Katara
AppBlogDiscordWebsite
  • Getting Started
    • 👋Welcome to Katara
    • 👨‍🔬Use Cases
    • ⚙️How does it work?
      • 🧱Large Language Models
      • ➡️RAGs - Extending AI context
      • 🏛️Diataxis Framework
      • 🔧Agents and Workflows
      • 🧑‍🤝‍🧑People at the helm
    • 📁Data Collection and Usage Policy
  • Organizations
    • ⚡Quickstart
    • ⚒️Manage Your Organization
  • Agents
    • 💻What is an agent?
    • Q&A
    • Website
    • GitHub
    • Discord
    • Telegram
    • Slack
    • Taxonomy
    • Website Widget
  • Workflows
    • What is a Workflow?
    • 📊Community Analytics
    • 📈Content Improvement with Gap Analysis
    • 📄Adopt Diataxis
    • ❓Community Support
  • Corpuses
    • 📑What is a Corpus?
Powered by GitBook
On this page
Edit on GitHub
  1. Agents

Website

PreviousQ&ANextGitHub

Last updated 2 months ago

This loader collects raw text from websites, ignoring images and code blocks. It can refresh and track URLs similar to other data loaders, identifying changes in the website structure and prompting the user accordingly. Future features include parsing media (images, videos, code blocks) and flagging data conflicts or unwanted elements like PII.

You cannot have multiple website agents attached to the same URL.

Website Agents can be linked to a direct URL, or a sitemap. If you provide a link to a sitemap, we'll only scrape the pages defined in the sitemap, and the scraping depth will not be used.

How to Find the Sitemap of a Website (7 Options) | SEOcrawlSoftware SEO y Herramientas SEO Profesionales | SEOcrawl
How to find the sitemap of a website
Logo