Website

This loader collects raw text from websites, ignoring images and code blocks. It can refresh and track URLs similar to other data loaders, identifying changes in the website structure and prompting the user accordingly. Future features include parsing media (images, videos, code blocks) and flagging data conflicts or unwanted elements like PII.

You cannot have multiple website agents attached to the same URL.

Website Agents can be linked to a direct URL, or a sitemap. If you provide a link to a sitemap, we'll only scrape the pages defined in the sitemap, and the scraping depth will not be used.

How to find the sitemap of a website

Last updated