Website
Last updated
Last updated
This loader collects raw text from websites, ignoring images and code blocks. It can refresh and track URLs similar to other data loaders, identifying changes in the website structure and prompting the user accordingly. Future features include parsing media (images, videos, code blocks) and flagging data conflicts or unwanted elements like PII.
Website Agents can be linked to a direct URL, or a sitemap. If you provide a link to a sitemap, we'll only scrape the pages defined in the sitemap, and the scraping depth will not be used.