πŸ“‘What is a Corpus?

A corpus is a collection of documents used to inform your AI agent, allowing them to produce personalized responses, and actions.

The problem

AI models like ChatGPT are trained on vast amounts of data spanning everything from Cat pictures, to Reddit threads about baking cookies. Often this data is not directly helpful to organizational needs like technical writing, or answering questions about a specific documentation.

Additionally foundational models training data is frozen at a certain date, meaning that the model is unaware of any information about a topic from a certain time onwards. This leads to answers often being incomplete or misleading.

Solution: Extending AI context

A Corpus is a RAG, a method for extending AI context to include relevant, targeted, specific information about a certain Topic. Katara allows organizations to utilize data loader agents to pull information from live sources, like Discord, Slack, websites, GitHub, etc. This data will be collected and categorized in your Corpus. Generative agents will use the corpus to inform answers and content. This is critical to producing meaningful, and accurate answers. You can then periodically or automatically refresh the links to pull in the latest data about the topic.

Last updated