# What is a Corpus?

### A corpus is a collection of documents used to inform your AI agent, allowing them to produce personalized responses, and actions.

### The problem

AI models like ChatGPT are trained on vast amounts of data spanning everything from Cat pictures, to Reddit threads about baking cookies. Often this data is not directly helpful to organizational needs like technical writing, or answering questions about a specific documentation.

Additionally foundational models training data is frozen at a certain date, meaning that the model is unaware of any information about a topic from a certain time onwards. This leads to answers often being incomplete or misleading.

### Solution: Extending AI context

A [Corpus is a RAG](https://cloud.google.com/use-cases/retrieval-augmented-generation), a method for extending AI context to include relevant, targeted, specific information about a certain Topic.\
\
Katara allows [organizations](/about/organizations/manage-your-organization.md) to utilize data loader agents to pull information from live sources, like Discord, Slack, websites, GitHub, etc. This data will be collected and categorized in your Corpus.

### Key Concepts

To effectively manage your corpus, you should understand how Katara handles document access and organization:

* [**Collections**](/about/corpuses/what-is-a-corpus/collections.md): Organize your documents into logical groups to scope AI context.
* [**Document Ownership**](/about/corpuses/what-is-a-corpus/document-ownership.md): Understand who is responsible for each document and how ownership is assigned.
* [**Sharing**](/about/corpuses/what-is-a-corpus/sharing.md): Learn how to grant access to specific users within your organization.
* [**Visibility**](/about/corpuses/what-is-a-corpus/visibility.md): Discover the rules that determine who can see and search for your documents.
* [**Sensitivity Classification**](/about/corpuses/what-is-a-corpus/sensitivity-classification.md): Protect your most confidential data with clearance-based access.\
  Generative agents will use the corpus to inform answers and content. This is critical to producing meaningful, and accurate answers. You can then periodically or automatically refresh the links to pull in the latest data about the topic.

<figure><img src="/files/1FeQeZC2zD3mwJ6eE0qW" alt=""><figcaption></figcaption></figure>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.katara.ai/about/corpuses/what-is-a-corpus.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
