# Data Collection and Usage Policy

Katara only collects publicly available and non-sensitive personally identifiable information (PII)—such as social media handles—for the sole purpose of supporting client workflows. Clients retain full ownership and control of their data at all times, including the ability to access, export, or permanently delete it on demand.

Importantly, we do not attempt to resolve, enrich, geolocate, or otherwise identify individual community members. Our platform is expressly designed to preserve user anonymity. Instead, we focus exclusively on aggregating data to analyze macro-level trends and content consumption patterns within community sub-groups. This information is used solely to inform and improve content strategy and effectiveness, not to target or profile individuals.

## FAQ

<details>

<summary>Which data is be collected from the platform (usernames, exact content, other metadata)?</summary>

* **Katara App User** - Name, email, Latest Login, Date Created, Browser
* **Discord** - Channel Name, User Handle, User Id, Timestamps, Message Text, Emojis (if responding to a Katara generated answer)
* **Telegram** - Group Name, User Handle, User Id, Name (First/Last), Timestamps, Message Text
* **Slack** - Channel Name, User Id, User Handle, Name (First/Last), Timestamps, Message Text
* **Website/Web-Widget** - URL, Timestamps, Messages, IP Address (optional, configured by user, default is off)

</details>

<details>

<summary>How is the data processed?</summary>

* **Katara App User** - Authorization and authentication of Katara processes
* Message Text is evaluated for question identification, if the message text meets question criteria, the message is then submitted to the Q\&A agent to generate a response - that response is then re-evaluated against the response threshold settings for routing purposes
* Some data is enriched by classifiers - Katara may assign a topic, taxonomy, category, or user generated label to in-bound data to help users with filtering and analytics.

</details>

<details>

<summary>What data is stored?</summary>

See above.

</details>

<details>

<summary>How is stored data secured?</summary>

* All data is encrypted in transit/at rest.
* Access to stored data is controlled by logins for our core developers and Auth0 for clients, who have full access via the Katara App UI, the Organizational “Owner” Account is responsible for inviting other members and assigning them access roles/permissions.
* Some data can be replicated into Google Cloud logging, this happens on occasion for application debugging and monitoring.

</details>

<details>

<summary>What attributes of the data are labeled in storage?</summary>

If users opt-in, we will label the data with Topic, Taxonomy or Categorical labels depending on workflow - users can also explicitly label source data when it is loaded by configuring labels on several agent UIs - for example, #Discord-Channel-Name could be a label or “GitBook” could be a label if the user sets this up. Labels are helpful for filtering and for analytics.

</details>

<details>

<summary>What is your data retention and destruction policy?</summary>

We abide by GDPR and follow a contractual/consent based model - we agree to store your data for as long as you maintain an active sub, we also only store data on your consent and you have full access to all data (via the Content Explorer UI) which allows you to delete data you no longer wish to store. Upon the cancellation/deletion of your “Organization” account on Katara, you will have the option to request an extract of all captured data or request deletion.

</details>

<details>

<summary>Is our data used to train models? Or is our data the input to models only? Where are results stored and how are they secured?</summary>

If you opt-in to creating an “Organizational” taxonomy, we will use your data to train a classification model for the purpose of labeling/enriching your own dataset with this information - these models are only available per organization and are not shared across organizations. The lifecycle of these models will follow all other data that is specific to an organization and will be destroyed or transferred to an organization upon request.

Support & Q\&A workflows rely on RAG, so no model is trained for those automations, similarly, other generative AI workflows (for example, those in which we use the data to produce a re-written article, or a new piece of content) leverage existing captured content specific to your organization.

All results are loaded back into the same DocumentDB that original content is stored in. Occasionally custom research/information is shared directly with clients via Google Drive, Google Sheets, Google Docs, or Notion.

</details>

<details>

<summary>How is our data secured in transit?</summary>

Industry standard encryption is used when moving data in/out of any integrated platforms.

</details>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.katara.ai/about/getting-started/data-collection-and-usage-policy.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.