Taxonomy
Last updated
Last updated
The taxonomy agent is coming soon! The documentation below explains the agent as it exists, but development is still in progress and much of it is subject to change.
The taxonomy agent allows you to generate taxonomies from the data you load into the Katara platform. A taxonomy is a representation of your data as a list of topics and the keywords associated with each topic.
At the core of the taxonomy agent's functionality are runs. Runs are how taxonomies are generated. Each agent can have many runs, as well as trigger runs on a schedule. Each agent also has a default runβthis is the run that will be used by other agents connected to the agent, unless another run is specifically chosen. By default, the default run for a taxonomy agent is the most recent successful run. If you set a specific run as the default instead, the agent will no longer use the most recent successful run as the default output, even if more runs are created.
There are three ways to start a taxonomy run: manually, on a schedule, and re-training. All types of runs will use your agent's configuration options when starting the run, and require the agent to be enabled. To start a run manually, go to the runs tab of your taxonomy agent and click the button labelled "Start a new run". From here, you'll have the option to adjust your agent's configuration before starting the run. To start a run on a schedule, you'll need to configure the "Schedule" option under the details tab of your taxonomy agent. Finally, to start a re-training run, go to an existing run of your taxonomy agent, and click the "Re-train" button. Re-training is different than scheduled and manual runs in that it starts with the model generated by the run, and continues to train it. This can be useful if you want to refine an existing run, rather than re-run from scratch.
After a run has successfully completed, you can click on it to view and edit its details. The output of a run consists of generated topics and their keywords, as well as a generated description for the entire taxonomy. This page also provides the ability to change the run's name, leave notes for yourself, and view the configuration that was used to generate the run.
Topics are the primary output of a taxonomy run. Each topic has a name and keywords the model determined were a part of each topic. We also expose the BERTopic name, which is a generally less human-readable name given to each topic by the model itself. This name can be useful for debugging purposes.
For each topic, you can configure two properties: the name, and whether to include the topic in the output. The name of a topic is directly mapped to a tag name, which can be used later to tag documents classified with this topic. You might want to edit this name if you a different name makes more sense to you, given the keywords of the topic. This name has no effect on any models, so it's fully in your control to choose a name that makes sense to you. You also have a toggle for including the topic in the output of the run. If this toggle is disabled, the topic will not be used when using the run. By default, all topics are included in the output.