Text Classification · Prodigy · An annotation tool for AI, Machine Learning & NLP

Fast and flexible annotation

Prodigy’s web-based annotation app has been carefully designed to be as efficient as possible. It allows you to mix and match annotation interfaces to build the experience that works best for your task.

Text classification tasks often have multiple categories to choose between, and the categories may or may not be mututally exclusive. Prodigy has full support for all of these problem types. You can also ask “yes-or-no” questions, allowing you to zoom through the data. This binary approach is especially powerful when combined with Prodigy's active learning capabilities, because you can let the model select only the questions it's most unsure about, maximising your information per click.

Try it live and select the category!

This live demo requires JavaScript to be enabled.

Try it live and select the categories!

This live demo requires JavaScript to be enabled.

Try it live and accept or reject!

This live demo requires JavaScript to be enabled.

patterns.jsonl{"pattern": [{"lemma": "acquire"}], "label": "COMPANY_SALE"}

This live demo requires JavaScript to be enabled.

Bootstrap with powerful patterns

Prodigy is a fully scriptable annotation tool, letting you automate as much as possible with custom rule-based logic. If your classes are imbalanced, you don't want to waste time labeling irrelevant examples. Instead, give Prodigy rules or a list of trigger words, review the matches in context and annotate the exceptions. As you annotate, a statistical model can learn to suggest similar examples, generalising beyond your initial patterns.

Focus on what the model is most uncertain about

Prodigy puts the model in the loop, so that it can actively participate in the training process, using what it already knows to figure out what to ask you next. The model learns as you go, based on the answers you provide. Most annotation tools avoid making any suggestions to the user, to avoid biasing the annotations. Prodigy takes the opposite approach: ask the user as little as possible.

Example
prodigytrain./textcat-model--textcat dataset_a,dataset_b--textcat-multilabel dataset_c--training.max-steps 1000

Example
prodigytextcat.correcttext-dataset./textcat-modelexamples.jsonl--label pos,neu,neg

Immediately train custom text classifiers

Once you've got your first annotations you can immediately have Prodigy train https://spacy.io/. You can point the trainto the datasets of interest and immediately get a machine learning pipeline for text classification. You can even train a model that handles multiple tasks and choose to override the settings from the command line.

From here, you can re-use the model to make annotation easier via textcat.corrector even use it for active learning via textcat.teach.

View the documentation