Text Classification
Whether you’re doing intent detection, information extraction, semantic role labeling or sentiment analysis, Prodigy provides easy, flexible and powerful annotation options. Active learning keeps you efficient even if your classes are heavily imbalanced.
Fast and flexible annotation
Prodigy’s web-based annotation app has been carefully designed to be as efficient as possible. It allows you to mix and match annotation interfaces to build the experience that works best for your task.
Text classification tasks often have multiple categories to choose between, and the categories may or may not be mututally exclusive. Prodigy has full support for all of these problem types. You can also ask “yes-or-no” questions, allowing you to zoom through the data. This binary approach is especially powerful when combined with Prodigy's active learning capabilities, because you can let the model select only the questions it's most unsure about, maximising your information per click.
Read morepatterns.jsonl{"pattern": [{"lemma": "acquire"}], "label": "COMPANY_SALE"}
Bootstrap with powerful patterns
Prodigy is a fully scriptable annotation tool, letting you automate as much as possible with custom rule-based logic. If your classes are imbalanced, you don't want to waste time labeling irrelevant examples. Instead, give Prodigy rules or a list of trigger words, review the matches in context and annotate the exceptions. As you annotate, a statistical model can learn to suggest similar examples, generalising beyond your initial patterns.
Focus on what the model is most uncertain about
Prodigy puts the model in the loop, so that it can actively participate in the training process, using what it already knows to figure out what to ask you next. The model learns as you go, based on the answers you provide. Most annotation tools avoid making any suggestions to the user, to avoid biasing the annotations. Prodigy takes the opposite approach: ask the user as little as possible.
Example
prodigytrain./textcat-model--textcat dataset_a,dataset_b--textcat-multilabel dataset_c--training.max-steps 1000
Example
prodigytextcat.correcttext-dataset./textcat-modelexamples.jsonl--label pos,neu,neg
Immediately train custom text classifiers
Once you've got your first annotations you can immediately have Prodigy train https://spacy.io/. You can point the train
to the datasets of interest and immediately get a machine learning pipeline for text classification. You can even train a model that handles multiple tasks and choose to override the settings from the command line.
From here, you can re-use the model to make annotation easier via textcat.correct
or even use it for active learning via textcat.teach
.