Large Language Models

Nothing is stopping you from integrating Prodigy with services that can help you annotate. This includes services that provide large language models that offer zero/few-shot learning. Prodigy provides a few built-in recipes to help you get started.

From OpenAI to Prodigy diagram

Named Entity Recognition

You can use ner.openai.correctto annotate examples with live suggestions from OpenAI. This recipe marks entity predictions obtained from a large language model and allows you to accept them as correct, or to manually curate them. Alternatively you can also choose to fetch examples ahead of time. The ner.openai.fetch recipe gives you the same suggestions but is able to download a large batch of examples upfront. These examples can then be annotated and corrected via the ner.manual recipe.

Both recipes can be used to detect entities that spaCy models aren't trained on and you're free to adapt the recipes. You can provide examples to have OpenAI do few-shot learning, change the hyperparameters from the command line or choose to send your own custom prompts.

Read more

Example

prodigyner.openai.fetchexamples.jsonlopenai-out.jsonldish,ingredient,equipment

Example of pre-highlighted entities by LLM

This live demo requires JavaScript to be enabled.

Example

prodigytextcat.openai.fetchexamples.jsonlopenai-out.jsonlrecipe,feedback,question

Example response from LLM with reasoning

This live demo requires JavaScript to be enabled.

Text Classification

The recipe textcat.openai.correctlets you classify texts faster with the help of large language models. It also provides a reason why a particular label was chosen. Just like the named entity recipes, you can also choose to fetch examples upfront instead via the textcat.openai.fetch recipe.

By fetching the examples upfront, you'll also be able to filter based on the LLM predictions. This can be incredibly useful when you're dealing with an imbalanced classification task with a rare label. Instead of going through all the examples manually you can only check the examples in which OpenAI predicts the label of interest.

You can also provide extra context to the prompt by adding examples to steer the large language model. Alternatively, you may also choose to customise the prompt completely by writing your own jinja2 templates.

Read more

Generate terminology lists from scratch

There are many ways to use a large language model with zero-shot capabilities. You make predictions to pre-annotate examples, but you can also have it bootstrap terminology lists via the terms.openai.fetch recipe. These terms can be reviewed so they can later be used for named entity recognition, span categorization or weak-supervision.

Read more

Example

prodigyterms.openai.fetch"skateboard tricks"skateboard-tricks.jsonl
skateboard-tricks.jsonl{"text": "kickflip", "meta": {"openai_query": "skateboard tricks"}}
{"text": "nose manual", "meta": {"openai_query": "skateboard tricks"}}
{"text": "heelside flip", "meta": {"openai_query":"skateboard tricks"}}
{"text":"ollie", "meta": {"openai_query": "skateboard tricks"}}
{"text": "frontside boardslide", "meta": {"openai_query": "skateboard tricks"}}
{"text": "5050 Grind", "meta": {"openai_query": "skateboard tricks"}}
View the documentation