Instructor: The Most Popular Library for Simple Structured Outputs
This blog is your step-by-step guide to installing and using Instructor. We'll break down the code, share sample outputs, provide helpful resources, and touch on advanced use cases. By the end, you'll be ready to leverage structured outputs in your AI applications.

You’re not just reading about AI today — you’re about to build it."
"Don’t just watch the future happen — create it. Join Gen AI Launch Pad 2024 and turn your curiosity into capability before the AI wave leaves you behind. 🚀"
Introduction
As AI models like GPT become more powerful and flexible, developers are often faced with a challenge: how do we get structured outputs from large language models (LLMs)? Enter Instructor, a library designed to simplify structured data extraction from LLMs. In this blog post, we'll explore what makes Instructor so effective, break down the code, and understand how you can integrate it with models like OpenAI's GPT and Cohere's models.
Why Use Instructor?
Instructor makes it easy to prompt LLMs for structured outputs, such as JSON data. Instead of receiving unstructured text, you can request LLMs to provide responses in the format you need. This is especially useful for:
- Form Data Extraction: Automating extraction of specific fields from documents.
- APIs & Automation: Structuring data for APIs or downstream processing.
- Enterprise Use-Cases: Tasks that require predictable and structured results.
- Data Pipelines: When you need clean, structured data for analytics or reporting.
- Chatbots and Assistants: Ensuring responses from AI assistants follow a predictable format.
Instructor abstracts away complexity, enabling you to build robust applications faster. By specifying a schema for the output, you ensure your AI delivers exactly what you need.
Installation
First, let's install the necessary libraries. The notebook starts with a simple installation step:
!pip install instructor openai==1.57.4 cohere --quiet
- Instructor: The main library for structured outputs.
- OpenAI: For accessing OpenAI models like GPT-3.5 and GPT-4.
- Cohere: An alternative to OpenAI, providing different LLM capabilities.
Output:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 249.9/249.9 kB 13.6 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.1/3.1 MB 77.9 MB/s eta 0:00:00
This installs all necessary packages quietly (without verbose output).
Troubleshooting Installation Issues
- Network Issues: If the installation is slow or fails, check your internet connection.
- Version Conflicts: If you have older versions of libraries installed, update them using pip install --upgrade.
- Environment Issues: Ensure you're working in a clean virtual environment or Colab instance to avoid conflicts.
Setting Up API Keys
Next, you need API keys for OpenAI and Cohere. The code fetches these from Google Colab's userdata storage:
import os
from google.colab import userdata
os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')
os.environ['CO_API_KEY'] = userdata.get('CO_API_KEY')
How to Obtain API Keys
1.OpenAI API Key:
- Sign up at OpenAI.
- Go to your account settings and generate a new API key.
2.Cohere API Key:
- Sign up at Cohere.
- Navigate to the API section and generate a new API key.
Security Tips
- Never share your API keys publicly or commit them to repositories.
- Use environment variables or secure storage options to manage keys.
Importing Libraries
Now, let's import the required libraries:
import instructor from openai import OpenAI from pydantic import BaseModel
- Instructor: The core library for handling structured outputs.
- OpenAI: For interfacing with OpenAI's models.
- Pydantic: For defining structured data models.
What is Pydantic?
Pydantic is a powerful data validation and parsing library in Python. It allows you to define schemas (structured models) for your data using Python classes. These schemas ensure that data conforms to the expected format and type, providing a reliable way to validate incoming data and prevent errors. Pydantic is particularly useful when you need to ensure consistency and correctness of data in applications.
Key Features of Pydantic
- Type Enforcement: Ensures that data matches specified types, such as str,float,int, or custom types.
- Validation: Automatically validates data against the defined schema and raises clear error messages if the data is incorrect.
- Serialization/Deserialization: Converts data between different formats (e.g., JSON to Python objects and vice versa).
- Nested Models: Supports defining complex schemas with nested data structures.
- Error Handling: Provides detailed error messages when validation fails, making debugging easier.
- Automatic Data Parsing: Automatically parses input data, transforming it to the correct types.
Example of Pydantic Model
Here's an example of a simple pydantic model:
from pydantic import BaseModel
class User(BaseModel):
    name: str
    age: int
    email: str
# Creating a User instance
user = User(name="Alice", age=30, email="alice@example.com")
print(user)
Output:
name='Alice' age=30 email='alice@example.com'
If you provide incorrect data types, Pydantic will raise a validation error:
try:
    user = User(name="Alice", age="thirty", email="alice@example.com")
except Exception as e:
    print(e)
Output:
age value is not a valid integer (type=type_error.integer)
Why Use Pydantic with Instructor?
When combined with Instructor, Pydantic helps define the structure of the data you expect from an LLM. This means you can:
- Enforce Data Integrity: Ensure the LLM’s response conforms to your schema.
- Reduce Errors: Identify and handle invalid outputs gracefully.
- Streamline Processing: Easily integrate structured outputs into your workflows, APIs, and data pipelines.
Instructor uses Pydantic models to guide the LLM in generating consistent, structured outputs, making your applications more reliable and easier to maintain.
Creating a Structured Data Model
Here's an example of how to define a structured output using pydantic and Instructor:
class WeatherResponse(BaseModel):
    location: str
    temperature: float
    condition: str
In this example:
- WeatherResponse: A- pydanticmodel specifying the desired fields:
- location: Name of the location (string).
- temperature: The temperature in degrees (float).
- condition: The weather condition (string).
This model tells the LLM to output responses matching this structure.
Why Use Structured Models?
- Consistency: Ensures the LLM output follows a predictable structure.
- Error Reduction: Reduces the chances of unexpected or unusable data.
- Easier Parsing: Simplifies downstream processing and integration with APIs or databases.
Error Handling
Instructor can gracefully handle errors when the model output doesn't match the expected structure. If the LLM returns an output that doesn't align with the defined pydantic model, Instructor raises a validation error.
Example of Error Handling
try:
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": "Give me a list of temperatures."}],
        response_model=WeatherResponse
    )
    print(response)
except Exception as e:
    print("Error:", e)
Output:
Error: 1 validation error for WeatherResponse response -> location field required (type=value_error.missing)
This helps ensure your application can handle unexpected outputs gracefully.
Conclusion
The Instructor library is a powerful tool for extracting structured data from large language models like OpenAI's GPT and Cohere's models. By combining the flexibility of LLMs with the precision of pydantic schemas, Instructor allows you to build applications that require consistent, structured outputs with ease.
Key Takeaways:
- Ease of Use: Instructor simplifies prompting for structured outputs.
- Consistency: Ensure predictable results by defining pydanticschemas.
- Flexibility: Works with both OpenAI and Cohere models.
- Robustness: Built-in error handling for invalid outputs.
Whether you're building chatbots, automating data pipelines, or working on enterprise AI solutions, Instructor can help streamline your development process.
Resources
- Instructor GitHub Repository: Instructor on GitHub
- OpenAI API Documentation: OpenAI Docs
- Cohere API Documentation: Cohere Docs
- Pydantic Documentation: Pydantic Docs
- Instructor Build Fast with AI: NoteBook
---------------------------------
Stay Updated:- Follow Build Fast with AI pages for all the latest AI updates and resources.
Experts predict 2025 will be the defining year for Gen AI implementation.Want to be ahead of the curve?
Join Build Fast with AI’s Gen AI Launch Pad 2025 - your accelerated path to mastering AI tools and building revolutionary applications.
AI That Keeps You Ahead
Get the latest AI insights, tools, and frameworks delivered to your inbox. Join builders who stay ahead of the curve.
You Might Also Like

How FAISS is Revolutionizing Vector Search: Everything You Need to Know
Discover FAISS, the ultimate library for fast similarity search and clustering of dense vectors! This in-depth guide covers setup, vector stores, document management, similarity search, and real-world applications. Master FAISS to build scalable, AI-powered search systems efficiently! 🚀

Smolagents a Smol Library to build great Agents
In this blog post, we delve into smolagents, a powerful library designed to build intelligent agents with code. Whether you're a machine learning enthusiast or a seasoned developer, this guide will help you explore the capabilities of smolagents, showcasing practical applications and use cases.

Guardrails with LangChain: A Comprehensive Guide
This blog explores integrating Guardrails with LangChain to enforce structured and reliable NLP outputs. It covers setup, schema creation, and pipeline building, with real-world applications like content management, e-commerce, and data automation to enhance AI reliability and usability.

