Guidance: Structured LLM Generation

The best time to start with AI was yesterday. The second best time? Right after reading this post.
Join Build Fast with AI’s Gen AI Launch Pad 2025—a 6-week transformation program designed to accelerate your AI mastery and empower you to build revolutionary applications.
Introduction
Large Language Models (LLMs), such as OpenAI's GPT series, have significantly advanced natural language processing, enabling sophisticated applications across industries. Yet, conventional methods like fine-tuning and prompt engineering can pose challenges, including increased costs, latency, and a lack of precise control over output structure.
Guidance emerges as a game-changer, offering a programming paradigm designed to enhance interaction with LLMs. With features like structured generation, output constraints, and dynamic control logic, Guidance allows developers to optimize performance while reducing operational complexities.
This blog delves into the capabilities of Guidance, providing a step-by-step walkthrough of its features, practical applications, and advanced techniques. By the end, you'll have the tools and knowledge to:
- Efficiently interact with LLMs.
- Control and constrain model outputs.
- Implement multistep workflows for complex tasks.
- Apply Guidance in real-world scenarios with confidence.
Detailed Explanation
Setting Up Guidance
To begin, ensure your environment is configured correctly. Install the necessary libraries and set up your API key.
!pip install guidance gpustat llama-cpp-python import os from google.colab import userdata os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY') from guidance import models gpt4 = models.OpenAI("gpt-4o")
What this does:
guidance
: The main library enabling structured LLM interactions.gpustat
: Tracks GPU usage, useful for monitoring resource consumption in high-performance tasks.llama-cpp-python
: Provides integration with LLaMA models for additional flexibility.OPENAI_API_KEY
: Authenticates your session with OpenAI's API.
Why it matters: Setting up this environment ensures seamless integration with GPT models, laying the foundation for advanced use cases.
Where to apply: This setup is essential for projects requiring structured LLM interactions, such as chatbots, decision-making systems, or custom workflows.
Simple Generation
A straightforward way to generate responses is by appending a query to a model instance.
lm = gpt4 + "Who won the last Kentucky Derby and by how much?"
Explanation:
- The
+
operator combines the model instance (gpt4
) with the query string. - The model processes the query and generates a response.
Expected Output:
- A specific answer detailing the winner of the Kentucky Derby and the margin of victory.
Use Case: Ideal for single-turn Q&A systems, basic chatbot applications, or quick information retrieval.
Pro Tip: Experiment with varying query structures to understand how phrasing impacts the model’s responses.
Making Decisions with select
Guidance introduces the ability to choose between predefined alternatives, enabling dynamic responses based on user input.
from guidance import select gpt4 + f''' Q: {query} Now I will choose to either SEARCH the web or RESPOND. Choice: {select(["SEARCH", "RESPOND"], name="choice")} '''
Explanation:
select
: Allows the model to choose between predefined options.- The variable
choice
stores the model’s selection, influencing subsequent actions.
Expected Output:
- If the model chooses “SEARCH”: It indicates the need for external input.
- If it chooses “RESPOND”: It directly generates an answer.
Applications: This is particularly useful for scenarios like:
- Decision trees in interactive chatbots.
- Routing logic in customer support systems.
- Dynamic branching workflows.
Interleaved Generation and Control
Guidance excels in blending logic with text generation, enabling sophisticated control over outputs. Here’s an example:
@guidance def qa_bot(lm, query): lm += f''' Q: {query} Now I will choose to either SEARCH the web or RESPOND. Choice: {select(["SEARCH", "RESPOND"], name="choice")} ''' if lm["choice"] == "SEARCH": lm += "A: I don't know, Google it!" else: lm += f'A: {gen(stop="Q:", name="answer")}' return lm gpt4 + qa_bot(query)
Explanation:
guidance
decorator: Marks the function as a Guidance-enabled workflow.gen
: Dynamically generates text with defined constraints (e.g.,stop="Q:"
).- Conditional logic determines whether to defer to external resources or generate a direct response.
Expected Output:
- For “SEARCH”: A suggestion to consult external sources.
- For “RESPOND”: A complete answer.
Applications: Build chatbots or assistants that adaptively handle queries based on their complexity or the availability of information.
Regex-Guided Output
Regex constraints ensure outputs adhere to specific formats, which is invaluable for structured data generation.
gpt4o + f''' Tweak this proverb to apply to model instructions instead. Where there is no guidance, a people falls, but in an abundance of counselors there is safety. - Proverbs 11:14 UPDATED Where there is no guidance{gen('rewrite', stop="- ")} - GPT {gen('chapter', regex="[0-9]+")}:{gen('verse', regex="[0-9]+")} '''
Explanation:
regex
: Ensures generated outputs match specific patterns (e.g., numerical values forchapter
andverse
).- The
gen
function dynamically generates text adhering to these constraints.
Expected Output: A rephrased proverb, properly formatted with new chapter and verse references.
Applications:
- Academic citations.
- Template-based text generation.
- Ensuring data consistency in structured datasets.
Pro Tip: Experiment with more complex regex patterns to enforce advanced constraints.
Multistep Interaction
Here’s how Guidance can handle multistep processes with conditional branching:
@guidance def experts(lm, query): with system(): lm += "You are a helpful assistant." with user(): lm += f''' I want a response to the following question: {query} Who are 3 world-class experts (past or present) who would be great at answering this? Please don't answer the question or comment on it yet.''' with assistant(): lm += gen(name='experts', max_tokens=300) with user(): lm += f''' Great, now please answer the question as if these experts had collaborated in writing a joint anonymous answer. If the experts would disagree, just present their different positions as alternatives.''' with assistant(): lm += gen(name='answer', max_tokens=500) return lm gpt4 + experts(query='What is the meaning of life?')
Explanation:
- Context Blocks:
system
: Provides initial setup or context.user
: Captures user queries or instructions.assistant
: Generates responses based on the above inputs.gen
: Dynamically generates expert recommendations and a collaborative response.
Expected Output:
- A list of three world-class experts.
- A nuanced answer reflecting diverse perspectives.
Applications: Use this approach for research, consulting, or expert-driven content creation.
Conclusion
Guidance transforms the way developers interact with LLMs, offering unparalleled control and flexibility. By mastering its tools—from regex constraints to dynamic workflows—you can unlock new possibilities in text generation while optimizing for efficiency and cost.
Resources
---------------------------------
Stay Updated:- Follow Build Fast with AI pages for all the latest AI updates and resources.
Experts predict 2025 will be the defining year for Gen AI implementation.Want to be ahead of the curve?
Join Build Fast with AI’s Gen AI Launch Pad 2025 - your accelerated path to mastering AI tools and building revolutionary applications.