A Practical Look at Generating Structured Outputs with LLMs
Over the past weeks, I’ve tested various methods to get LLMs to produce consistent, well-structured responses—an essential step when integrating these models into programmatic solutions, where free-form text can break automated workflows and data pipelines. Below are two approaches that helped me solve these problems: prompt engineering and Pydantic schema validation.
Structured Data for Reliability
Converting raw LLM text into predictable formats like JSON streamlines integration with APIs, databases, and UIs. Without structure, even extracting a single field can become error-prone; for instance, {"date": "2024-10-17"} is much easier to process than “Next Thursday.”
Method 1: Prompt Engineering (The “Ask Nicely” Approach)
This approach relies on explicit instructions in the prompt to guide the model’s output format.
1
2
3
4
5
6
7
8
9
10
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "user", "content": "Extract keywords from: 'Quantum computing is revolutionizing cryptography.' Return as JSON: {keywords: []}"}
]
)
print(response.choices[0].message.content)
# Might look like:
# {"keywords": ["Quantum computing", "cryptography"]}
Pros
- Universal Compatibility: Works with most models, including older ones.
- Flexibility: No dependency on specific libraries or SDKs.
Cons
- Fragility: Even with precise prompts, the model might omit fields, add typos, or return invalid JSON.
- No Type Safety: Manual validation is required, increasing code complexity.
When to Use
- Prototyping or one-off tasks.
- Working with endpoints/models that don’t support JSON schema enforcement.
Method 2: Pydantic Schema Validation (The “Strict Enforcer” Approach)
Using OpenAI’s SDK with Pydantic, you can define strict schemas and parse responses directly into Python objects.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
from pydantic import BaseModel
class ProductReview(BaseModel):
product: str
sentiment: str # "positive", "neutral", "negative"
features: list[str]
response = client.chat.completions.parse(
model="gpt-4o",
messages=[
{"role": "user", "content": "Analyze this review: 'The camera is amazing, but battery life disappoints.'"}
],
response_format=ProductReview
)
parsed_data = response.choices[0].message.parsed
print(parsed_data)
# ProductReview(product='Camera', sentiment='positive', features=['camera', 'battery life'])
Pros
- Type Safety: Invalid outputs raise validation errors immediately.
- Integration: Directly maps to Python objects, reducing boilerplate code.
- Refusal Handling: Detects safety-based refusals programmatically.
Cons
- Model Limitations: Requires newer models that fully support
response_format. - Schema Rigidity: May struggle with ambiguous inputs (e.g., extracting data from messy text).
When to Use
- Production systems needing reliability.
- Complex or nested data structures (e.g., multi-field documents).
Pydantic in Practice: Lessons Learned
What Went Well
Validation as a Guardrail
In one text-classification workflow, the LLM sometimes returned incomplete fields for category tags. By defining aCategoryItem(BaseModel)with the required keys, Pydantic immediately caught missing tags and enforced type correctness (e.g.,tags: list[str]). This quick feedback loop saved a lot of time troubleshooting downstream logic.IDE Support
Pydantic models provided clear hints and autocomplete in the IDE. Whenever a new field was added or renamed, my code editor flagged mismatches, reducing errors before runtime.
Common Pitfalls
Schema Design Too Strict
If you define a field asintbut the LLM returns"12 pages"instead of12, you’ll get a validation error. One solution is to allow astrtype and parse it manually. Another is to instruct the model more explicitly to return only numeric data—though that sometimes still fails if the user prompt is ambiguous.Model/Endpoint Constraints
Some endpoints on AWS or Azure don’t fully support strict JSON schema modes. Pydantic is best paired with newer OpenAI models that accept aresponse_format. If you must use an endpoint without this feature, be prepared to do manual validation or allow a more lenient schema.
Practical Tips
Start Simple
Prompt-based JSON is faster to prototype. Transition to strict schemas later for reliability.- Handle Edge Cases
Sometimes the model returns incomplete or invalid data:1 2 3 4 5
try: item = CategoryItem.parse_raw(response.content) except ValidationError as e: logger.warning("Validation error: %s", e) # Decide whether to retry or skip this item
- Why? Even with a schema, a refusal or partial response can break validation. Catch these errors early and decide whether to log, skip, or retry.
- Combine Approaches
You can still use a prompt that asks for JSON, plus enforce a schema:
1
2
3
4
5
6
7
8
9
10
messages = [
{"role": "system", "content": "Provide a JSON response with 'title' and 'tags'."},
{"role": "user", "content": "Extract metadata from the paragraph: '...'"}
]
result = client.chat.completions.parse(
model="your-model",
messages=messages,
response_format=MetaDataSchema
)
- Why? Explicitly telling the model to produce JSON reduces errors, and the schema ensures type safety if the model strays.
- Use
anyOffor Flexibility
Sometimes your data can take multiple forms. A simple example:1 2 3 4 5 6
{ "anyOf": [ { "$ref": "#/$defs/AddressFormatA" }, { "$ref": "#/$defs/AddressFormatB" } ] }
- Why? If the LLM might return different valid structures,
anyOfhelps accommodate variations without failing validation every time.
- Why? If the LLM might return different valid structures,
Final Thoughts
Structured outputs are a major advantage in LLM workflows, but not a universal fix. Through trial and error, I’ve found:
- Prompt-only methods are quick to set up but risk invalid responses.
- Pydantic-backed schemas provide stricter validation and easier debugging.
- Validation should happen early—never fully trust raw model output in production.
Balancing simplicity with strong formatting constraints ensures your LLM-based system stays robust and dependable.