Using Structured Outputs in Azure OpenAI’s GPT-4o for consistent document data processing

A stack of well-structured documents on a table

When using language models for AI-driven document processing, ensuring reliability and consistency in data extraction is crucial for downstream processing.

This article outlines how the Structured Outputs feature of GPT-4o offers the most reliable and cost-effective solution to this challenge.

To jump into action and use Structured Outputs for document processing, get hands on with our Python samples on GitHub.

Key challenges in consistency in generating structured outputs

ISVs and Startups building document data extraction solutions grapple with the complexities of ensuring that language models generate a consistent output inline with their defined schemas. These key challenges include:

Limitations in inline JSON output. While some models introduced the ability to produce JSON outputs, inconsistencies still arise from them. Language models can generate a response that doesn’t conform to the provided schema. This requires additional prompt engineering or post-processing to resolve.
Complexity in prompts. Including detailed inline JSON schemas within prompts increases the overall number of input tokens consumed. This is particularly problematic if you have a large, complex output structure.

Benefits of using the Structured Outputs features in Azure OpenAI’s GPT-4o

To overcome the limitations and inconsistencies of inline JSON outputs, GPT-4o’s structured outputs enables the following capabilities:

Strict schema adherence. Structured Outputs dynamically constrains the model’s outputs to adhere to JSON schemas provided in the response format of the request to GPT-4o. This ensures that the response is always well-formed for downstream processing.
Reliability and consistency. Using additional libraries, such as Pydantic, combined with Structured Outputs, developers can define exactly how data should be constrained to a specific model. This minimizes any post-processing and improves data validation.
Cost optimization. Unlike inline JSON schemas, Structured Outputs do not count towards the total number of input tokens consumed in a request to GPT-4o. This provides more overall input tokens for consuming document data.

Let’s explore how to use Structured Outputs with document processing in more detail.

Understanding Structured Outputs in document processing

Introduced in September 2024, the Structured Outputs feature in Azure OpenAI’s GPT-4o model provided much needed flexibility in requests to generate a consistent output using class models and JSON schemas.

For document processing, this enables a more streamlined approach to both structured data extraction as well as document classifications. This is particularly useful when building document processing pipelines.

By utilizing a JSON schema format, GPT-4o constrains the generated output to a JSON structure that is consistent with every request. These JSON structures can then easily be deserialized into a model object that can be processed easily by other services or systems. This eliminates potential errors often caused by inline JSON structures being misinterpreted by language models.

Implementing consistent outputs using GPT-4o in Python

To take full advantage and simplify the schema generation with Python, Pydantic is the ideal supporting library to build out class models to define the desired structure for outputs. Pydantic offers built-in schema generation for producing the necessary JSON schema required for the request, as well as data validation.

Below is an example for extracting data from an invoice demonstrating the capabilities of a complex class structure using Structured Outputs.

from typing import Optional
from pydantic import BaseModel

class InvoiceSignature(BaseModel):
type: Optional[str]
name: Optional[str]
is_signed: Optional[bool]

class InvoiceProduct(BaseModel):
id: Optional[str]
description: Optional[str]
unit_price: Optional[float]
quantity: Optional[float]
total: Optional[float]
reason: Optional[str]

class Invoice(BaseModel):
invoice_number: Optional[str]
purchase_order_number: Optional[str]
customer_name: Optional[str]
customer_address: Optional[str]
delivery_date: Optional[str]
payable_by: Optional[str]
products: Optional[list[InvoiceProduct]]
returns: Optional[list[InvoiceProduct]]
total_product_quantity: Optional[float]
total_product_price: Optional[float]
product_signatures: Optional[list[InvoiceSignature]]
returns_signatures: Optional[list[InvoiceSignature]]

With a well-defined model in place, requests to the Azure OpenAI chat completions endpoint are as simple as providing the model as the request’s response format. This is demonstrated below in a request to extract data from an invoice.

completion = openai_client.beta.chat.completions.parse(
model=”gpt-4o”,
messages=[
{
“role”: “system”,
“content”: “You are an AI assistant that extracts data from documents.”,
},
{
“role”: “user”,
“content”: f”””Extract the data from this invoice.
– If a value is not present, provide null.
– Dates should be in the format YYYY-MM-DD.”””,
},
{
“role”: “user”,
“content”: document_markdown_content,
}
],
response_format=Invoice,
max_tokens=4096,
temperature=0.1,
top_p=0.1
)

Best practices for utilizing Structured Outputs for document data processing

Schema/model design. Use well defined names for nested objects and properties to make it easier for the GPT-4o model to interpret how to extract these key pieces of information from documents. Be specific in terminology to ensure the model determines the correct value for fields.

Utilize prompt engineering. Continue to use your input prompts to provide direct instruction to the model on how to work with the document provided. For example, include the definitions for domain jargon, acronyms, and synonyms that may exist in a document type.

Use libraries that generate JSON schemas. Libraries, such as Pydantic for Python, make it easier to focus on building out models and data validation without the complexities of understanding how to convert or build a JSON schema from scratch.

Combine with GPT-4o vision capabilities. Processing document pages as images in a request to GPT-4o using Structured Outputs can yield higher accuracy and cost-effectiveness when compared to processing document text alone.

Summary

Leveraging Structured Outputs in Azure OpenAI’s GPT-4o provides a necessary solution to ensure consistent and reliable outputs when processing documents. By enforcing adherence to JSON schemas, this feature minimizes the chances of errors, reduces post-processing needs, and optimizes token usage.

The one key recommendation to take away from this guidance is:

Evaluate Structured Outputs for your use cases. We have provided a collection of samples on GitHub to guide you through potential scenarios, including extraction and classifications. Modify these samples to the needs of your specific document types to evaluate the effectiveness of the techniques. Get the samples on GitHub.

By exploring this approach, you can further streamline your document processing workflows, enhancing developer productivity and satisfaction for end users.

Cart

Cart