Integrating Azure AI and OpenAI for Smart App Development
Hello! Abdulhamid here! I am a Microsoft student ambassador writing from Aston University, where I am studying Applied Artificial Intelligence. I’m excited to share insights on leveraging Azure AI and OpenAI to build cutting-edge applications. Whether you’re a fellow student, a developer, or an AI enthusiast, this step-by-step guide will help you harness the power of OpenAI’s models to enhance existing apps or create new innovative ones. With Azure AI Studio offering a wide array of AI services that can be easily deployed, and the new Azure OpenAI services granting access to advanced generative models, integrating these tools has become more straightforward than ever.
In this article, I will guide you through the process of building a smart nutritionist app by leveraging Azure Document Intelligence service for text extraction and Azure OpenAI’s GPT-4o for human-like, accurate responses.
Prerequisites:
An active Azure subscription
Registered access to Azure OpenAI service
Basic knowledge of Python programming language
Text Extraction with Azure’s pre-built model
For our application, we want to extract the ingredients and nutrition information from the food products we consume. Document Intelligence simplifies this process by accurately extracting text, tables, key-value pairs and other structures from images and documents. While you can train and build custom models with this service, Azure accelerates your app development process by provisioning highly accurate pre-built models.
One such model we will be leveraging is the prebuilt-layout that easily extracts different structures which will be useful for retrieving nutritional information that is mostly printed in tables.
To deploy a Document Intelligence resource:
Sign in to https://portal.azure.com/ and search Document Intelligence
Click on the search result and create on the top pane
Fill in the following to create and deploy the service
Subscription: select your active subscription
Resource group: Select an existing resource group or create a new one
Name: name your resource.
Region: select any region you wish, with east US being the default
Pricing: Select free tier (F0)
Click on review + create and then create to deploy the resource
Once a resource has been deployed, click on “Go to resource”. Scroll to the bottom of the page and copy one of the access keys and endpoint–you need this to connect your app to your deployed service.
Setting up your development environment
Next, set up the environment for your app development:
Create a .env file to hold your credentials
Open your notepad and paste the following:
AZURE_FORM_RECOGNIZER_ENDPOINT=”YOUR_AZURE_FORM_RECOGNIZER_ENDPOINT”
AZURE_FORM_RECOGNIZER_KEY=”YOUR_AZURE_FORM_RECOGNIZER_KEY”
b. Copy your endpoint and anyone of the keys and paste in the placeholders
c. Save the file into a new folder “nutrition_app” with the name .env and save as all files
Open VS code and open your newly created folder
Press Ctrl + Shift + P to open your terminal and run the following commands: pip install azure-ai-formrecognizer==3.3.0
Press Ctrl + Alt + Windows + N, select python file to create a new python file
We need to import the necessary libraries and set up the configurations required for us to connect to the pre-built model. Copy and paste the code below to do that. I have also loaded sample images of a nutrition label and ingredients.
# import modules
import os
from dotenv import load_dotenv
from azure.core.credentials import AzureKeyCredential
from azure.ai.formrecognizer import DocumentAnalysisClient
load_dotenv()
# Azure Form Recognizer Configuration
azure_form_recognizer_endpoint = os.getenv(“AZURE_FORM_RECOGNIZER_ENDPOINT”)
azure_form_recognizer_key = os.getenv(“AZURE_FORM_RECOGNIZER_KEY”)
ingredients_file_uri = “https://github.com/HamidOna/azurelearn/blob/main/20240609_002105.jpg?raw=true”
nutrition_table_file_uri = “https://github.com/HamidOna/azurelearn/blob/main/20240608_192914.jpg?raw=true”
fileModelId = “prebuilt-layout”
This pre-built model can recognize tables, lines, and words. We would extract the ingredients as lines and the nutrition facts as a table to properly parse the structure without getting any of the content jumbled up.
We would also add a few more lines of code to achieve this before we finally print the extracted text.
Copy the code below to the existing script.
document_analysis_client = DocumentAnalysisClient(
endpoint=azure_form_recognizer_endpoint,
credential=AzureKeyCredential(azure_form_recognizer_key)
)
poller_ingredients = document_analysis_client.begin_analyze_document_from_url(
model_id=fileModelId,
document_url=ingredients_file_uri
)
result_ingredients = poller_ingredients.result()
# Extract text labels from the ingredients image
ingredients_content = “”
if result_ingredients.pages:
for idx, page in enumerate(result_ingredients.pages):
for line in page.lines:
ingredients_content += f”{line.content}n”
# Connect to Azure Form Recognizer for nutrition table image
print(f”nConnecting to Forms Recognizer at: {azure_form_recognizer_endpoint}”)
print(f”Analyzing nutrition table at: {nutrition_table_file_uri}”)
poller_nutrition_table = document_analysis_client.begin_analyze_document_from_url(
model_id=fileModelId,
document_url=nutrition_table_file_uri
)
result_nutrition_table = poller_nutrition_table.result()
# Extract table content from the nutrition table image
nutrition_table_content = “”
if result_nutrition_table.tables:
for table_idx, table in enumerate(result_nutrition_table.tables):
table_content = []
for row_idx in range(table.row_count):
row_content = [“”] * table.column_count
table_content.append(row_content)
for cell in table.cells:
table_content[cell.row_index][cell.column_index] = cell.content
nutrition_table_content += f”nTable #{table_idx + 1}:n”
for row in table_content:
nutrition_table_content += “t”.join(row) + “n”
combined_content = f”Ingredients:n{ingredients_content}nNutrition Table:n{nutrition_table_content}”
print(combined_content)
Now save the file as app.py and proceed to run it in your terminal
python app.py
You should get an output similar to this:
Connecting to Openai GPT4o and parsing the extracted text
We have completed the first half of this project. Next, we set up the GPT4o model and then parse our data to generate results.
First create an Openai resource (fill out this registration form if you don’t already have access):
Subscription: select your active subscription
Resource group: Select an existing resource group or create a new one
Name: name your resource.
Region: Select any region from the available lists of regions.
Pricing: Select Standard S0
After deploying the resource, click on “Go to Azure Openai studio” on the top pane
Scroll down on the left pane and click on the “Deployments” page
Click on “create new deployment” next
Select GPT4o in the list of models
Assign a name to the deployment (note down the name so you can connect to it)
Reduce the token rate limit to 7K and then “Create”
Once that has successfully deployed, return to the Openai resource in portal.azure.com and copy the endpoint and access key
Next, go to the app.py script. Open the terminal and run:
pip install openai
Navigate to the .env file and paste the following into the file after replacing with your endpoint and access key from the Openai resource
AZURE_OAI_ENDPOINT=”YOUR_AZURE_FORM_RECOGNIZER_ENDPOINT”
AZURE_OAI_KEY=”YPUR_AZURE_FORM_RECOGNIZER_KEY”
AZURE_OAI_DEPLOYMENT=”YOUR_DEPLOYED_MODEL_NAME”
Copy the imports below and replace in the current script
# import modules
import os
from dotenv import load_dotenv
from azure.core.credentials import AzureKeyCredential
from azure.ai.formrecognizer import DocumentAnalysisClient
from openai import AzureOpenAI
Set up the configurations right after the print statement for the extracted text:
client = AzureOpenAI(
azure_endpoint=azure_oai_endpoint,
api_key=azure_oai_key,
api_version=”2024-05-13″
)
Prompt Engineering
An important part of using LLM models and getting accurate results is prompting.
Prompt engineering allows you to accurately pass instructions to the model on how to behave which is essential to how perfectly it executes its tasks. It’s good practice to spend a few minutes crafting an excellent prompt tailored to your use case.
For this project, we want our model to be able to tell us about the ingredients and provide helpful advice about them. We also want it to print a summary of its report before extensive detail about the ingredients. Another useful tip is to pass an example of a query and an example output from the model. See below the implementation:
# Create a system message
system_message = “””
You are a smug, funny nutritionist who provides health advice based on ingredients and nutrition tables.
Provide advice on what is safe to consume based on the ingredients and nutrition table.
Discuss the ingredients as a whole but single out scientifically named ingredients so the user can understand them better.
Mention the adequate consumption or potential harm based on excessive amounts of substances.
Identify any potential allergies. Output a general summary first before giving further details. Here are a few examples:
– “Example:
{User query}:
Please analyze the following ingredients and nutrition label content:
Ingredients: Potatoes, Vegetable Oils, Salt, Potassium phosphates
Nutrition Table:
– Energy: 532 kcal per 100g
– Fat: 31.5g per 100g
– Sodium: 1.28g per 100g
{System}:
Summary: The ingredients are pretty standard for potato crisps. Potatoes and vegetable oils provide the base, while salt adds flavor. Watch out for the high fat and sodium content if you’re trying to watch your heart health or blood pressure. As for allergies, you’re mostly safe unless you’re allergic to potatoes or sunflower/rapeseed oil. Potassium phosphates? Just some friendly muscle helpers, but keep it moderate!
Potassium phosphates: Ah, the magical salts that help keep your muscles happy. Just don’t overdo it!”
“””
Copy and paste the message above into the existing python script. Next, we parse the extracted text to the model and make a request.
messages_array = [{“role”: “system”, “content”: system_message}]
# Add the extracted nutrition label content to the user messages
messages_array.append({“role”: “user”, “content”: f”Please analyze the following nutrition label content:n{combined_content}”})
# Send request to Azure OpenAI model
response = client.chat.completions.create(
model=azure_oai_deployment,
temperature=0.6,
max_tokens=1200,
messages=messages_array
)
generated_text = response.choices[0].message.content
# Print the summary generated by OpenAI
print(“Summary: ” + generated_text + “n”)
Copy and paste the code above. Save your changes and run in your terminal.
python nutrition_app.py
Voila! You have your own personal food nutritionist. See sample result below:
You can further refine the system message to fit your diet such as watching out for food with high sugar content, specifying allergies, helping you find halal food and so on.
Check out the following resources to improve the app and build your own specific use cases:
Document Intelligence pre-built models
Project on Github with streamlit interface
Microsoft Tech Community – Latest Blogs –Read More