Optimizing Models: Fine-Tuning, RAG and Application Strategies
Before diving in, let’s take a moment to review the key resources and foundational concepts that will guide us through this blog. That will ensure we’re well-equipped to follow along. This brief review will provide a strong starting point for exploring the main topics ahead.
Microsoft Azure: Microsoft offers a cloud computing platform and a suite of cloud services. It provides a wide range of cloud-based
services and solutions that enable organizations to build, deploy, and manage applications and services through Microsoft’s global network of data centers.
AI Studio: a platform that helps you evaluate model responses and orchestrate prompt application components with prompt flow for better performance. The platform facilitates scalability for transforming proof of concepts into full-fledged production with ease, continuous monitoring and refinement support long-term success.
Fine-tuning: is the process of retraining pretrained models on specific datasets. The purpose is typically to improve model performance on specific tasks or to introduce information that wasn’t well represented when you originally trained the base model.
Retrieval Augmented Generation (RAG): is a pattern that works with pretrained large language models (LLM) and your own data to generate responses. In Azure Machine Learning, you can implement RAG in a prompt flow.
Our hands-on learning will be developing an AI-based solution that helps the user extract financial information and insights from investment/finance books and newspaper in our database.
The process is divided into three main parts:
Fine-tune a base model with financial data to help the model provide more specific responses and be grounded and rooted with data related to finance and investment.
Implement RAG so that the response won’t be only based on the data it was trained with (fine-tuned with) but also based on other data sources (the user’s input in our case).
Integration of the deployed model into a web app so that it could be used through a user interface.
1- Setup:
Create a resource group which is defined as a container that holds related resources for an Azure solution. The resource group can include all the resources for the solution, or only those resources that you want to manage as a group.
You need to specify your subscription, a unique resource group name, and the region.
Create an Azure OpenAI resource: Azure OpenAI Service provides REST API access to OpenAI’s powerful language models including GPT-4o, GPT-4 Turbo with Vision, GPT-4, GPT-3.5-Turbo, and Embeddings model series. These models can be easily adapted to your specific task including but not limited to content generation, summarization, image understanding, semantic search, and natural language to code translation
– Create a text embedding model: the embedding is an information-dense representation of the semantic meaning of a piece of text. Each embedding is a vector of floating-point numbers, such that the distance between two embeddings in the vector space is correlated with semantic similarity between two inputs in the original format.
Create an AI search resource: Azure AI Search (“Azure Cognitive Search” previously) provides secure information retrieval at scale over user-owned content in traditional and generative AI search applications. Information retrieval is foundational to any app that surfaces text and vectors. Common scenarios include data exploration, and increasingly feeding query results to prompts based on your proprietary grounding data for conversational search as we will do in our example.
Create a storage account: it contains all your Azure Storage data objects: blobs, files, queues, and tables. The storage account provides a unique namespace for your Azure Storage data that is accessible from anywhere in the world over HTTP or HTTPS.
– Create a blob container: blob Storage is Microsoft’s object optimized for storing massive amounts of unstructured data. Unstructured data is data that doesn’t adhere to a particular data model or definition, such as text or binary data. it will be used to store your data.
Navigate to your storage resource -> Click on Storage browser tab on the left -> Click Blob Containers -> Click on + add container then Upload your data. Our data was pdf files (books and newspapers) and csv files from Kaggle, all are related to finance and investment.
Create a search Index: is your searchable content, available to the search engine for indexing, full text search, vector search, hybrid search, and filtered queries. Check that the status of your AI search is “Running”.
– Import and vectorize data: integrated vectorization is an extension of the indexing and query pipelines in Azure AI Search. It adds the following capabilities: Data chunking (split the data into smaller, manageable pieces) during indexing. Text-to-vector conversion during indexing.
Navigate to you AI search service -> Click on Indexes tab on the left -> Click on “Import and Vectorize data” -> Select your text embedding model previously deployed.
2- Fine Tune a Base Model
Preparing Dataset: before starting fine tuning, we need to prepare our training and validation data. Which could be collected manually or generated from text files or tabular data. However, it is important that the data confirm the following template:
{“messages”: [{“role”: “system”, “content”: “content goes here”}, {“role”: “user”, “content”: “query goes here?”}, {“role”: “assistant”, “content”: “response goes here.”}]}.
To satisfy this, we prepared our two data sets training_data.jsonl and val_data.jsonl for training and validation, respectively.
Below is an example of our training_data.jsonl:
{“messages”: [{“role”: “system”, “content”: “You are a helper in finance and an advisor in investment”}, {“role”: “user”, “content”: “What makes a successful startup?”}, {“role”: “assistant”, “content”: “Luck, hard work, and maybe skills… mostly luck, hard work, and consistency.”}]}
Both data files are attached to this blog. They were collected manually through some examples.
Evaluate data to ensure its quality, check number of tokens and its distribution.
import json
import tiktoken
import numpy as np
from collections import defaultdict
encoding = tiktoken.get_encoding(“cl100k_base”)
def num_tokens_from_messages(messages, tokens_per_message=3, tokens_per_name=1):
num_tokens = 0
for message in messages:
if not isinstance(message, dict):
print(f”Unexpected message format: {message}”)
continue
num_tokens += tokens_per_message
for key, value in message.items():
if not isinstance(value, str):
print(f”Unexpected value type for key ‘{key}’: {value}”)
continue
num_tokens += len(encoding.encode(value))
if key == “name”:
num_tokens += tokens_per_name
num_tokens += 3
return num_tokens
def num_assistant_tokens_from_messages(messages):
num_tokens = 0
for message in messages:
if not isinstance(message, dict):
print(f”Unexpected message format: {message}”)
continue
if message.get(“role”) == “assistant”:
content = message.get(“content”, “”)
if not isinstance(content, str):
print(f”Unexpected content type: {content}”)
continue
num_tokens += len(encoding.encode(content))
return num_tokens
def print_distribution(values, name):
if values:
print(f”n#### Distribution of {name}:”)
print(f”min / max: {min(values)}, {max(values)}”)
print(f”mean / median: {np.mean(values)}, {np.median(values)}”)
print(f”p5 / p95: {np.quantile(values, 0.05)}, {np.quantile(values, 0.95)}”)
else:
print(f”No values to display for {name}”)
files = [
r’train_data.jsonl’,
r’val_data.jsonl’
]
for file in files:
print(f”Processing file: {file}”)
try:
with open(file, ‘r’, encoding=’utf-8′) as f:
total_tokens = []
assistant_tokens = []
for line in f:
try:
ex = json.loads(line)
messages = ex.get(“messages”, [])
if not isinstance(messages, list):
raise ValueError(“The ‘messages’ field should be a list.”)
total_tokens.append(num_tokens_from_messages(messages))
assistant_tokens.append(num_assistant_tokens_from_messages(messages))
except json.JSONDecodeError:
print(f”Error decoding JSON line: {line}”)
except ValueError as ve:
print(f”ValueError: {ve} – line: {line}”)
except Exception as e:
print(f”Unexpected error processing line: {e} – line: {line}”)
if total_tokens and assistant_tokens:
print_distribution(total_tokens, “total tokens”)
print_distribution(assistant_tokens, “assistant tokens”)
else:
print(“No valid data to process.”)
print(‘*’ * 50)
except FileNotFoundError:
print(f”File not found: {file}”)
except Exception as e:
print(f”An unexpected error occurred: {e}”)
Login to AI Studio
Navigate to the Fine-tuning tab
Check the available models for fine-tuning within your region.
Upload your training and validation data
Since we have our data locally, we uploaded them. In case you want to save your data in the cloud and use the URL for later in place of the “Uploading files” option, you can use SDK and follow this code:
# Initialize AzureOpenAI client
client = AzureOpenAI(
azure_endpoint=azure_oai_endpoint,
api_key=azure_oai_key,
api_version=version # Ensure this API version is correct
)
training_file_name = r’path’
validation_file_name = r’path’
try:
# Upload the training dataset file
with open(training_file_name, “rb”) as file:
training_response = client.files.create(
file=file, purpose=”fine-tune”
)
training_file_id = training_response.id
print(“Training file ID:”, training_file_id)
except Exception as e:
print(f”Error uploading training file: {e}”)
try:
# Upload the validation dataset file
with open(validation_file_name, “rb”) as file:
validation_response = client.files.create(
file=file, purpose=”fine-tune”
)
validation_file_id = validation_response.id
print(“Validation file ID:”, validation_file_id)
except Exception as e:
print(f”Error uploading validation file: {e}”)
You can specify the hyperparameters such as batch size, or leave them with default values.
Review the settings before submitting
Check the status of the fine-tuning in your dashboard, changing from Queued to Running to Completed.
Once completed, your fine-tuned model is ready to be deployed. Click on ‘Deploy’
After successful deployment, you can go back to Azure Open AI and find your fine-tuned model deployed along with your previous text embedding model.
3- Integration into Web App
The concept here is to rely on the model’s knowledge + users’ documentation. We have two options and both provide high precision for responses:
Look for the answer in the documents, and if not found, return a response based on the internal knowledge of the model.
Combine the two responses from the retriever and the model. Which is the one we opt for here.
Also, for integration, we have two ways we may follow: through the Azure OpenAI User Interface and deploying into an Azure static web app or develop your own web app and use the Azure SDK to integrate your model.
1- Deploying into Azure static web app
Click on “Open in Playground” below your deployments list in Azure open AI
Click “Add your data”
Choose your Azure blob storage as data source à Choose Index name “myindex”
Customize the system message to “You are a financial advisor and an expert in investment. You have access to a wide variety of documents. Use your own knowledge to answer the question and verify it or supplement it using the relevant documents when possible.” This system message will enable the model not only to rely on documents but also rely on its internal knowledge.
Complete the setup and click on “Apply changes”
Deploy to a new web app and configure the web app name, subscription, resource group, location, and pricing plan.
2- Develop your own web App and use Azure SDK
Prepare your environment
load_dotenv ()
azure_oai_endpoint = os.getenv(“AZURE_OAI_FINETUNE_ENDPOINT2”)
azure_oai_key = os.getenv(“AZURE_OAI_FINETUNE_KEY2”)
azure_oai_deployment = os.getenv(“AZURE_OAI_FINETUNE_DEPLOYMENT2”)
azure_search_endpoint = os.getenv(“AZURE_SEARCH_ENDPOINT”)
azure_search_key = os.getenv(“AZURE_SEARCH_KEY”)
azure_search_index = os.getenv(“AZURE_SEARCH_INDEX”)
Initialize your AzureOpenAI client
client = AzureOpenAI(
base_url=f”{azure_oai_endpoint}/openai/deployments/{azure_oai_deployment}/extensions”,
api_key=azure_oai_key,
api_version=”2023-09-01-preview)
Configure your data source for Azure AI search. This will retrieve response from our stored files.
extension_config = dict(
dataSources= [
{
“type”: “AzureCognitiveSearch”,
“parameters”: {
“endpoint”: azure_search_endpoint,
“key”: azure_search_key,
“indexName”: azure_search_index,
}
}
]
)
RAG is used to enhance a model’s capabilities by adding more grounded information, not to eliminate the model’s internal knowledge.
RAG is used to enhance a model’s capabilities by adding more grounded information, not to eliminate the model’s internal knowledge.
Some issues that you may face during development:
Issue 1: make sure to verify the OpenAI version. You can pin the version to openai=0.28 or upgrade it and follow migration steps.
Issue 2: you may run out of quota and be asked to wait for 24 hours till the next try. Make sure to always have enough quota in your subscription.
Issue 1: make sure to verify the OpenAI version. You can pin the version to openai=0.28 or upgrade it and follow migration steps.
Issue 2: you may run out of quota and be asked to wait for 24 hours till the next try. Make sure to always have enough quota in your subscription.
Next, you can look at how to do real-time injection so that you personalize more of the responses. Try to find how to rely between your web app, the user’s input I/O, the searching index, and LLM.
Keyword: Langchain, Databricks
Resources:
what-is-azure-used-for.
What is Azure AI Studio? – Azure AI Studio | Microsoft Learn.
Fine-tuning in Azure AI Studio – Azure AI Studio | Microsoft Learn.
machine-learning/concept-retrieval-augmented-generation.
Manage resource groups – Azure portal – Azure Resource Manager | Microsoft Learn.
What is Azure OpenAI Service? – Azure AI services | Microsoft Learn
Introduction to Azure AI Search – Azure AI Search | Microsoft Learn
storage-account-create
Introduction to Blob (object) Storage – Azure Storage | Microsoft Learn
How to generate embeddings with Azure OpenAI Service – Azure OpenAI | Microsoft Learn.
Azure OpenAI Service models – Azure OpenAI | Microsoft Learn
Search index overview – Azure AI Search | Microsoft Learn
Integrated vectorization – Azure AI Search | Microsoft Learn
Easy Guide to Transitioning from OpenAI to Azure OpenAI: Step-by-Step Process
LangChain on Azure Databricks for LLM development – Azure Databricks | Microsoft Learn
Build a RAG-based copilot solution with your own data using Azure AI Studio – Training | Microsoft Learn
RAG and generative AI – Azure AI Search | Microsoft Learn
Retrieval augmented generation in Azure AI Studio – Azure AI Studio | Microsoft Learn
Retrieval Augmented Generation using Azure Machine Learning prompt flow (preview) – Azure Machine Learning | Microsoft Learn
Retrieval-Augmented Generation (RAG) with Azure AI Document Intelligence – Azure AI services | Microsoft Learn
Microsoft Tech Community – Latest Blogs –Read More