Azure AI Model Inference API
The Azure AI Model Inference API provides a unified interface for developers to interact with various foundational models deployed in Azure AI Studio. This API allows developers to generate predictions from multiple models without changing their underlying code. By providing a consistent set of capabilities, the API simplifies the process of integrating and switching between different models, enabling seamless model selection based on task requirements.
Features
Foundational models have made significant advancements, particularly in natural language processing and computer vision. However, these models often excel in specific tasks and may approach the same problem differently. The Azure AI Model Inference API enables developers to:
Enhance performance by selecting the most suitable model for a particular task.
Optimize efficiency by using smaller, faster models for simpler tasks.
Create complex experiences by composing multiple models.
Maintain code portability across different models without sacrificing performance or capabilities.
Availability of Models
The Azure AI Model Inference API is available for the following models:
Serverless API Endpoints:
Cohere Embed V3 family
Cohere Command R family
Meta Llama 2 chat family
Meta Llama 3 instructs family
Mistral-Small
Mistral-Large
Jais
Jamba family
Phi-3 family
Managed Inference:
Meta Llama 3 instructs family
Phi-3 family
Mistral and Mixtral family
Additionally, the API is compatible with Azure OpenAI model deployments. Note that models deployed after June 24th, 2024, can take advantage of managed inference capabilities.
API Capabilities
The API supports multiple modalities, allowing developers to:
Retrieve model information: Get details about the deployed model.
Text embeddings: Generate an embedding vector for the input text.
Text completions: Generate text based on a provided prompt.
Chat completions: Create responses for chat conversations.
Image embeddings: Generate embedding vectors for text and image inputs.
Inference SDK Support
The Azure AI Inference SDK provides streamlined clients in several languages, including Python, JavaScript, and C#, making it easy to consume predictions from models using the Azure AI Model Inference API.
Installation
To install the Python package, use:
pip install azure-ai-inference
Example: Creating a Client for Chat Completions
Here’s a quick example of how to create a client for chat completions using Python:
import os
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential
# Create a client using an API key
client = ChatCompletionsClient(
endpoint=os.environ[“AZUREAI_ENDPOINT_URL”],
credential=AzureKeyCredential(os.environ[“AZUREAI_ENDPOINT_KEY”]),
)
# Create a client using Azure Entra ID
from azure.identity import DefaultAzureCredential
client = ChatCompletionsClient(
endpoint=os.environ[“AZUREAI_ENDPOINT_URL”],
credential=DefaultAzureCredential(),
)
Extensibility
The API allows developers to pass additional parameters to models beyond the specified modalities, using the extra-parameters header. For example, you can pass the safe_mode parameter to the Mistral-Large model, which isn’t specified in the API, like this:
from azure.ai.inference.models import SystemMessage, UserMessage
response = client.complete(
messages=[
SystemMessage(content=”You are a helpful assistant.”),
UserMessage(content=”How many languages are in the world?”),
],
model_extras={“safe_mode”: True}
)
print(response.choices[0].message.content)
Handling Different Model Capabilities
If a model doesn’t support a specific parameter, the API returns an error. You can handle these cases by inspecting the response:
import json
from azure.ai.inference.models import SystemMessage, UserMessage, ChatCompletionsResponseFormat
from azure.core.exceptions import HttpResponseError
try:
response = client.complete(
messages=[
SystemMessage(content=”You are a helpful assistant.”),
UserMessage(content=”How many languages are in the world?”),
],
response_format={“type”: ChatCompletionsResponseFormat.JSON_OBJECT}
)
except HttpResponseError as ex:
if ex.status_code == 422:
response = json.loads(ex.response._content.decode(‘utf-8’))
for offending in response.get(“detail”, []):
param = “.”.join(offending[“loc”])
value = offending[“input”]
print(f”Model doesn’t support the parameter ‘{param}’ with value ‘{value}'”)
else:
raise ex
Content Safety
The API also integrates with Azure AI Content Safety, filtering potentially harmful content. If a request triggers content safety measures, the response will indicate this, allowing developers to handle it accordingly.
Getting Started
To start using the Azure AI Model Inference API, deploy any of the supported models to Serverless API endpoints or Managed Online Endpoints and utilize the provided code to consume predictions.
Model Swapping Demo
Here’s an example of how easy it is to swap models in a Python solution while keeping the code consistent:
import os
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential
# Example function to swap models
def swap_model(endpoint_url, api_key):
client = ChatCompletionsClient(
endpoint=endpoint_url,
credential=AzureKeyCredential(api_key),
)
return client
# Swapping between two models for evaluation
model_1 = swap_model(os.environ[“MODEL1_ENDPOINT”], os.environ[“MODEL1_KEY”])
model_2 = swap_model(os.environ[“MODEL2_ENDPOINT”], os.environ[“MODEL2_KEY”])
response_1 = model_1.complete(messages=[UserMessage(content=”What’s the weather today?”)])
response_2 = model_2.complete(messages=[UserMessage(content=”What’s the weather today?”)])
# Compare the results from the two models
print(“Model 1 Response:”, response_1.choices[0].message.content)
print(“Model 2 Response:”, response_2.choices[0].message.content)
Comparing Model Outputs using the Azure Inference API
The Azure Inference API provides a convenient way to evaluate the effectiveness of different models and compare their outputs. By using the API, you can easily swap between models and test their performance on a given input prompt.
Here is an example of how to use the Azure Inference API to compare the outputs of two models:
import os
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential
from azure.ai.inference.models import UserMessage
# Set up the models
model_1_endpoint = “https://your-resource-name.cognitiveservices.azure.com/”
model_1_key = “your-model-1-key”
model_2_endpoint = “https://your-resource-name.cognitiveservices.azure.com/”
model_2_key = “your-model-2-key”
# Example function to swap models
def swap_model(endpoint_url, api_key):
client = ChatCompletionsClient(
endpoint=endpoint_url,
credential=AzureKeyCredential(api_key),
)
return client
# Swapping between two models for evaluation
model_1 = swap_model(model_1_endpoint, model_1_key)
model_2 = swap_model(model_2_endpoint, model_2_key)
# Set the model names for clarity
model_1_name = “text-davinci-002”
model_2_name = “text-curie-001”
# Set the input prompt
input_prompt = “What’s the weather today?”
# Get responses from both models
response_1 = model_1.complete(messages=[UserMessage(content=input_prompt)], model=model_1_name)
response_2 = model_2.complete(messages=[UserMessage(content=input_prompt)], model=model_2_name)
# Compare the results from the two models
print(f”Model 1 ({model_1_name}) Response:”, response_1.choices[0].message.content)
print(f”Model 2 ({model_2_name}) Response:”, response_2.choices[0].message.content)
Comparison and Contrast:
Both models respond to the input prompt but with different styles and levels of detail.
text-davinci-002 (Model 1) provides a more conversational and friendly response, acknowledging the user’s question and offering suggestions on how to find the answer. The response is longer and more elaborate, with a more personal touch.
text-curie-001 (Model 2) provides a more concise and direct response, simply stating that it’s not aware of the current weather and offering suggestions on how to find out. The response is shorter and more to the point.
In general, text-davinci-002 is known for its ability to generate more creative and conversational responses, while text-curie-001 is known for its ability to provide more accurate and informative responses. This is reflected in their responses to this input prompt.
Conclusion:
Using the Azure Inference API is a great way to evaluate the effectiveness of models and compare outputs quickly and easily. By swapping between models and testing their performance on a given input prompt, you can gain valuable insights into the strengths and weaknesses of each model and make informed decisions about which model to use for your specific use case.
Explore our samples and read the API reference documentation to get yourself started.
Microsoft Tech Community – Latest Blogs –Read More