Ministral 3B small model from Mistral is now available in the Azure AI Model Catalog
At Microsoft, we are committed to driving innovation in AI by continually enhancing our offerings. On the first anniversary of Mistral 7B, we are excited to continue our collaboration with Mistral to announce the addition of the new state-of-the-art models to the Azure AI Model Catalog: Ministral 3B. This model despite its size is setting a new standard in performance and efficiency.
A New Frontier in AI Performance
According to Mistral, Ministral 3B represents a significant advancement in the sub-10B category, focusing on knowledge, commonsense reasoning, function-calling, and efficiency. With support for up to 128k context length , these models are tailored for a diverse array of applications—from orchestrating agentic workflows to developing specialized task workers.
Enhancing Workflow Efficiency
When used alongside larger language models like Mistral Large, Ministral 3B can serve as efficient intermediary for function-calling in multi-step agentic workflows. Their ability to be fine-tuned allows them to excel in tasks such as:
Input Parsing: Quickly understanding user inputs to streamline processing.
Task Routing: Directing tasks to appropriate models or functions based on user intent.
API Calling: Efficiently interfacing with APIs while minimizing latency and operational costs.
Creating Powerful Agents with Small Models
Two Key Use Cases
Ministral 3B excels in two primary macro use cases:
Multi-Step Agentic Workflows: This use case involves orchestrating complex workflows where agents need to call larger models selectively. Ministral 3B serves as a highly efficient intermediary, identifying the appropriate larger models to invoke, ensuring that the right model is used for the right task in the workflow.
Low-Latency, High-Volume Use Cases: For applications requiring rapid, high-throughput responses—such as real-time customer support, data processing, and high-volume API calls—Ministral 3B delivers exceptional low-latency performance, allowing enterprises to process large volumes of requests with minimal delay.
Versatile Use Cases
Ministral 3B can also be utilized for a wide range of agentic use cases, including:
Customer Support Automation: Enhancing customer interactions with efficient automated responses.
Back Office and Process Automation: Streamlining operations and improving productivity.
Code Migration and CI/CD: Facilitating smoother transitions in software development cycles.
Improved RAG Architectures & Retrieval: Optimizing retrieval-augmented generation tasks.
Moderation and LLM Output Checking: Ensuring the quality and appropriateness of AI outputs.
Agentic Benefits of Smaller Models
Ability: Ministral 3B, and other smaller models in the same category are super performant, fast, and cost-efficient. They excel in function calling, making them ideal for agentic workflows.
Security: These models can be deployed securely and efficiently in your environment, ensuring that your internal data remains private. They can be implemented on a VPC, in the cloud, on-premise, or accessed via our API.
Customizability: Smaller models can be fine-tuned easily for specific tasks and may outperform larger models in certain domains. Their size allows for efficient fine-tuning and retraining, facilitating adaptability to evolving needs.
Agentic Architecture with Small Models
Ministral 3B is designed to work within an efficient agentic architecture that leverages specialized smaller models. Here’s how it works:
User Request Handling: When a user makes a request (e.g., “Please give me my customer number”), it is processed through a router, which can be an embedding model or a larger language model (LLM).
Routing to Specialized Agents: The router directs the request to specialized agents, such as:
Account Management Agent
Fraud Detection Agent
Billing Details Agent
Customer Support Agent
Efficiency Benefits: This architecture is much faster and cheaper than using a single large model. Small and edge models are fine-tuned for specific domain tasks, enabling them to outperform larger general models in many scenarios.
Microservices Approach: The architecture allows for easier adaptation of models compared to a large LLM handling everything. This microservices approach leads to improved performance in function calling and overall user experience.
How to use MInistral 3B on Azure?
Here’s how you can effectively utilize the newly introduced les Ministraux models in the Azure AI Model Catalog:
Prerequisites:
If you don’t have an Azure subscription, get one here: https://azure.microsoft.com/en-us/pricing/purchase-options/pay-as-you-go
Familiarize yourself with Azure AI Model Catalog
Create an Azure AI Studio hub and project. Make sure you pick East US, West US3, South Central US, West US, North Central US, East US 2 or Sweden Central as the Azure region for the hub.
Create a deployment to obtain the inference API and key:
Open the model card in the model catalog on Azure AI Studio.
Click on Deploy and select the Pay-as-you-go option.
Subscribe to the Marketplace offer and deploy. You can also review the API pricing at this step.
You should land on the deployment page that shows you the API and key in less than a minute. You can try out your prompts in the playground.
The prerequisites and deployment steps are explained in the product documentation. You can use the API and key with various clients. Check out the samples to get started.
Conclusion
The introduction of Ministral 3B marks an exciting milestone in our journey to enhance AI capabilities on Azure. By integrating these state-of-the-art models into the Azure AI Model Catalog, we empower developers and businesses to innovate with confidence, leveraging advanced AI solutions for edge computing and on-device applications.
With its combination of low-latency performance, versatility across use cases, and cost-efficiency at $0.04 per million tokens, Ministral 3B is a game-changer for enterprises looking to harness the power of AI without breaking the bank.
Join us in exploring the future of AI with Ministral 3B—where cutting-edge technology meets practical applications for a smarter, more efficient world.
Microsoft Tech Community – Latest Blogs –Read More