Meta’s new Llama 3.2 SLMs and image reasoning models now available on Azure AI Model Catalog
In collaboration with Meta, Microsoft is excited to announce that Meta’s new Llama 3.2 models are now available on the Azure AI Model Catalog. Starting today, Llama 3.2 11B Vision Instruct and Llama 3.2 90B Vision Instruct models – Llama’s first ever multi-modal models – via managed compute. are ready to be deployed via managed compute in the Model Catalog.
Coming Soon: Inferencing through Models-as-a-Service serverless APIs is coming soon.
Additionally, Llama 3.2 1B, 3B,1B Instruct, 3B Instruct, are Meta’s first ever SLMs, built for on-device local inferencing for mobile and edge devices, ensuring safety anywhere and low-cost agentic applications like multilingual summarization and RAG for on-device and local inferencing. We’re delighted to be one of Meta’s launch partners for this release and empower developers with the latest Llama models, with the 3.2 release fit for purpose towards edge, mobile and image reasoning use cases. This release brings together the capabilities of Azure’s secure and scalable cloud infrastructure, Azure AI Studio’s tools like Azure AI Content Safety, Azure AI Search and prompt flow with Meta’s cutting-edge AI models to offer a powerful, customizable, and secure AI experience.
Introducing Llama 3.2: A New Era of Vision and Lightweight AI Models
Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. To support image recognition tasks, the Llama 3.2-Vision models use separately trained image reasoning adaptor weights that are packaged with the core LLM weights via cross-attention and called as tool use when an input image is presented to the model.
These models are designed for a variety of use cases, including image reasoning, multilingual summarization, and personalized on-device agents. With these models, developers can create AI applications that prioritize user privacy, reduce reliance on the cloud, and deliver faster, more efficient processing. All models support long context lengths (up to 128k) and are optimized for inference with grouped query attention (GQA).
Starting today, developers can access the following models via managed compute inferencing:
Llama 3.2 1B
Llama 3.2 3B
Llama 3.2-1B-Instruct
Llama 3.2-3B-Instruct
Llama Guard 3 1B
Llama 3.2 11B Vision Instruct
Llama 3.2 90B Vision Instruct
Llama Guard 3 11B Vision
Fine-tuning is available for Llama 3.2 1B Instruct and 3B Instruct and coming soon for the rest of the collection.
Coming soon to Models-as-a-Service, Llama 3.2 11B Vision Instruct and Llama 3.2 90B Vision Instruct will be available via serverless API deployment.
Key Features and Benefits of Llama 3.2
Multimodal Capabilities for image reasoning applications: Llama 3.2’s Vision models (11B and 90B) are the first Llama models to support multimodal tasks, integrating image encoder representations into the language model. Developers can create applications that analyze visual data and generate accurate insights, helping bridge the gap between vision and language in AI models.
Lightweight Models for Mobile and Edge Applications: Llama 3.2’s 1B and 3B text-only models are ideal for edge applications, offering local and on-device inferencing, ensuring that sensitive information never leaves the device, significantly reducing the risk of data breaches or unauthorized access. These models enable fast, real-time responses for on-device agents, making them ideal for tasks like summarizing messages, retrieving information, and providing multilingual support, all while maintaining user privacy.
System-Level Safety and Customization: Llama 3.2 introduces Llama Guard 3, a safety layer built into the models to ensure responsible innovation. This safeguard helps developers maintain compliance and trust while building AI solutions. Additionally, developers have full control and customization over the models, with direct access to model weights and architecture.
Llama Stack for Seamless Development: Llama 3.2 is built on top of the Llama Stack, a standardized interface that simplifies the development of AI applications. This stack integrates with PyTorch and includes tools for fine-tuning, synthetic data generation, and agentic application development. The Llama Stack API allows developers to manage Llama models with ease, providing a streamlined experience from evaluation to deployment: meta-llama/llama-stack: Model components of the Llama Stack APIs (github.com)
What Sets Llama 3.2 Apart
According to Meta, Llama 3.2 stands out for its combination of flexibility, privacy, and performance:
Deep Customization: Developers can tailor models to their specific needs, with full control over weights and architecture.
Infrastructure Control: With the flexibility to deploy in any environment—whether on-prem, cloud, or virtual—Llama 3.2 offers unmatched versatility.
Ironclad Security: Processing data locally maintains sovereignty over sensitive information, ensuring that privacy is a top priority.
Complete Transparency: Llama 3.2 provides full visibility into model behavior, supporting regulatory compliance and trust building..
Why Llama 3.2 on Azure?
Developers using Meta Llama 3 models can work seamlessly with tools in Azure AI Studio, such as Azure AI Content Safety, Azure AI Search, and prompt flow to enhance ethical and effective AI practices. Here are some main advantages that highlight the smooth integration and strong support system provided by Llama 3.2 with Azure, Azure AI and Models as a Service:
Enhanced Security and Compliance: Azure places a strong emphasis on data privacy and security, adopting Microsoft’s comprehensive security protocols to protect customer data. With Llama 3.2 on Azure AI Studio, enterprises can operate confidently, knowing their data remains within the secure bounds of the Azure cloud, thereby enhancing privacy and operational efficiency.
Content Safety Integration: Customers can integrate Meta Llama 3 models with content safety features available through Azure AI Content Safety, enabling additional responsible AI practices. This integration facilitates the development of safer AI applications, ensuring content generated or processed is monitored for compliance and ethical standards.
Simplified Assessment of LLM flows: Azure AI’s prompt flow allows evaluation flows, which help developers to measure how well the outputs of LLMs match the given standards and goals by computing metrics. This feature is useful for workflows created with Llama 3.2; it enables a comprehensive assessment using metrics such as groundedness, which gauges the pertinence and accuracy of the model’s responses based on the input sources when using a retrieval augmented generation (RAG) pattern
Client integration: You can use the API and key with various clients. Use the provided API in Large Language Model (LLM) tools such as prompt flow, OpenAI, LangChain, LiteLLM, CLI with curl and Python web requests. Deeper integrations and further capabilities coming soon.
Simplified Deployment and Inference: By deploying Meta models through MaaS with pay-as-you-go inference APIs, developers can take advantage of the power of Llama 3 without managing underlying infrastructure in their Azure environment.
These features demonstrate Azure’s commitment to offering an environment where organizations can harness the full potential of AI technologies like Llama 3.2 efficiently and responsibly, driving innovation while maintaining high standards of security and compliance.
Getting Started with Meta Llama3 on MaaS
To get started with Azure AI Studio and deploy your first model, follow these clear steps:
Familiarize Yourself: If you’re new to Azure AI Studio, start by reviewing this documentation to understand the basics and set up your first project.
Access the Model Catalog: Open the model catalog in AI Studio.
Find the Model: Use the filter to select the Meta collection or click the “View models” button on the MaaS announcement card.
Select the Model: Open the Llama-3.2 text model from the list.
Deploy the Model: Click on ‘Deploy’ and choose the managed compute option
FAQ
What does it cost to use Llama 3.2 models on Azure?
For managed compute deployments, you’ll be billed based on the minimum GPU SKU used for deployment, provided you have sufficient GPU quota.
For models via MaaS, you’ll be billed based on the prompt and completion tokens. Pricing will be available soon, seen in Azure AI Studio (Marketplace Offer details tab when deploying the model) and Azure Marketplace.
Do I need GPU capacity in my Azure subscription to use Llama 3.2 models?
Yes, for models available via managed compute deployment, you will need GPU capacity by model.
When you deploy the model, you’ll see the VM that is automatically selected for deployment.
For the 11B Vision Instruct and 90B Vision Instruct available via serverless API (coming soon), no GPU capacity is required.
When Llama 3.2 11B Vision Instruct and 90B Vision Instruct models are listed on the Azure Marketplace, can I purchase and use these models directly from Azure Marketplace?
Azure Marketplace enables the purchase and billing of Llama 3.2, but the purchase experience can only be accessed through the model catalog.
Attempting to purchase Llama 3.2 models from the Marketplace will redirect you to Azure AI Studio.
Given that Llama 3.2 11B Vision Instruct and 90B Vision Instruct will be billed through the Azure Marketplace, would it retire my Azure consumption commitment (aka MACC) when these models are available via MaaS?
Yes, both the Llama 3.2 11B Vision Instruct and 90B Vision Instruct models will be “Azure benefit eligible” Marketplace offer, indicating MACC eligibility. Learn more about MACC here: https://learn.microsoft.com/en-us/marketplace/azure-consumption-commitment-benefit
Is my inference data shared with Meta?
No, Microsoft does not share the content of any inference request or response data with Meta.
Are there rate limits for the Meta models on Azure?
Meta models come with 200k tokens per minute and 1k requests per minute limit. Reach out to Azure customer support if this doesn’t suffice.
Can I use MaaS models in any Azure subscription types?
Customers can use MaaS models in all Azure subsection types with a valid payment method, except for the CSP (Cloud Solution Provider) program. Free or trial Azure subscriptions are not supported.
Can I fine-tune Llama 3.2 models?
You can fine-tune the Llama 3.2 1B Instruct and 3B Instruct models. Fine-tuning for the rest of the collection is coming soon.
Microsoft Tech Community – Latest Blogs –Read More