Author: Telkom University PuTI
News to Know: Volume 1, Edition 9
Welcome to the Viva Glint newsletter. These recurring communications coincide with platform releases and enhancements to help you get the most out of the Viva Glint product. You can access the current newsletter and past editions on the Viva Glint blog.
Glint released its latest new features and enhancements on August 24, 2024. Scheduled monthly releases and regular maintenance are how we best serve our customers. Your dashboard provides date and timing details two or three days before releases. See our future release and downtime dates. Follow along with what features are ahead: Viva Glint product roadmap.
Glint celebrates one year at Microsoft Viva
In July 2023, Glint moved from its home at LinkedIn and joined Microsoft Viva, providing a comprehensive approach to improving employee engagement and performance. It has been a year of incredible accomplishments. Read how Glint enriches the Microsoft Viva experience, written by Quentin Mackey Principal Group Product Manager.
To celebrate one year of Viva Glint, we are hosting our first Viva Glint Customer Town Hall. Tune in on September 18 or 19 to hear about the latest Viva news, product updates, and stories to help you make the most of your Viva Glint experience. Choose an option below and register:
Option 1: September 18 – NAMER and EMEA
Option 2: September 18/19 – APAC
New on your Viva Glint platform
New feature! 360 feedback programs are live! Microsoft Viva Glint 360 feedback programs provide insights into an individual’s strengths and opportunities, with a long-term focus on improvement. Rather than relying solely on the perspective of an immediate supervisor or manager, 360s invite multiple perspectives. 360 feedback programs are included with your Viva Glint platform.
Read about the 360 feedback feature and watch a 2-minute video detailing the process.
Share the Viva Glint 360 Subject Guide with your employees.
Read the blog.
Need to reopen or extend your survey cycle? Sometimes survey takers need extra time to complete a survey, or your response rate may not be as high as expected. Admins can reopen or extend a survey cycle as long as a new cycle isn’t started. For best results, reopen a recently closed survey within a day or two of its closing date.
Confidently communicate with your global employee population. Use Viva Glint’s multi-language emails to meet local language guidelines for work-related communications and ensure that your global organization fully understands survey notifications. Learn more about multi-language emails.
Seamlessly discover and navigate between Microsoft 365 apps on the web. Glint is integrating to the Microsoft 365 header to allow one-touch navigation and convenient suite-wide usage from:
The Viva App Bar in other Viva apps (when they are updated to the latest App Bar package)
The Microsoft 365 Apps window (search or select the Employee Experience category)
The Microsoft 365 app launcher
Connect and learn with Viva Glint
For managers
Psychological safety for managers | September 17
Invite your managers to our first webinar on psychological safety. Building Psychological Safety on Your Team will help managers identify and cultivate psychological safety within their team. Share this link with your managers.
For all stakeholders
Ask the Experts: Identifying insights through reports | September 10
Geared towards new Viva Glint customers who are deploying their first programs, this session focuses on identifying insights using our reports. Register here.
Industry cohort meetings | September 11, 12, and 25
Join us for our Healthcare, Manufacturing, and Retail quarterly meetings where we’ll share industry-specific insights and learnings. Read the blog and register here.
HR Tech Conference | September 24-26
HR Tech is a top industry event with this year focusing on engagement, talent acquisition, and performance management in the era of AI. Don’t miss the Viva Glint, Insights, and Pulse Customer Roundtable and stop by the Microsoft booth to help influence the direction of our product. To reserve your seat (includes lunch), email jgonzales@microsoft.com. Spots are limited.
Exciting new resources
Articulate your business priorities and needs. To build your holistic employee listening strategy, take time to align with your internal stakeholders and complete our new Holistic Listening Vision and Strategy Discovery Workbook. This will help you discuss information that is important to consider for your listening strategy and for tracking progress.
Accelerate AI transformation. Review this new eBook from the Viva People Science team which outlines findings on AI readiness, discusses implications, and provides practical guidance for leaders and HR on how they can best support people through change related to AI at work. eBook: The state of AI change readiness.
Learn how Microsoft HR uses Viva and Microsoft 365 Copilot to empower employees. Microsoft HR understands that technology is about people. A human-centric approach was the natural fit when we rolled out Copilot for Microsoft 365 to the global HR organization. Read the blog about how Microsoft HR uses the Viva suite to communicate, encourage skilling and development, and measure success.
Maximize your Viva Glint experience with a Microsoft partner. Microsoft partners are certified experts who have undergone rigorous training and possess deep knowledge of our products and services. Read this blog and learn how to engage a partner.
How are we doing?
If you have any feedback on this newsletter, please reply to this email. Also, if there are people on your team that should be receiving this newsletter, please have them sign up using this link.
Microsoft Tech Community – Latest Blogs –Read More
Unlocking Secure VM Connectivity with Azure Bastion
In today’s digital landscape, where security breaches are an unfortunate reality, safeguarding sensitive data and infrastructure has become more critical than ever. Recent cyberattacks have highlighted the importance of minimizing the blast radius and fortifying defenses against potential threats. Among the myriad security measures available, Azure Bastion provides a robust solution, offering a secure and seamless pathway for accessing virtual machines (VMs)—from Dev/Test environments to enterprise production workloads.
Enhanced Security Measures
Public ports exposed to the internet are often prime targets for malicious actors. Hackers leverage port-scanning on open RDP (Remote Desktop Protocol) and SSH (Secure Shell Protocol) entry points to gain unauthorized access to systems, potentially wreaking havoc on organizations’ operations and compromising sensitive data. Azure Bastion acts as a shield against such threats by hardening at one centrally managed gateway, closing RDP/SSH ports from the public internet while providing private connectivity to VMs.
With Azure Bastion, the need to expose VMs to the public internet, along with the associated risks, is eliminated. Instead of relying on traditional methods like VPNs or on-premises jump servers, Bastion offers a simplified yet highly secure approach. By leveraging Transport Layer Security (TLS) encryption and integrating with Azure’s robust authentication mechanisms, Bastion provides seamless RDP/SSH connectivity to your VMs while hardening the attack surface to one point.
The importance of securing RDP/SSH access cannot be overstated. These protocols are essential for remote management and troubleshooting, but they also represent significant vulnerabilities if not properly secured. Azure Bastion ensures that these critical access points are protected, reducing the risk of unauthorized access and potential breaches. By centralizing the management of RDP/SSH access, Bastion simplifies the security landscape, making it easier for organizations to enforce consistent security policies and monitor access activities.
The above diagram shows connections to virtual machines via a Bastion dedicated deployment that uses a Basic or Standard SKU.
The above diagram shows connections to virtual machines via a Bastion Premium SKU deployment using the private-only feature.
Streamlined Management
Azure Bastion not only enhances security but also simplifies management tasks. Since it is a fully managed service provided by Azure, users are relieved of the burden of setting up and maintaining infrastructure components. With just a few clicks, administrators can deploy Bastion and start securely accessing their VMs without worrying about infrastructure overhead, manual updates and patching, or configuration complexities.
Moreover, Azure Bastion offers scalability and flexibility, allowing users to connect to multiple VMs across their Azure environment effortlessly. This centralized management approach streamlines operations and enhances productivity, especially in large-scale deployments where managing connectivity to numerous VMs can be challenging.
The ease of deployment and management is a significant advantage for organizations of all sizes. Small and medium-sized businesses (SMBs) can benefit from the reduced complexity and lower operational overhead, while large enterprises can leverage Bastion’s scalability to manage extensive VM environments efficiently. By providing a consistent and reliable method for accessing VMs, Azure Bastion helps organizations maintain high levels of productivity and operational efficiency.
Choosing the Right SKU
Azure Bastion offers four distinct SKUs tailored to various needs and use cases, each with its unique advantages.
Developer SKU: Generally available in 6 public regions, the new Developer SKU provides a cost-effective option for developers and testers seeking access to VMs. It offers one connection per VNET without the configuration, scaling, and features of more advanced VM solutions. Bastion Developer is an excellent choice for users looking to provide secure access to their development and testing environments without incurring expenses. Bastion Developer is estimated to be available in all other Bastion supported public regions within the year.
Basic SKU: For small to medium-sized businesses (SMBs) and enterprises, the Basic SKU of Azure Bastion offers a well-rounded solution. This SKU offers a dedicated deployment within your Virtual Network (VNET), providing secure access for organizations seeking a comprehensive yet cost-effective option. With support for NSGs, peered VNets, and connectivity to 40-45 VMs at a time, Bastion Basic SKU is perfect for customers looking to secure their environment on a smaller scale.
Standard SKU: For enterprises with production workloads demanding high availability, scalability, and advanced features, the Standard SKU of Azure Bastion is the top offering. With Bastion Standard SKU, customers can scale up to 50 instances and support up to 400 VM connections. Customers can also enable advanced features such as CLI support, IP-based connection to non-Azure VMs, and shared connections.
Premium SKU: The Premium SKU is designed for customers with highly regulated workloads, such as financial services, government, and healthcare customers. Bastion Premium expands on the scalability of Standard SKU and adds advanced features such as graphical session recording of VM sessions and private-only connection on the Bastion host. Premium SKU provides enhanced audit and risk management for organizations that require the highest level of security and performance for their critical workloads.
Conclusion
Azure Bastion stands out as a vital tool in the arsenal of cloud security measures. By providing a secure, centrally managed gateway for RDP/SSH access to VMs, it significantly reduces the attack surface and enhances the overall security posture of organizations. Its ease of deployment, scalability, and integration with Azure’s authentication mechanisms make it an indispensable solution for modern enterprises looking to protect their digital assets.
The ability to choose from different SKUs allows organizations to tailor their Bastion deployment to their specific needs, ensuring that they can achieve the right balance of security, cost, and functionality. Whether for development and testing, small-scale production environments, or large-scale enterprise deployments, Azure Bastion offers a flexible and robust solution that can adapt to a wide range of use cases.
As cyber threats continue to evolve, the importance of robust security measures like Azure Bastion will only grow. By investing in secure and scalable solutions, organizations can better protect their sensitive data and infrastructure, ensuring that they remain resilient in the face of ever-changing security challenges. Azure Bastion represents a critical component of a comprehensive security strategy, providing the tools and capabilities needed to safeguard the modern digital landscape.
Microsoft Tech Community – Latest Blogs –Read More
Designing and running a Generative AI Platform based on Azure AI Gateway
Designing and Operating a Generative AI Platform
Summary
Are you in a platform team who has been tasked with building an AI Platform to serve the Generative AI needs of your internal consumers? What does that mean? It’s a daunting challenge to be set, and even harder if you’re operating in a highly regulated environment.
As enterprises scale out usage of Generative AI past a few initial use-cases they will face into a new set of challenges – scaling, onboarding, security and compliance to name a few.
This article discusses such challenges and approaches to building an AI Platform to serve your internal consumers.
Needs and more needs
To successfully run Generative AI at scale, organisations are utilising new features in API Management platforms such as Azure API Management’s AI Gateway (https://techcommunity.microsoft.com/t5/azure-integration-services-blog/introducing-genai-gateway-capabilities-in-azure-api-management/ba-p/4146525). The key to success for these platforms will be based on effective CI / CD and automation strategies. As we will see, an architecture to run Azure Open AI safely at scale involves safely deploying and managing lots of moving pieces, which together solve for scenarios such as:
How many Azure Open AI (AOAI) APIs should I create?
How do I version AOAI APIs?
How do I support consumers with different content-safety and model requirements?
How do I restrict throughput per Consumer, per deployment?
How do I scale out AOAI services?
How do I log all prompts and responses including streaming, without disruption?
What other value add services should a platform offer consumers?
Further, we need to understand how common services and libraries involved in building Generative AI Services fit into the architecture. We can build the best AI Platform in the world but if our consumers find they cannot use common Generative AI Libraries with it, have we really succeeded?
This document iterates through use-cases to build out a reference implementation that can safely run Azure API Management (AI Gateway) and Azure Open AI at scale, supporting most common libraries and services. You can find a reference implementation here:
https://github.com/graemefoster/APImAIPlatform
Target Azure Architecture
Decision Matrix
You might not need all the components. This matrix should help you understand what each iteration brings, allowing you to make an informed decision on when to stop.
AOAI
APIm / AOAI
Proxy / APIm / AOAI
Proxy / APIm / AOAI / Defender
Chargeback
✗
✓
✓
✓
Prompt flow (JWT & key)
✗
✓
✓
✓
Advanced Logging (PII redaction / streaming)
✗
✗
✓
✓
SIEM
✗
✗
✗
✓
Out of scope
We are focusing on the requirements for running Azure Open AI and Generative AI services, rather than specific application stacks. A Generative AI Orchestrator may involve multiple services such an Azure App Service, Azure AI Search and Storage. These will vary between applications and are considered out-of-scope for this document.
Roles
We have identified the following roles involved in running Generative AI Applications. NB: these may not map one to one to people. We leave team structure to your own personal choice.
Role
Responsibilities
Gen AI Developers
Building and testing AI Orchestrators including promptflow
GEN AI Operators
Understanding consumer usage, prompt performance, overall system response.
AI Platform
Managing Azure Open AI services, access, security, and audit logging
AI Governance
Monitoring AI safety / abuse / groundedness
The remainder of the document will introduce fictitious use-cases and map them to the above roles. We will iterate on the platform and show how it provides for the requirements. It can be overwhelming to see a target architecture without insight into why individual services exist. My hope is that by the end of the document you see why each service is there. Also, this is an evolving space. As services gain new features, knowing why something is there will enable you to consolidate accordingly.
Gen AI Application Engineers
Let’s start with a simple use-case. Ultimately, we want our feature teams to deliver applications that deliver real business value. There is no application or business value without the Application Engineers.
Use Case:
So that I can build Generative AI Applications
As a Gen AI Application Engineer
I need access to Azure Open AI API models
Use Case:
So that I can iterate on a Generative AI Application
As a Gen AI Application Engineer
I want to see diagnostics information like latency and prompt response time of my prompts
These two use-cases can be delivered with a single Azure Open AI resource:
It’s simple, but it might be all you need. We run the Orchestrator in a PaaS like Azure App Service, send telemetry to Application Insights, and use a single Azure Open AI service.
We can use Managed Identities for secure authentication, and content-filters inside Azure Open AI to keep us safe. They can do things like detect Jailbreaks and moderate responses.
We can provision PTUs (Provisioned Throughput Units https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/provisioned-throughput-onboarding) to guarantee capacity for Generative AI.
If you can stop now, then great. But most organisations will soon hit the next set of challenges.
I need to fairly share my PTU amongst multiple consumers
My consumers need access to embedding models which are not part of my PTU, and one deployment isn’t enough
I want more control over the identities accessing Azure Open AI such my own IdP
I need to log all prompts and responses for a central AI safety team
If this sounds like you, then read on as we build a platform!
Gen AI Platform Engineers
Use Case:
So that my organisation can run many AI applications on a PTU
As a Gen AI Platform Engineer
I need to share a PTU fairly across consumers
Use Case:
So that I can charge-back my PTU deployments
As a Gen AI Platform Engineer
I need to report over token usage metrics
Use Case:
So that we can leverage existing security controls
As a Gen AI Platform Engineer
I need to use identities from an existing IdP
To deliver this feature we’re going to reach for an AI Gateway. These are becoming commonplace in APIm products. In our case let’s drop Azure API Management with its AI Gateway components into the architecture and see what happens:
This is nice – we’ve now introduced an AI gateway which will give us lots of extra functionality
We’ll expose an API from Azure APIm that looks and feels like the Azure Open AI API. This will make it easier for our consumers using frameworks that ‘expect’ the AOAI surface area. It also makes it simpler for our platform team as there’s only one API for them to maintain.
We can create new versions of this API as Open AI releases new versions.
We can also restrict the surface area of the Azure Open AI API.
Let’s say we don’t want to expose Assistants, or the ability to upload files. With this model we can just choose not to expose those APIs.
We can use APIm’s load balanced pools to balance requests.
We can get it to prioritise our PTU and fail over to PAYG if we run out of capacity.
This will reduce the likelihood of 429’s getting back to our consumers.
We can use APIm to authorise incoming requests using our own IdP or a Platform Entra Application.
This lets us use custom roles and have more control over the JWT’s our consumers need.
We can use APIm policies like token-rate-limiting to fairly share our PTU stopping greedy consumers.
We’ll use “products” to lower the blast radius of breaking our consumers.
When (not if!) the platform gets popular we will need to support many consumers with different token limits. Products let us model this as lots of small policy files. These will be easier to manage than one big one.
We can use policies that emit token count metrics (including for streaming endpoints) allowing chargeback scenarios.
GEN AI Engineers… part II
This is looking great… it’s an amazing platform but the Engineers are not happy… What we’ve built is a brilliant launchpad for engineers who want to build Generative AI Applications, but it doesn’t cater for all tooling they might want.
Use Case:
So that I can support libraries like Prompt flow
As a Gen AI Platform Engineer
I need to cater for assumptions the libraries make on authentication
Some AI Libraries were built before the idea of AI Platforms existed and have taken tight dependencies on the way the ‘old world’ worked. These will be fixed over time but for now our platform is going to have to deal with them.
Let’s take Promptflow as an example. Our platform makes heavy use of API Management’s AI Gateway products to reduce the blast radius of change.
As of August 2024, APIm requires a Subscription Key to trigger Product behaviour. Our fictitious security team has mandated OAuth2 for all API calls. If you use Prompt flow’s OAuth flow it acquires tokens against Azure Open AI’s scope. And it’s tricky to attach a subscription key to its calls. It’s low-level but it’s causing friction between the application teams and the security team.
Prompt flow makes our authentication life a bit more difficult… We’re faced with a risk-based decision. Do we:
Allow Generative AI applications built with Prompt flow to take a less secure approach sending an APIm subscription-key
Allow Gen AI Apps to have direction permissions against Azure Open AI introducing a risk they could bypass our gateway protections and call Azure Open AI directly.
Introduce something ‘in the middle’ to adapt Prompt flow’s requests to meet our security requirements.
There are a few ways to approach this. Let us start with an approach that doesn’t add any new services. We are going to introduce a new product to our APIm AI Gateway which will:
Authorise the incoming JWT provided by Prompt flow and map the caller to a subscription key.
Make a new call back into APIm with the subscription key appended as a header to the original request
This will incur an impact on APIm’s performance so it’s not a free lunch. But it will enable us to reduce the risk of Prompt flow using just Api-Keys.
Great – what we can do now is:
Let our AI Applications use tokens acquired for Azure Open AI without giving them permissions to Azure Open AI
This is a nice trick here as we getting a token for AOAI doesn’t authorize us. Auth is done at the AOAI service.
The APIm product can:
Check the incoming claims on the JWT to authenticate the caller
Acquire a new token on-behalf-of the original (optional step)
Append a subscription key (using a pre-configured lookup)
Make the outbound call back to APIm where it will trigger the expected product behaviour.
For many organisations this might be enough. But there are still a few things we might care about. The OWASP LLM Top 10 identifies common vulnerabilities for LLM usage. Wouldn’t it be great if we can help our consumers detect some of them?
Bigger organisations tend to have AI Safety teams who need to report over all LLM consumption. They are asking questions like:
Can I get confidence prompts aren’t being jail-broken?
Can we be confident prompts are using grounding data, and not hallucinating?
Can we maintain a redacted audit log of all prompts and responses?
Let’s step into the shoes of our Responsible AI team…
Responsible AI team
Use Case:
So that my organisation stays within responsible AI boundaries
As a Responsible AI team
We want to spot check redacted prompt inputs, and outputs
Use Case:
So that my organisation stays within responsible AI boundaries
As a Responsible AI team
We want centralised alerting on attempted attacks on our Gen AI applications
Use Case:
So that my organisation stays within responsible AI boundaries
As a Responsible AI team
We want to know about CVE’s from LLM models and images our applications are using
We are going to reach for a few new tools to achieve these use-cases. AI Threat Detection from Azure Defender (https://learn.microsoft.com/en-us/azure/defender-for-cloud/ai-threat-protection) will help us with centralised alerts and CVE detection. Great news if we’re using Azure Open AI as it can take signals directly from the Content Safety layer meaning we don’t have to build anything to integrate the services.
For logging those prompts and responses we’re going to have to think outside the box. Most API Management platforms offer support for logging the bodies of requests and responses but there’s a caveat. They often don’t support streaming (Sever Side Events) which are used to provide better response latency to callers. If you’ve ever used Chat GPT you’ll have seen a streaming response in action. That ticker-tacker typewriter like experience when your response is returned is a Streaming Response.
API Management solutions currently buffer these responses in-order to log them which reduces the user-experience.
I’ll avoid the question of ‘do you need to use Server-Side Events’. It’s safe to say like all questions in IT, the answer is ’It Depends’. But what if you do need them?
To handle logging, I’ll introduce a second Gen AI Gateway into the architecture. It’s called AI Central https://github.com/microsoft/aicentral (there are lots of these out there – disclaimer, I am the primary maintainer of AI Central), and it runs as a Docker Container / Web API sitting in-front of APIm. Another good option is AI Sentry – https://github.com/microsoft/ai-sentry
AI Central will drop prompts and responses into a queue for PII redaction and logging. It works with streaming responses and doesn’t buffer so won’t degrade the end user experience. It currently uses the Azure Language service, but we are investigating using the PTU overnight when it’s not used as much, or PHI-3 models running in a sidecar.
AI Central logs PII Redacted prompts and responses to a Cosmos Database that a Responsible AI Team can use.
We’ve also enabled Azure Defender for AI. The AI Content Filter built into AOAI is transmitting data to Azure Defender looking for attacks on their LLM. This is all surfaced in the standard Azure Defender dashboards.
But before we call ‘done’, let’s face into our last hurdle. How do or Gen AI Application Engineers onboard to this platform?
Gen AI Platform Engineers… part II
Use Case:
So that I can simplify onboarding
As a Gen AI Platform Engineer
I want to streamline consumer onboarding
Use Case:
So that I can manage consumer demand
As a Gen AI Platform Engineer
I want a tool to simplify managing multiple Azure Open AI deployments
Use Case:
So that I can deploy on a Friday
As a Gen AI Platform Engineer
I want to deploy daily
How does a feature team express their requirements to a platform team? We are using a JSON document which could be added via a Pull Request into the Platform team’s repository. Something like this should suffice. It will capture enough information about the consumer to:
Understand their token / model requirements
Understand their content safety requirements
Understand when they want to promote into environments
Get in touch with them
Support Chargeback
{
“consumerName”: “consumer-1”,
“requestName”: “my-amazing-service”,
“contactEmail”: “engineer.name@myorg.com”,
“costCentre”: “92304”,
“constantAppIdIdentifiers”: [],
“models”: [
{
“deploymentName”: “embeddings-for-my-purpose”,
“modelName”: “text-embedding-ada-002”,
“contentSafety”: “high”,
“environments”: {
“dev”: {
“thousandsOfTokens”: 1,
“deployAt”: “2024-07-02T00:00:0000”
},
“test”: {
“thousandsOfTokens”: 1,
“deployAt”: “2024-07-02T00:00:0000”
},
“prod”: {
“thousandsOfTokens”: 15,
“deployAt”: “2024-07-02T00:00:0000”
}
}
},
{
“deploymentName”: “gpt35-for-my-purpose”,
“modelName”: “gpt-35-turbo”,
“contentSafety” : “high”,
“environments”: {
“dev”: {
“thousandsOfTokens”: 1,
“deployAt”: “2024-07-02T00:00:0000”
},
“test”: {
“thousandsOfTokens”: 1,
“deployAt”: “2024-07-02T00:00:0000”
},
“prod”: {
“thousandsOfTokens”: 15,
“deployAt”: “2024-07-02T00:00:0000”
}
}
}
]
}
This will be a conversation. When a consumer opens a Pull Request to add / update this information into the Platform repository use the Pull Request to query / suggest alternate deployment approaches (maybe gpt-4o is a better fit for their requirement then gpt-4).
When you are comfortable with a request, merge the Pull Request into the platform repository.
Platform Team Mapping
Now you have the feature teams’ requirements, as a Platform team you need to express these as Azure Open AI deployments and API Management Products. This is unlikely to be a one-to-one mapping.
For example, to maximise use of PTU you might want to consolidate multiple consumer demands into a single deployment. To maximise PTU further you will need to consolidate Content Filter policies into ‘low’, ‘medium’, or ‘high’ (as content filter policies have one-to-one affinity to a deployment).
These are all backend implementation decisions. Your contract to the consumer is to provide them access to Azure Open AI with the agreed deployment names, throughput, and content-filters, securing using their provided Entra Application Ids.
Platform Team Decisions
How many AOAI services will you need? AOAI has limits on quotas per region. For example, as of August 2024 Australia East allows you 350k tokens per minute for Text-Embedding-Ada-002. https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits
If you have more demand than available capacity for a region / subscription pair, then you will need to deploy more Azure Open AI resources to either different subscriptions, or different regions. The code sample provided uses a single subscription, scaling out over multiple regions.
We’ll leave the science of sizing PTU deployments out-of-scope for here. There’s lots of good documentation out there for sizing PTUs, e.g. https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/right-size-your-ptu-deployment-and-save-big/ba-p/4053857.
Back to our platform let’s start simple – we will define a deployment to AOAI using this JSON. Our platform will be built using a list of these.
{
“aoaiName”: “graemeopenai”,
“deploymentName”: “testdeploy2”,
“enableDynamicQuota”: false,
“deploymentType”: “PAYG”,
“model”: “gpt-35-turbo”,
“modelVersion”: “0613”,
“thousandsOfTokensPerMinute”: 5
}
Finally, we need a mapping between the demands of our feature team to the platform deployments. We will use this JSON:
{
“consumerName”: “consumer-1”,
“requirements”: [
{
“outsideDeploymentName”: “text-embedding-ada-002”,
“platformTeamDeploymentMapping”: “text-embedding-ada-002”,
“platformTeamPoolMapping”: “graemeopenai-embedding-pool”
},
{
“outsideDeploymentName”: “gpt35”,
“platformTeamDeploymentMapping”: “testdeploy2”,
“platformTeamPoolMapping”: “graemeopenai-pool”
}
]
}
The platform uses APIm policies to rewrite consumer requests to the ‘outside’ deployment name to the actual deployment names, allowing multiple consumers to share single deployments. This mapping lets the Platform team introduce new deployments potentially with different names without affecting the consumers.
In future iterations we want to write a simple User Interface to help manage this mapping exercise.
Deployment
Deployment of a platform should be no different to deployment of a feature. The more you do it, the more confident you will be. The Bicep templates that form the platform (https://github.com/graemefoster/APImAIPlatform) are designed to deploy everything – Azure Open AI services, deployments, APIs, Products, AI Central, etc in a single “line-of-sight”.
Our entire platform is deployed using a single ‘az deployment sub create‘ command. If we had to rebuild the entire thing from scratch it would be the same single deployment command.
My recommendation is to deploy daily at least and potentially more frequently.
Optional Extras
In building this document some other ideas popped up that I think would be helpful in making it even easier and quicker for your internal customers to get onboard:
Providing platform endpoints to test your Prompts against well-known jail-breaks and other OWASP LLM threats
Running random checks against audited prompts and responses to check the grounded-ness of the responses
CI / CD Pipelines that ‘collect’ an application’s prompts and responses and pro-actively run guardrail evaluations over them
Think about all the requirements that your consumers need to ‘tick’ before they can deploy into production. Platforms succeed when their customers fall into the proverbial pit-of-success. The more general requirements you can automate, the more your engineers are going to love your platform.
Final thoughts
And that’s it. We’ve covered a lot of detail but have setup:
Azure Open AI Deployments
An API Management AI Gateway capable of authenticating, load balancing and enforcing consumer quota
An AI proxy that can provide advanced logging as-well as bridging libraries like Prompt flow which cannot easily send Entra JWT’s and APIm Subscription Keys
Discussed AI Threat Detection from Azure Defender to assist SIEM monitoring
Thought about future enhancements that could make your platform even more valuable.
What do you think? How are you approaching this problem? Try the sample. Make it better! We’d love to hear from you so please leave some feedback.
Microsoft Tech Community – Latest Blogs –Read More
Inside Maia 100: Revolutionizing AI Workloads with Microsoft’s Custom AI Accelerator
Authored by:
Sherry Xu, Partner Lead SoC Architect, Azure Maia
Chandru Ramakrishnan, Partner Software Engineering Manager
As the advancement of artificial intelligence continues to demand new innovations in the cloud, we find ourselves squarely in a moment where the co-optimization of hardware with software is critical to optimizing AI infrastructure for peak performance, scalability, and fungibility.
At Hot Chips 2024, Microsoft shared specifications on Maia 100, Microsoft’s first-generation custom AI accelerator designed specifically for large-scale AI workloads deployed in Azure. Vertically integrated to optimize performance and reduce costs, the Maia 100 system includes a platform architecture featuring custom server boards with tailor-made racks and a software stack built to increase performance and cost efficiency for advanced AI capabilities on services like Azure OpenAI Services.
Maia 100 Accelerator Architecture
The Maia 100 accelerator is purpose-built for a wide range of cloud-based AI workloads. The chip measures out at ~820mm2 and utilizes TSMC’s N5 process with COWOS-S interposer technology. Equipped with large on-die SRAM, Maia 100’s reticle-size SoC die, combined with four HBM2E die, provide a total of 1.8 terabytes per second of bandwidth and 64 gigabytes of capacity to accommodate AI-scale data handling requirements.
An AI accelerator built for high throughput and diverse data formats
Designed to support up to 700W TDP but provisioned at 500W, Maia 100 can deliver high performance while managing power efficiently based on its targeted workloads.
Chip architecture designed to support advanced machine learning needs
Maia 100’s architecture, tailored to modern machine learning needs, reflects the application of thoughtful research on AI systems for optimal computational speed, performance, and accuracy.
A high-speed tensor unit offers rapid processing for training and inferencing while supporting a wide range of data types, including low precision data types such as the MX data format, first introduced by Microsoft through the MX Consortium in 2023. This tensor unit is constructed as a 16xRx16 unit.
The vector processor is a loosely coupled superscalar engine built with custom instruction set architecture (ISA) to support a wide range of data types, including FP32 and BF16.
A Direct Memory Access (DMA) engine supports different tensor sharding schemes.
Hardware semaphores enable asynchronous programming on the Maia system.
A software-led approach to data utilization and power efficiency
The Maia accelerator is designed with a lower-precision storage data type and a data compression engine to reduce the bandwidth and capacity ask required for large inferencing jobs often bottlenecked by data movement. To further improve data utilization and power efficiency, large L1 and L2 scratch pads are software-managed for optimal data utilization and power efficiency.
Ethernet-based interconnects support large-scale AI models
In 2023, Microsoft led the development of the Ultra Ethernet Consortium, helping enable the industry to use Ethernet-based interconnects designed for ultra-high bandwidth compute. Maia 100 supports up to 4800 Gbps all-gather and scatter-reduced bandwidth, and 1200 Gbps all-to-all bandwidth. This ethernet interconnect utilizes a custom RoCE-like protocol, offering enhanced reliability and balance. Maia’s backend network protocol supports AES-GCM encryption, also making It ideal for confidential compute. Maia 100 is also supported by a unified backend network for scale-up and scale-out workloads, providing flexibility to support both direct and switch connectivity.
Enabling quick deployment and model portability on the Maia SDK
With hardware and software architecture designed from the ground up to run large-scale workloads more efficiently, Maia 100 vertically integrates what we have learned across every layer of our cloud architecture – from advanced cooling and networking needs to the software stack that allows quick deployment of models. The Maia software development kit (SDK) allows users to quickly port their models written in PyTorch and Triton to Maia.
The Maia SDK provides a comprehensive set of components for developers to enable quick deployment of models to Azure OpenAI Services:
Framework integration: a first-class PyTorch backend which supports both eager mode and graph mode;
Developer tools: Tools for debugging and performance-tuning models such as a debugger, profiler, visualizer, and model quantization and validation tools;
Compilers: We have 2 programming models and compilers for Maia – Triton programming model offers agility and portability, while the Maia API is suited for the highest performance.
Kernel and Collective Library: Using the compilers, we’ve developed a set of highly optimized ML compute and communication kernels enabling you to get started quickly on Maia. Authoring of custom kernels is also supported.
Maia Host/Device Runtime: A host-device runtime layer comes with a hardware abstraction layer that is responsible for memory allocation, kernel launches, scheduling, and device management.
Dual programming models ensure efficient data handling and synchronization
The Maia programming model leverages asynchronous programming with semaphores for synchronization, enabling the overlap of computation with memory and network transfers. It operates with two execution streams: control processors issuing asynchronous commands via queues and hardware threads executing these commands, ensuring efficient data handling through semaphore-based synchronization.
To program Maia, developers can choose from two programming models: Triton, a popular open-source domain-specific language (DSL) for deep neural networks (DNNs) that simplifies coding and runs on both GPUs and Maia, or the Maia API, a Maia-specific custom programming model built for maximum performance with more detailed control. Triton requires fewer lines of code and handles memory and semaphore management automatically, while Maia API demands more code and explicit management by the programmer.
Optimizing data flow with gather-based matrix multiplication
Maia uses a Gather-based approach for large distributed General Matrix Multiplication (GEMMs), as opposed to an All-Reduce based approach. This approach offers several advantages: enhanced processing speed and efficiency by fusing the post-GEMM activation function (like GELU) directly in SRAM; reduction of idle time by overlapping computation with network communications, and reduction of latency by sending quantized data over the network – leading to faster data transmission and overall system performance.
Additionally, we leverage Static Random-Access Memory (SRAM) at the cluster level to buffer activations and intermediate results. Network reads and writes are also served directly from SRAM, enabling direct access to CSRAM. This significantly reduces HBM reads, improving latency.
We further enhance performance by parallelizing computations across clusters and utilizing the Network On Chip (NOC) for on-chip activation gathering.
Optimizing workload performance with portability and flexibility
Key to Maia 100’s fungibility is its ability to execute PyTorch models against Maia with a single line of code. This is supported by a PyTorch backend, which operates both in both eager mode for the optimal developer experience, and graph mode for the best performance. Leveraging PyTorch with Triton, developers can optimize workload performance with complete portability and flexibility between hardware backends without sacrificing efficiency and the ability to target AI workloads.
With its advanced architecture, comprehensive developer tools, and seamless integration with Azure, the Maia 100 is revolutionizing the way Microsoft manages and executes AI workloads. Through the algorithmic co-design or hardware with software, built-in hardware optionality for both model developers and custom kernel authors, and a vertically integrated design to optimize performance and improve power efficiency while reducing costs, Maia 100 offers a new option for running advanced cloud-based AI workloads on Microsoft’s AI infrastructure.
Microsoft Tech Community – Latest Blogs –Read More
Fine Tune GPT-4o on Azure OpenAI Service
Get excited – you can now fine tune GPT-4o using the Azure OpenAI Service!
We’re thrilled to announce the public preview of fine-tuning for GPT-4o on Azure. After a successful private preview, GPT-4o is now available to all of our Azure OpenAI customers, offering unparalleled customization and performance in Azure OpenAI Service.
Why fine-tuning matters
Fine-tuning is a powerful tool that allows you to tailor our advanced models to your specific needs. Whether you’re looking to enhance the accuracy of responses, ensure outputs align with your brand voice, reduce token consumption or latency, or optimize the model for a particular use case, fine-tuning allows you to customize best-in-class models with your own proprietary data.
GPT-4o: Make a great model even better with your own training data
GPT-4o offers the same performance as GPT-4 Turbo, but improved efficiency – and the best performance on non-English language content of any OpenAI model. With the launch of fine tuning for GPT-4o, you now have the ability to customize it for your unique needs. Fine-tuning GPT-4o enables developers to train the model on domain-specific data, creating outputs that are more relevant, accurate, and contextually appropriate.
This release marks a significant milestone for Azure OpenAI Service, as it allows you to build highly specialized models that drive better outcomes, use fewer tokens with greater accuracy, and create truly differentiated models to support your use cases.
Fine tuning capabilities
Today, we’re announcing the availability of text-to-text fine tuning for GPT-4o. In addition to basic customization, we support advanced features to help you create customized models for your needs:
Tool Calling: Include function and tool calls in your training data to empower your custom models to do even more!
Continuous Fine Tuning: Fine tune previously fine-tuned models with new, or additional, data to update or improve accuracy
Deployable Snapshots: No need to worry about overfitting – you can now deploy snapshots, preserved at each epoch, and evaluate them alongside your final model
Built in Safety: GPT-4, 4o, and 4o mini models have automatic guardrails in place to ensure that your fine-tuned models are not capable of generating harmful content.
GPT-4o is available to customers using Azure OpenAI resources in North Central US and Sweden Central. Stay tuned as we add support in additional regions.
Lowering prices to make experimentation accessible
We’ve heard your feedback about the costs of fine-tuning and hosting models. To make it easier for you to experiment and deploy fine-tuned models, we’ve updated our pricing structure to:
Bill for training based on the total tokens trained – not the number of hours
Reduce hosting charges by ~40% for some of our most popular models, including the GPT-35-Turbo family.
These changes make experimentation easier (and less expensive!) than ever before. You can find the updated pricing for Fine tuning models on the Azure OpenAI Service – Pricing | Microsoft Azure
Get started today!
Whether you’re new to fine-tuning or an experienced developer, getting started with Azure OpenAI Service has never been easier. Fine-tuning is available through both Azure OpenAI Studio and Azure AI Studio, providing a user-friendly interface for those who prefer a GUI and robust APIs for advanced users.
Ready to get started?
Learn more about Azure OpenAI Service
Check out our How-To Guide for Fine Tuning with Azure OpenAI
Try it out with Azure AI Studio
Microsoft Tech Community – Latest Blogs –Read More
Imaji Ketahanan Digital & Akselerasi Ekonomi AI
Post Content
Mengenal Potensi AI di Indonesia
Post Content
Urgensi Akselerasi Regulasi AI
Post Content