Category: News
OpenAI at Scale: Maximizing API Management through Effective Service Utilization
Scenario
In this blog post, I will demonstrate how to leverage Azure API Management to enhance the resiliency and capacity of your OpenAI Service.
Azure API Management is a tool that assists in creating, publishing, managing, and securing APIs. It offers features like routing, caching, throttling, authentication, transformation, and more.
By utilizing Azure API Management, you can:
Distribute requests across multiple instances of the Azure OpenAI Service using the priority-based load balancing technique, which includes groups with weight distribution inside the group. This helps spread the load across various resources and regions, thereby enhancing the availability and performance of your service.
Implement the circuit breaker pattern to protect your backend service from being overwhelmed by excessive requests. This helps prevent cascading failures and improves the stability and resiliency of your service. You can configure the circuit breaker property in the backend resource and define rules for tripping the circuit breaker, such as the number or percentage of failure conditions within a specified time frame and a range of status codes indicating failures.
Diagram 1: API Management with circuit breaker implementation.
>Note: Backends in lower priority groups will only be used when all backends in higher priority groups are unavailable because circuit breaker rules are tripped.
Diagram 2: API Management load balancer with circuit breaker in action.
In the following section I will guide you through circuit breaker deployment with API Management and Azure Open AI services. you can use the same solution with native OpenAI service.
The GitHub repository for this article can be found in github.com/eladtpro/api-management-ai-policies
Prerequisites
If you don’t have an Azure subscription, create a free account before you begin.
Use the Bash environment in Azure Cloud Shell. For more information, see Quickstart for Bash in Azure Cloud Shell.
If you prefer to run CLI reference commands locally, install the Azure CLI.
If you’re using a local installation, sign in to the Azure CLI by using the az login command. To finish the authentication process, follow the steps displayed in your terminal. For other sign-in options, see Sign in with the Azure CLI.
if you don’t have Azure API Management, create a new instance.
Azure OpenAI Services for the backend pool, each service should have the same model deployed with the same name and version across all the services.
Step I: Provision Azure API Management Backend Pool (bicep)
Bicep CLI
Install or Upgrade Bicep CLI.
# az bicep install
az bicep upgrade
Deploy the Backend Pool using Bicep
Login to Azure.
az login
>Important: Update the names of the backend services in the deploy.bicep file before running the next command.
Create a deployment at resource group from a remote template file, update the parameters in the file.
az deployment group create –resource-group <resource-group-name> –template-file <path-to-your-bicep-file> –name apim-deployment
>Note: You can learn more about the bicep backend resource Microsoft.ApiManagement service/backends. Also about the CircuitBreakerRule Note: The following warning may be displayed when running the above command:
>Note: The following warning may be displayed when running the above command:
/path/to/deploy.bicep(102,3) : Warning BCP035: The specified “object” declaration is missing the following required properties: “protocol”, “url”. If this is an inaccuracy in the documentation, please report it to the Bicep Team. [https://aka.ms/bicep-type-issues]
Output:
{
“id”: “<deployment-id>”,
“location”: null,
“name”: “apim-deployment”,
“properties”: {
“correlationId”: “754b1f5b-323f-4d4d-99e0-7303d8f64695”,
.
.
.
“provisioningState”: “Succeeded”,
“templateHash”: “8062591490292975426”,
“timestamp”: “2024-09-07T06:54:37.490815+00:00”,
},
“resourceGroup”: “azure-apim”,
“type”: “Microsoft.Resources/deployments”
}
>Note: To view failed operations, filter operations with the ‘Failed’ state.
az deployment operation group list –resource-group <resource-group-name> –name apim-deployment –query “[?properties.provisioningState==’Failed’]”
The following is the deploy.bicep backend circuit breaker and load balancer configuration:
resource apiManagementService ‘Microsoft.ApiManagement/service@2023-09-01-preview’ existing = {
name: apimName
}
resource backends ‘Microsoft.ApiManagement/service/backends@2023-09-01-preview’ = [for (name, i) in backendNames: {
name: name
parent: apiManagementService
properties: {
url: ‘https://${name}.openai.azure.com/openai’
protocol: ‘http’
description: ‘Backend for ${name}’
type: ‘Single’
circuitBreaker: {
rules: [
{
acceptRetryAfter: true
failureCondition: {
count: 1
interval: ‘PT10S’
statusCodeRanges: [
{
min: 429
max: 429
}
{
min: 500
max: 503
}
]
}
name: ‘${name}BreakerRule’
tripDuration: ‘PT10S’
}
]
}
}
}]
And the part for the backend pool:
resource aoailbpool ‘Microsoft.ApiManagement/service/backends@2023-09-01-preview’ = {
name: ‘openaiopool’
parent: apiManagementService
properties: {
description: ‘Load balance openai instances’
type: ‘Pool’
pool: {
services: [
{
id: ‘/backends/${backendNames[0]}’
priority: 1
weight: 1
}
{
id: ‘/backends/${backendNames[1]}’
priority: 2
weight: 1
}
{
id: ‘/backends/${backendNames[2]}’
priority: 2
weight: 1
}
]
}
}
}
Step II: Create the API Management API
>Note: The following policy can be used in existing APIs or new APIs. the important part is to set the backend service to the backend pool created in the previous step.
Option I: Add to existing API
All you need to do is to add the following set-backend-service and the retry policies for activating the Load Balancer with Circuit Breaker module:
<set-backend-service id=”lb-backend” backend-id=”openaiopool” /><retry condition=”@(context.Response.StatusCode == 429)” count=”3″ interval=”1″ first-fast-retry=”true”>
<forward-request buffer-request-body=”true” />
</retry>
Option II: Create new API
Add new API
Go to your API Management instance.
Click on APIs.
Click on Add API.
Select ‘HTTP’ API.
Give it a name and set the URL suffix to ‘openai’.
>Note: The URL suffix is the path that will be appended to the API Management URL. For example, if the API Management URL is ‘https://apim-ai-features.azure-api.net‘, the URL suffix is ‘openai’, and the full URL will be ‘https://apim-ai-features.azure-api.net/openai‘.
Add “catch all” operation
Click on the API you just created.
Click on the ‘Design’ tab.
Click on Add operation.
Set the method to ‘POST’.
Set the URL template to ‘/{*path}’.
Set the name.
Click on ‘Save’.
>Note: The ‘catch all’ operation is planned to match all OpenAI requests, we achieve this by setting the URL template to ‘/{*path}’. for example:
Base URL will be: https://my-apim.azure-api.net/openai
Postfix URL will be: /deployments/gpt-4o/chat/completions?api-version=2024-06-01
The full URL will be: https://my-apim.azure-api.net/openai/deployments/gpt-4o/chat/completions?api-version=2024-06-01
Add the Load Balancer Policy
Select the operation you just created.
Click on the ‘Design’ tab.
Click on ‘Inbound processing’ policy button ‘</>’.
Replace the existing policy with this policy (showed below).
Click on ‘Save’.
This policy is set up to distribute requests across the backend pool and retry requests if the backend service is unavailable:
<policies>
<inbound>
<base />
<set-backend-service id=”lb-backend” backend-id=”openaiopool” />
<azure-openai-token-limit tokens-per-minute=”400000″ counter-key=”@(context.Subscription.Id)” estimate-prompt-tokens=”true” tokens-consumed-header-name=”consumed-tokens” remaining-tokens-header-name=”remaining-tokens” />
<authentication-managed-identity resource=”https://cognitiveservices.azure.com/” />
<azure-openai-emit-token-metric namespace=”genaimetrics”>
<dimension name=”Subscription ID” />
<dimension name=”Client IP” value=”@(context.Request.IpAddress)” />
</azure-openai-emit-token-metric>
<set-variable name=”traceId” value=”@(Guid.NewGuid().ToString())” />
<set-variable name=”traceparentHeader” value=”@(“00” + context.Variables[“traceId”] + “-0000000000000000-01″)” />
<set-header name=”traceparent” exists-action=”skip”>
<value>@((string)context.Variables[“traceparentHeader”])</value>
</set-header>
</inbound>
<backend>
<retry condition=”@(context.Response.StatusCode == 429)” count=”3″ interval=”1″ first-fast-retry=”true”>
<forward-request buffer-request-body=”true” />
</retry>
</backend>
<outbound>
<base />
<set-header name=”backend-host” exists-action=”skip”>
<value>@(context.Request.Url.Host)</value>
</set-header>
<set-status code=”@(context.Response.StatusCode)” reason=”@(context.Response.StatusReason)” />
</outbound>
<on-error>
<base />
<set-header name=”backend-host” exists-action=”skip”>
<value>@(context.LastError.Reason)</value>
</set-header>
<set-status code=”@(context.Response.StatusCode)” reason=”@(context.LastError.Message)” />
</on-error>
</policies>
>Important: The main policies taking part of the load balancing that will distribute the requests to the backend pool created in the previous step are the following: set-backend-service: This policy sets the backend service to the backend pool created in the previous step.
<set-backend-service id=”lb-backend” backend-id=”openaiopool” />
retry: This policy retries the request if the backend service is unavailable. in case the circuit breaker gets triggered, the request will be retried immediately to the next available backend service.
>Important: The value of count should be equal to the number of backend services in the backend pool.
<retry condition=”@(context.Response.StatusCode == 429)” count=”3″ interval=”1″ first-fast-retry=”true”>
<forward-request buffer-request-body=”true” />
</retry>
Step III: Configure Monitoring
Go to your API Management instance.
Click on ‘APIs’.
Click on the API you just created.
Click on ‘Settings’.
Scroll down to ‘Diagnostics Logs’.
Check the ‘Override global’ checkbox.
Add the ‘backend-host’ and ‘Retry-After’ headers to log.
Click on ‘Save’.
>Note: The ‘backend-host‘ header is the host of the backend service that the request was actually sent to. The ‘Retry-After‘ header is the time in seconds that the client should wait before retrying the request sent by the Open AI service overriding tripDuration of the backend circuit breaker setting.
>Note: Also you can add the request and response body to the HTTP requests in the ‘Advanced Options’ section.
Step IV: Prepare the OpenAI Service
Deploy the model
>Important: In order to use the load balancer configuration seamlessly, All the OpenAI services should have the same model deployed. The model should be deployed with the same name and version across all the services.
Go to the OpenAI service.
Select the ‘Model deployments’ blade.
Click the ‘Manage Deployments’ button.
Configure the model.
Click on ‘Create’.
Repeat the above steps for all the OpenAI services, making sure that the model is deployed with the same name and version across all the services.
Set the Managed Identity
>Note: The API Management instance should have the System/User ‘Managed Identity’ set to the OpenAI service.
Go to the OpenAI service.
Select the ‘Access control (IAM)’ blade.
Click on ‘Add role assignment’.
Select the role ‘Cognitive Services OpenAI User’.
Select the API Management managed identity.
Click on ‘Review + assign’.
Repeat the above steps for all the OpenAI services.
Step V: Test the Load Balancer
>Note: Calling the API Management API will require the ‘api-key’ header to be set to the subscription key of the API Management instance.
We are going to run the Chat Completion API from the OpenAI service through the API Management API. The API Management API will distribute the requests to the backend pool created in the previous steps.
Run the Python load-test script
Execute the test python script main.py to test the load balancer and circuit breaker configuration.
python main.py –apim-name apim-ai-features –subscription-key APIM_SUBSCRIPTION_KEY –request-max-tokens 200 –workers 5 –total-requests 1000 –request-limit 30
Explanation
python main.py: This runs the main.py script.
–apim-name apim apim-ai-features: The name of the API Management.
–key APIM_SUBSCRIPTION_KEY: This passes the API subscription key.
–request-max-tokens 200: The maximum number of tokens to generate per request in the completion (optional, as it defaults to 200).
–workers 5: The number of parallel requests to send (optional, as it defaults to 20).
–total_requests 1000: This sets the total number of requests to 1000 (optional, as it defaults to 1000).
–request-limit 30: The number of requests to send per second (optional, as it defaults to 20).
>Note: You can adjust the values of –batch_size and –total_requests as needed. If you omit them, the script will use the default values specified in the argparse configuration.
Test Results
ApiManagementGatewayLogs
| where OperationId == “chat-completion”
| summarize CallCount = count() by BackendId, BackendUrl
| project BackendId, BackendUrl, CallCount
| order by CallCount desc
| render barchart
Conclusion
In conclusion, leveraging Azure API Management significantly enhances the resiliency and capacity of Azure OpenAI service by distributing requests across multiple instances and implementing load-balancer with retry/circuit-breaker patterns.
These strategies improve service availability, performance, and stability. To read more look in Backends in API Management.
References
Azure API Management
Azure API Management terminology
API Management policy reference
API Management policy expressions
Backends in API Management
Error handling in API Management policies
Azure Tech Community
AI Hub Gateway Landing Zone accelerator
Microsoft Tech Community – Latest Blogs –Read More
Unlocking the Power of Responsible AI with Microsoft Azure
Discover how Microsoft is leading the way in responsible AI development with our comprehensive resources on Azure AI products. Learn about the latest features in Azure AI Content Safety, including prompt shields, custom categories, and groundedness detection, and understand their importance and implementation.
Learn More about Responsible AI
Key Resources:
Responsible AI Learning Modules: Dive deep into responsible AI practices.
YouTube Playlist: Watch tutorials and demos on Azure AI Content Safety.
Azure AI Content Safety Workshops: Hands-on training available on MS Learn and GitHub.
The resources are a deep-dive into the latest features of Azure AI Content Safety, including prompt shields, custom categories, and groundedness detection, and offer guidance on their importance and implementation.
Collection of Responsible AI Learning Modules: Explore Here
Responsible AI YouTube Playlist: Watch Here
Learn Modules for Azure AI Content Safety:
Azure AI Content Safety Studio Workshop
Azure AI Content Safety Code Workshop
Fundamentals of Responsible Generative AI – Training | Microsoft Learn
Manage and review models in Azure Machine Learning – Training | Microsoft Learn
Discover Microsoft guidelines for responsible conversational AI development – Training | Microsoft Learn
Discover Microsoft safety guidelines for responsible conversational AI development – Training | Microsoft Learn
Train a model and debug it with Responsible AI dashboard – Training | Microsoft Learn
Responsible AI in AI Studio – How to safeguard your generative AI applications in Azure AI (youtube.com)
Azure AI Studio Evaluations – How to evaluate and improve generative AI responses with Azure AI Studio (youtube.com)
Content Safety – How to build safe and responsible AI applications with Azure AI Content Safety (youtube.com)
Prompt Shields – How to mitigate GenAI security threats with Azure AI Content Safety prompt shields (youtube.com)
Tech Community Prompt Shields GA Blog – https://aka.ms/PromptShieldsGA
Tech Community Protected Material Detection GA Blog – https://aka.ms/ProtectedMaterialGA
Customizing content safety (thresholders, custom categories, etc) – How to customize generative AI guardrails with Azure AI Content Safety (youtube.com)
Groundedness detection – How to detect and mitigate generative AI hallucinations with Azure AI Content Safety (youtube.com)
YouTube: Azure AI Content Safety demo videos on Microsoft Developer channel for customer tutorials:
Visit the playlist here and check back for more updates: aka.ms/rai-playlist
GitHub: Azure AI Content Safety Workshop in Azure AI Studio
GitHub repo with train-the-trainer slide deck – GitHub – Azure-Samples/aacs-workshops: Samples for Azure AI Content Safety training modules
MS Learn: Azure AI Content Safety Workshop
Azure AI Content Safety Workshop on MS Learn (UI-based) aka.ms/aacs-studio-workshop
Azure AI Content Safety Workshop on MS Learn (Code) aka.ms/aacs-code-workshop
Microsoft Tech Community – Latest Blogs –Read More
Integrating vision into RAG applications
Retrieval Augmented Generation (RAG) is a popular technique to get LLMs to provide answers that are grounded in a data source. What do you do when your knowledge base includes images, like graphs or photos? By adding multimodal models into your RAG flow, you can get answers based off image sources, too!
Our most popular RAG solution accelerator, azure-search-openai-demo, now has an optional feature for RAG on image sources. In the example question below, the app answers a question that requires correctly interpreting a bar graph:
This blog post will walk through the changes we made to enable multimodal RAG, both so that developers using the solution accelerator can understand how it works, and so that developers using other RAG solutions can bring in multimodal support.
First let’s talk about two essential ingredients: multimodal LLMs and multimodal embedding models.
Multimodal LLMs
Azure now offers multiple multimodal LLMs: gpt-4o and gpt-4o-mini, through the Azure OpenAI service, and Phi-3.5-vision-instruct, through the Azure AI Model Catalog. These models allow you to send in both images and text, and return text responses. (In the future, we may have LLMs that take audio input and return non-text inputs!)
For example, an API call to the gpt-4o model can contain a question along with an image URL:
{
“role”: “user”,
“content”: [
{
“type”: “text”,
“text”: “Whats in this image?”
},
{
“type”: “image_url”,
“image_url”: { “url”: “https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg” }
}
]
}
Those image URLs can be specified as full HTTP URLs, if the image happens to be available on the public web, or they can be specified as base-64 encoded Data URIs, which is particularly helpful for privately stored images.
For more examples working with gpt-4o, check out openai-chat-vision-quickstart, a repo which can deploy a simple Chat+Vision app to Azure, plus includes Jupyter notebooks showcasing scenarios.
Multimodal embedding models
Azure also offers a multimodal embedding API, as part of the Azure AI Vision APIs, that can compute embeddings in a multimodal space for both text and images. The API uses the state-of-the-art Florence model from Microsoft Research.
For example, this API call returns the embedding vector for an image:
curl.exe -v -X POST “https://<endpoint>/computervision/retrieval:vectorizeImage?api-version=2024-02-01-preview&model-version=2023-04-15”
–data-ascii ” { ‘url’:’https://learn.microsoft.com/azure/ai-services/computer-vision/media/quickstarts/presentation.png’ }”
Once we have the ability to embed both images and text in the same embedding space, we can use vector search to find images that are similar to a user’s query. For an example, check out this notebook that setups a basic multimodal search of images using Azure AI Search.
Multimodal RAG
With those two multimodal models, we were able to give our RAG solution the ability to include image sources in both the retrieval and answering process.
At a high-level, we made the following changes:
Search index: We added a new field to the Azure AI Search index to store the embedding returned by the multimodal Azure AI Vision API (while keeping the existing field that stores the OpenAI text embeddings).
Data ingestion: In addition to our usual PDF ingestion flow, we also convert each PDF document page to an image, store that image with the filename rendered on top, and add the embedding to the index.
Question answering: We search the index using both the text and multimodal embeddings. We send both the text and the image to gpt-4o, and ask it to answer the question based on both kinds of sources.
Citations: The frontend displays both image sources and text sources, to help users understand how the answer was generated.
Let’s dive deeper into each of the changes above.
Search index
For our standard RAG on documents approach, we use an Azure AI search index that stores the following fields:
content: The extracted text content from Azure Document Intelligence, which can process a wide range of files and can even OCR images inside files.
sourcefile: The filename of the document
sourcepage: The filename with page number, for more precise citations
embedding: A vector field with 1536 dimensions, to store the embedding of the content field, computed using text-only OpenAI ada-002 model.
For RAG on images, we add an additional field:
imageEmbedding: A vector field with 1024 dimensions, to store the embedding of the image version of the document page, computed using the AI Vision vectorizeImage API endpoint.
Data ingestion
For our standard RAG approach, data ingestion involves these steps:
Use Azure Document Intelligence to extract text out of a document
Use a splitting strategy to chunk the text into sections. This is necessary in order to keep chunk sizes at a reasonable size, as sending too much content to an LLM at once tends to reduce answer quality.
Upload the original file to Azure Blob storage.
Compute ada-002 embeddings for the content field.
Add each chunk to the Azure AI search index.
For RAG on images, we add two additional steps before indexing: uploading an image version of each document page to Blob Storage and computing multi-modal embeddings for each image.
Generating citable images
The images are not just a direct copy of the document page. Instead, they contain the original document filename written in the top left corner of the image, like so:
This crucial step will enable the GPT vision model to later provide citations in its answers. From a technical perspective, we achieved this by first using the PyMuPDF Python package to convert documents to images, then using the Pillow Python package to add a top border to the image and write the filename there.
Question answering
Now that our Blob storage container has citable images and our AI search index has multi-modal embeddings, users can start to ask questions about images.
Our RAG app has two primary question asking flows, one for “single-turn” questions, and the other for “multi-turn” questions which incorporates as much conversation history that can fit in the context window. To simplify this explanation, we’ll focus on the single-turn flow.
Our single-turn RAG on documents flow looks like:
Receive a user question from the frontend.
Compute an embedding for the user question using the OpenAI ada-002 model.
Use the user question to fetch matching documents from the Azure AI search index, using a hybrid search that does a keyword search on the text and a vector search on the question embedding.
Pass the resulting document chunks and the original user question to the gpt-3.5 model, with a system prompt that instructs it to adhere to the sources and provide citations with a certain format.
Our single-turn RAG on documents-plus-images flows looks like this:
Receive a user question from the frontend.
Compute an embedding for the user question using the OpenAI ada-002 model AND an additional embedding using the AI Vision API multimodal model.
Use the user question to fetch matching documents from the Azure AI search index, using a hybrid multivector search that also searches on the imageEmbedding field using the additional embedding. This way, the underlying vector search algorithm will find results that are both similar semantically to the text of the document but also similar semantically to any images in the document (e.g. “what trends are increasing?” could match a chart with a line going up and to the right).
For each document chunk returned in the search results, convert the Blob image URL into a base64 data-encoded URI. Pass both the text content and the image URIs to a GPT vision model, with this prompt that describes how to find and format citations: The documents contain text, graphs, tables and images.
Each image source has the file name in the top left corner of the image with coordinates (10,10) pixels and is in the format SourceFileName:<file_name>
Each text source starts in a new line and has the file name followed by colon and the actual information. Always include the source name from the image or text for each fact you use in the response in the format: [filename]
Answer the following question using only the data provided in the sources below.
The text and image source can be the same file name, don’t use the image title when citing the image source, only use the file name as mentioned.
Now, users can ask questions where the answers are entirely contained in the images and get correct answers! This can be a great fit for diagram-heavy domains, like finance.
Considerations
We have seen some really exciting uses of this multimodal RAG approach, but there is much to explore to improve the experience.
More file types: Our repository only implements image generation for PDFs, but developers are now ingesting many more formats, both image files like PNG and JPEG as well as non-image files like HTML, docx, etc. We’d love help from the community in bringing support for multimodal RAG to more file formats.
More selective embeddings: Our ingestion flow uploads images for *every* PDF page, but many pages may be lacking in visual content, and that can negatively affect vector search results. For example, if your PDF contains completely blank pages, and the index stored the embeddings for those, we have found that vector searches often retrieve those blank pages. Perhaps in the multimodal space, “blankness” is considered similar to everything. We’ve considered approaches like using a vision model in the ingestion phase to decide whether an image is meaningful, or using that model to write a very descriptive caption for images instead of storing the image embeddings themselves.
Image extraction: Another approach would be to extract images from document pages, and store each image separately. That would be helpful for documents where the pages contain multiple distinct images with different purposes, since then the LLM would be able to focus more on only the most relevant image.
We would love your help in experimenting with RAG on images, sharing how it works for your domain, and suggesting what we can improve. Head over to our repo and follow the steps for deploying with the optional GPT vision feature enabled!
Microsoft Tech Community – Latest Blogs –Read More
Microsoft 365 Admin Center to Support Continuous Access Evaluation
Continuous Access Evaluation Revokes Access Immediately
The announcement in message center notification MC884015 (5 Sept 2024) that the Microsoft 365 admin center (Figure 1) will implement continuous access evaluation (CAE) in September 2024 is very welcome. Microsoft implemented CAE for Exchange Online, SharePoint Online, and Teams in January 2022.
Implementing CAE means that the Microsoft 365 admin center can respond to critical events that occur such as user account password changes or if a connection originates from an unexpected IP address. If an administrator account is unfortunate enough to be compromised, CAE will ensure that the credentials used to access the admin center will expire immediately after the password is changed for the account or access is revoked for the account.
Speed is Key
Speed is of the essence when it comes to responding to attacks and making sure that credentials are invalidated and forcing reauthentication as soon as possible is helpful. CAE replaces older methods like waiting for an access token to expire. The problem with waiting for access tokens to age out is that unauthorized access could persist for up to an hour after the compromise occurs.
Of course, it’s even better to stop compromise by making sure that administrator accounts are protected by strong multifactor authentication such as the Microsoft administrator app or passkeys. Even though we’ve known that this is true for years, the percentage of Microsoft 365 accounts protected by multifactor authentication is still disappointing (38% in February 2024). In that context, being able to revoke access to critical administrative tools like the Microsoft 365 admin center is important.
Other Microsoft 365 Administrative Portals
The Microsoft 365 Admin Center is a headline administrative portal and it’s important that Microsoft protects it with CAE. However, this step shouldn’t be seen as bulletproof protection for a tenant because it is not. There’s no news about support for CAE in other important administrative portals like the Purview compliance portal and the Defender portal.
Although it would be good for CAE to be supported in all Microsoft 365 admin centers, the fact remains that this might not be enough to stop an attacker. As noted above, speed is key after an attacker penetrates a tenant. Waiting for a GUI slows down an attacker, who can use automated scripting using PowerShell and Graph API requests to perform actions like the creation of new accounts and permissioned apps. Firing off some scripts to infect a tenant thoroughly is a lot more efficient than using an admin center. This underlines the need to stop attackers getting into a tenant. CAE is a kind of plaster that will heal some of the damage, but it can’t stop attackers wreaking havoc if they manage to compromise an account holding administrative roles.
Continuous Access Evaluation is a Good Thing
Don’t get me wrong. I strongly endorse the implementation of Continuous Access Evaluation across the administrative landscape of Microsoft 365 tenants. Anything that slows or obstructs attackers is a good thing. Everything that complicates the process of compromise is valued.
The sad thing is that 38% figure for accounts protected by multifactor authentication reported above. Taking Microsoft’s reported figure of 400 million paid Office 365 seats, that means only 152 million accounts use multifactor authentication and almost 250 million do not. That’s just too many lucrative targets for the bad guys to go after. We need to do better.
So much change, all the time. It’s a challenge to stay abreast of all the updates Microsoft makes across the Microsoft 365 ecosystem. Subscribe to the Office 365 for IT Pros eBook to receive monthly insights into what happens, why it happens, and what new features and capabilities mean for your tenant.
GoTo Group Berkolaborasi dengan Microsoft Tingkatkan Produktivitas Tim Engineering dengan GitHub Copilot
Gambar utama: Engineer GoTo bekerja dengan GitHub Copilot untuk mengakselerasi laju inovasi (Foto oleh GoTo Group)
Read the English version here
Jakarta, 10 September 2024 – GoTo Group, ekosistem digital terkemuka di Indonesia, telah mengambil langkah signifikan untuk meningkatkan produktivitas tim engineer-nya dengan berkolaborasi bersama Microsoft Indonesia dalam menggunakan GitHub Copilot, solusi AI yang paling banyak digunakan developer di dunia.
Dibekali dengan kapabilitas AI, GitHub Copilot meningkatkan produktivitas dan kepuasan engineer dalam mengerjakan tugas coding sehari-hari. Mulai dari memberikan saran coding secara real time, mendampingi keseluruhan proses coding di integrated development environment[1] melalui fitur chat, hingga menyederhanakan konsep coding kompleks dalam bahasa sehari-hari.
Hans Patuwo, Chief Operating Officer, GoTo mengatakan, “Sejak Juni 2024, hampir seribu engineer GoTo mulai mengadopsi GitHub Copilot, dengan rencana implementasi penuh diharapkan selesai pada pertengahan Oktober 2024. Penggunaan asisten coding berbasis AI ini memungkinkan engineer kami untuk meningkatkan kualitas code dan mengerjakan lebih banyak hal dalam waktu yang lebih singkat. Para engineer GoTo pun telah melaporkan penghematan waktu yang signifikan, dengan penghematan rata-rata lebih dari 7 jam per minggu sehingga memungkinkan mereka untuk berinovasi dengan lebih cepat dan memberikan nilai lebih kepada pengguna kami.”
Engineer GoTo mempercepat proses coding dengan GitHub Copilot (Foto oleh GoTo Group)
Selain penghematan waktu, para engineer GoTo juga telah menerima 30% rekomendasi code yang diberikan GitHub Copilot, dalam bulan pertama mereka menggunakan solusi AI ini. Sebuah angka solid yang berada di rentang atas tingkat penerimaan rekomendasi code GitHub Copilot, yang secara global umumnya berada di 26-30%.
Naya Hodi, Manajer Merchant Gateway, GoTo mengatakan, “GitHub Copilot secara signifikan mengurangi kesalahan sintaks dan menyediakan fitur autocomplete yang sangat membantu. Dengan memanfaatkan GitHub Copilot tim bisa mengurangi pekerjaan berulang dan membuat coding jadi lebih efisien. Ini memungkinkan saya dan tim untuk fokus pada hal-hal yang lebih kompleks dalam mengembangkan software.”
“Kami sangat senang dapat mendukung GoTo dengan GitHub Copilot, membekali engineers mereka dengan bantuan teknologi AI dalam pekerjaan sehari-hari, khususnya untuk mengembangkan software. Kami juga mengapresiasi langkah GoTo untuk berkolaborasi langsung dengan para engineer dalam mengkuantifikasi nilai Copilot sejak awal. Melalui kuantifikasi ini, GoTo mampu membangun momentum kuat untuk memperluas dan mempercepat adopsi teknologi AI di seluruh tim engineering mereka. Kami senang dapat terus membantu GoTo untuk menghasilkan software secara lebih cepat, dan mengakselerasi laju inovasi yang berdampak nyata bagi Indonesia,” ujar Sharryn Napier, Vice President, APAC at GitHub.
Langkah GoTo dalam mengintegrasikan AI ke dalam sistem operasional perusahaan memperkuat komitmen perusahaan untuk mendorong inovasi sembari memberdayakan tenaga kerja, guna menghasilkan teknologi yang berdampak positif bagi pengguna.
“Kami merasa terhormat dapat mendukung GoTo dalam misinya untuk memberdayakan kemajuan, dengan menghadirkan solusi teknologi yang memungkinkan semua orang bisa berkembang di era ekonomi digital. Dengan mengintegrasikan GitHub Copilot ke dalam proses kerja engineer, GoTo memberikan kesempatan kepada tim mereka untuk berinovasi lebih cepat, meningkatkan produktivitas, dan pada akhirnya memberikan nilai lebih kepada pengguna. Kepercayaan dan teknologi senantiasa berjalan beriringan, dan kolaborasi ini menegaskan komitmen bersama kami untuk mempercepat transformasi digital inklusif, sebagai bagian dari inisiatif Berdayakan Indonesia Microsoft,” kata Andrew Boyd, General Manager, Digital Natives & Startups, Microsoft Asia.
###
[1]Piranti lunak yang menyediakan berbagai alat untuk pengembangan piranti lunak dalam satu aplikasi
GoTo Group Collaborates with Microsoft to Boost Engineering Productivity with GitHub Copilot
Featured image: GoTo engineers work using GitHub Copilot to accelerate the pace of innovation (Photo by GoTo Group)
Jakarta, 10 September 2024 – GoTo Group, Indonesia’s leading digital ecosystem, has taken a significant step forward in enhancing productivity across its engineering teams by collaborating with Microsoft Indonesia to adopt GitHub Copilot, the world’s most widely adopted AI developer tool.
GitHub Copilot significantly boosts engineers’ productivity and happiness in daily coding tasks with its AI capabilities, from real-time code suggestions, chat assistance in the integrated development environment*, to breaking down complex coding concepts using daily conversational language.
Hans Patuwo, Chief Operating Officer, GoTo said, “Since June 2024, almost a thousand GoTo’s engineers have adopted GitHub Copilot, with full rollout expected by mid-October 2024. The adoption of this AI-powered coding assistant has enabled our engineers to enhance code quality and to accomplish more in less time. GoTo engineers have reported significant time savings averaging over seven hours per week, allowing them to innovate with greater speed and to bring more value to our users.”
GoTo engineer accelerates coding process with GitHub Copilot (Photo by GoTo Group)
Aside from time saving, GoTo’s engineers are also already seeing an early code acceptance rate of 30% within their first month adopting GitHub Copilot. This means 30% of code suggestions made by GitHub Copilot are accepted or used by GoTo engineers — a solid figure on the higher end of the average acceptance rate of 26-30% typically seen among engineers using GitHub Copilot.
Nayana Hodi, Engineering Manager, GoTo shared, “GitHub Copilot has significantly reduced syntax errors and provided helpful autocomplete features, eliminating repetitive tasks and making coding more efficient. This has allowed me to focus on the more complex elements in building great software.”
“We are thrilled to empower GoTo with GitHub Copilot, equipping their engineers with AI across the software development lifecycle. GoTo has implemented an impressive evaluation strategy, collaborating directly with engineers to collect first-hand measurements that showcase real impact. By quantifying the value of Copilot from the start, GoTo is building strong momentum for widespread adoption and accelerated learning across their engineering team. We’re excited to continue this journey, helping GoTo ship software ahead of the curve and accelerate the pace of innovation,” said Sharryn Napier, Vice President, APAC at GitHub.
GoTo’s move in integrating AI into the company’s workflow underscores its commitment to driving innovation while empowering its workforce to deliver impactful technology at scale.
“We are proud to support GoTo in their mission to empower progress by offering technology infrastructure and solutions that enable everyone to thrive in the digital economy. By integrating GitHub Copilot into their engineering processes, GoTo equips their teams with the tools to innovate faster, enhance productivity, and ultimately deliver greater user value. Trust and technology go hand-in-hand, and this collaboration underscores our shared commitment to harnessing AI technology, creating meaningful opportunities for all Indonesians, and accelerating an inclusive digital transformation agenda as part of Microsoft’s Empowering Indonesia initiative,” stated Andrew Boyd, General Manager, Digital Natives & Startups, Microsoft Asia.
###
* Software that provides various tools for software development in a single application.
Accelerating water wading simulation using Altair® nanoFluidX® on Azure Nvidia A100 and Nvidia H100
Over the last few weeks we have been working together with Altair engineers to verify and validate their nanoFluidX v2024 product on Azure. This software offers significant advantages for engineers tackling problems where traditional CFD technology requires significant manual time and heavy computational resources. Vehicle wading is an important durability attribute where engineers monitor water reach and accumulation, and assess the potential for damage caused by water impact.
nanoFluidX’s Lagrangian meshless approach was designed from inception for GPU compute using NVIDIA CUDA, making it one of the fastest SPH solvers on the market. Models can be setup incredibly quickly, giving engineers the power to iterate faster.
With this validation, the intention was to look at the GPU compute possibilities in two ways: how will nanoFluidX perform on the Nvidia H100 series GPUs, and how it will work while scaling up to 8-way GPU virtual machines (VMs). Let’s look at the A100 and the H100 first.
The NC_A100_v4 has 3 flavors, with 1, 2 or 4 A100 80GB GPUs. In the basis, these are PCIe based GPUs, but internally they are NVlink connected in pairs. The rest of the system consists of 24 (non-multithreaded) AMD Milan CPU cores, 220GB main memory, and a 960TB NVME local scratch disk per GPU. When selecting a 2 or 4 GPU VMs, these numbers are multiplied up to a total of 880GB main memory.
The NC_H100_v5 has grown together with the GPU capabilities. It is available in a 1 or 2 GPU configuration built around the Nvidia 94GB H100 NVL. While this GPU has a PCIe interface towards the main system, many of the capabilities are in line with the SXM H100 series. The CPU cores are increased to 40 (non-multi-threaded) AMD Genoa CPU cores and 320 GB of main memory together with an upgraded 3.5TB NVME local scratch disk.
The benchmark that was run for this validation is the Altair CX-1 car model. This benchmark represents a production-scale model of a full size vehicle traveling at 10 km/h through a 24 meter wading channel in 15 seconds.
“Collaborating with Microsoft and NVIDIA, we have successfully validated nanoFluidX v2024 on NVIDIA’s A100 and H100 GPUs. The latest release boasts a solver that is 1.5x faster than previously, and offers improved scaling on multiple GPUs. These benchmarks show the use of NVIDIA H100 significantly enhances performance by up to 1.8x, cutting simulation times and accelerating design cycles. These advancements solidify nanoFluidX as one of the fastest Smoothed-particle Hydrodynamic (SPH) GPU code on the market.” – David Curry, Senior Vice President, CFD and EDEM, Altair.
As can been seen in the table below, the H100 delivers higher performance than the A100, which is in line with the published performance increase between the two generations by Nvidia. Therefore, both the software and the Azure VMs allow these GPUs to reach their compute potential.
Since nanoFluidX supports multi-GPU systems, we wanted to validate the scalability and test them on the 8-way GPU ND series. Again, we tested on both the Nvidia A100 systems, the NDads_A100_v4, and its successor based on the Nvidia H100: the NDisr_H100_v5. Both of these systems have all 8 GPUs interconnected through NVlink.
Chart showing performance increase for H100 (NCv5) over A100 (NCv4)
As shown in the table above, nanoFluidX effectively utilizes all GPU power. On the NDisr_H100_v5, it achieved a 1-hour simulation duration, significantly impacting turnaround and design cycle time.
While you can simply go to the portal, request a quota, and spin up these VMs, we often see customers seeking an HPC environment that integrates better into their workflow for production. Altair offers a solution to run projects on Azure through their Altair One platform. Please collaborate with your Altair representative to enable this Azure-based solution for you. Alternatively, you can use the Altair SaaS solution, Altair Unlimited, from the virtual appliance marketplace to deploy and manage your own HPC cluster on Azure. To enable GPU quotas for HPC, please coordinate with your Azure account manager.
#AzureHPCAI
Microsoft Tech Community – Latest Blogs –Read More
Deploying .dapacs to Azure SQL via Azure DevOps Pipelines
Introduction
This post is part of the SQL Database series that I am compiling. This specific topic assumes that you have already built a .dacpac file via Azure DevOps YAML Pipeline and are ready to now deploy your .dacpac to Azure. Congratulations! If you’d like to follow along all source code is in my GitHub repository.
PreReqs
To be successful here we’d require some items to be setup:
An Azure SQL Server and Database already deployed in Azure
An Azure DevOps Service Connection that has access to deploy the database (for me I like to have the Service Connection be a member of the Entra SQL Admin group, more on that later)
A .dacpac built by ADO and ready to publish. I used the SDK style project to create mine, you could use other methods as long as you have the .dacpac file.
Network connectivity to the Database. For this specific example we will be using an Azure SQL instance and leverage MS Hosted Azure DevOps Agents. Variations of this process is possible leveraging either a Windows self hosted build agents or the newer Managed DevOps Pools.
Deploy Steps
When writing one of these I have found it can be helpful to write out the individual steps required for our build. In our case it will consist of:
Download the pipeline artifact
Open up on the Azure SQL Server Firewall to the Agent
Deploy .dacpac
Delete the Azure SQL Server Firewall rule
The good news here is that first, the job will automatically download the pipeline artifact. To reiterate this will download the .dacpac which was built in the previous stage so that subsequent jobs can leverage it for deployments to one or more environments. The second piece of good news is the opening of the Azure SQL Server Firewall rules can be handled by SqlAzureDacpacDeployment@1 task. In addition, there is the option to delete the firewall rule after the task has been completed.
So, this means we effectively just need a single job in our deployment stage!
SqlAzureDacpacDeployment@1
Here is the YAML code for the job to handle the deployment:
jobs:
– deployment: sqlmoveemecicd_app_dev_eus
environment:
name: dev
dependsOn: []
strategy:
runOnce:
deploy:
steps:
– task: SqlAzureDacpacDeployment@1
displayName: Publish sqlmoveme on sql-moveme-dev2-eus.database.windows.net
inputs:
DeploymentAction: Publish
azureSubscription: [Insert Service Connection Name]
AuthenticationType: servicePrincipal
ServerName: [SQL Server Destination]
DatabaseName: [SQL Database Destination]
deployType: DacpacTask
DacpacFile: $(Agent.BuildDirectory)sqlmoveme***.dacpac
AdditionalArguments: ”
DeleteFirewallRule: True
So, an item here to discuss which is a bit of a pre-requisite when discussing Azure DevOps YAML Pipelines is the deployment job concept. I do cover this in a previous post in the YAML Pipeline series on Azure DevOps Pipelines: Tasks, Jobs, Stages. Suffice it to say deployment jobs are special types of jobs in Azure DevOps which are to be leveraged for the actual deployment of artifacts and one of the key capabilities of deployment jobs is the ability to tie them to an environment. Environments can have gates which is a set of criteria that can include manual or automatic checks prior to deployment.
Lets take a step back and talk a little more on some of the requirements. First, we need to establish authentication from Azure DevOps to our Azure SQL Server. The most secure way to do this would be through Entra Authentication. The credentials we will be using would be the service principal associated with an Azure DevOps Service connection. This connection will either have credentials stored as part of the App Registration or leveraged workload identity federation.
Personally, I would recommend using the workload identity federation process as this will eliminate the need for a secret. This Service Connection can be the same one used to deploy other resources in the environment, though I understand and respect the separation of data and management plane activities so a separate one specific to the database is acceptable. If you’d rather not use Entra Auth for authenticating to the database, you can alternatively pass credentials stored in a Variable Group, though usually it’s a good idea not to use passwords when possible.
So now that we know how and what we are going to authenticate with it’s time to go over how the access to the service account would be provisioned. When configuring an Azure SQL Server one can designate an Entra Security group as the Admin.
Below is a screenshot showing that the Microsoft Entra Admin ID has been granted to an Entra Security Group. If this is a new concept please follow up with Using automation to set up the Microsoft Entra admin for SQL Server – SQL Server | Microsoft Learn. Additionally, here is a great walkthrough put together by MVP Stephan van Rooij, Azure SQL and Entra ID authentication, tips from the field.
The service principle being used for the deployment is in turn added to this group. Thus, our deployment will have full access to deploy the .dacpac to the Azure SQL Server. Additionally, I have Microsoft Entra Authentication only configured; this is considered a best practice as SQL user credentials expose a potential credential liability. If new to this concept feel free to read more on Microsoft Entra-only authentication – Azure SQL Database & Azure SQL Managed Instance & Azure Synapse Analytics | Microsoft Learn
Results
After we add this deployment stage to our previous build stage our results in ADO will look like:
A two stage pipeline where the first stage will generate an artifact of our .dacpac and the second stage which will take the .dacpac produced in the first stage and deploy it. A complete YAML definition of this pipeline can be found on my GitHub repository.
Next Steps
Now that we have covered how to effectively build .sqlproj into a .dacpac anddeploy said .dacpac to Azure our next step will be to deploy tto multiple environments via different configurations! Feel free to subscribe to this series on SQL Databases alternatively if you like my posts feel free to follow me.
Microsoft Tech Community – Latest Blogs –Read More
Exploring AI Agent-Driven Auto Insurance Claims RAG Pipeline.
Introduction:
In this post, I explore a recent experiment aimed at creating a RAG pipeline tailored for the insurance industry, specifically for handling automobile insurance claims, with the goal of potentially reducing processing times.
I also showcase the implementation of Autogen AI Agents to enhance search retrieval through agent interaction and function calls on sample auto insurance claims documents, a Q&A use case, and how this workflow can substantially reduce the time required for claims processing.
RAG workflows in my opinion represent a novel data stack, distinct from traditional ETL processes. Although they encompass data ingestion and processing similar to traditional ETL in data engineering, they introduce additional pipeline stages like chunking, embedding, and the loading of data into vector databases, diverging from the standard Lakehouse or data warehouse pipelines.
Each stage of the RAG application workflow is pivotal to the accuracy and pertinence of the downstream LLM application. One of these stages is the chunking method, and for this proof of concept, I chose to test a page-based chunking technique that leverages the document’s layout without relying on third party packages.
Key Services and Features:
By leveraging enterprise-grade features of Azure AI services, I can securely integrate Azure AI Document Intelligence, Azure AI Search, and Azure OpenAI through private endpoints. This integration ensures that the solution adheres to best practice cybersecurity standards. In addition, it offers secure network isolation and private connectivity to and from virtual networks and associated Azure services.
Some of these services are:
Azure AI Document Intelligence and the prebuilt-layout model.
Azure AI Search Index and Vector database configured with the HNSW search algorithm.
Azure OpenAI GPT-4-o model.
Page-based Chunking technique.
Autogen AI Agents.
Azure Open AI Embedding model: text-ada-003.
Azure Key Vault.
Private Endpoints integration across all services.
Azure Blob Storage.
Azure Function App. (This serverless compute platform can be replaced with Microsoft Fabric or Azure Databricks)
Document Extraction and Chunking:
These templates include forms with data detailing the accident location, description, vehicle information of the involved parties, and any injuries sustained. Thanks to the folks at LlamaIndex for providing the sample claims documents. Below is a sample of the forms template.
claims sample form
The claim documents are PDF files housed in Azure Blob Storage. Data ingestion begins from the container URL of the blob storage using the Azure AI Document Intelligence Python SDK.
This implementation of a page-based chunking method utilizes the markdown output from the Azure AI Document Intelligence SDK. The SDK, setup with the prebuilt-layout extraction model, extracts the content of pages, including forms and text, into markdown formats, preserving the document’s specific structure, such as paragraphs and sections, and its context.
The SDK facilitates the extraction of documents page by page, via the pages collection of the documents, allowing for the sequential organization of markdown output data. Each page is preserved as an element within a list of pages, streamlining the process of efficiently extracting page numbers for each segment. More details about the document intelligence service and layout model can be found at this link.
The snippet below illustrates the process of page-based extraction, preprocessing of page elements, and their assignment to a Python list:
page extraction
Each page content will be used as the value of the content field in the vector database index, alongside other metadata fields in the vector index. Each page content is its own chunk and will be embedded before being loaded into the vector database. The following snippet demonstrates this operation:
Define Autogen AI Agents and Agent Tool/Function:
The concept of an AI Agent is modeled after human reasoning and the question-and-answer process. The agent is driven by a Large Language Model (its brain), which assists in determining whether additional information is required to answer a question or if a tool needs to be executed to complete a task.
In contrast, non-agentic RAG pipelines incorporate meticulously designed prompts that integrate context information (typically through a context variable within the prompt) sourced from the vector store before initiating a request to the LLM for a response. AI agents possess the autonomy to determine the “best” method for accomplishing a task or providing an answer. This experiment presents a straightforward agentic RAG workflow. In upcoming posts, I will delve into more complex, agent-driven RAG solutions. More details about Autogen Agents can be accessed here.
I set up two Autogen agent instances designed to simulate or engage in a question-and-answer chat conversation among themselves to carry out search tasks based on the input messages. To facilitate the agents’ ability to search and fetch query results from the Azure AI Search vector store via function calls, I authored a Python function that will be associated with these agents. The AssistantAgent, which is configured to invoke the function, and the UserProxyAgent, which is tasked with executing the function, are both examples of the Autogen Conversable Agent class.
The user agent begins a dialogue with the assistant agent by asking a question about the search documents. The assistant agent then gathers and synthesizes the response according to the system message prompt instructions and the context data retrieved from the vector store.
The snippets below provide the definition of Autogen agents and a chat conversation between the agents. The complete notebook implementation is available in the linked GitHub repository.
Last Thoughts:
The assistant agent correctly answered all six questions, aligning with my assessment of the documents’ information and ground truth. This proof of concept demonstrates the integration of pertinent services into a RAG workflow to develop an LLM application, which aims to substantially decrease the time frame for processing claims in the auto insurance industry scenario.
As previously stated, each phase of the RAG workflow is crucial to the response quality. The system message prompt for the Assistant agent needs precise crafting, as it can alter the response outcomes based on the set instructions. Similarly, the custom retrieval function’s logic plays a significant role in the agent’s ability to locate and synthesize responses to the messages.
The accuracy of the responses has been assessed manually. Ideally, this process should be automated.
In an upcoming post, I intend to explore the automated evaluation of the RAG workflow. Which methods can be utilized to accurately assess and subsequently refine the RAG pipeline?
Both the retrieval and generative stages of the RAG process require thorough evaluation.
What tools can we use to accurately evaluate the end-to-end phases of a RAG workflow, including extraction, processing, and chunking strategies? How can we compare various chunking methods, such as the page-based chunking described in this article versus the recursive character text split chunking option?
How do we compare the retrieval results of an HNSW vector search algorithm against the KNN exhaustive algorithm?
What kind of evaluation tools are available and what metrics can be captured for agent-based systems?
Is a one-size-fits-all tool available to manage these? We will find answers to these questions.
Moreover, I would also like to examine and assess how this and other RAG and generative ai workflows are reviewed to ensure alignment with the standards of fairness, reliability and safety, privacy and security, inclusiveness, transparency, and accountability as defined in the Responsible AI Ethics framework for building and developing these systems.
Microsoft Tech Community – Latest Blogs –Read More
Microsoft Defender for Identity: the critical role of identities in automatic attack disruption
In today’s digital landscape, cyber-threats are becoming increasingly sophisticated and frequent. Advanced attacks are often multi-workload and cross-domain, requiring organizations to deploy robust security solutions to counter this complexity and protect their assets and data. Microsoft Defender XDR offers a comprehensive suite of tools designed to prevent, detect and respond to these threats. With speed and effectiveness being the two most important elements in incident response, Defender XDR tips the scale back to defenders with automatic attack disruption.
What is Automatic attack disruption?
Automatic attack disruption is an AI-powered capability that uses the correlated signals in Microsoft Defender XDR to stop and prevent further damage of in-progress attacks. What makes this disruption technology so differentiated is our ability to recognize the intent of an attacker and accurately predict, then stop, their next move with an extremely high level of confidence. This includes automated response actions such as containing compromised devices, disabling compromised user accounts, or disabling malicious OAuth apps. The benefits of attack disruption include:
Disruption of attacks at machine speed: with an average time of 3 minutes to disrupt ransomware attacks, attack disruption changes the speed of response for most organizations.
Reduced Impact of Attacks: by minimizing the time attackers have to cause damage, attack disruption limits the lateral movement of threat actors within your network, reducing the overall impact of the threat. This means less downtime, fewer compromised systems, and lower recovery costs.
Enhanced Security Operations: attack disruption allows security operations teams to focus on investigating and remediating other potential threats, improving their efficiency and overall effectiveness.
The role of Defender for Identity
While attack disruption occurs at the Defender XDR level, it’s important to note that Microsoft Defender for Identity, delivers critical identity signals and response actions to the platform. At a high level, Defender for Identity helps customers better protect their identity fabric through identity-specific posture recommendations, detections and response actions. These are correlated with the other workload signals in the Defender platform and attributed to a high-fidelity incident. Within the context of attack disruption, Defender for Identity enables user specific response actions including:
Disabling user accounts: When a user account is compromised, Defender for Identity can automatically disable the account to prevent further malicious activities. Whether the identity in question is managed in Active Directory on-premises or Entra ID in the cloud, Defender is able to take immediate action and help contain the threat and protect your organization’s assets.
Resetting passwords: In cases where a user’s credentials have been compromised, Defender for Identity can force a password reset. This ensures that the attacker can no longer use the compromised credentials to access your systems
Microsoft Defender XDR’s automatic disruption capability is a game-changer in the world of cybersecurity. Powered by Microsoft Security intelligence and leveraging AI and machine learning, it provides real-time threat mitigation, reduces the impact of attacks, and enhances the efficiency of security operations. However, to fully realize the benefits of automatic disruption, it’s essential to include Defender for Identity in your security strategy, filling a critical need in your defenses.
Use this quick installation guide to deploy Defender for Identity.
Microsoft Tech Community – Latest Blogs –Read More
Azure Communication Services at the DEVIntersection Conference
Join us for the DEVintersection Conference from September 10 to 12, 2024, in Las Vegas, Nevada. This event gathers technology enthusiasts from across the globe. Whether you are a developer, IT professional, or business leader, this conference provides an exceptional chance to explore the latest in cloud technology.
Our experts from Azure Communication Services will be there at the event and will be hosting the following sessions.
Take Your Apps to the Next Level: Azure OpenAI, Communication, and Organizational Data Features
Sept 10, 14:00 – 15:00 | Lab | Grand Ballroom 118 | Dan Wahlin
Many of us are building Line of Business (LOB) apps that can integrate data from custom APIs and 3rd party data sources. That’s great, but when was the last time you sat down to think through how you can leverage the latest technologies to take the user experience to the next level?
In this session, Dan Wahlin introduces different ways to enhance customer experience by adding calling and SMS. You can integrate organizational data to minimize user context shifts, leverage the power of AI to enhance customer productivity, and take your LOB apps to the next level.
Register and add this session.
Bridging the Gap: Integrating Custom Applications with Microsoft Using Azure
Sept 11, 15:30 – 16:30 | Session | Grand Ballroom 122 | Milan Kaur
Discover how Azure can facilitate seamless integration between non-Teams users and Microsoft Teams. If you’re invested in Teams and seeking to develop audio-video solutions to connect your custom third-party applications with Teams, this session is for you. Join us to explore the possibilities and streamline collaboration beyond internal Teams.
Register and add this session.
Beyond chatbots: add multi-channel communication to your AI apps
Sept 12, 08:30 – 09:30 | Session | Grand Ballroom 122 | Milan Kaur
Unlock the potential of conversational AI with Azure!
In this session, discover how to extend your bot’s functionality beyond standard chat interactions. We’ll learn together how to add voice and other messaging channels such as WhatsApp to build pro-code AI bots grounded in custom data.
Register and add this session.
About our speakers
Dan Wahlin
Principal Cloud Developer Advocate
Milan Kaur
Senior Product Manager
Dan Wahlin is a Principal Cloud Developer Advocate at Microsoft focusing on Microsoft 365 and Azure integration scenarios. In addition to his work at Microsoft, Dan creates training courses for Pluralsight, speaks at conferences and meetups around the world, and offers webinars on a variety of technical topics.
Twitter: @DanWahlin
Milan is a seasoned software engineer turned product manager passionate about building innovative communication tools. With over a decade of experience in the industry, she has a deep understanding of the challenges and opportunities in the field of cloud communications.
LinkedIn: @milankaurintech
Microsoft Tech Community – Latest Blogs –Read More
Unlock Analytics and AI for Oracle Database@Azure with Microsoft Fabric and OCI GoldenGate
The strategic partnership between Oracle and Microsoft has redefined the enterprise cloud landscape. Oracle Database@Azure seamlessly integrates Oracle’s database services with Microsoft’s Azure cloud platform, empowering businesses to maintain the performance, security, and reliability of Oracle databases while modernizing with Azure’s extensive cloud services.
As companies strive to accelerate their digital transformation, reduce complexity, and optimize their cloud strategies, data remains central to their success. High-quality data underpins effective business insights and serves as the foundation for AI innovation. Now in public preview, customers have the opportunity to use OCI GoldenGate—a database replication and heterogeneous data integration service—to sync their data estates with Microsoft Fabric. This integration unlocks new prospects for data analytics and AI applications by unifying diverse datasets, allowing teams to identify patterns and visualize opportunities.
A Unified Platform for Data and AI
Microsoft Fabric is an AI-powered real-time analytics and business intelligence platform that consolidates data engineering, integration, warehousing, and data science into one unified solution. By simplifying the complexity and cost of integrating analytics services, Microsoft Fabric provides a seamless experience for data professionals across various roles.
Microsoft Fabric integrates tools like Azure Synapse Analytics and Azure Data Factory into a cohesive Software as a Service (SaaS) platform, featuring seven core workloads tailored to specific tasks and personas. This platform enables organizations to manage their entire data lifecycle within a single solution, streamlining the process of building, managing, and deploying data-driven applications. With its unified architecture, Microsoft Fabric reduces the complexity of managing a data estate and simplifies billing by offering a shared pool of capacity and storage across all workloads. It also enhances data management and protection with robust governance and security features.
A key highlight of Microsoft Fabric is its integration with native generative AI services, such as Copilot, which enables richer insights and more compelling visualizations. This AI-driven approach can significantly impact business growth by improving decision-making and collaboration across teams. With Power BI and Synapse workloads built in and native integration with Azure Machine Learning, you can accelerate the deployment of AI-powered solutions, making it an essential tool for organizations looking to advance their data strategies.
OCI Golden Gate integration with Microsoft Fabric
OCI GoldenGate is a real-time data integration and replication solution that ensures high availability, disaster recovery, and transactional integrity across diverse environments. When integrated with Microsoft Fabric, OCI GoldenGate adds significant value by enabling seamless, real-time data synchronization between Oracle databases and the AI-powered analytics platform of Fabric. This ensures that data professionals can work with the most up-to-date information across their data ecosystem, enhancing the accuracy and timeliness of insights.
OCI GoldenGate’s ability to support complex data transformations and migrations allows organizations to leverage Microsoft Fabric’s advanced analytics and AI capabilities without disruption, driving faster, more informed decision-making and enabling businesses to unlock new levels of innovation.
Get started
Enhance your data strategy and drive more informed decision-making by leveraging your existing Microsoft and Oracle investments with Oracle Database@Azure by integrating it with Microsoft Fabric. Get started today through the Azure Marketplace!
Read the Oracle CloudWorld blog: https://aka.ms/OCWBlog24
Learn more about Microsoft Fabric at https://aka.ms/fabric
Learn more about Oracle Database@Azure: https://aka.ms/oracle
Technical documentation: Overview – Oracle Database@Azure | Microsoft Learn
To setup OCI GoldenGate, you can refer to the documentation here – Implement OCI GoldenGate on an Azure Linux VM – Azure Virtual Machines | Microsoft Learn
Get skilled: https://aka.ms/ODAA_Learn
Microsoft Tech Community – Latest Blogs –Read More
Announcing availability of Oracle Database@Azure in Australia East
Microsoft and Oracle are excited to announce that we are expanding the general availability of Oracle Database@Azure for the Azure Australia East region.
Customer demand for Oracle Database@Azure continues to grow – that’s why we’re announcing plans to expand regional availability to a total of 21 regions around the world. Oracle Database@Azure is now available in six Azure regions – Australia East, Canada Central, East US, France Central, Germany West Central, and UK South. To meet growing global demand, the service will soon be available in more regions, including Brazil South, Central India, Central US, East US 2, Italy North, Japan East, North Europe, South Central US, Southeast Asia, Spain Central, Sweden Central, United Arab Emirates North, West Europe, West US 2, and West US 3. In addition to the 21 primary regions, we will also add support for disaster recovery in a number of other Azure regions including Brazil Southeast, Canada East, France South, Germany North, Japan West, North Central US, South India, Sweden South, UAE Central, UK West, and West US.
As part of the continued expansion of Oracle services on Azure, we have new integrations with Microsoft Fabric and Microsoft Sentinel and support for Oracle Autonomous Recovery Service. Visit our sessions at Oracle CloudWorld and read our blog to learn more.
Learn more: https://aka.ms/oracle
Technical documentation: Overview – Oracle Database@Azure | Microsoft Learn
Get skilled: https://aka.ms/ODAA_Learn
Microsoft Tech Community – Latest Blogs –Read More
Day zero support for iOS/iPadOS 18 and macOS 15
With Apple’s recent announcement of iOS/iPadOS 18.0 and macOS 15.0 Sequoia, we’ve been working hard to ensure that Microsoft Intune can provide day zero support for Apple’s latest operating systems so that existing features work as expected.
We’ll continue to upgrade our service and release new features that integrate elements of support for the new operating system (OS) versions.
Apple User Enrollment with Company Portal
With iOS/iPadOS 18, Apple no longer supports profile-based User Enrollment. Due to these changes, Intune will end support for Apple User Enrollment with Company Portal shortly after the release of iOS/iPadOS 18 and you’ll need to use an alternate management method for enrolling devices. We recommend enrolling devices with account driven User Enrollment for similar functionality and an improved user experience. For those looking for a simpler enrollment experience, try the new web based device enrollment for iOS/iPadOS.
Please note, device enrollment with Company Portal will remain unaffected by these changes.
Impact to existing devices and profiles:
After Intune ends support for User Enrollment with Company Portal:
Existing enrolled devices are not impacted and will continue to be enrolled.
Users won’t be able to enroll new devices if they’re targeted with this enrollment type profile.
Intune technical support will only be provided for existing devices enrolled with this method. We won’t provide technical support for any new enrollments.
New settings and payloads
We’ve continued to invest in the data-driven infrastructure that powers the settings catalog, enabling us to provide day zero support for new settings as they’re released by Apple. The Apple settings catalog has been updated to support all of the newly released iOS/iPadOS and macOS settings for both declarative device management (DDM) and mobile device management (MDM) so that your team can have your devices ready for day zero. New settings for DDM include:
Disk Management
External Storage: Control the mount policy for external storage
Network Storage: Control the mount policy for network storage
Safari Extension Settings
Allowed Domains: Control the domain and sub-domains that the extension can access
Denied Domains: Control the domain and sub-domains that the extension cannot access
Private Browsing: Control whether an extension is allowed in Private Browsing
State: Control whether an extension is allowed, disallowed, or configurable by the user
Software Update Settings
Allow Standard User OS Updates: Control whether a standard user can perform Major and Minor software updates
Software Update Settings > Automatic updates
Allowed: Specifies whether automatic downloads of available updates can be controlled by the user
Download: Specifies whether automatic downloads of available updates can be controlled by the user
Install OS Updates: Specifies whether automatic install of available OS updates can be controlled by the user
Install Security Update: Specifies whether automatic install of available security updates can be controlled by the user
Software Update Settings > Deferrals
Combined Period In Days: Specifies the number of days to defer a major or minor OS software update on the device
Major Period In Days: Specifies the number of days to defer a major OS software update on the device
Minor Period In Days: Specifies the number of days to defer a minor OS software update on the device
System Period In Days: Specifies the number of days to defer system or non-OS updates. When set, updates only appear after the specified delay, following the release of the update
Notifications: Configure the behavior of notifications for enforced updates
Software Update Settings > Rapid Security Response
Enable: Control whether users are offered Rapid Security Responses when available
Enable Rollback: Control whether users are offered Rapid Security Response rollbacks
Recommended Cadence: Specifies how the device shows software updates to the user
New settings for MDM include:
Extensible Single Sign On (SSO) > Platform SSO
Authentication Grace Period: The amount of time after a ‘FileVault Policy’, ‘Login Policy’, or ‘Unlock Policy’ is received or updated that unregistered local accounts can be used
FileVault Policy: The policy to apply when using Platform SSO at FileVault unlock on Apple Silicon Macs
Login Policy: The policy to apply when using Platform SSO at the login window
Non Platform SSO Accounts: The list of local accounts that are not subject to the ‘FileVault Policy’, ‘Login Policy’, or ‘Unlock Policy’
Offline Grace Period: The amount of time after the last successful Platform SSO login a local account password can be used offline
Unlock Policy: The policy to apply when using Platform SSO at screensaver unlock
Extensible Single Sign On Kerberos
Allow Password: Allow the user to switch the user interface to Password mode
Allow SmartCard: Allow the user to switch the user interface to SmartCard mode
Identity Issuer Auto Select Filter: A string with wildcards that can use used to filter the list of available SmartCards by issuer. e.g “*My CA2*”
Start In Smart Card Mode: Control if the user interface will start in SmartCard mode
Restrictions
Allow ESIM Outgoing Transfers
Allow Personalized Handwriting Results
Allow Video Conferencing Remote Control
Allow Genmoji
Allow Image Playground
Allow Image Wand
Allow iPhone Mirroring
Allow Writing Tools
System Policy Control
Enable XProtect Malware Upload
With the upcoming Intune September (2409) release, the new DDM settings will be:
Math
Calculator
Basic Mode
Add Square Root
Scientific Mode – Enabled
Programmer Mode – Enabled
Input Modes – Unit Conversion
System Behavior – Keyboard Suggestions
System Behavior – Math Notes
New MDM settings for Intune’s 2409 (September) release include:
System Extensions
Non Removable System Extensions
Non Removable System Extensions UI
Web Content Filter
Hide Deny List URLs
More information on configuring these new settings using the settings catalog can be found at Create a policy using settings catalog in Microsoft Intune.
Updates to ADE Setup Assistant screens within enrollment policies
With Intune’s September (2409) release, there’ll be six new Setup Assistant screens that admins can choose to show or hide when creating an Automated Device Enrollment (ADE) policy. These include three iOS/iPadOS and three macOS Skip Keys that will be available for both existing and new enrollment policies.
Emergency SOS (iOS/iPadOS 16+)
The IT admin can choose to show or hide the iOS/iPadOS Safety (Emergency SOS) setup pane that is displayed during Setup Assistant.
Action button (iOS/iPadOS 17+)
The IT admin can choose to show or hide the iOS/iPadOS Action button configuration pane that is displayed during Setup Assistant.
Intelligence (iOS/iPadOS 18+)
The IT admin can choose to show or hide the iOS/iPadOS Intelligence setup pane that is displayed during Setup Assistant.
Wallpaper (macOS 14+)
The IT admin can choose to show or hide the macOS Sonoma wallpaper setup pane that is displayed after an upgrade. If the screen is hidden, the Sonoma wallpaper will be set by default.
Lockdown mode (macOS 14+)
The IT admin can choose to show or hide the macOS Lockdown Mode setup pane that is displayed during Setup Assistant.
Intelligence (macOS 15+)
The IT admin can choose to show or hide the macOS Intelligence setup pane that is displayed during Setup Assistant.
For more information refer to Apple’s SkipKeys | Apple Developer Documentation.
Updates to supported vs. allowed versions for user-less devices
We previously introduced a new model for enrolling user-less devices (or devices without a primary user) for supported and allowed OS versions to keep enrolled devices secure and efficient. The support statements have been updated to reflect the changes with the iOS/iPadOS 18 and upcoming macOS 15 releases:
Support statement for supported versus allowed macOS versions for devices without a primary user.
If you have any questions or feedback, leave a comment on this post or reach out on X @IntuneSuppTeam. Stay tuned to What’s new in Intune for additional settings and capabilities that will soon be available!
Microsoft Tech Community – Latest Blogs –Read More
LLM Load Test on Azure (Serverless & Managed-Compute)
Introduction
In the ever-evolving landscape of artificial intelligence, the ability to efficiently load test large language models (LLMs) is crucial for ensuring optimal performance and scalability. llm-load-test-azure is a powerful tool designed to facilitate load testing of LLMs running in various Azure deployment settings.
Why Use llm-load-test-azure?
The ability to load test LLMs is essential for ensuring that they can handle real-world usage scenarios. By using llm-load-test-azure, developers can identify potential bottlenecks, optimize performance, and ensure that their models are ready for deployment. The tool’s flexibility, comprehensive feature set, and support for various Azure AI models make it an invaluable resource for anyone working with LLMs on Azure.
Some scenarios where this tool is helpful:
You set up an endpoint and need to determine the number of tokens it can process per minute and the latency expectations.
You implemented a Large Language Model (LLM) on your own infrastructure and aim to benchmark various compute types for your application.
You intend to test the real token throughput and conduct a stress test on your premium PTUs.
Key Features
llm-load-test-azure is packed with features that make it an indispensable tool for anyone working with LLMs on Azure. Here are some of the highlights:
Customizable Testing Dataset: Generate a custom testing dataset tailored to settings similar to your use case. This flexibility ensures that the load tests are as relevant and accurate as possible.
Load Testing Options: The tool supports customizable concurrency, duration, and warmup options, allowing users to simulate various load scenarios and measure the performance of their models under different conditions.
Support for Multiple Azure AI Models: Whether you’re using Azure OpenAI, Azure OpenAI Embedding, Azure Model Catalog serverless (Maas), or managed-compute (MaaP), llm-load-test-azure has you covered. The tool’s modular design enables developers to integrate new endpoints with minimal effort.
Detailed Results: Obtain comprehensive statistics like throughput, time-to-first-token, time-between-tokens and end2end latency in JSON format, providing valuable insights into the performance of your models.
Getting Started
Using llm-load-test-azure is straightforward. Here’s a quick guide to get you started:
Generate Dataset (Optional): Create a custom dataset using the generate_dataset.py script. Specify the input and output lengths, the number of samples, and the output file name.
[ python datasets/generate_dataset.py –tok_input_length 250 –tok_output_length 50 –N 100 –output_file datasets/random_text_dataset.jsonl ]
–tok_input_length: The length of the input. minimum 25.
–tok_output_length: The length of the output.
–N: The number of samples to generate.
–output_file: The name of the output file (default is random_text_dataset.jsonl).
Run the Tool: Execute the load_test.py script with the desired configuration options. Customize the tool’s behavior using a YAML configuration file, specifying parameters such as output format, storage type, and warmup options.
load_test.py [-h] [-c CONFIG] [-log {warn,warning,info,debug}]
optional arguments:
-h, –help show this help message and exit
-c CONFIG, –config CONFIG
config YAML file name
-log {warn,warning,info,debug}, –log_level {warn,warning,info,debug}
Provide logging level. Example –log_level debug, default=warning
Results
The tool will produce comprehensive statistics like throughput, time-to-first-token, time-between-tokens and end2end latency in JSON format, providing valuable insights into the performance of your LLM Azure endpoint.
Example of the json output:
“results”: [ # stats on a request level
…
],
“config”: { # the run settings
…
“load_options”: {
“type”: “constant”,
“concurrency”: 8,
“duration”: 20
…
},
“summary”: { # overall stats
“output_tokens_throughput”: 159.25729928295627,
“input_tokens_throughput”: 1592.5729928295625,
“full_duration”: 20.093270540237427,
“total_requests”: 16,
“complete_request_per_sec”: 0.79, # number of competed requests / full_duration
“total_failures”: 0,
“failure_rate”: 0.0
#time per ouput_token
“tpot”: {
“min”: 0.010512285232543946,
“max”: 0.018693844079971312,
“median”: 0.01216195583343506,
“mean”: 0.012808671338217597,
“percentile_80”: 0.012455177783966065,
“percentile_90”: 0.01592913103103638,
“percentile_95”: 0.017840550780296324,
“percentile_99”: 0.018523185420036312
},
#time to first token
“ttft”: {
“min”: 0.4043765068054199,
“max”: 0.5446293354034424,
“median”: 0.46433258056640625,
“mean”: 0.4660029411315918,
“percentile_80”: 0.51033935546875,
“percentile_90”: 0.5210948467254639,
“percentile_95”: 0.5295632600784301,
“percentile_99”: 0.54161612033844
},
#input token latency
“itl”: {
“min”: 0.008117493672586566,
“max”: 0.01664590356337964,
“median”: 0.009861880810416522,
“mean”: 0.010531313198552402,
“percentile_80”: 0.010261738599844314,
“percentile_90”: 0.013813444118403915,
“percentile_95”: 0.015781731761280615,
“percentile_99”: 0.016473069202959836
},
#time to ack
“tt_ack”: {
“min”: 0.404374361038208,
“max”: 0.544623851776123,
“median”: 0.464330792427063,
“mean”: 0.46600091457366943,
“percentile_80”: 0.5103373527526855,
“percentile_90”: 0.5210925340652466,
“percentile_95”: 0.5295597910881042,
“percentile_99”: 0.5416110396385193
},
“response_time”: {
“min”: 2.102457046508789,
“max”: 3.7387688159942627,
“median”: 2.3843793869018555,
“mean”: 2.5091602653265,
“percentile_80”: 2.4795608520507812,
“percentile_90”: 2.992232322692871,
“percentile_95”: 3.541854977607727,
“percentile_99”: 3.6993860483169554
},
“output_tokens”: {
“min”: 200,
“max”: 200,
“median”: 200.0,
“mean”: 200.0,
“percentile_80”: 200.0,
“percentile_90”: 200.0,
“percentile_95”: 200.0,
“percentile_99”: 200.0
},
“input_tokens”: {
“min”: 2000,
“max”: 2000,
“median”: 2000.0,
“mean”: 2000.0,
“percentile_80”: 2000.0,
“percentile_90”: 2000.0,
“percentile_95”: 2000.0,
“percentile_99”: 2000.0
},
}
}
Conclusion
llm-load-test-azure is a powerful and versatile tool that simplifies the process of load testing large language models on Azure. Whether you’re a developer or AI enthusiast, this repository provides the tools you need to ensure that your models perform optimally under various conditions. Check out the repository on GitHub and start optimizing your LLMs today!
Bookmark this Github link: maljazaery/llm-load-test-azure (github.com)
Acknowledgments
Special thanks to Zack Soenen for code contributions, Vlad Feigin for feedback and reviews, and Andrew Thomas, Gunjan Shah and my manager Joel Borellis for ideation and discussions.
llm-load-test-azure tool is derived from the original load test tool [openshift-psap/llm-load-test (github.com)]. Thanks to the creators.
Disclaimer
This tool is unofficial and not a Microsoft product. It is still under development, so feedback and bug reports are welcome.
Microsoft Tech Community – Latest Blogs –Read More
Microsoft at Open Source Summit Europe 2024
Join Microsoft at Open Source Summit Europe, from September 16 to 18, 2024. This event gathers open source developers, technologists, and community leaders to collaborate, share insights, address challenges, and gain knowledge—advancing open source innovation and ensuring a sustainable ecosystem. Open Source Summit features a series of events focused on the most critical technologies, topics, and issues in the open source community today.
Register for Open Source Summit Europe 2024 today!
Attend Microsoft sessions
Attend a Microsoft session at Open Source Summit Europe to learn more about Microsoft’s contributions to open source communities, gain valuable insights from industry experts, and stay up to date on the latest open source trends. Be sure to add these exciting sessions to your event schedule.
Monday, September 16, 2024
Session
Speakers
Time
The Open Source AI Definition is (Almost) Ready
Justin Colannino, Microsoft
Stefano Maffulli, Open Source Initiative
2:15 PM to
2:55 PM CEST
Tuesday, September 17, 2024
Session
Speakers
Time
Keynote: OSS Security Through Collaboration
Ryan, Waite, Open Source Strategy and Incubations, Microsoft
9:50 AM to 10:05 AM CEST
Linux Sandboxing with Landlock
Mickaël Salaün, Senior Software Engineer, Microsoft
11:55 AM to 12:35 PM CEST
Danielle Tal, Microsoft; Mauro Morales, Spectro Cloud; Felipe Huici, Unikraft GmbH; Richard Brown, SUSE; Erik Nordmark, Zededa
11:55 AM to 12:35 PM PDT
Wednesday, September 18, 2024
Session
Speakers
Time
Panel: Why Open Source AI Matters for Europe
Justin Colannino, Microsoft; Sachiko Muto, OpenForum; Stefano Maffulli, Open Source Initiative; Cailean Osborne, The Linux Foundation
11:55 AM to 12:35 PM CEST
Open-Source Software Engineering Education
Stephen Walli, Principal Programmer Manager, Microsoft
3:10 PM to 3:50 PM CEST
Visit us at the Microsoft booth and experience exciting sessions and demos
Come visit us at booth D3 to engage with fellow open source enthusiasts at Microsoft, experience live demos on the latest open source technologies, and discuss the future of open source. You can also catch exciting sessions in the booth to learn more about a wide range of open source topics, including the following and more:
.NET 9
Azure Kubernetes Service
Flatcar Container Linux
Headlamp
Inspektor Gadget and eBPF observability
Linux on Azure
PostgreSQL
WebAssembly
We hope to see you in Vienna next week!
Learn more about Linux and open source at Microsoft
Open Source at Microsoft —explore the open source projects, programs, and tools at Microsoft.
Linux on Azure —learn more about building, running, and deploying your Linux applications in Azure.
Microsoft Tech Community – Latest Blogs –Read More
Purview eDiscovery’s Big Makeover
New Purview eDiscovery Due “by end of 2024”
eDiscovery is probably not where most Microsoft 365 tenant administrators spend a lot of time. Running eDiscovery cases is quite a specialized task. Often, large enterprises have dedicated compliance teams to handle finding, refining, analyzing, and understanding the material unearthed during eDiscovery together with liaison with outside legal and other expertise.
Starting with Exchange 2010, Microsoft recognized that eDiscovery was a necessity. SharePoint Server had its own eDiscovery center, and these elements moved into Office 365. In concert with their own work, Microsoft bought Equivio, a specialized eDiscovery company, in January 2015 to acquire the technology that became the eDiscovery premium solution.
Over the last few years, Microsoft has steadily added to the feature set of the eDiscovery premium solution while leaving the eDiscovery standard and content search solutions relatively unchanged. The last makeover that content search received was in 2021, and it wasn’t very successful. I thought it was slow and unwieldy. Things have improved since, but content searches have never been a great example of snappy performance and functionality, even if some good changes arrived, like the KQL query editor in 2022. (Microsoft has now renamed the keyword-based query lanuage to be KeyQL to differentiate it from the Kusto Query Language used with products like Sentinel).
Time marches on, and Microsoft has decided to revamp eDiscovery. In an August 12, 2024,announcement, Microsoft laid out its plans for the next generation of eDiscovery. The software is available in preview, but only in the new Microsoft Purview portal.
The new portal handles both Purview compliance and data governance solutions. Microsoft plans to retire the current Purview compliance portal by the end of 2024 (Figure 1). Whether that date is achieved is quite another matter. As reported below, there’s work to be done to perfect the new portal before retirement is possible.
Big Changes in the New Purview eDiscovery
Apart from a refreshed UI, the big changes include:
Rationalization of eDiscovery into a single UI. Today, Purview includes content searches, eDiscovery standard, and eDiscovery premium, each with their own UI and quirks. In the new portal, a single eDiscovery solution covers everything, with licensing dictating the functionality revealed to users. If you have an E5 license, you get premium eDiscovery with all its bells and whistles. If you have E3, you’ll get standard eDiscovery.
Better data source management: Microsoft 365 data sources span many different types of information. In the past, eDiscovery managers picked individual mailboxes, sites, and OneDrive accounts to search. A new data source picker integrates all sources
Support for sensitivity labels and sensitive information types within queries: The query builder supports looking for documents and messages that contain sensitive information types (SITs, as used by DLP and other Purview solutions) or protected by sensitivity labels. Overall, the query builder is much better than before (Figure 2).
The output of queries is handled differently too. Statistics are presented after a query runs (Figure 3), and the ability to test a sample set to determine if the query finds the kind of items that you’re looking for still exists.
Exporting query results doesn’t require downloading an app. Everything is taken care of by a component called the Process manager that coordinates the retrieval of information from the various sources where the query found hits. Everything is included in a compressed file that includes individual SharePoint files, PSTs for messages found in Exchange mailboxes, and a folder called “LooseFile” that appears to include Copilot for Microsoft 365 chats and meeting recaps.
Not Everything Works in the New Purview eDiscovery
Like any preview, not everything is available in the software available online. For instance, I could not create a query based on sensitivity labels. More frustratingly, I could find no trace of content searches in the new interface, despite Microsoft’s assertion that “users still have access to all existing Content Searches and both Standard and Premium eDiscovery cases on the unified eDiscovery case list page in the Microsoft Purview portal.” Eventually and after originally posting this article, a case called Content Searches appeared at the bottom of the case list. Navigating to the bottom of a case list (which could be very long) isn’t a great way to find content searches and it seems unnecessarily complicated. Perhaps a dedicated button to open content searches would work better?
Many administrators have created content searches in the past to look for data. For instance, you might want to export selective data from an inactive mailbox. In the new eDiscovery, content searches are created as standard eDiscovery cases, a change that Microsoft says improves security control by allowing the addition or removal of users from the case. Given that I have 100+ content searches in one case, I think that the new arrangement overcomplicates matters (how can I impose granular security on any one of the content searches if they’re all lumped together into one case?). It’s an example of how the folks developing the eDiscovery solution have never considered how tenant administrators use content searches in practice.
Interestingly, Microsoft says that the purge action for compliance searches can now remove 100 items at a time from an Exchange mailbox. They mention Teams in the same sentence, but what this really means is that the purge can remove compliance records for Teams from the mailbox that later synchronize with Teams clients to remove the actual messages.
Much More to Discover
Leaving aside the obvious pun, there is lots more to investigate in the new eDiscovery. If you are an eDiscovery professional, you’ll be interested in understanding how investigations work and whether Copilot (Security and Microsoft 365) can help, especially with large review sets. If you’re a tenant administrator, you should make sure that you understand how content searches and exports work. Microsoft has an interactive guide to help, but more importantly, we will update the eDiscovery chapter in the Office 365 for IT Pros eBook once the new software is generally available.
Learn how to exploit eDiscovery and the data available to Microsoft 365 tenant administrators through the Office 365 for IT Pros eBook. We love figuring out how things work.
Using Guest Accounts to Bypass the Teams Meeting Lobby
And Why You Might Need to Change Account to Attend a Teams Meeting
Earlier this week I discussed a change made in how Teams copies text from messages that reduces user irritation. Let me balance the books by explaining a different aspect of Teams that continues to vex me.
I’m waiting to be accepted into a Teams meeting and wondering why I’m forced to wait in the lobby. I know that the organization wants people to use their guest accounts when attending meetings because of concerns about data leakage, so it’s annoying to have to twiddle my thumbs in the virtual lobby as the minutes tick by. And then the answer strikes: I’m attempting to join the meeting using my account rather than a guest account. After exiting, I rejoin after selecting my guest identity and enter the meeting without pausing in the lobby.
The UI to Change User Accounts
All of this happens because of what seems to be a major (to me) UI flaw in Teams. Figure 1 is the screen that appears when attempting to join a Teams meeting in a host tenant. By default, the user account from the home tenant is selected. If other accounts are available, the Change option appears to allow the user to select a different account. Teams knows if you have a guest account for the host tenant because it is listed under Accounts and Orgs in Teams settings.
Figure 1: The option to change account to attend a Teams meeting in another tenant
You can switch to the account by selecting it from the list (Figure 2).
Because the meeting is limited to tenant and guest accounts, a connection request using the guest account sails through without meeting any lobby restrictions.
I can appreciate what the Teams UI designers were trying to do when they placed the Change button on the dialog. It makes sense to offer users the choice to switch accounts. The problem is that the option is just a tad too subtle and that leads to it being overlooked. I know I am not the only one in this situation because it has happened to a bunch of people who might know better.
Managing Access to Confidential Calls
MVPs are members of the Microsoft Most Valuable Professional program. Part of the benefits of being an MVP are product briefings about new features or plans that Microsoft has to improve their software, including Teams. All such briefings are under a strict Non-Disclosure Agreement (NDA) and people are required to join meetings using the guest account created for them by Microsoft. The restriction is enforced by the lobby setting for meetings to allow tenant accounts and guests to bypass the lobby. It is a reasonable restriction because Microsoft needs to know who they’re talking to, and a guest account is a good indication that an external person has been vetted for access to a tenant.
I commonly attend several product briefings each week. And on a regular basis, I fail to switch to my guest account before attempting to join calls. The result is that I spend time waiting in the lobby thinking that it would be nice if someone started the call soon before I realize what’s going on or a presenter recognizes my name in the lobby and lets me in. I’ve been known to become distracted while waiting to be admitted from the lobby and miss the entire call.
Automatic Switching Would Help
Teams knows what the meeting setting is for lobby bypass. It knows if the person joining a call can bypass the lobby with one or more accounts. It would be terrific if Teams could apply some intelligence to the situation and prompt the user to change if their current account can’t bypass the lobby. I might make more calls then.
Make sure that you’re not surprised about changes that appear inside Microsoft 365 applications by subscribing to the Office 365 for IT Pros eBook. Our monthly updates make sure that our subscribers stay informed.
Copilot’s Automatic Summary for Word Documents
Automatic Document Summary in a Bulleted List
Last week, I referenced the update for Word where Copilot for Microsoft 365 generates an automatic summary for documents. This is covered in message center notification MC871010 (Microsoft 365 roadmap item 399921). Automatic summaries are included in Copilot for Microsoft 365 and Microsoft Copilot Pro (the version that doesn’t ground prompts using Graph data).
As soon as I published the article where I referred to the feature, it turned up in Word. Figure 1 shows the automatic summary generated for a document (in this case, the source of an article).
The summary is the same output as the bulleted list Copilot will generate if you open the Copilot pane and ask Copilot to summarize this doc. Clicking the Ask a question button opens the Copilot pane with the summary prepopulated ready for the user to delve deeper into the summary.
The summary is only available after a document is saved and closed. The next time someone opens the document, the summary pane appears at the top of the document and Copilot generates the summary. The pane remains at the top of the document and doesn’t appear on every page. If Copilot thinks it necessary (for instance, if more text is added to a document), it displays a Check for new summary button to prompt the user to ask Copilot to regenerate the summary.
Apart from removing the Copilot license from an account (in which case the summaries don’t appear), there doesn’t seem to be a way to disable the feature. You can collapse the summary, but it’s still there and can be expanded at any time.
Summarizing Large Word Documents
When Microsoft launched Copilot support for Word, several restrictions existed. For instance, Word couldn’t ground user prompts against internet content. More importantly, summarization could only handle relatively small documents. The guidance was that Word could handle documents with up to 15,000 words but would struggle thereafter.
This sounds a lot, and it’s probably enough to handle a large percentage of the documents generated within office environments. However, summaries really come into their own when they extract information from large documents commonly found in contracts and plans. The restriction, resulting from the size of the prompt that could be sent to the LLM, proved to be a big issue.
Microsoft responded in in August 2024 with an announcement that Word could now summarize documents of up to 80,000 words. In their text, Microsoft says that the new limit is four times greater than the previous limit. The new limit is rolling out for desktop, mobile, and browser versions of Word. For Windows, the increased limit is available in Version 2310 (Build 16919.20000) or later.
Processing Even Larger Word Documents
Eighty thousand words sounds a lot. At an average of 650 words per page, that’s 123 pages filled with text. I wanted to see how Copilot summaries coped with larger documents.
According to this source, the maximum size of a text-only Word document is 32 MB. With other elements included, the theoretical size extends to 512 MB. I don’t have documents quite that big, but I do have the source document for the Office 365 for IT Pros eBook. At 1,242 pages and 679,800 characters, including many figures, tables, cross-references, and so on, the file size is 29.4 MB.
Copilot attempted to generate a summary for Office 365 for IT Pros but failed. This wasn’t surprising because the file is so much larger than the maximum supported.
The current size of the Automating Microsoft 365 with PowerShell eBook file is 1.72 MB and spans 113,600 words in 255 pages. That’s much closer to the documented limit, and Copilot was able to generate a summary (Figure 2).
Although the bulleted list contains information extracted from the file, it doesn’t reflect the true content of the document because Copilot was unable to send the entire file to the LLM for processing. The bulleted list comes from the first two of four chapters and completely ignores the chapters dealing with the Graph API and Microsoft Graph PowerShell SDK.
Summaries For Standard Documents
Microsoft hasn’t published any documentation that I can find for Copilot’s automatic document summary feature. When it appears, perhaps the documentation will describe how to disable the feature for those who don’t want it. If not, we’ll just have to cope with automatic summaries. At least they will work for regular Word documents of less than 80,000 words.
So much change, all the time. It’s a challenge to stay abreast of all the updates Microsoft makes across the Microsoft 365 ecosystem. Subscribe to the Office 365 for IT Pros eBook to receive monthly insights into what happens, why it happens, and what new features and capabilities mean for your tenant.