Category: Microsoft
Category Archives: Microsoft
Windows Insider updates no longer update?
Recently I’ve noticed that my Windows Insider updates have stopped updating. I’ve checked the Settings app and the Update & Security section, but it seems that Windows is no longer downloading or installing any new updates. I’m worried that my system might be vulnerable to security risks if it’s not getting the latest updates.
I’m not sure if it’s a problem with my Windows Insider account, my computer settings, or something else entirely. I’d appreciate any help or guidance on how to get my Windows Insider updates working again.
Recently I’ve noticed that my Windows Insider updates have stopped updating. I’ve checked the Settings app and the Update & Security section, but it seems that Windows is no longer downloading or installing any new updates. I’m worried that my system might be vulnerable to security risks if it’s not getting the latest updates. I’m not sure if it’s a problem with my Windows Insider account, my computer settings, or something else entirely. I’d appreciate any help or guidance on how to get my Windows Insider updates working again. Read More
Efficient Time-Based Audit Log Filtering for Azure SQL Database
We are happy to announce that, we’ve released an enhanced version of fn_get_audit_file. Now, customers can easily specify their desired time range to retrieve audit logs more efficiently.
The sys.fn_get_audit_file_v2 function in Azure SQL Database is designed to retrieve audit log data with enhanced efficiency compared to its predecessor, sys.fn_get_audit_file. The new function introduces time-based filtering at both the file and record levels, providing significant performance improvements, particularly for queries targeting specific time ranges.
Function Syntax
sys.fn_get_audit_file_v2 ( file_pattern, initial_file_name, audit_record_offset, start_time, end_time )
for more details, please refer the documentation here: https://learn.microsoft.com/en-us/sql/relational-databases/system-functions/sys-fn-get-audit-file-v2-transact-sql
Microsoft Tech Community – Latest Blogs –Read More
Copilot in Outlook – Access from top nav bar is always in English
Hi,
I’ve noticed that that within Outlook, the Copilot chat interface remains in English, despite the application being set to a different language (French, in my case). Could someone explain how to adjust the language settings of the Copilot chat (found in the top navigation bar) to match the language used in the app?
Hi, I’ve noticed that that within Outlook, the Copilot chat interface remains in English, despite the application being set to a different language (French, in my case). Could someone explain how to adjust the language settings of the Copilot chat (found in the top navigation bar) to match the language used in the app? Read More
After deploying my Python Bot in azure its not working
I am trying to create an Azure bot using the Python Azure sdk framework and deploy it on azure app service azure bot. The bot works well on my local with emulator (windows laptop). But once I deploy it does not work in the test web chat option in Azure.
My app.py file code –
import sqlite3
from pathlib import Path
import sys
import traceback
from datetime import datetime
from googleai import ai
from aiohttp import web
from aiohttp.web import Request, Response, json_response
from botbuilder.core import (
BotFrameworkAdapterSettings,
TurnContext,
BotFrameworkAdapter,
ConversationState,
UserState,
MemoryStorage
)
from botbuilder.core.integration import aiohttp_error_middleware
from botbuilder.schema import Activity, ActivityTypes
from bot import MyBot
from config import DefaultConfig
CONFIG = DefaultConfig()
# Create adapter.
# See https://aka.ms/about-bot-adapter to learn more about how bots work.
SETTINGS = BotFrameworkAdapterSettings(CONFIG.APP_ID, CONFIG.APP_PASSWORD)
ADAPTER = BotFrameworkAdapter(SETTINGS)
# Catch-all for errors.
async def on_error(context: TurnContext, error: Exception):
# This check writes out errors to console log .vs. app insights.
# NOTE: In production environment, you should consider logging this to Azure
# application insights.
print(f”n [on_turn_error] unhandled error: {error}”, file=sys.stderr)
traceback.print_exc()
# if(context.activity.text==””)
await context.send_activity(
“i didnt understand what you mean ,pls rewrite your questin with more info what you looking for”
)
# Send a trace activity if we’re talking to the Bot Framework Emulator
if context.activity.channel_id == “emulator”:
# Create a trace activity that contains the error object
trace_activity = Activity(
label=”TurnError”,
name=”on_turn_error Trace”,
timestamp=datetime.utcnow(),
type=ActivityTypes.trace,
value=f”{error}”,
value_type=”https://www.botframework.com/schemas/error”,
)
# Send a trace activity, which will be displayed in Bot Framework Emulator
await context.send_activity(trace_activity)
ADAPTER.on_turn_error = on_error
def create_database():
conn = sqlite3.connect(‘Chinook.db’)
speech_file_path = Path(__file__).parent / “Chinook_Sqlite.sql”
with open(speech_file_path, ‘r’,encoding=’cp1252′, errors=’replace’) as f:
sql_script = f.read()
conn.executescript(sql_script)
conn.close()
memstore = MemoryStorage()
constate = ConversationState(memstore)
userstate = UserState(memstore)
# Create the Bot
BOT = MyBot(constate,userstate,CONFIG.EXPIRE_AFTER_SECONDS)
# Listen for incoming requests on /api/messages
async def messages(req: Request) -> Response:
# Main bot message handler.
if “application/json” in req.headers[“Content-Type”]:
body = await req.json()
else:
return Response(status=415)
activity = Activity().deserialize(body)
auth_header = req.headers[“Authorization”] if “Authorization” in req.headers else “”
response = await ADAPTER.process_activity(activity, auth_header, BOT.on_turn)
if response:
return json_response(data=response.body, status=response.status)
return Response(status=201)
def init_func(argv):
APP = web.Application(middlewares=[aiohttp_error_middleware])
APP.router.add_post(“/api/messages”, messages)
return APP
if __name__ == “__main__”:
APP = init_func(None)
try:
web.run_app(APP, host=”0.0.0.0″, port=CONFIG.PORT)
except Exception as error:
raise error
Startup command is – python3 -m aiohttp.web -H 0.0.0.0 -P 8000 app:init_func
Error –
IN App Service -> Log stream –
Container bot-webapp01_0_aa92351a for site bot-webapp01 has exited, failing site start
Container bot-webapp01_0_aa92351a didn’t respond to HTTP pings on port: 8000, failing site start. See container logs for debugging.
I am trying to create an Azure bot using the Python Azure sdk framework and deploy it on azure app service azure bot. The bot works well on my local with emulator (windows laptop). But once I deploy it does not work in the test web chat option in Azure.My app.py file code -import sqlite3
from pathlib import Path
import sys
import traceback
from datetime import datetime
from googleai import ai
from aiohttp import web
from aiohttp.web import Request, Response, json_response
from botbuilder.core import (
BotFrameworkAdapterSettings,
TurnContext,
BotFrameworkAdapter,
ConversationState,
UserState,
MemoryStorage
)
from botbuilder.core.integration import aiohttp_error_middleware
from botbuilder.schema import Activity, ActivityTypes
from bot import MyBot
from config import DefaultConfig
CONFIG = DefaultConfig()
# Create adapter.
# See https://aka.ms/about-bot-adapter to learn more about how bots work.
SETTINGS = BotFrameworkAdapterSettings(CONFIG.APP_ID, CONFIG.APP_PASSWORD)
ADAPTER = BotFrameworkAdapter(SETTINGS)
# Catch-all for errors.
async def on_error(context: TurnContext, error: Exception):
# This check writes out errors to console log .vs. app insights.
# NOTE: In production environment, you should consider logging this to Azure
# application insights.
print(f”n [on_turn_error] unhandled error: {error}”, file=sys.stderr)
traceback.print_exc()
# if(context.activity.text==””)
await context.send_activity(
“i didnt understand what you mean ,pls rewrite your questin with more info what you looking for”
)
# Send a trace activity if we’re talking to the Bot Framework Emulator
if context.activity.channel_id == “emulator”:
# Create a trace activity that contains the error object
trace_activity = Activity(
label=”TurnError”,
name=”on_turn_error Trace”,
timestamp=datetime.utcnow(),
type=ActivityTypes.trace,
value=f”{error}”,
value_type=”https://www.botframework.com/schemas/error”,
)
# Send a trace activity, which will be displayed in Bot Framework Emulator
await context.send_activity(trace_activity)
ADAPTER.on_turn_error = on_error
def create_database():
conn = sqlite3.connect(‘Chinook.db’)
speech_file_path = Path(__file__).parent / “Chinook_Sqlite.sql”
with open(speech_file_path, ‘r’,encoding=’cp1252′, errors=’replace’) as f:
sql_script = f.read()
conn.executescript(sql_script)
conn.close()
memstore = MemoryStorage()
constate = ConversationState(memstore)
userstate = UserState(memstore)
# Create the Bot
BOT = MyBot(constate,userstate,CONFIG.EXPIRE_AFTER_SECONDS)
# Listen for incoming requests on /api/messages
async def messages(req: Request) -> Response:
# Main bot message handler.
if “application/json” in req.headers[“Content-Type”]:
body = await req.json()
else:
return Response(status=415)
activity = Activity().deserialize(body)
auth_header = req.headers[“Authorization”] if “Authorization” in req.headers else “”
response = await ADAPTER.process_activity(activity, auth_header, BOT.on_turn)
if response:
return json_response(data=response.body, status=response.status)
return Response(status=201)
def init_func(argv):
APP = web.Application(middlewares=[aiohttp_error_middleware])
APP.router.add_post(“/api/messages”, messages)
return APP
if __name__ == “__main__”:
APP = init_func(None)
try:
web.run_app(APP, host=”0.0.0.0″, port=CONFIG.PORT)
except Exception as error:
raise errorStartup command is – python3 -m aiohttp.web -H 0.0.0.0 -P 8000 app:init_func Error – IN App Service -> Log stream -Container bot-webapp01_0_aa92351a for site bot-webapp01 has exited, failing site startContainer bot-webapp01_0_aa92351a didn’t respond to HTTP pings on port: 8000, failing site start. See container logs for debugging. Read More
Taskbar thumbnails preview not showing on top taskbar
I’m not usually one to ask for help, but I’m stuck with an issue. When I move the taskbar to the top of the screen in Windows 11 (version 21H2), the thumbnail previews don’t show up when I hover over the taskbar icons. All the taskbar settings are correct because the previews work fine when the taskbar is at the bottom.
I’ve searched all over the internet but haven’t found a solution or even a clue about how to fix this. Is it possible to get the previews working with the taskbar at the top? Any help would be appreciated.
I’m not usually one to ask for help, but I’m stuck with an issue. When I move the taskbar to the top of the screen in Windows 11 (version 21H2), the thumbnail previews don’t show up when I hover over the taskbar icons. All the taskbar settings are correct because the previews work fine when the taskbar is at the bottom. I’ve searched all over the internet but haven’t found a solution or even a clue about how to fix this. Is it possible to get the previews working with the taskbar at the top? Any help would be appreciated. Read More
How to Resolve Script Error in QBO/QBS?
I’m encountering a script error in QBO/QBS that disrupts my workflow. The error appears when accessing certain features or running reports. How can I fix this issue? Any troubleshooting steps or solutions would be greatly appreciated.
I’m encountering a script error in QBO/QBS that disrupts my workflow. The error appears when accessing certain features or running reports. How can I fix this issue? Any troubleshooting steps or solutions would be greatly appreciated. Read More
What to Do When Getting Error 503 in QBO?
I’m getting Error 503 in QBO and it’s disrupting my workflow. Can anyone provide detailed steps or troubleshooting tips to resolve this issue?
I’m getting Error 503 in QBO and it’s disrupting my workflow. Can anyone provide detailed steps or troubleshooting tips to resolve this issue? Read More
How to sum the minimum values for each Building with other conditions
Hello,
I want to sum the minimum values (Column Capacity) for each Building (column Building) for a certain process while meeting these conditions:
The Building Status should be “Ready”the Tool Status should be “Ready”
I want the formula to be in one cell. For example the result of the formula will be 12 based on the below example data.
Thank you
Hello, I want to sum the minimum values (Column Capacity) for each Building (column Building) for a certain process while meeting these conditions:The Building Status should be “Ready”the Tool Status should be “Ready”I want the formula to be in one cell. For example the result of the formula will be 12 based on the below example data. Thank you Read More
How to Resolve QBO Error PS107 During Update?
I’m encountering Error PS107 while updating QBO. Has anyone else experienced this issue and found a solution? Any detailed steps to fix this error would be greatly appreciated!
I’m encountering Error PS107 while updating QBO. Has anyone else experienced this issue and found a solution? Any detailed steps to fix this error would be greatly appreciated! Read More
Cross-tenant user access to Planner (in Teams)
Hi
We are facing an issue with Planner access from cross-tenant member users.
After the user changes Tenant, and tries to access a Planner tab in Teams, he/she receives instantly a “Your session has expired.” popup, what is keep repeating if you try to hit Login now.
We are aware that Planner is not supported on Shared channels (what is meant to be used in cross-tenant collaboration), but this is not a Shared channel.
Can someone confirm if Planner is (or will be) compatible with cross-tenant usage??
Gergely Boruzs
HiWe are facing an issue with Planner access from cross-tenant member users.After the user changes Tenant, and tries to access a Planner tab in Teams, he/she receives instantly a “Your session has expired.” popup, what is keep repeating if you try to hit Login now. We are aware that Planner is not supported on Shared channels (what is meant to be used in cross-tenant collaboration), but this is not a Shared channel. Can someone confirm if Planner is (or will be) compatible with cross-tenant usage??Gergely Boruzs Read More
How to change the language for an end user eLearning module in attack simulation e-learning
Hello,
I have started a campaign and some users would like have the content delivered in their preferred language that might be different from the browser or the M365 account language settings.
How is it possible for an end user to select another language for an e-learning module. I remember that their was a drop down menu but it seems to no longer appears . The campaign is based on the standard e-learning from the library that supports more than 20 languages.
Hello,I have started a campaign and some users would like have the content delivered in their preferred language that might be different from the browser or the M365 account language settings.How is it possible for an end user to select another language for an e-learning module. I remember that their was a drop down menu but it seems to no longer appears . The campaign is based on the standard e-learning from the library that supports more than 20 languages. Read More
I am blocked out of Microsoft Learn
Hi there,
I am trying to access the ‘Sales and presales training’ section within ‘Microsoft Learning’, but it states “This account is currently blocked”.
My company is a Microsoft partner, and I am using (as per instructions) my company email.
Any ideas why my account is blocked and how to resolve this issue? I have been trying to login for 2 weeks now, to no avail. Thanks in advance.
Hi there, I am trying to access the ‘Sales and presales training’ section within ‘Microsoft Learning’, but it states “This account is currently blocked”. My company is a Microsoft partner, and I am using (as per instructions) my company email. Any ideas why my account is blocked and how to resolve this issue? I have been trying to login for 2 weeks now, to no avail. Thanks in advance. Read More
Windows Explorer makes sound when pressing Alt+Left or Alt-Right
Hello. I just started having this problem with Windows Explorer. Every time I press Alt+Left or Alt+Right to navigate through folders, it makes the “Asterisk” sound. It’s getting to be very annoying and I can’t seem to find any way to fix this. The only “solution” that I found was completely disabling system sounds which I don’t wanna do. However, I later found out that if I press Alt+Left or Alt+Right, but I hold down the Alt key, let go of the Left/Right key, press the Down key, and release the Alt key, it doesn’t make any sounds. Can you please get the sounds to stop without me having to press the Down key?
Hello. I just started having this problem with Windows Explorer. Every time I press Alt+Left or Alt+Right to navigate through folders, it makes the “Asterisk” sound. It’s getting to be very annoying and I can’t seem to find any way to fix this. The only “solution” that I found was completely disabling system sounds which I don’t wanna do. However, I later found out that if I press Alt+Left or Alt+Right, but I hold down the Alt key, let go of the Left/Right key, press the Down key, and release the Alt key, it doesn’t make any sounds. Can you please get the sounds to stop without me having to press the Down key? Read More
I can’t create a create checkbox char inside a cell
I have many checkboxes writen in a desktop excel doc, now I have to use excel browser version. The doc already have checkboxes with a cross in it, or an empty box. If I try to copy those checkboxes to other parts it disapear because the orginial font type Opensymbol does not exist. There is no such option to insert symbols (I cant understand why). The thing is that in some parts of the doc it’s used and I can see checkboxes.
I have many checkboxes writen in a desktop excel doc, now I have to use excel browser version. The doc already have checkboxes with a cross in it, or an empty box. If I try to copy those checkboxes to other parts it disapear because the orginial font type Opensymbol does not exist. There is no such option to insert symbols (I cant understand why). The thing is that in some parts of the doc it’s used and I can see checkboxes. Read More
Azure OpenAI Best Practices Insights from Customer Journeys
Introduction
When integrating Azure OpenAI’s powerful models into your production environment, it’s essential to follow best practices to ensure security, reliability, and scalability. Azure provides a robust platform with enterprise capabilities that, when leveraged with OpenAI models like GPT-4, DALL-E 3, and various embedding models, can revolutionize how businesses interact with AI. This guidance document contains best practices for scaling OpenAI applications within Azure, detailing resource organization, quota management, rate limiting, and the strategic use of Provisioned Throughput Units (PTUs) and Azure API Management (APIM) for efficient load balancing.
Why we care?
Large transformer models are mainstream nowadays, creating SoTA results for a variety of tasks. They are powerful but very expensive to train and use. The extremely high inference cost, in both time and memory, is a big bottleneck for adopting a powerful transformer for solving real-world tasks at scale.
Why is it hard to run inference for large transformer models? Besides the increasing size of SoTA models, there are two main factors contributing to the inference challenge (Pope et al. 2022🙁
Large memory footprint. Both model parameters and intermediate states are needed in memory at inference time. For example,
The KV cache should be stored in memory during decoding time; E.g. For a batch size of 512 and context length of 2048, the KV cache totals 3TB, that is 3x the model size (!).
Inference cost from the attention mechanism scales quadratically with input sequence length.
2. Low parallelizability. Inference generation is executed in an autoregressive fashion, making the decoding process hard to parallel.
Best Practices for Azure OpenAI Resources
Consolidate Azure OpenAI workloads under a single Azure subscription to streamline management and cost optimization.
Treat Azure OpenAI resources as a shared service to ensure efficient usage of PTU and PAYG resources.
Utilize separate subscriptions only for distinct development and production environments or for geographic requirements.
Prefer resource groups for regional isolation, which simplifies scaling and management compared to multiple subscriptions.
Maintain a single Azure OpenAI resource per region, allowing up to 30 enabled regions within a single subscription.
Create both PAYG and PTU deployments within each Azure OpenAI resource for each model to ensure flexible scaling.
Leverage PTUs for business critical usage and PAYG for traffic that exceeds the PTU allocation.
Quotas and Rate Limiting
Azure imposes certain quotas and limits to manage resources effectively. Be aware of these limits and plan your usage accordingly. If your application is expected to scale, consider how you’ll manage dynamic quotas and provisioned throughput units (PTUs) to handle the load.
Tokens: Tokens are basic text units processed by OpenAI models. Efficient token management is crucial for cost and load balancing.
Quotas :
OpenAI sets API quotas based on subscription plans, dictating API usage within specific time frames.
Quotas are per model, per region, and per subscription.
Proactively monitor quotas to prevent unexpected service disruptions.
Quotas do not guarantee capacity, and traffic may be throttled if the service is overloaded.
During peak traffic, the service may throttle requests even if the quota has not been reached.
Rate Limiting
Rate limiting ensures equitable API access and system stability.
Rate Limits are imposed on the number of requests per minute (RPM) and the number of tokens per minute (TPM).
Implement backoff strategies to handle rate limit errors effectively.
PTUs: In the Azure OpenAI service, which provides Azure customers access to these models, there are fundamentally 2 different levels of service offerings:
Pay-as-you-go, priced based on usage of the service
Provisioned Throughput Units (PTU), fixed-term commitment pricing
Metrics and Monitoring
Azure OpenAI Metrics Dashboards: Start with the out-of-box dashboards provided by Azure OpenAI in the Azure portal. These dashboards display key metrics such as HTTP requests, tokens-based usage, PTU utilization, and fine-tuning activities, offering a quick snapshot of your deployment’s health and performance.
Analyze Metrics: Utilize Azure Monitor metrics explorer to delve into essential metrics captured by default:
Azure OpenAI Requests: Tracks the total number of API calls split by Status Code.
Generated Completion Tokens and Processed Inference Tokens: Monitors token usage, which is crucial for managing capacity and operational costs.
Provision-managed Utilization V2: Provides insights into utilization percentages, helping prevent overuse and ensuring efficient resource allocation.
Time to Response: Time taken for the first response to appear after a user send a prompt.
To calculate usage-based chargebacks for Provisioned Throughput Units (PTUs) when sharing an Azure OpenAI instance across multiple business units, it is essential to monitor and log token consumption accurately. Incorporate the “azure-openai-emit-token-metric” policy in Azure API Management to emit token consumption metrics directly into Application Insights. This policy facilitates tracking various token metrics such as Total Tokens, Prompt Tokens, and Completion Tokens, allowing for a thorough analysis of service utilization. Configure the policy with specific dimensions such as User ID, Client IP, and API ID to enhance granularity in reporting and insights. By implementing these strategies, organizations can ensure transparent and fair chargebacks based on actual usage, fostering accountability and optimized resource allocation across different business units.
https://learn.microsoft.com/en-us/azure/api-management/azure-openai-emit-token-metric-policy
Given the 2 options, you would probably gravitate toward the pay-as-you-go pricing, this is a logical conclusion for customers just starting to use these models in Proof of Concept/Experimental use cases. But as customer use cases become production-ready, the PTU model will be the obvious choice.
Utilize PTUs for baseline usage of OpenAI workloads to guarantee consistent throughput.
PAYG deployments should handle traffic that exceeds the PTU allocation.
If you think about the Azure OpenAI service as analogous to a freeway, the service helps facilitate Cars (requests) travelling to the models and ultimately back to the original location. The funny thing about highways and interstates, like standard pay-as-you-go deployments, is you are unable to control who is using the highway the same time as you are―which is akin to the service’s utilization we all experience during peak hours of the day. We all have a posted a speed limit, like rate limits, but may never reach the speed we expect due to the factors mentioned above. Moreover, if you managed a fleet of vehicles―vehicles we can think of as service calls―all using different aspects of the highway, you also cannot predict which lane you get stuck in. Some may luckily find the fast lane, but you can never prevent the circumstances ahead in the road. That’s the risk we take when using the highway, but tollways (token-based consumption) all give us the right to use it whenever we want. While some high demand times are foreseeable, such as during rush hour, there may be cases where phantom slowdowns exist where there is no rhyme or reason as to why these slowdowns occur. Therefore, your estimated travel times (Response Latency) can vary drastically based on the different traffic scenarios that can occur on the road.
Provisioned throughput (PTUs) is more analogous to The Boring Company’s Loop than anything else. Unlike public transportation that has predefined stops, the Loop provides a predetermined estimation of time it will take to arrive at your destination because there are no scheduled stops―you travel directly to your destination. Provisioned throughput, like a Loop, is a function of how many tunnels (capacity), stations (client-side queuing), and quantity of vehicles (concurrency) you can handle at any one time. During peak travel times, even the queued wait time to get in the loop at your first station (time to first token) may have you arrive at your destination (end to end response time) faster than taking the highway because the speed limit is conceptually much higher with no traffic. This makes Provisioned throughput much more advantageous―if and only if―we implement a new methodology in our client-side retry logic than how we’ve handled it previously. For instance, if you have a tolerance for longer per-call latencies―only adding a little latency in front of the call (time to first token), exploiting the retry-after-ms value returned from the 429 response―you can define how long you are willing to wait before you redirect traffic to Pay-as-you-go or other models. This implementation will also ensure you are getting the highest throughput out of PTUs as possible.
In summary, for Azure OpenAI use cases that require Predictable, Consistent, and Cost Efficient usage of the service the Provisioned Throughput Unit (PTU) offering becomes the most reasonable solution especially when it comes to business-critical production workloads.
Latency Improvement Techniques:
There are many techniques to improve the underlying use cases by factors.
the model used
number of input tokens
number of completion tokens
infrastructure and load on the inferencing engine
Lets now talk about some the best practices on reducing latency.
Prompt Compression using LLMLingua: As per the benchmark there is a increase in time to first token with size of input token as shown in the below plot hence it is imperative to reduce the input token as effectively as possible. In this we use LLMLingua library which has reduces the input token getting passed to the model. I discussed more on this my previous blog here.
Skeleton Of Thought: This technique makes it possible to generate longer generations more quickly by first generating a skeleton, then generating each point of the outline. SoT first assembles a skeleton request using the skeleton prompt template with the original question. The skeleton prompt template is written to guide the LLM to output a concise skeleton of the answer. Then, we extract the B points from the skeleton response of the LLM. Here you can find the implementation here.
3. Maximizing Shared Prompt: Maximize shared prompt prefix, by putting dynamic portions (e.g. RAG results, history, etc) later in the prompt. This makes your request more KV cache-friendly (which most LLM providers use) and means fewer input tokens are processed on each request.
4. Streaming: The single most effective approach, as it cuts the waiting time to a second or less. (ChatGPT would feel pretty different if you saw nothing until each response was done.)
5. Generating tokens is almost always the highest latency step when using an LLM: as a general heuristic, cutting 50% of your output tokens may cut ~50% your latency. Use of MAX_TOKENS to the actual generation token size helps too.
6. Parallelization: In use cases like Classification you can parallelize request and use async as much as possible.
Load Balancing with Azure API Management (APIM)
APIM plays a pivotal role in managing, securing, and analyzing APIs.
Policies within APIM can be used to manage traffic, secure APIs and enforce usage quotas.
Load Balancing within APIM distributes traffic evenly, ensuring no single instance is overwhelmed.
Circuit Breaker policies in APIM prevent cascading failures and improve system resilience.
Smart Load Balancing with APIM ensures prioritized traffic distribution across multiple OpenAI resources.
APIM Policies for OpenAI:
Many service providers, including OpenAI, set limits on API calls. Azure OpenAI, for instance, has limits on tokens per minute (TPM) and requests per minute (RPM). Exceeding these limits results in a 429 ‘TooManyRequests’ HTTP Status code and a ‘Retry-After’ header, indicating a pause before the next request.
This solution incorporates a comprehensive approach, considering UX/workflow design, application resiliency, fault-handling logic, appropriate model selection, API policy configuration, logging, and monitoring. It introduces an Azure API Management Policy that seamlessly integrates a single endpoint to your applications while efficiently managing consumption across multiple OpenAI or other API backends based on their availability and priority.
Smart vs. Round-Robin Load Balancers
Our solution stands out in its intelligent handling of OpenAI throttling. It is responsive to the HTTP status code 429 (Too Many Requests), a common occurrence due to rate limits in Azure OpenAI. Unlike traditional round-robin methods, our solution dynamically directs traffic to non-throttling OpenAI backends, based on a prioritized order. When a high-priority backend starts throttling, traffic is automatically rerouted to lower-priority backends until the former recovers.
This smart load balancing solution effectively addresses the challenges posed by API limit constraints in Azure OpenAI. By implementing the strategies outlined in the provided documentation, you can ensure efficient and reliable application performance, leveraging the full potential of your OpenAI and Azure API Management resources.
Learn more about this implementation on this github repo.
With support for round-robin, weighted (new), and priority-based (new) load balancing, you can now define your own load distribution strategy according to your specific requirements.
Define priorities within the load balancer configuration to ensure optimal utilization of specific Azure OpenAI endpoints, particularly those purchased as PTUs. In the event of any disruption, a circuit breaker mechanism kicks in, seamlessly transitioning to lower-priority instances based on predefined rules.
Our updated circuit breaker now features dynamic trip duration, leveraging values from the retry-after header provided by the backend. This ensures precise and timely recovery of the backends, maximizing the utilization of your priority backends to their fullest.
Learn more about load balancer and circuit breaker here.
Import OpenAI in APIM
New Import Azure OpenAI as an API in Azure API management provides an easy single click experience to import your existing Azure OpenAI endpoints as APIs.
We streamline the onboarding process by automatically importing the OpenAPI schema for Azure OpenAI and setting up authentication to the Azure OpenAI endpoint using managed identity, removing the need for manual configuration. Additionally, within the same user-friendly experience, you can pre-configure Azure OpenAI policies, such as token limit and emit token metric, enabling swift and convenient setup.
Learn more about Import Azure OpenAI as an API here.
High Availability
Use Azure API Management to route traffic, ensuring centralized security and compliance.
Implement private endpoints to secure OpenAI resources and prevent unauthorized access.
Leverage Managed Identity to secure access to OpenAI resources and other Azure services.
Azure API Management has built a set of GenAI Gateway capabilities:
Azure OpenAI Token Limit Policy
Azure OpenAI Emit Token Metric Policy
Load Balancer and Circuit Breaker
Import Azure OpenAI as an API
Azure OpenAI Semantic Caching Policy (in public preview)
source: https://github.com/Azure-Samples/AI-Gateway
Security and Compliance
Security is paramount when deploying any application in a production environment. Azure OpenAI offers features to help secure your data and comply with various regulations:
Role-based access control (RBAC) allows you to define who has access to what within your Azure resources.
Content filtering and asynchronous content filtering can help ensure that the content generated by the AI models aligns with your policies and standards.
Red teaming large language models (LLMs) can help identify potential vulnerabilities before they become issues.
To use Azure OpenAI in a scenario that does not require key sharing, and thus is more appropriate for the production environment. This scenario has several advantages. If you employ managed identities, you don’t have to handle credentials. In fact, credentials are not even accessible to you. Moreover, you can use managed identities to authenticate to any resource that supports Microsoft Entra authentication, including your own applications. And finally, managed identities are free to use, which is also significant if you have multiple applications that use OpenAI. Learn more about it here.
Responsible AI Practices
Adhering to responsible AI principles is essential. Azure OpenAI provides guidelines and tools to help:
Transparency notes and a code of conduct can guide your AI’s behavior.
Data privacy and security measures are crucial to protect your and your customers’ data.
Monitoring for abuse and managing system message templates can help prevent and respond to any misuse of the AI services.
Azure AI Responsible AI team recently announced Public Preview of ‘Risks & safety monitoring’ feature on Azure OpenAI Service. Microsoft holds the commitment to ensure the development/deployment of AI systems are safe, secure, and trustworthy. And there are a set of tools to help make it possible. In addition to the detection/ mitigation on harmful content in near-real time, the risks & safety monitoring help get a better view of how the content filter mitigation works on real customer traffic and provide insights on potentially abusive end-users. With the risks & safety monitoring feature, customers can achieve:
Visualize the volume and ratio of user inputs/model outputs that blocked by the content filters, as well as the detailed break-down by severity/category. Then use the data to help developers or model owners to understand the harmful request trend over time and inform adjustment to content filter configurations, blocklists as well as the application design.
Understand the risk of whether the service is being abused by any end-users through the “potentially abusive user detection”, which analyzes user behaviors and the harmful requests sent to the model and generates a report for further action taking.
Conclusion
This guidance outlines a strategy for leveraging Azure OpenAI resources at an enterprise level. By centralizing OpenAI resources and adopting smart load balancing with APIM, organizations can maximize their investment in OpenAI, ensuring scalability, cost-effectiveness, and performance across a wide range of applications and use cases.
It takes lot of effort to write this kind of blog, please do clap this blog and follow me to keep me motivated to write more such blogs.
Additional Resources
https://github.com/Azure/AI-in-a-Box/blob/main/guidance/scaling/README.md
Smart load balancing with Azure API Management
Smart load balancing with Azure Container Apps
Using Azure API Management Circuit Breaker and Load balancing with Azure OpenAI Service
https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/azure-openai-offering-models-explain-it-like-i-m-5/ba-p/4112453
https://techcommunity.microsoft.com/t5/azure-integration-services-blog/introducing-genai-gateway-capabilities-in-azure-api-management/ba-p/4146525
https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/best-practice-guidance-for-ptu/ba-p/4152133
For more detailed information on Azure OpenAI’s capabilities, tokens, quotas, rate limits, and PTUs, visit the [Azure OpenAI documentation](https://docs.microsoft.com/en-us/azure/cognitive-services/openai/).
Microsoft Tech Community – Latest Blogs –Read More
Integrating Azure AI and OpenAI for Smart App Development
Hello! Abdulhamid here! I am a Microsoft student ambassador writing from Aston University, where I am studying Applied Artificial Intelligence. I’m excited to share insights on leveraging Azure AI and OpenAI to build cutting-edge applications. Whether you’re a fellow student, a developer, or an AI enthusiast, this step-by-step guide will help you harness the power of OpenAI’s models to enhance existing apps or create new innovative ones. With Azure AI Studio offering a wide array of AI services that can be easily deployed, and the new Azure OpenAI services granting access to advanced generative models, integrating these tools has become more straightforward than ever.
In this article, I will guide you through the process of building a smart nutritionist app by leveraging Azure Document Intelligence service for text extraction and Azure OpenAI’s GPT-4o for human-like, accurate responses.
Prerequisites:
An active Azure subscription
Registered access to Azure OpenAI service
Basic knowledge of Python programming language
Text Extraction with Azure’s pre-built model
For our application, we want to extract the ingredients and nutrition information from the food products we consume. Document Intelligence simplifies this process by accurately extracting text, tables, key-value pairs and other structures from images and documents. While you can train and build custom models with this service, Azure accelerates your app development process by provisioning highly accurate pre-built models.
One such model we will be leveraging is the prebuilt-layout that easily extracts different structures which will be useful for retrieving nutritional information that is mostly printed in tables.
To deploy a Document Intelligence resource:
Sign in to https://portal.azure.com/ and search Document Intelligence
Click on the search result and create on the top pane
Fill in the following to create and deploy the service
Subscription: select your active subscription
Resource group: Select an existing resource group or create a new one
Name: name your resource.
Region: select any region you wish, with east US being the default
Pricing: Select free tier (F0)
Click on review + create and then create to deploy the resource
Once a resource has been deployed, click on “Go to resource”. Scroll to the bottom of the page and copy one of the access keys and endpoint–you need this to connect your app to your deployed service.
Setting up your development environment
Next, set up the environment for your app development:
Create a .env file to hold your credentials
Open your notepad and paste the following:
AZURE_FORM_RECOGNIZER_ENDPOINT=”YOUR_AZURE_FORM_RECOGNIZER_ENDPOINT”
AZURE_FORM_RECOGNIZER_KEY=”YOUR_AZURE_FORM_RECOGNIZER_KEY”
b. Copy your endpoint and anyone of the keys and paste in the placeholders
c. Save the file into a new folder “nutrition_app” with the name .env and save as all files
Open VS code and open your newly created folder
Press Ctrl + Shift + P to open your terminal and run the following commands: pip install azure-ai-formrecognizer==3.3.0
Press Ctrl + Alt + Windows + N, select python file to create a new python file
We need to import the necessary libraries and set up the configurations required for us to connect to the pre-built model. Copy and paste the code below to do that. I have also loaded sample images of a nutrition label and ingredients.
# import modules
import os
from dotenv import load_dotenv
from azure.core.credentials import AzureKeyCredential
from azure.ai.formrecognizer import DocumentAnalysisClient
load_dotenv()
# Azure Form Recognizer Configuration
azure_form_recognizer_endpoint = os.getenv(“AZURE_FORM_RECOGNIZER_ENDPOINT”)
azure_form_recognizer_key = os.getenv(“AZURE_FORM_RECOGNIZER_KEY”)
ingredients_file_uri = “https://github.com/HamidOna/azurelearn/blob/main/20240609_002105.jpg?raw=true”
nutrition_table_file_uri = “https://github.com/HamidOna/azurelearn/blob/main/20240608_192914.jpg?raw=true”
fileModelId = “prebuilt-layout”
This pre-built model can recognize tables, lines, and words. We would extract the ingredients as lines and the nutrition facts as a table to properly parse the structure without getting any of the content jumbled up.
We would also add a few more lines of code to achieve this before we finally print the extracted text.
Copy the code below to the existing script.
document_analysis_client = DocumentAnalysisClient(
endpoint=azure_form_recognizer_endpoint,
credential=AzureKeyCredential(azure_form_recognizer_key)
)
poller_ingredients = document_analysis_client.begin_analyze_document_from_url(
model_id=fileModelId,
document_url=ingredients_file_uri
)
result_ingredients = poller_ingredients.result()
# Extract text labels from the ingredients image
ingredients_content = “”
if result_ingredients.pages:
for idx, page in enumerate(result_ingredients.pages):
for line in page.lines:
ingredients_content += f”{line.content}n”
# Connect to Azure Form Recognizer for nutrition table image
print(f”nConnecting to Forms Recognizer at: {azure_form_recognizer_endpoint}”)
print(f”Analyzing nutrition table at: {nutrition_table_file_uri}”)
poller_nutrition_table = document_analysis_client.begin_analyze_document_from_url(
model_id=fileModelId,
document_url=nutrition_table_file_uri
)
result_nutrition_table = poller_nutrition_table.result()
# Extract table content from the nutrition table image
nutrition_table_content = “”
if result_nutrition_table.tables:
for table_idx, table in enumerate(result_nutrition_table.tables):
table_content = []
for row_idx in range(table.row_count):
row_content = [“”] * table.column_count
table_content.append(row_content)
for cell in table.cells:
table_content[cell.row_index][cell.column_index] = cell.content
nutrition_table_content += f”nTable #{table_idx + 1}:n”
for row in table_content:
nutrition_table_content += “t”.join(row) + “n”
combined_content = f”Ingredients:n{ingredients_content}nNutrition Table:n{nutrition_table_content}”
print(combined_content)
Now save the file as app.py and proceed to run it in your terminal
python app.py
You should get an output similar to this:
Connecting to Openai GPT4o and parsing the extracted text
We have completed the first half of this project. Next, we set up the GPT4o model and then parse our data to generate results.
First create an Openai resource (fill out this registration form if you don’t already have access):
Subscription: select your active subscription
Resource group: Select an existing resource group or create a new one
Name: name your resource.
Region: Select any region from the available lists of regions.
Pricing: Select Standard S0
After deploying the resource, click on “Go to Azure Openai studio” on the top pane
Scroll down on the left pane and click on the “Deployments” page
Click on “create new deployment” next
Select GPT4o in the list of models
Assign a name to the deployment (note down the name so you can connect to it)
Reduce the token rate limit to 7K and then “Create”
Once that has successfully deployed, return to the Openai resource in portal.azure.com and copy the endpoint and access key
Next, go to the app.py script. Open the terminal and run:
pip install openai
Navigate to the .env file and paste the following into the file after replacing with your endpoint and access key from the Openai resource
AZURE_OAI_ENDPOINT=”YOUR_AZURE_FORM_RECOGNIZER_ENDPOINT”
AZURE_OAI_KEY=”YPUR_AZURE_FORM_RECOGNIZER_KEY”
AZURE_OAI_DEPLOYMENT=”YOUR_DEPLOYED_MODEL_NAME”
Copy the imports below and replace in the current script
# import modules
import os
from dotenv import load_dotenv
from azure.core.credentials import AzureKeyCredential
from azure.ai.formrecognizer import DocumentAnalysisClient
from openai import AzureOpenAI
Set up the configurations right after the print statement for the extracted text:
client = AzureOpenAI(
azure_endpoint=azure_oai_endpoint,
api_key=azure_oai_key,
api_version=”2024-05-13″
)
Prompt Engineering
An important part of using LLM models and getting accurate results is prompting.
Prompt engineering allows you to accurately pass instructions to the model on how to behave which is essential to how perfectly it executes its tasks. It’s good practice to spend a few minutes crafting an excellent prompt tailored to your use case.
For this project, we want our model to be able to tell us about the ingredients and provide helpful advice about them. We also want it to print a summary of its report before extensive detail about the ingredients. Another useful tip is to pass an example of a query and an example output from the model. See below the implementation:
# Create a system message
system_message = “””
You are a smug, funny nutritionist who provides health advice based on ingredients and nutrition tables.
Provide advice on what is safe to consume based on the ingredients and nutrition table.
Discuss the ingredients as a whole but single out scientifically named ingredients so the user can understand them better.
Mention the adequate consumption or potential harm based on excessive amounts of substances.
Identify any potential allergies. Output a general summary first before giving further details. Here are a few examples:
– “Example:
{User query}:
Please analyze the following ingredients and nutrition label content:
Ingredients: Potatoes, Vegetable Oils, Salt, Potassium phosphates
Nutrition Table:
– Energy: 532 kcal per 100g
– Fat: 31.5g per 100g
– Sodium: 1.28g per 100g
{System}:
Summary: The ingredients are pretty standard for potato crisps. Potatoes and vegetable oils provide the base, while salt adds flavor. Watch out for the high fat and sodium content if you’re trying to watch your heart health or blood pressure. As for allergies, you’re mostly safe unless you’re allergic to potatoes or sunflower/rapeseed oil. Potassium phosphates? Just some friendly muscle helpers, but keep it moderate!
Potassium phosphates: Ah, the magical salts that help keep your muscles happy. Just don’t overdo it!”
“””
Copy and paste the message above into the existing python script. Next, we parse the extracted text to the model and make a request.
messages_array = [{“role”: “system”, “content”: system_message}]
# Add the extracted nutrition label content to the user messages
messages_array.append({“role”: “user”, “content”: f”Please analyze the following nutrition label content:n{combined_content}”})
# Send request to Azure OpenAI model
response = client.chat.completions.create(
model=azure_oai_deployment,
temperature=0.6,
max_tokens=1200,
messages=messages_array
)
generated_text = response.choices[0].message.content
# Print the summary generated by OpenAI
print(“Summary: ” + generated_text + “n”)
Copy and paste the code above. Save your changes and run in your terminal.
python nutrition_app.py
Voila! You have your own personal food nutritionist. See sample result below:
You can further refine the system message to fit your diet such as watching out for food with high sugar content, specifying allergies, helping you find halal food and so on.
Check out the following resources to improve the app and build your own specific use cases:
Document Intelligence pre-built models
Project on Github with streamlit interface
Microsoft Tech Community – Latest Blogs –Read More
Repeating error encountered regarding Windows Update
prompt im getting “There were some problems installing updates, but we’ll try again later. If you keep seeing this and want to search the web or contact support for information, this may help: (0x80073701)”
can anyone help?
prompt im getting “There were some problems installing updates, but we’ll try again later. If you keep seeing this and want to search the web or contact support for information, this may help: (0x80073701)”can anyone help? Read More
I need to extract or get latest conversation mail with its attachment from graphAPI?
when i am requesting to the azure graphAPI by the below url
f’https://graph.microsoft.com/v1.0/me/mailFolders/{folder}/messages?$top=1&$orderby=receivedDateTime desc’
it is coming the email body html entire conversation message as a whole html. But i need latest conversation either it is a single mail or reply mail and its attachments only , I need. I have tried but the html body giving full conversations as a html content .
when i am requesting to the azure graphAPI by the below urlf’https://graph.microsoft.com/v1.0/me/mailFolders/{folder}/messages?$top=1&$orderby=receivedDateTime desc’it is coming the email body html entire conversation message as a whole html. But i need latest conversation either it is a single mail or reply mail and its attachments only , I need. I have tried but the html body giving full conversations as a html content . Read More
How to embed Outlook’s inbox into my own project?
I wanna ask, does software support outlook’s email inbox to embed into our web project? We already buy the microsoft 365 office.
I wanna ask, does software support outlook’s email inbox to embed into our web project? We already buy the microsoft 365 office. Read More
How to Easily Record Your Screen in Windows 11 for Free
Windows 11 and Windows 10 do offer a built-way to record the screen, which is to use Xbox Game Bar. Some users find this method helpful. But I personally don’t think it’s convenient enough. I also tried Bandicam, which is a popular screen recorder. But I don’t like the user experience.
I finally found an easy-to-use tool to record the screen. I’ve been using it for over two years. It’s free to use, but I believe there’s a paid version. I’m very happy with the free version.
This tool has a web-based version and a desktop version. I prefer the former. You can download the desktop if needed.
Steps: How to screen record in Windows 11 and 10 for free, with or without audio
1. In your browser, visit the official website of the Apowersoft tool.
2. You’ll find a Start Recording button on the page. Click this button.
3. In the Free Online Recording screen, you will see four options.
To record your screen with audio, select Screen and System Sound.To record your screen without audio, select Screen only.
4. Click the Start Recording button, and a message will appear to encourage you to download the app. Click the Continue recording button.
5. A dialog will pop up, allowing to you to select an area to record, such as the entire screen, a specific window, or a specific browser tab. Select the area that you need to record and click Share.
Now the screen recording will start. You can pause/resume or stop the recording at any time.
6. After you finish recording the screen, click the Stop button.
7. You will see a Save button. Click the Save button and then choose MP4 file or Original file based on your needs. Generally the MP4 file is recommended because it’s a popular video format and is widely supported.
This screen recorder is very easy to use. However, it’s not perfect. Instead of the entire screen, I only need to record a certain area on the screen. I wish it allowed me to drag to select an area. All in all, this is a good tool for screen recording considering it’s free to use.
Windows 11 and Windows 10 do offer a built-way to record the screen, which is to use Xbox Game Bar. Some users find this method helpful. But I personally don’t think it’s convenient enough. I also tried Bandicam, which is a popular screen recorder. But I don’t like the user experience. I finally found an easy-to-use tool to record the screen. I’ve been using it for over two years. It’s free to use, but I believe there’s a paid version. I’m very happy with the free version. This tool has a web-based version and a desktop version. I prefer the former. You can download the desktop if needed. Steps: How to screen record in Windows 11 and 10 for free, with or without audio 1. In your browser, visit the official website of the Apowersoft tool. 2. You’ll find a Start Recording button on the page. Click this button. 3. In the Free Online Recording screen, you will see four options. To record your screen with audio, select Screen and System Sound.To record your screen without audio, select Screen only. 4. Click the Start Recording button, and a message will appear to encourage you to download the app. Click the Continue recording button. 5. A dialog will pop up, allowing to you to select an area to record, such as the entire screen, a specific window, or a specific browser tab. Select the area that you need to record and click Share. Now the screen recording will start. You can pause/resume or stop the recording at any time. 6. After you finish recording the screen, click the Stop button. 7. You will see a Save button. Click the Save button and then choose MP4 file or Original file based on your needs. Generally the MP4 file is recommended because it’s a popular video format and is widely supported. This screen recorder is very easy to use. However, it’s not perfect. Instead of the entire screen, I only need to record a certain area on the screen. I wish it allowed me to drag to select an area. All in all, this is a good tool for screen recording considering it’s free to use. Read More