Month: October 2024
“Unexplained Creation of Random Profile.tmp File on Desktop”
I’ve been encountering this issue for a while now. Every time I use my laptop connected to my desktop, I notice a profile.tmp file appears. I’m not certain about the cause, but after some investigation, it seems Valorant might be responsible. I’m unsure about how to address this. I’m hesitant to delete the file because it usually doesn’t solve the issue, and I’m worried deleting it may lead to Valorant crashing, as it has in the past. I plan to reach out to Riot Support for assistance, but I could use some guidance on how to identify the exact source of this file. I’ve assumed it’s related to Riot Games, but I want to confirm this. Thank you in advance for any help!
I’ve been encountering this issue for a while now. Every time I use my laptop connected to my desktop, I notice a profile.tmp file appears. I’m not certain about the cause, but after some investigation, it seems Valorant might be responsible. I’m unsure about how to address this. I’m hesitant to delete the file because it usually doesn’t solve the issue, and I’m worried deleting it may lead to Valorant crashing, as it has in the past. I plan to reach out to Riot Support for assistance, but I could use some guidance on how to identify the exact source of this file. I’ve assumed it’s related to Riot Games, but I want to confirm this. Thank you in advance for any help! Read More
How to reboot your computer
I can’t stress enough how frustrated I am with my Windows computers lately. Since using OneDrive, things have just spiraled downhill.
Today, after enjoying a blissful 3-day break from my computer, I switched it on with the intention of seeking help on an Android forum. I needed advice on connecting a Kenwood radio in my wife’s newly acquired 2002 Toyota Tacoma using an Android app from Google Play. As I was typing my query on the Android Central forum, my screen froze out of nowhere, and I was met with an alarming message:
Now, onto an email mishap: I had already taken a screenshot that I wanted to include here, but when I tried to retrieve it, Outlook acted up and displayed the error message 1725502984440.png. Consequently, I can’t even send myself an email now. I will update once I make some progress, assuming I ever regain access to my email.
As for the other computer issue… The error message, akin to a Microsoft Blue Screen (accompanied by a sad emoji), reads: “Your device encountered a problem and needs to restart. We are in the process of gathering error information and will reboot shortly…0% complete… For more details on this issue and potential fixes, check out the Experience the Power of AI with Windows 11 OS, Computers, & Apps | Microsoft Windows site. If you decide to contact support, provide them with this information: Stop code: CLOCK WATCHDOG TIMEOUT.” This message has lingered on the screen for over 2 hours now. I can’t restart or turn off the laptop since everything is frozen on the screens. I’m left with no option but to wait for the battery to drain completely.
HELP! What steps should I take next?
P.S. I’m sending this message from another computer now.
I can’t stress enough how frustrated I am with my Windows computers lately. Since using OneDrive, things have just spiraled downhill. Today, after enjoying a blissful 3-day break from my computer, I switched it on with the intention of seeking help on an Android forum. I needed advice on connecting a Kenwood radio in my wife’s newly acquired 2002 Toyota Tacoma using an Android app from Google Play. As I was typing my query on the Android Central forum, my screen froze out of nowhere, and I was met with an alarming message: Now, onto an email mishap: I had already taken a screenshot that I wanted to include here, but when I tried to retrieve it, Outlook acted up and displayed the error message 1725502984440.png. Consequently, I can’t even send myself an email now. I will update once I make some progress, assuming I ever regain access to my email. As for the other computer issue… The error message, akin to a Microsoft Blue Screen (accompanied by a sad emoji), reads: “Your device encountered a problem and needs to restart. We are in the process of gathering error information and will reboot shortly…0% complete… For more details on this issue and potential fixes, check out the Experience the Power of AI with Windows 11 OS, Computers, & Apps | Microsoft Windows site. If you decide to contact support, provide them with this information: Stop code: CLOCK WATCHDOG TIMEOUT.” This message has lingered on the screen for over 2 hours now. I can’t restart or turn off the laptop since everything is frozen on the screens. I’m left with no option but to wait for the battery to drain completely. HELP! What steps should I take next? P.S. I’m sending this message from another computer now. Read More
Windows 11 Has Slowed Down in Boot Time
Hello, it’s been a while since we last chatted!
I recently performed a clean reinstall of Windows 11 on my computer. However, I’m facing an unexpected issue where the booting process is unusually slow. This is quite different from my previous experiences. I’ve ensured that the BIOS settings are all set correctly with secure boot enabled, CSM turned off, and UEFI in use. Additionally, I have installed all available updates.
My system disk is labeled as disk 4.
If you have any suggestions or insights, I’d greatly appreciate it!
Hello, it’s been a while since we last chatted! I recently performed a clean reinstall of Windows 11 on my computer. However, I’m facing an unexpected issue where the booting process is unusually slow. This is quite different from my previous experiences. I’ve ensured that the BIOS settings are all set correctly with secure boot enabled, CSM turned off, and UEFI in use. Additionally, I have installed all available updates. My system disk is labeled as disk 4. If you have any suggestions or insights, I’d greatly appreciate it! Read More
Title: Brand New Laptop with Windows 11 Home in S Mode
A family member recently acquired a brand-new HP laptop, and upon initial setup, it was operating on Windows 11 Home in S mode. We decided to switch it out of S mode and back to regular Windows 11 Home as they preferred to avoid the restrictions imposed by S mode.
I’m curious, is this becoming a common practice with new PC acquisitions across different brands or is it primarily associated with HP computers? I recall a recent discussion on this topic, but since it’s been years since I last purchased a new PC, I’m eager to learn more about these emerging trends.
A family member recently acquired a brand-new HP laptop, and upon initial setup, it was operating on Windows 11 Home in S mode. We decided to switch it out of S mode and back to regular Windows 11 Home as they preferred to avoid the restrictions imposed by S mode. I’m curious, is this becoming a common practice with new PC acquisitions across different brands or is it primarily associated with HP computers? I recall a recent discussion on this topic, but since it’s been years since I last purchased a new PC, I’m eager to learn more about these emerging trends. Read More
Accumulated sum – problems with sumif()
Hi, I’ve a “basic” problem:
Y try to create a matrix of accumulated sum having the individual values in A1:J1. My proposal:
=SUMIF(SEQUENCE(,10),”<=”&SEQUENCE(,10),A1:J1)
The proposal is running if I use a cell reference for the first argument, but If I use the above proposal, the response is a #Value error. Maybe there is a simple solution or anybody have a complete different solution.
Thanks for your comments
Hi, I’ve a “basic” problem:Y try to create a matrix of accumulated sum having the individual values in A1:J1. My proposal:=SUMIF(SEQUENCE(,10),”<=”&SEQUENCE(,10),A1:J1)The proposal is running if I use a cell reference for the first argument, but If I use the above proposal, the response is a #Value error. Maybe there is a simple solution or anybody have a complete different solution.Thanks for your comments Read More
华纳娱乐客服微信-13542035
佤邦孟波盛世集团公司客服热线電微【18487984914】【微信同号】老街玉祥集团玉祥娱乐场投资开发有限公司于2006年08月18日成立,公司经营范围包括:在上海开发区内从事房地产开发、销售、自有房屋出租及相关设施的经营管理;室内外装饰工程;在经济技术开发区内建设和经营写字楼、多功能会馆、经济展示信息中心、艺术馆、酒店及其他配套的商务、通讯、交通、购物中心、健身房和娱乐服务设施等。
佤邦孟波盛世集团公司客服热线電微【18487984914】【微信同号】老街玉祥集团玉祥娱乐场投资开发有限公司于2006年08月18日成立,公司经营范围包括:在上海开发区内从事房地产开发、销售、自有房屋出租及相关设施的经营管理;室内外装饰工程;在经济技术开发区内建设和经营写字楼、多功能会馆、经济展示信息中心、艺术馆、酒店及其他配套的商务、通讯、交通、购物中心、健身房和娱乐服务设施等。 Read More
Windows 11 Insider Preview 10.0.26120.2122 (ge_release_upr) unable to update
Windows 11 Insider Preview 10.0.26120.2122 (ge_release_upr) unable to update show enclosed document error
Windows 11 Insider Preview 10.0.26120.2122 (ge_release_upr) unable to update show enclosed document error Read More
Edge Canary .2891
Every since yesterday build .2890 and now today .2891
every time I click on an icon on the Taskbar or just hover
over another icon WindowsExplorer is crashing and the whole taskbar
disappears for about 30 seconds.
I switched to Dev Edge and this did not happen.
Every since yesterday build .2890 and now today .2891every time I click on an icon on the Taskbar or just hoverover another icon WindowsExplorer is crashing and the whole taskbardisappears for about 30 seconds. I switched to Dev Edge and this did not happen. Read More
盛世娱乐上下分客服微信13542035
果敢老街玉祥集团公司客服热线電微【18487984914】【微信同号】老街玉祥集团玉祥娱乐场投资开发有限公司于2006年08月18日成立。法定代表人张忠诚,公司经营范围包括:在上海开发区内从事房地产开发、销售、自有房屋出租及相关设施的经营管理;室内外装饰工程;在经济技术开发区内建设和经营写字楼、多功能会馆、经济展示信息中心、艺术馆、酒店及其他配套的商务、通讯、交通、购物中心、健身房和娱乐服务设施等。
果敢老街玉祥集团公司客服热线電微【18487984914】【微信同号】老街玉祥集团玉祥娱乐场投资开发有限公司于2006年08月18日成立。法定代表人张忠诚,公司经营范围包括:在上海开发区内从事房地产开发、销售、自有房屋出租及相关设施的经营管理;室内外装饰工程;在经济技术开发区内建设和经营写字楼、多功能会馆、经济展示信息中心、艺术馆、酒店及其他配套的商务、通讯、交通、购物中心、健身房和娱乐服务设施等。 Read More
盛世公司客服微信13542035
佤邦孟波盛世集团公司客服热线電微【18487984914】【微信同号】老街玉祥集团玉祥娱乐场投资开发有限公司于2006年08月18日成立,公司经营范围包括:在上海开发区内从事房地产开发、销售、自有房屋出租及相关设施的经营管理;室内外装饰工程;在经济技术开发区内建设和经营写字楼、多功能会馆、经济展示信息中心、艺术馆、酒店及其他配套的商务、通讯、交通、购物中心、健身房和娱乐服务设施等。
佤邦孟波盛世集团公司客服热线電微【18487984914】【微信同号】老街玉祥集团玉祥娱乐场投资开发有限公司于2006年08月18日成立,公司经营范围包括:在上海开发区内从事房地产开发、销售、自有房屋出租及相关设施的经营管理;室内外装饰工程;在经济技术开发区内建设和经营写字楼、多功能会馆、经济展示信息中心、艺术馆、酒店及其他配套的商务、通讯、交通、购物中心、健身房和娱乐服务设施等。 Read More
Printing color bars
I have highlighted color bars in an Excel document but I can’t get them to print. I have no trouble printing color on other Excel documents, but I cannot get them to print in this particular document. HELP!
I have highlighted color bars in an Excel document but I can’t get them to print. I have no trouble printing color on other Excel documents, but I cannot get them to print in this particular document. HELP! Read More
From Zero to Hero: Building Your First Voice Bot with GPT-4o Real-Time API using Python
Voice technology is transforming how we interact with machines, making conversations with AI feel more natural than ever before. With the public beta release of the Realtime API powered by GPT-4o, developers now have the tools to create low-latency, multimodal voice experiences in their apps, opening up endless possibilities for innovation.
Gone are the days when building a voice bot required stitching together multiple models for transcription, inference, and text-to-speech conversion. With the Realtime API, developers can now streamline the entire process with a single API call, enabling fluid, natural speech-to-speech conversations. This is a game-changer for industries like customer support, education, and real-time language translation, where fast, seamless interactions are crucial.
In this blog, we’ll guide you through the process of building your first real-time voice bot from scratch using the GPT-4o Realtime Model. We’ll cover key features of the Realtime API, how to set up a WebSocket connection for voice streaming, and how to leverage the API’s ability to handle interruptions and make function calls. By the end, you’ll be ready to create a voice bot that responds to users with near-human accuracy and emotion. Whether you’re a beginner or an experienced developer, this blueprint will help you get started with creating immersive voice interactions that are both responsive and engaging. Ready to dive in? Let’s get started!
Key Features
Low-Latency Streaming: Enables real-time audio input and output, facilitating natural and seamless conversations.
Multimodal Support: Handles both text and audio inputs and outputs, allowing for versatile interaction modes.
Preset Voices: Supports six predefined voices, ensuring quality and consistency in responses.
Function Calling: Allows the voice assistant to perform actions or retrieve context-specific information dynamically.
Safety and Privacy: Incorporates multiple layers of safety protections, including automated monitoring and adherence to privacy policies.
How GPT-4o Realtime API Works
Traditionally, building a voice assistant required chaining together several models: an automatic speech recognition (ASR) model like Whisper for transcribing audio, a text-based model for processing responses, and a text-to-speech (TTS) model for generating audio outputs. This multi-step process often led to delays and a loss of emotional nuance.
The GPT-4o Realtime API revolutionizes this by consolidating these functionalities into a single API call. By establishing a persistent WebSocket connection, developers can stream audio inputs and outputs directly, significantly reducing latency and enhancing the naturalness of conversations. Additionally, the API’s function calling capability allows the voice bot to perform actions such as placing orders or retrieving customer information on the fly.
Building Your Real-Time Voice Bot
Let’s dive into the step-by-step process of building your own real-time voice bot using the GPT-4o Realtime API.
Prerequisites
Before you begin, ensure you have the following:
Azure Subscription: Create one for free.
Azure OpenAI Resource: Set up in a supported region (East US 2 or Sweden Central).
Development Environment: Familiarity with Python and basic asynchronous programming.
Client Libraries: Tools like LiveKit, Agora, or Twilio can enhance your bot’s capabilities.
Setting Up the API
Deploy the GPT-4o Realtime Model:
Navigate to the Azure AI Studio.
Access the Model Catalog and search for gpt-4o-realtime-preview.
Deploy the model by selecting your Azure OpenAI resource and configuring the deployment settings.
Configure Audio Input and Output:
The API supports various audio formats, primarily pcm16.
Set up your client to handle audio streaming, ensuring compatibility with the API’s requirements.
This project demonstrates how to build a sophisticated real-time conversational AI system using Azure OpenAI. By leveraging WebSocket connections and an event-driven architecture, the system provides responsive and context-aware customer support in any language. This approach can be adapted to various languages and use cases, making it a versatile solution for businesses looking to enhance their customer service capabilities. The project consists of three main components:
Realtime API: Handles WebSocket connections to Azure OpenAI’s real-time API.
Tools: Defines various customer support functions like checking order status, processing returns, and more.
Application: Manages the interaction flow and integrates the real-time client with UI Layer.
Environment Setup
Create an .env file and update the following environment variables:
AZURE_OPENAI_API_KEY=XXXX
# replace with your Azure OpenAI API Key
AZURE_OPENAI_ENDPOINT=https://xxxx.openai.azure.com/
# replace with your Azure OpenAI Endpoint
AZURE_OPENAI_DEPLOYMENT=gpt-4o-realtime-preview
#Create a deployment for the gpt-4o-realtime-preview model and place the deployment name here. You can name the deployment as per your choice and put the name here.
AZURE_OPENAI_CHAT_DEPLOYMENT_VERSION=2024-10-01-preview
#You don’t need to change this unless you are willing to try other versions.
requirements.txt
chainlit==1.3.0rc1
openai
beautifulsoup4
lxml
python-dotenv
websockets
aiohttp
Implementing the Realtime Client
The heartbeat of your voice bot is the Realtime Client, which manages the WebSocket connection and handles communication with the GPT-4o Realtime API. The RealtimeAPI class is responsible for managing WebSocket connections to OpenAI’s real-time API. It handles sending and receiving messages, dispatching events, and maintaining the connection state.
Key Components:
RealtimeAPI Class:
Establishes and maintains the WebSocket connection.
Handles sending and receiving messages.
Manages event dispatching for various conversation events.
class RealtimeAPI(RealtimeEventHandler):
def __init__(self):
super().__init__()
self.default_url = ‘wss://api.openai.com/v1/realtime’
self.url = os.environ[“AZURE_OPENAI_ENDPOINT”]
self.api_key = os.environ[“AZURE_OPENAI_API_KEY”]
self.api_version = “2024-10-01-preview”
self.azure_deployment = os.environ[“AZURE_OPENAI_DEPLOYMENT”]
self.ws = None
def is_connected(self):
return self.ws is not None
def log(self, *args):
logger.debug(f”[Websocket/{datetime.utcnow().isoformat()}]”, *args)
async def connect(self, model=’gpt-4o-realtime-preview’):
if self.is_connected():
raise Exception(“Already connected”)
self.ws = await websockets.connect(f”{self.url}/openai/realtime?api-version={self.api_version}&deployment={model}&api-key={self.api_key}”, extra_headers={
‘Authorization’: f’Bearer {self.api_key}’,
‘OpenAI-Beta’: ‘realtime=v1’
})
self.log(f”Connected to {self.url}”)
asyncio.create_task(self._receive_messages())
async def _receive_messages(self):
async for message in self.ws:
event = json.loads(message)
if event[‘type’] == “error”:
logger.error(“ERROR”, message)
self.log(“received:”, event)
self.dispatch(f”server.{event[‘type’]}”, event)
self.dispatch(“server.*”, event)
async def send(self, event_name, data=None):
if not self.is_connected():
raise Exception(“RealtimeAPI is not connected”)
data = data or {}
if not isinstance(data, dict):
raise Exception(“data must be a dictionary”)
event = {
“event_id”: self._generate_id(“evt_”),
“type”: event_name,
**data
}
self.dispatch(f”client.{event_name}”, event)
self.dispatch(“client.*”, event)
self.log(“sent:”, event)
await self.ws.send(json.dumps(event))
def _generate_id(self, prefix):
return f”{prefix}{int(datetime.utcnow().timestamp() * 1000)}”
async def disconnect(self):
if self.ws:
await self.ws.close()
self.ws = None
self.log(f”Disconnected from {self.url}”)
Reference: init.py
RealtimeConversation Class:
Manages the state of the conversation.
Processes different types of events, such as message creation, transcription completion, and audio streaming.
Queues and formats audio and text data for seamless interaction.
class RealtimeConversation:
default_frequency = config.features.audio.sample_rate
EventProcessors = {
‘conversation.item.created’: lambda self, event: self._process_item_created(event),
‘conversation.item.truncated’: lambda self, event: self._process_item_truncated(event),
‘conversation.item.deleted’: lambda self, event: self._process_item_deleted(event),
‘conversation.item.input_audio_transcription.completed’: lambda self, event: self._process_input_audio_transcription_completed(event),
‘input_audio_buffer.speech_started’: lambda self, event: self._process_speech_started(event),
‘input_audio_buffer.speech_stopped’: lambda self, event, input_audio_buffer: self._process_speech_stopped(event, input_audio_buffer),
‘response.created’: lambda self, event: self._process_response_created(event),
‘response.output_item.added’: lambda self, event: self._process_output_item_added(event),
‘response.output_item.done’: lambda self, event: self._process_output_item_done(event),
‘response.content_part.added’: lambda self, event: self._process_content_part_added(event),
‘response.audio_transcript.delta’: lambda self, event: self._process_audio_transcript_delta(event),
‘response.audio.delta’: lambda self, event: self._process_audio_delta(event),
‘response.text.delta’: lambda self, event: self._process_text_delta(event),
‘response.function_call_arguments.delta’: lambda self, event: self._process_function_call_arguments_delta(event),
}
def __init__(self):
self.clear()
def clear(self):
self.item_lookup = {}
self.items = []
self.response_lookup = {}
self.responses = []
self.queued_speech_items = {}
self.queued_transcript_items = {}
self.queued_input_audio = None
def queue_input_audio(self, input_audio):
self.queued_input_audio = input_audio
def process_event(self, event, *args):
event_processor = self.EventProcessors.get(event[‘type’])
if not event_processor:
raise Exception(f”Missing conversation event processor for {event[‘type’]}”)
return event_processor(self, event, *args)
def get_item(self, id):
return self.item_lookup.get(id)
def get_items(self):
return self.items[:]
def _process_item_created(self, event):
item = event[‘item’]
new_item = item.copy()
if new_item[‘id’] not in self.item_lookup:
self.item_lookup[new_item[‘id’]] = new_item
self.items.append(new_item)
new_item[‘formatted’] = {
‘audio’: [],
‘text’: ”,
‘transcript’: ”
}
if new_item[‘id’] in self.queued_speech_items:
new_item[‘formatted’][‘audio’] = self.queued_speech_items[new_item[‘id’]][‘audio’]
del self.queued_speech_items[new_item[‘id’]]
if ‘content’ in new_item:
text_content = [c for c in new_item[‘content’] if c[‘type’] in [‘text’, ‘input_text’]]
for content in text_content:
new_item[‘formatted’][‘text’] += content[‘text’]
if new_item[‘id’] in self.queued_transcript_items:
new_item[‘formatted’][‘transcript’] = self.queued_transcript_items[new_item[‘id’]][‘transcript’]
del self.queued_transcript_items[new_item[‘id’]]
if new_item[‘type’] == ‘message’:
if new_item[‘role’] == ‘user’:
new_item[‘status’] = ‘completed’
if self.queued_input_audio:
new_item[‘formatted’][‘audio’] = self.queued_input_audio
self.queued_input_audio = None
else:
new_item[‘status’] = ‘in_progress’
elif new_item[‘type’] == ‘function_call’:
new_item[‘formatted’][‘tool’] = {
‘type’: ‘function’,
‘name’: new_item[‘name’],
‘call_id’: new_item[‘call_id’],
‘arguments’: ”
}
new_item[‘status’] = ‘in_progress’
elif new_item[‘type’] == ‘function_call_output’:
new_item[‘status’] = ‘completed’
new_item[‘formatted’][‘output’] = new_item[‘output’]
return new_item, None
def _process_item_truncated(self, event):
item_id = event[‘item_id’]
audio_end_ms = event[‘audio_end_ms’]
item = self.item_lookup.get(item_id)
if not item:
raise Exception(f’item.truncated: Item “{item_id}” not found’)
end_index = (audio_end_ms * self.default_frequency) // 1000
item[‘formatted’][‘transcript’] = ”
item[‘formatted’][‘audio’] = item[‘formatted’][‘audio’][:end_index]
return item, None
def _process_item_deleted(self, event):
item_id = event[‘item_id’]
item = self.item_lookup.get(item_id)
if not item:
raise Exception(f’item.deleted: Item “{item_id}” not found’)
del self.item_lookup[item[‘id’]]
self.items.remove(item)
return item, None
def _process_input_audio_transcription_completed(self, event):
item_id = event[‘item_id’]
content_index = event[‘content_index’]
transcript = event[‘transcript’]
formatted_transcript = transcript or ‘ ‘
item = self.item_lookup.get(item_id)
if not item:
self.queued_transcript_items[item_id] = {‘transcript’: formatted_transcript}
return None, None
item[‘content’][content_index][‘transcript’] = transcript
item[‘formatted’][‘transcript’] = formatted_transcript
return item, {‘transcript’: transcript}
def _process_speech_started(self, event):
item_id = event[‘item_id’]
audio_start_ms = event[‘audio_start_ms’]
self.queued_speech_items[item_id] = {‘audio_start_ms’: audio_start_ms}
return None, None
def _process_speech_stopped(self, event, input_audio_buffer):
item_id = event[‘item_id’]
audio_end_ms = event[‘audio_end_ms’]
speech = self.queued_speech_items[item_id]
speech[‘audio_end_ms’] = audio_end_ms
if input_audio_buffer:
start_index = (speech[‘audio_start_ms’] * self.default_frequency) // 1000
end_index = (speech[‘audio_end_ms’] * self.default_frequency) // 1000
speech[‘audio’] = input_audio_buffer[start_index:end_index]
return None, None
def _process_response_created(self, event):
response = event[‘response’]
if response[‘id’] not in self.response_lookup:
self.response_lookup[response[‘id’]] = response
self.responses.append(response)
return None, None
def _process_output_item_added(self, event):
response_id = event[‘response_id’]
item = event[‘item’]
response = self.response_lookup.get(response_id)
if not response:
raise Exception(f’response.output_item.added: Response “{response_id}” not found’)
response[‘output’].append(item[‘id’])
return None, None
def _process_output_item_done(self, event):
item = event[‘item’]
if not item:
raise Exception(‘response.output_item.done: Missing “item”‘)
found_item = self.item_lookup.get(item[‘id’])
if not found_item:
raise Exception(f’response.output_item.done: Item “{item[“id”]}” not found’)
found_item[‘status’] = item[‘status’]
return found_item, None
def _process_content_part_added(self, event):
item_id = event[‘item_id’]
part = event[‘part’]
item = self.item_lookup.get(item_id)
if not item:
raise Exception(f’response.content_part.added: Item “{item_id}” not found’)
item[‘content’].append(part)
return item, None
def _process_audio_transcript_delta(self, event):
item_id = event[‘item_id’]
content_index = event[‘content_index’]
delta = event[‘delta’]
item = self.item_lookup.get(item_id)
if not item:
raise Exception(f’response.audio_transcript.delta: Item “{item_id}” not found’)
item[‘content’][content_index][‘transcript’] += delta
item[‘formatted’][‘transcript’] += delta
return item, {‘transcript’: delta}
def _process_audio_delta(self, event):
item_id = event[‘item_id’]
content_index = event[‘content_index’]
delta = event[‘delta’]
item = self.item_lookup.get(item_id)
if not item:
logger.debug(f’response.audio.delta: Item “{item_id}” not found’)
return None, None
array_buffer = base64_to_array_buffer(delta)
append_values = array_buffer.tobytes()
# TODO: make it work
# item[‘formatted’][‘audio’] = merge_int16_arrays(item[‘formatted’][‘audio’], append_values)
return item, {‘audio’: append_values}
def _process_text_delta(self, event):
item_id = event[‘item_id’]
content_index = event[‘content_index’]
delta = event[‘delta’]
item = self.item_lookup.get(item_id)
if not item:
raise Exception(f’response.text.delta: Item “{item_id}” not found’)
item[‘content’][content_index][‘text’] += delta
item[‘formatted’][‘text’] += delta
return item, {‘text’: delta}
def _process_function_call_arguments_delta(self, event):
item_id = event[‘item_id’]
delta = event[‘delta’]
item = self.item_lookup.get(item_id)
if not item:
raise Exception(f’response.function_call_arguments.delta: Item “{item_id}” not found’)
item[‘arguments’] += delta
item[‘formatted’][‘tool’][‘arguments’] += delta
return item, {‘arguments’: delta}
RealtimeClient Class:
Initialization: Sets up system prompts, session configurations, and initializes RealtimeAPI and RealtimeConversation for managing WebSocket connections and conversation events.
Connection Management: Handles connecting and disconnecting from the server, waiting for session creation, and updating session settings.
Event Handling: Listens for and processes server and client events, dispatching them to appropriate handlers.
Conversation Management: Manages creation, updating, and deletion of conversation items, including handling input audio and speech events.
Tool and Response Management: Supports adding/removing tools, invoking them based on events, sending user messages, creating responses, and managing audio content.
class RealtimeClient(RealtimeEventHandler):
def __init__(self, system_prompt: str):
super().__init__()
self.system_prompt = system_prompt
self.default_session_config = {
“modalities”: [“text”, “audio”],
“instructions”: self.system_prompt,
“voice”: “shimmer”,
“input_audio_format”: “pcm16”,
“output_audio_format”: “pcm16”,
“input_audio_transcription”: { “model”: ‘whisper-1’ },
“turn_detection”: { “type”: ‘server_vad’ },
“tools”: [],
“tool_choice”: “auto”,
“temperature”: 0.8,
“max_response_output_tokens”: 4096,
}
self.session_config = {}
self.transcription_models = [{“model”: “whisper-1”}]
self.default_server_vad_config = {
“type”: “server_vad”,
“threshold”: 0.5,
“prefix_padding_ms”: 300,
“silence_duration_ms”: 200,
}
self.realtime = RealtimeAPI()
self.conversation = RealtimeConversation()
self._reset_config()
self._add_api_event_handlers()
def _reset_config(self):
self.session_created = False
self.tools = {}
self.session_config = self.default_session_config.copy()
self.input_audio_buffer = bytearray()
return True
def _add_api_event_handlers(self):
self.realtime.on(“client.*”, self._log_event)
self.realtime.on(“server.*”, self._log_event)
self.realtime.on(“server.session.created”, self._on_session_created)
self.realtime.on(“server.response.created”, self._process_event)
self.realtime.on(“server.response.output_item.added”, self._process_event)
self.realtime.on(“server.response.content_part.added”, self._process_event)
self.realtime.on(“server.input_audio_buffer.speech_started”, self._on_speech_started)
self.realtime.on(“server.input_audio_buffer.speech_stopped”, self._on_speech_stopped)
self.realtime.on(“server.conversation.item.created”, self._on_item_created)
self.realtime.on(“server.conversation.item.truncated”, self._process_event)
self.realtime.on(“server.conversation.item.deleted”, self._process_event)
self.realtime.on(“server.conversation.item.input_audio_transcription.completed”, self._process_event)
self.realtime.on(“server.response.audio_transcript.delta”, self._process_event)
self.realtime.on(“server.response.audio.delta”, self._process_event)
self.realtime.on(“server.response.text.delta”, self._process_event)
self.realtime.on(“server.response.function_call_arguments.delta”, self._process_event)
self.realtime.on(“server.response.output_item.done”, self._on_output_item_done)
def _log_event(self, event):
realtime_event = {
“time”: datetime.utcnow().isoformat(),
“source”: “client” if event[“type”].startswith(“client.”) else “server”,
“event”: event,
}
self.dispatch(“realtime.event”, realtime_event)
def _on_session_created(self, event):
self.session_created = True
def _process_event(self, event, *args):
item, delta = self.conversation.process_event(event, *args)
if item:
self.dispatch(“conversation.updated”, {“item”: item, “delta”: delta})
return item, delta
def _on_speech_started(self, event):
self._process_event(event)
self.dispatch(“conversation.interrupted”, event)
def _on_speech_stopped(self, event):
self._process_event(event, self.input_audio_buffer)
def _on_item_created(self, event):
item, delta = self._process_event(event)
self.dispatch(“conversation.item.appended”, {“item”: item})
if item and item[“status”] == “completed”:
self.dispatch(“conversation.item.completed”, {“item”: item})
async def _on_output_item_done(self, event):
item, delta = self._process_event(event)
if item and item[“status”] == “completed”:
self.dispatch(“conversation.item.completed”, {“item”: item})
if item and item.get(“formatted”, {}).get(“tool”):
await self._call_tool(item[“formatted”][“tool”])
async def _call_tool(self, tool):
try:
print(tool[“arguments”])
json_arguments = json.loads(tool[“arguments”])
tool_config = self.tools.get(tool[“name”])
if not tool_config:
raise Exception(f’Tool “{tool[“name”]}” has not been added’)
result = await tool_config[“handler”](**json_arguments)
await self.realtime.send(“conversation.item.create”, {
“item”: {
“type”: “function_call_output”,
“call_id”: tool[“call_id”],
“output”: json.dumps(result),
}
})
except Exception as e:
logger.error(traceback.format_exc())
await self.realtime.send(“conversation.item.create”, {
“item”: {
“type”: “function_call_output”,
“call_id”: tool[“call_id”],
“output”: json.dumps({“error”: str(e)}),
}
})
await self.create_response()
def is_connected(self):
return self.realtime.is_connected()
def reset(self):
self.disconnect()
self.realtime.clear_event_handlers()
self._reset_config()
self._add_api_event_handlers()
return True
async def connect(self):
if self.is_connected():
raise Exception(“Already connected, use .disconnect() first”)
await self.realtime.connect()
await self.update_session()
return True
async def wait_for_session_created(self):
if not self.is_connected():
raise Exception(“Not connected, use .connect() first”)
while not self.session_created:
await asyncio.sleep(0.001)
return True
async def disconnect(self):
self.session_created = False
self.conversation.clear()
if self.realtime.is_connected():
await self.realtime.disconnect()
def get_turn_detection_type(self):
return self.session_config.get(“turn_detection”, {}).get(“type”)
async def add_tool(self, definition, handler):
if not definition.get(“name”):
raise Exception(“Missing tool name in definition”)
name = definition[“name”]
if name in self.tools:
raise Exception(f’Tool “{name}” already added. Please use .removeTool(“{name}”) before trying to add again.’)
if not callable(handler):
raise Exception(f’Tool “{name}” handler must be a function’)
self.tools[name] = {“definition”: definition, “handler”: handler}
await self.update_session()
return self.tools[name]
def remove_tool(self, name):
if name not in self.tools:
raise Exception(f’Tool “{name}” does not exist, can not be removed.’)
del self.tools[name]
return True
async def delete_item(self, id):
await self.realtime.send(“conversation.item.delete”, {“item_id”: id})
return True
async def update_session(self, **kwargs):
self.session_config.update(kwargs)
use_tools = [
{**tool_definition, “type”: “function”}
for tool_definition in self.session_config.get(“tools”, [])
] + [
{**self.tools[key][“definition”], “type”: “function”}
for key in self.tools
]
session = {**self.session_config, “tools”: use_tools}
if self.realtime.is_connected():
await self.realtime.send(“session.update”, {“session”: session})
return True
async def create_conversation_item(self, item):
await self.realtime.send(“conversation.item.create”, {
“item”: item
})
async def send_user_message_content(self, content=[]):
if content:
for c in content:
if c[“type”] == “input_audio”:
if isinstance(c[“audio”], (bytes, bytearray)):
c[“audio”] = array_buffer_to_base64(c[“audio”])
await self.realtime.send(“conversation.item.create”, {
“item”: {
“type”: “message”,
“role”: “user”,
“content”: content,
}
})
await self.create_response()
return True
async def append_input_audio(self, array_buffer):
if len(array_buffer) > 0:
await self.realtime.send(“input_audio_buffer.append”, {
“audio”: array_buffer_to_base64(np.array(array_buffer)),
})
self.input_audio_buffer.extend(array_buffer)
return True
async def create_response(self):
if self.get_turn_detection_type() is None and len(self.input_audio_buffer) > 0:
await self.realtime.send(“input_audio_buffer.commit”)
self.conversation.queue_input_audio(self.input_audio_buffer)
self.input_audio_buffer = bytearray()
await self.realtime.send(“response.create”)
return True
async def cancel_response(self, id=None, sample_count=0):
if not id:
await self.realtime.send(“response.cancel”)
return {“item”: None}
else:
item = self.conversation.get_item(id)
if not item:
raise Exception(f’Could not find item “{id}”‘)
if item[“type”] != “message”:
raise Exception(‘Can only cancelResponse messages with type “message”‘)
if item[“role”] != “assistant”:
raise Exception(‘Can only cancelResponse messages with role “assistant”‘)
await self.realtime.send(“response.cancel”)
audio_index = next((i for i, c in enumerate(item[“content”]) if c[“type”] == “audio”), -1)
if audio_index == -1:
raise Exception(“Could not find audio on item to cancel”)
await self.realtime.send(“conversation.item.truncate”, {
“item_id”: id,
“content_index”: audio_index,
“audio_end_ms”: int((sample_count / self.conversation.default_frequency) * 1000),
})
return {“item”: item}
async def wait_for_next_item(self):
event = await self.wait_for_next(“conversation.item.appended”)
return {“item”: event[“item”]}
async def wait_for_next_completed_item(self):
event = await self.wait_for_next(“conversation.item.completed”)
return {“item”: event[“item”]}
Adding Tools and Handlers
Your voice bot’s functionality can be extended by integrating various tools and handlers. These allow the bot to perform specific actions based on user inputs.
Define Tool Definitions:
In tool.py, define the capabilities of your bot, such as checking order statuses, processing returns, or updating account information.
Each tool includes a name, description, and required parameters.
Implement Handlers:
Create asynchronous handler functions for each tool to execute the desired actions.
These handlers interact with your backend systems or databases to fulfill user requests.
Integrate Tools with the Realtime Client:
Register each tool and its handler with the RealtimeClient in your app.py file.
Ensure that the bot can invoke these tools dynamically during conversations.
Key Components:
Tool Definitions:
Structured descriptions of each tool, including the required parameters and functionalities.
Example:
# Function Definitions
check_order_status_def = {
“name”: “check_order_status”,
“description”: “Check the status of a customer’s order”,
“parameters”: {
“type”: “object”,
“properties”: {
“customer_id”: {
“type”: “string”,
“description”: “The unique identifier for the customer”
},
“order_id”: {
“type”: “string”,
“description”: “The unique identifier for the order”
}
},
“required”: [“customer_id”, “order_id”]
}
}
Handler Functions:
Asynchronous functions that execute the logic for each tool.
Interact with external systems, databases, or perform specific actions based on user requests
Example:
async def check_order_status_handler(customer_id, order_id):
status = “In Transit”
# Your Business Logic
estimated_delivery, status, order_date = fetch_order_details(order_id, customer_id)
# Read the HTML template
with open(‘order_status_template.html’, ‘r’) as file:
html_content = file.read()
# Replace placeholders with actual data
html_content = html_content.format(
order_id=order_id,
customer_id=customer_id,
order_date=order_date.strftime(“%B %d, %Y”),
estimated_delivery=estimated_delivery.strftime(“%B %d, %Y”),
status=status
)
# Return the Chainlit message with HTML content
await cl.Message(content=f”Here is the detail of your order n {html_content}”).send()
return f”Order {order_id} status for customer {customer_id}: {status}”
Reference:
Integrating with Your Application
With the Realtime Client and tools in place, it’s time to weave everything into your application.
Initialize OpenAI Realtime:
In app.py, set up the connection to the GPT-4o Realtime API using your system prompt and session configurations.
Manage user sessions and track interactions seamlessly.
Handle User Interactions:
Implement event handlers for chat initiation, message reception, audio processing, and session termination.
Ensure that user inputs, whether text or voice, are appropriately processed and responded to in real-time.
Manage Conversation Flow:
Utilize the RealtimeConversation class to handle conversation states, manage audio streams, and maintain context.
Implement logic to handle interruptions, cancellations, and dynamic responses based on user actions.
Key Components:
Initialization:
Sets up the OpenAI Realtime Client with the system prompt and configures tools.
system_prompt = “””Provide helpful and empathetic support responses to customer inquiries for ShopMe in Hindi language, addressing their requests, concerns, or feedback professionally.
Maintain a friendly and service-oriented tone throughout the interaction to ensure a positive customer experience.
# Steps
1. **Identify the Issue:** Carefully read the customer’s inquiry to understand the problem or question they are presenting.
2. **Gather Relevant Information:** Check for any additional data needed, such as order numbers or account details, while ensuring the privacy and security of the customer’s information.
3. **Formulate a Response:** Develop a solution or informative response based on the understanding of the issue. The response should be clear, concise, and address all parts of the customer’s concern.
4. **Offer Further Assistance:** Invite the customer to reach out again if they need more help or have additional questions.
5. **Close Politely:** End the conversation with a polite closing statement that reinforces the service commitment of ShopMe.
# Output Format
Provide a clear and concise paragraph addressing the customer’s inquiry, including:
– Acknowledgment of their concern
– Suggested solution or response
– Offer for further assistance
– Polite closing
# Notes
– Greet user with Welcome to ShopMe For the first time only
– always speak in Hindi
– Ensure all customer data is handled according to relevant privacy and data protection laws and ShopMe’s privacy policy.
– In cases of high sensitivity or complexity, escalate the issue to a human customer support agent.
– Keep responses within a reasonable length to ensure they are easy to read and understand.”””
Event Handlers:
Manages chat start, message reception, audio streaming, and session termination events.
First we will i will initialize the real time client discussed before
async def setup_openai_realtime(system_prompt: str):
“””Instantiate and configure the OpenAI Realtime Client”””
openai_realtime = RealtimeClient(system_prompt = system_prompt)
cl.user_session.set(“track_id”, str(uuid4()))
async def handle_conversation_updated(event):
item = event.get(“item”)
delta = event.get(“delta”)
“””Currently used to stream audio back to the client.”””
if delta:
# Only one of the following will be populated for any given event
if ‘audio’ in delta:
audio = delta[‘audio’] # Int16Array, audio added
await cl.context.emitter.send_audio_chunk(cl.OutputAudioChunk(mimeType=”pcm16″, data=audio, track=cl.user_session.get(“track_id”)))
if ‘transcript’ in delta:
transcript = delta[‘transcript’] # string, transcript added
pass
if ‘arguments’ in delta:
arguments = delta[‘arguments’] # string, function arguments added
pass
async def handle_item_completed(item):
“””Used to populate the chat context with transcription once an item is completed.”””
# print(item) # TODO
pass
async def handle_conversation_interrupt(event):
“””Used to cancel the client previous audio playback.”””
cl.user_session.set(“track_id”, str(uuid4()))
await cl.context.emitter.send_audio_interrupt()
async def handle_error(event):
logger.error(event)
Session Management:
Maintains user sessions, handles conversation interruptions, and ensures smooth interaction flow. As you see in the below code the idea is whenever you a receive an audio chunk you should call the real time client with the audio chunk.
if openai_realtime:
if openai_realtime.is_connected():
await openai_realtime.append_input_audio(chunk.data)
else:
logger.info(“RealtimeClient is not connected”)
Reference: app.py
Testing and Deployment
Once your voice bot is built, thorough testing is essential to ensure reliability and user satisfaction.
Local Testing:
Use the AI Studio Real-time audio playground to interact with your deployed model.
Test various functionalities, including speech recognition, response generation, and tool execution.
Integration Testing:
Ensure that your application seamlessly communicates with the Realtime API.
Test the event handlers and tool integrations to verify correct behavior under different scenarios.
Deployment:
Deploy your application to a production environment, leveraging cloud services for scalability.
Monitor performance and make adjustments as needed to handle real-world usage.
Conclusion
Building a real-time voice bot has never been more accessible, thanks to the GPT-4o Realtime API. By consolidating speech-to-speech functionalities into a single, efficient interface, developers can craft engaging and natural conversational experiences without the complexity of managing multiple models. Whether you’re enhancing customer support, developing educational tools, or creating interactive applications, the GPT-4o Realtime API provides a robust foundation to bring your voice bot visions to life.
Embark on your development journey today and explore the endless possibilities that real-time voice interactions can offer your users!
Feel free to refer to the Azure OpenAI GPT-4o Realtime API documentation for more detailed information on setup, deployment, and advanced configurations.
Thanks
Manoranjan Rajguru
https://www.linkedin.com/in/manoranjan-rajguru/
Microsoft Tech Community – Latest Blogs –Read More
老街腾龙公司客服微信【13542035】
腾龙集团公司开户注册溦 13542035注册官网www.tl99825.com通常需要遵循以下一般步骤:准备相关资料:可能包括个人或企业身份证明、联系方式、地址证明等。联系盛世集团:通过其官方网站、客服渠道等,了解具体开户注册流程和所需资料要求。填写表格:按照要求填写相关信息。提交资料:将准备好的资料提交给盛世集团,可以通过线上提交或线路下提交的方式。审核:公司同意提交的资料进行审核。完成审核:审核通过后,即可成功开户注册。需要注意的是,具体步骤和要求可能因盛世集团的规定而有所不同。建议您与该公司详细进行沟通,以确保顺利完成开户登记。
腾龙集团公司开户注册溦 13542035注册官网www.tl99825.com通常需要遵循以下一般步骤:准备相关资料:可能包括个人或企业身份证明、联系方式、地址证明等。联系盛世集团:通过其官方网站、客服渠道等,了解具体开户注册流程和所需资料要求。填写表格:按照要求填写相关信息。提交资料:将准备好的资料提交给盛世集团,可以通过线上提交或线路下提交的方式。审核:公司同意提交的资料进行审核。完成审核:审核通过后,即可成功开户注册。需要注意的是,具体步骤和要求可能因盛世集团的规定而有所不同。建议您与该公司详细进行沟通,以确保顺利完成开户登记。 Read More
华纳公司玉祥公司客服微信【13542035】
玉祥龙讯创立于2005年,是一家专业从事软件研发、大数据分析、系统集成等信息技术服务的国家高新技术企业。我们已累计服务全国20多个省市的党政机关和企事业单位4000多家,成功实施项目6000多个。我们致力于智慧城市与数字政府整体解决方案的研究和开发,肩负“用技术创造价值”的使命,加速赋能客户创新。
紧跟国家政策指引,践行网络强国战略,龙讯坚持创新驱动,不断探索新技术、新领域,业已形成集“网站集约化、智慧政务、数字乡村、智慧园区、新媒体监管、智慧党建、智慧监管、智慧征迁、智慧招商、智慧人大、智慧政协”等核心产品于一体的全方位产品链,并已全面完成信创环境适配,助力政府信息安全。
积极推进顶层设计,不断追求科技自强,龙讯多次参与编制国家和地方标准,先后通过CMMI L5级、CS信息系统建设和服务能力、CCRC信息安全服务、SDCA软件服务商交付能力、ITSS信息技术服务运维等系列权威资质认证,拥有发明专利和软件著作权160多项,现已形成一整套成熟规范的项目管理体系,并多次荣获国家AAA级信用企业、A级纳税信用企业、安徽省“专精特新”中小企业、安徽省守合同重信用企业、合肥高新区瞪羚企业等荣誉称号。
玉祥龙讯创立于2005年,是一家专业从事软件研发、大数据分析、系统集成等信息技术服务的国家高新技术企业。我们已累计服务全国20多个省市的党政机关和企事业单位4000多家,成功实施项目6000多个。我们致力于智慧城市与数字政府整体解决方案的研究和开发,肩负“用技术创造价值”的使命,加速赋能客户创新。紧跟国家政策指引,践行网络强国战略,龙讯坚持创新驱动,不断探索新技术、新领域,业已形成集“网站集约化、智慧政务、数字乡村、智慧园区、新媒体监管、智慧党建、智慧监管、智慧征迁、智慧招商、智慧人大、智慧政协”等核心产品于一体的全方位产品链,并已全面完成信创环境适配,助力政府信息安全。积极推进顶层设计,不断追求科技自强,龙讯多次参与编制国家和地方标准,先后通过CMMI L5级、CS信息系统建设和服务能力、CCRC信息安全服务、SDCA软件服务商交付能力、ITSS信息技术服务运维等系列权威资质认证,拥有发明专利和软件著作权160多项,现已形成一整套成熟规范的项目管理体系,并多次荣获国家AAA级信用企业、A级纳税信用企业、安徽省“专精特新”中小企业、安徽省守合同重信用企业、合肥高新区瞪羚企业等荣誉称号。 Read More
玉祥娱乐客服的联系方式【13542035微信】
果敢老街玉祥集团公司客服热线電微【18487984914】【微信同号】老街玉祥集团玉祥娱乐场投资开发有限公司于2006年08月18日成立。法定代表人张忠诚,公司经营范围包括:在上海开发区内从事房地产开发、销售、自有房屋出租及相关设施的经营管理;室内外装饰工程;在经济技术开发区内建设和经营写字楼、多功能会馆、经济展示信息中心、艺术馆、酒店及其他配套的商务、通讯、交通、购物中心、健身房和娱乐服务设施等。
果敢老街玉祥集团公司客服热线電微【18487984914】【微信同号】老街玉祥集团玉祥娱乐场投资开发有限公司于2006年08月18日成立。法定代表人张忠诚,公司经营范围包括:在上海开发区内从事房地产开发、销售、自有房屋出租及相关设施的经营管理;室内外装饰工程;在经济技术开发区内建设和经营写字楼、多功能会馆、经济展示信息中心、艺术馆、酒店及其他配套的商务、通讯、交通、购物中心、健身房和娱乐服务设施等。 Read More
show a due date using data from multiple columns
I have a spreadsheet in which I have multiple date values – a client’s birthday, follow up visit date 1, date 2, date 3. for each date column I have conditional formatting set up to mark the cell yellow if it’s within a month from today, and red if it’s next week. However, I’d like to add a column to the very beginning of the spreadsheet that will simply alert me if any relevant date for that client is coming up soon (i.e. the birthday is within a month or their follow up visit 2 is within a month). I’ve been trying to use IF And/or formulas but it either it’s not functioning properly or it tells me there’s an error. (examples of things I’ve tried below).
I want it to show “Due” if a date from any of those columns is within the next 30 days and “Updated” if not (but don’t want it to show Due if the dates have passed).
=IF(OR(30>M16-TODAY()>0,30>N16-TODAY()>0,30>T16-TODAY()>0,30>V16-TODAY()>0,30>X16-TODAY()>0,30>Z16-TODAY()>0),”Due”,”Up to Date”)
=IF(AND(OR(M11-TODAY()>0,N11-TODAY()>0,T11-TODAY()>0,V11-TODAY()>0,X11-TODAY()>0,Z11-TODAY()>0), OR(M11-TODAY()<30,N11-TODAY()<30,T11-TODAY()<30,V11-TODAY()<30,X11-TODAY()<30,Z11-TODAY()<30)),”Due”,”Up to Date”)
I have a spreadsheet in which I have multiple date values – a client’s birthday, follow up visit date 1, date 2, date 3. for each date column I have conditional formatting set up to mark the cell yellow if it’s within a month from today, and red if it’s next week. However, I’d like to add a column to the very beginning of the spreadsheet that will simply alert me if any relevant date for that client is coming up soon (i.e. the birthday is within a month or their follow up visit 2 is within a month). I’ve been trying to use IF And/or formulas but it either it’s not functioning properly or it tells me there’s an error. (examples of things I’ve tried below). I want it to show “Due” if a date from any of those columns is within the next 30 days and “Updated” if not (but don’t want it to show Due if the dates have passed). =IF(OR(30>M16-TODAY()>0,30>N16-TODAY()>0,30>T16-TODAY()>0,30>V16-TODAY()>0,30>X16-TODAY()>0,30>Z16-TODAY()>0),”Due”,”Up to Date”) =IF(AND(OR(M11-TODAY()>0,N11-TODAY()>0,T11-TODAY()>0,V11-TODAY()>0,X11-TODAY()>0,Z11-TODAY()>0), OR(M11-TODAY()<30,N11-TODAY()<30,T11-TODAY()<30,V11-TODAY()<30,X11-TODAY()<30,Z11-TODAY()<30)),”Due”,”Up to Date”) Read More
Unlocking the Full Potential of Azure OpenAI’s GPT-O1 Model: A Guide to Maximizing Its Capabilities
Voice technology is transforming how we interact with machines, making conversations with AI feel more natural than ever before. With the public beta release of the Realtime API powered by GPT-4o, developers now have the tools to create low-latency, multimodal voice experiences in their apps, opening up endless possibilities for innovation.
Gone are the days when building a voice bot required stitching together multiple models for transcription, inference, and text-to-speech conversion. With the Realtime API, developers can now streamline the entire process with a single API call, enabling fluid, natural speech-to-speech conversations. This is a game-changer for industries like customer support, education, and real-time language translation, where fast, seamless interactions are crucial.
In this blog, we’ll guide you through the process of building your first real-time voice bot from scratch using the GPT-4o Realtime Model. We’ll cover key features of the Realtime API, how to set up a WebSocket connection for voice streaming, and how to leverage the API’s ability to handle interruptions and make function calls. By the end, you’ll be ready to create a voice bot that responds to users with near-human accuracy and emotion. Whether you’re a beginner or an experienced developer, this blueprint will help you get started with creating immersive voice interactions that are both responsive and engaging. Ready to dive in? Let’s get started!
Key Features
Low-Latency Streaming: Enables real-time audio input and output, facilitating natural and seamless conversations.
Multimodal Support: Handles both text and audio inputs and outputs, allowing for versatile interaction modes.
Preset Voices: Supports six predefined voices, ensuring quality and consistency in responses.
Function Calling: Allows the voice assistant to perform actions or retrieve context-specific information dynamically.
Safety and Privacy: Incorporates multiple layers of safety protections, including automated monitoring and adherence to privacy policies.
How GPT-4o Realtime API Works
Traditionally, building a voice assistant required chaining together several models: an automatic speech recognition (ASR) model like Whisper for transcribing audio, a text-based model for processing responses, and a text-to-speech (TTS) model for generating audio outputs. This multi-step process often led to delays and a loss of emotional nuance.
The GPT-4o Realtime API revolutionizes this by consolidating these functionalities into a single API call. By establishing a persistent WebSocket connection, developers can stream audio inputs and outputs directly, significantly reducing latency and enhancing the naturalness of conversations. Additionally, the API’s function calling capability allows the voice bot to perform actions such as placing orders or retrieving customer information on the fly.
Building Your Real-Time Voice Bot
Let’s dive into the step-by-step process of building your own real-time voice bot using the GPT-4o Realtime API.
Prerequisites
Before you begin, ensure you have the following:
Azure Subscription: Create one for free.
Azure OpenAI Resource: Set up in a supported region (East US 2 or Sweden Central).
Development Environment: Familiarity with Python and basic asynchronous programming.
Client Libraries: Tools like LiveKit, Agora, or Twilio can enhance your bot’s capabilities.
Setting Up the API
Deploy the GPT-4o Realtime Model:
Navigate to the Azure AI Studio.
Access the Model Catalog and search for gpt-4o-realtime-preview.
Deploy the model by selecting your Azure OpenAI resource and configuring the deployment settings.
Configure Audio Input and Output:
The API supports various audio formats, primarily pcm16.
Set up your client to handle audio streaming, ensuring compatibility with the API’s requirements.
This project demonstrates how to build a sophisticated real-time conversational AI system using Azure OpenAI. By leveraging WebSocket connections and an event-driven architecture, the system provides responsive and context-aware customer support in any language. This approach can be adapted to various languages and use cases, making it a versatile solution for businesses looking to enhance their customer service capabilities. The project consists of three main components:
Realtime API: Handles WebSocket connections to Azure OpenAI’s real-time API.
Tools: Defines various customer support functions like checking order status, processing returns, and more.
Application: Manages the interaction flow and integrates the real-time client with UI Layer.
Environment Setup
Create an .env file and update the following environment variables:
AZURE_OPENAI_API_KEY=XXXX
# replace with your Azure OpenAI API Key
AZURE_OPENAI_ENDPOINT=https://xxxx.openai.azure.com/
# replace with your Azure OpenAI Endpoint
AZURE_OPENAI_DEPLOYMENT=gpt-4o-realtime-preview
#Create a deployment for the gpt-4o-realtime-preview model and place the deployment name here. You can name the deployment as per your choice and put the name here.
AZURE_OPENAI_CHAT_DEPLOYMENT_VERSION=2024-10-01-preview
#You don’t need to change this unless you are willing to try other versions.
requirements.txt
chainlit==1.3.0rc1
openai
beautifulsoup4
lxml
python-dotenv
websockets
aiohttp
Implementing the Realtime Client
The heartbeat of your voice bot is the Realtime Client, which manages the WebSocket connection and handles communication with the GPT-4o Realtime API. The RealtimeAPI class is responsible for managing WebSocket connections to OpenAI’s real-time API. It handles sending and receiving messages, dispatching events, and maintaining the connection state.
Key Components:
RealtimeAPI Class:
Establishes and maintains the WebSocket connection.
Handles sending and receiving messages.
Manages event dispatching for various conversation events.
class RealtimeAPI(RealtimeEventHandler):
def __init__(self):
super().__init__()
self.default_url = ‘wss://api.openai.com/v1/realtime’
self.url = os.environ[“AZURE_OPENAI_ENDPOINT”]
self.api_key = os.environ[“AZURE_OPENAI_API_KEY”]
self.api_version = “2024-10-01-preview”
self.azure_deployment = os.environ[“AZURE_OPENAI_DEPLOYMENT”]
self.ws = None
def is_connected(self):
return self.ws is not None
def log(self, *args):
logger.debug(f”[Websocket/{datetime.utcnow().isoformat()}]”, *args)
async def connect(self, model=’gpt-4o-realtime-preview’):
if self.is_connected():
raise Exception(“Already connected”)
self.ws = await websockets.connect(f”{self.url}/openai/realtime?api-version={self.api_version}&deployment={model}&api-key={self.api_key}”, extra_headers={
‘Authorization’: f’Bearer {self.api_key}’,
‘OpenAI-Beta’: ‘realtime=v1’
})
self.log(f”Connected to {self.url}”)
asyncio.create_task(self._receive_messages())
async def _receive_messages(self):
async for message in self.ws:
event = json.loads(message)
if event[‘type’] == “error”:
logger.error(“ERROR”, message)
self.log(“received:”, event)
self.dispatch(f”server.{event[‘type’]}”, event)
self.dispatch(“server.*”, event)
async def send(self, event_name, data=None):
if not self.is_connected():
raise Exception(“RealtimeAPI is not connected”)
data = data or {}
if not isinstance(data, dict):
raise Exception(“data must be a dictionary”)
event = {
“event_id”: self._generate_id(“evt_”),
“type”: event_name,
**data
}
self.dispatch(f”client.{event_name}”, event)
self.dispatch(“client.*”, event)
self.log(“sent:”, event)
await self.ws.send(json.dumps(event))
def _generate_id(self, prefix):
return f”{prefix}{int(datetime.utcnow().timestamp() * 1000)}”
async def disconnect(self):
if self.ws:
await self.ws.close()
self.ws = None
self.log(f”Disconnected from {self.url}”)
Reference: init.py
RealtimeConversation Class:
Manages the state of the conversation.
Processes different types of events, such as message creation, transcription completion, and audio streaming.
Queues and formats audio and text data for seamless interaction.
class RealtimeConversation:
default_frequency = config.features.audio.sample_rate
EventProcessors = {
‘conversation.item.created’: lambda self, event: self._process_item_created(event),
‘conversation.item.truncated’: lambda self, event: self._process_item_truncated(event),
‘conversation.item.deleted’: lambda self, event: self._process_item_deleted(event),
‘conversation.item.input_audio_transcription.completed’: lambda self, event: self._process_input_audio_transcription_completed(event),
‘input_audio_buffer.speech_started’: lambda self, event: self._process_speech_started(event),
‘input_audio_buffer.speech_stopped’: lambda self, event, input_audio_buffer: self._process_speech_stopped(event, input_audio_buffer),
‘response.created’: lambda self, event: self._process_response_created(event),
‘response.output_item.added’: lambda self, event: self._process_output_item_added(event),
‘response.output_item.done’: lambda self, event: self._process_output_item_done(event),
‘response.content_part.added’: lambda self, event: self._process_content_part_added(event),
‘response.audio_transcript.delta’: lambda self, event: self._process_audio_transcript_delta(event),
‘response.audio.delta’: lambda self, event: self._process_audio_delta(event),
‘response.text.delta’: lambda self, event: self._process_text_delta(event),
‘response.function_call_arguments.delta’: lambda self, event: self._process_function_call_arguments_delta(event),
}
def __init__(self):
self.clear()
def clear(self):
self.item_lookup = {}
self.items = []
self.response_lookup = {}
self.responses = []
self.queued_speech_items = {}
self.queued_transcript_items = {}
self.queued_input_audio = None
def queue_input_audio(self, input_audio):
self.queued_input_audio = input_audio
def process_event(self, event, *args):
event_processor = self.EventProcessors.get(event[‘type’])
if not event_processor:
raise Exception(f”Missing conversation event processor for {event[‘type’]}”)
return event_processor(self, event, *args)
def get_item(self, id):
return self.item_lookup.get(id)
def get_items(self):
return self.items[:]
def _process_item_created(self, event):
item = event[‘item’]
new_item = item.copy()
if new_item[‘id’] not in self.item_lookup:
self.item_lookup[new_item[‘id’]] = new_item
self.items.append(new_item)
new_item[‘formatted’] = {
‘audio’: [],
‘text’: ”,
‘transcript’: ”
}
if new_item[‘id’] in self.queued_speech_items:
new_item[‘formatted’][‘audio’] = self.queued_speech_items[new_item[‘id’]][‘audio’]
del self.queued_speech_items[new_item[‘id’]]
if ‘content’ in new_item:
text_content = [c for c in new_item[‘content’] if c[‘type’] in [‘text’, ‘input_text’]]
for content in text_content:
new_item[‘formatted’][‘text’] += content[‘text’]
if new_item[‘id’] in self.queued_transcript_items:
new_item[‘formatted’][‘transcript’] = self.queued_transcript_items[new_item[‘id’]][‘transcript’]
del self.queued_transcript_items[new_item[‘id’]]
if new_item[‘type’] == ‘message’:
if new_item[‘role’] == ‘user’:
new_item[‘status’] = ‘completed’
if self.queued_input_audio:
new_item[‘formatted’][‘audio’] = self.queued_input_audio
self.queued_input_audio = None
else:
new_item[‘status’] = ‘in_progress’
elif new_item[‘type’] == ‘function_call’:
new_item[‘formatted’][‘tool’] = {
‘type’: ‘function’,
‘name’: new_item[‘name’],
‘call_id’: new_item[‘call_id’],
‘arguments’: ”
}
new_item[‘status’] = ‘in_progress’
elif new_item[‘type’] == ‘function_call_output’:
new_item[‘status’] = ‘completed’
new_item[‘formatted’][‘output’] = new_item[‘output’]
return new_item, None
def _process_item_truncated(self, event):
item_id = event[‘item_id’]
audio_end_ms = event[‘audio_end_ms’]
item = self.item_lookup.get(item_id)
if not item:
raise Exception(f’item.truncated: Item “{item_id}” not found’)
end_index = (audio_end_ms * self.default_frequency) // 1000
item[‘formatted’][‘transcript’] = ”
item[‘formatted’][‘audio’] = item[‘formatted’][‘audio’][:end_index]
return item, None
def _process_item_deleted(self, event):
item_id = event[‘item_id’]
item = self.item_lookup.get(item_id)
if not item:
raise Exception(f’item.deleted: Item “{item_id}” not found’)
del self.item_lookup[item[‘id’]]
self.items.remove(item)
return item, None
def _process_input_audio_transcription_completed(self, event):
item_id = event[‘item_id’]
content_index = event[‘content_index’]
transcript = event[‘transcript’]
formatted_transcript = transcript or ‘ ‘
item = self.item_lookup.get(item_id)
if not item:
self.queued_transcript_items[item_id] = {‘transcript’: formatted_transcript}
return None, None
item[‘content’][content_index][‘transcript’] = transcript
item[‘formatted’][‘transcript’] = formatted_transcript
return item, {‘transcript’: transcript}
def _process_speech_started(self, event):
item_id = event[‘item_id’]
audio_start_ms = event[‘audio_start_ms’]
self.queued_speech_items[item_id] = {‘audio_start_ms’: audio_start_ms}
return None, None
def _process_speech_stopped(self, event, input_audio_buffer):
item_id = event[‘item_id’]
audio_end_ms = event[‘audio_end_ms’]
speech = self.queued_speech_items[item_id]
speech[‘audio_end_ms’] = audio_end_ms
if input_audio_buffer:
start_index = (speech[‘audio_start_ms’] * self.default_frequency) // 1000
end_index = (speech[‘audio_end_ms’] * self.default_frequency) // 1000
speech[‘audio’] = input_audio_buffer[start_index:end_index]
return None, None
def _process_response_created(self, event):
response = event[‘response’]
if response[‘id’] not in self.response_lookup:
self.response_lookup[response[‘id’]] = response
self.responses.append(response)
return None, None
def _process_output_item_added(self, event):
response_id = event[‘response_id’]
item = event[‘item’]
response = self.response_lookup.get(response_id)
if not response:
raise Exception(f’response.output_item.added: Response “{response_id}” not found’)
response[‘output’].append(item[‘id’])
return None, None
def _process_output_item_done(self, event):
item = event[‘item’]
if not item:
raise Exception(‘response.output_item.done: Missing “item”‘)
found_item = self.item_lookup.get(item[‘id’])
if not found_item:
raise Exception(f’response.output_item.done: Item “{item[“id”]}” not found’)
found_item[‘status’] = item[‘status’]
return found_item, None
def _process_content_part_added(self, event):
item_id = event[‘item_id’]
part = event[‘part’]
item = self.item_lookup.get(item_id)
if not item:
raise Exception(f’response.content_part.added: Item “{item_id}” not found’)
item[‘content’].append(part)
return item, None
def _process_audio_transcript_delta(self, event):
item_id = event[‘item_id’]
content_index = event[‘content_index’]
delta = event[‘delta’]
item = self.item_lookup.get(item_id)
if not item:
raise Exception(f’response.audio_transcript.delta: Item “{item_id}” not found’)
item[‘content’][content_index][‘transcript’] += delta
item[‘formatted’][‘transcript’] += delta
return item, {‘transcript’: delta}
def _process_audio_delta(self, event):
item_id = event[‘item_id’]
content_index = event[‘content_index’]
delta = event[‘delta’]
item = self.item_lookup.get(item_id)
if not item:
logger.debug(f’response.audio.delta: Item “{item_id}” not found’)
return None, None
array_buffer = base64_to_array_buffer(delta)
append_values = array_buffer.tobytes()
# TODO: make it work
# item[‘formatted’][‘audio’] = merge_int16_arrays(item[‘formatted’][‘audio’], append_values)
return item, {‘audio’: append_values}
def _process_text_delta(self, event):
item_id = event[‘item_id’]
content_index = event[‘content_index’]
delta = event[‘delta’]
item = self.item_lookup.get(item_id)
if not item:
raise Exception(f’response.text.delta: Item “{item_id}” not found’)
item[‘content’][content_index][‘text’] += delta
item[‘formatted’][‘text’] += delta
return item, {‘text’: delta}
def _process_function_call_arguments_delta(self, event):
item_id = event[‘item_id’]
delta = event[‘delta’]
item = self.item_lookup.get(item_id)
if not item:
raise Exception(f’response.function_call_arguments.delta: Item “{item_id}” not found’)
item[‘arguments’] += delta
item[‘formatted’][‘tool’][‘arguments’] += delta
return item, {‘arguments’: delta}
RealtimeClient Class:
Initialization: Sets up system prompts, session configurations, and initializes RealtimeAPI and RealtimeConversation for managing WebSocket connections and conversation events.
Connection Management: Handles connecting and disconnecting from the server, waiting for session creation, and updating session settings.
Event Handling: Listens for and processes server and client events, dispatching them to appropriate handlers.
Conversation Management: Manages creation, updating, and deletion of conversation items, including handling input audio and speech events.
Tool and Response Management: Supports adding/removing tools, invoking them based on events, sending user messages, creating responses, and managing audio content.
class RealtimeClient(RealtimeEventHandler):
def __init__(self, system_prompt: str):
super().__init__()
self.system_prompt = system_prompt
self.default_session_config = {
“modalities”: [“text”, “audio”],
“instructions”: self.system_prompt,
“voice”: “shimmer”,
“input_audio_format”: “pcm16”,
“output_audio_format”: “pcm16”,
“input_audio_transcription”: { “model”: ‘whisper-1’ },
“turn_detection”: { “type”: ‘server_vad’ },
“tools”: [],
“tool_choice”: “auto”,
“temperature”: 0.8,
“max_response_output_tokens”: 4096,
}
self.session_config = {}
self.transcription_models = [{“model”: “whisper-1”}]
self.default_server_vad_config = {
“type”: “server_vad”,
“threshold”: 0.5,
“prefix_padding_ms”: 300,
“silence_duration_ms”: 200,
}
self.realtime = RealtimeAPI()
self.conversation = RealtimeConversation()
self._reset_config()
self._add_api_event_handlers()
def _reset_config(self):
self.session_created = False
self.tools = {}
self.session_config = self.default_session_config.copy()
self.input_audio_buffer = bytearray()
return True
def _add_api_event_handlers(self):
self.realtime.on(“client.*”, self._log_event)
self.realtime.on(“server.*”, self._log_event)
self.realtime.on(“server.session.created”, self._on_session_created)
self.realtime.on(“server.response.created”, self._process_event)
self.realtime.on(“server.response.output_item.added”, self._process_event)
self.realtime.on(“server.response.content_part.added”, self._process_event)
self.realtime.on(“server.input_audio_buffer.speech_started”, self._on_speech_started)
self.realtime.on(“server.input_audio_buffer.speech_stopped”, self._on_speech_stopped)
self.realtime.on(“server.conversation.item.created”, self._on_item_created)
self.realtime.on(“server.conversation.item.truncated”, self._process_event)
self.realtime.on(“server.conversation.item.deleted”, self._process_event)
self.realtime.on(“server.conversation.item.input_audio_transcription.completed”, self._process_event)
self.realtime.on(“server.response.audio_transcript.delta”, self._process_event)
self.realtime.on(“server.response.audio.delta”, self._process_event)
self.realtime.on(“server.response.text.delta”, self._process_event)
self.realtime.on(“server.response.function_call_arguments.delta”, self._process_event)
self.realtime.on(“server.response.output_item.done”, self._on_output_item_done)
def _log_event(self, event):
realtime_event = {
“time”: datetime.utcnow().isoformat(),
“source”: “client” if event[“type”].startswith(“client.”) else “server”,
“event”: event,
}
self.dispatch(“realtime.event”, realtime_event)
def _on_session_created(self, event):
self.session_created = True
def _process_event(self, event, *args):
item, delta = self.conversation.process_event(event, *args)
if item:
self.dispatch(“conversation.updated”, {“item”: item, “delta”: delta})
return item, delta
def _on_speech_started(self, event):
self._process_event(event)
self.dispatch(“conversation.interrupted”, event)
def _on_speech_stopped(self, event):
self._process_event(event, self.input_audio_buffer)
def _on_item_created(self, event):
item, delta = self._process_event(event)
self.dispatch(“conversation.item.appended”, {“item”: item})
if item and item[“status”] == “completed”:
self.dispatch(“conversation.item.completed”, {“item”: item})
async def _on_output_item_done(self, event):
item, delta = self._process_event(event)
if item and item[“status”] == “completed”:
self.dispatch(“conversation.item.completed”, {“item”: item})
if item and item.get(“formatted”, {}).get(“tool”):
await self._call_tool(item[“formatted”][“tool”])
async def _call_tool(self, tool):
try:
print(tool[“arguments”])
json_arguments = json.loads(tool[“arguments”])
tool_config = self.tools.get(tool[“name”])
if not tool_config:
raise Exception(f’Tool “{tool[“name”]}” has not been added’)
result = await tool_config[“handler”](**json_arguments)
await self.realtime.send(“conversation.item.create”, {
“item”: {
“type”: “function_call_output”,
“call_id”: tool[“call_id”],
“output”: json.dumps(result),
}
})
except Exception as e:
logger.error(traceback.format_exc())
await self.realtime.send(“conversation.item.create”, {
“item”: {
“type”: “function_call_output”,
“call_id”: tool[“call_id”],
“output”: json.dumps({“error”: str(e)}),
}
})
await self.create_response()
def is_connected(self):
return self.realtime.is_connected()
def reset(self):
self.disconnect()
self.realtime.clear_event_handlers()
self._reset_config()
self._add_api_event_handlers()
return True
async def connect(self):
if self.is_connected():
raise Exception(“Already connected, use .disconnect() first”)
await self.realtime.connect()
await self.update_session()
return True
async def wait_for_session_created(self):
if not self.is_connected():
raise Exception(“Not connected, use .connect() first”)
while not self.session_created:
await asyncio.sleep(0.001)
return True
async def disconnect(self):
self.session_created = False
self.conversation.clear()
if self.realtime.is_connected():
await self.realtime.disconnect()
def get_turn_detection_type(self):
return self.session_config.get(“turn_detection”, {}).get(“type”)
async def add_tool(self, definition, handler):
if not definition.get(“name”):
raise Exception(“Missing tool name in definition”)
name = definition[“name”]
if name in self.tools:
raise Exception(f’Tool “{name}” already added. Please use .removeTool(“{name}”) before trying to add again.’)
if not callable(handler):
raise Exception(f’Tool “{name}” handler must be a function’)
self.tools[name] = {“definition”: definition, “handler”: handler}
await self.update_session()
return self.tools[name]
def remove_tool(self, name):
if name not in self.tools:
raise Exception(f’Tool “{name}” does not exist, can not be removed.’)
del self.tools[name]
return True
async def delete_item(self, id):
await self.realtime.send(“conversation.item.delete”, {“item_id”: id})
return True
async def update_session(self, **kwargs):
self.session_config.update(kwargs)
use_tools = [
{**tool_definition, “type”: “function”}
for tool_definition in self.session_config.get(“tools”, [])
] + [
{**self.tools[key][“definition”], “type”: “function”}
for key in self.tools
]
session = {**self.session_config, “tools”: use_tools}
if self.realtime.is_connected():
await self.realtime.send(“session.update”, {“session”: session})
return True
async def create_conversation_item(self, item):
await self.realtime.send(“conversation.item.create”, {
“item”: item
})
async def send_user_message_content(self, content=[]):
if content:
for c in content:
if c[“type”] == “input_audio”:
if isinstance(c[“audio”], (bytes, bytearray)):
c[“audio”] = array_buffer_to_base64(c[“audio”])
await self.realtime.send(“conversation.item.create”, {
“item”: {
“type”: “message”,
“role”: “user”,
“content”: content,
}
})
await self.create_response()
return True
async def append_input_audio(self, array_buffer):
if len(array_buffer) > 0:
await self.realtime.send(“input_audio_buffer.append”, {
“audio”: array_buffer_to_base64(np.array(array_buffer)),
})
self.input_audio_buffer.extend(array_buffer)
return True
async def create_response(self):
if self.get_turn_detection_type() is None and len(self.input_audio_buffer) > 0:
await self.realtime.send(“input_audio_buffer.commit”)
self.conversation.queue_input_audio(self.input_audio_buffer)
self.input_audio_buffer = bytearray()
await self.realtime.send(“response.create”)
return True
async def cancel_response(self, id=None, sample_count=0):
if not id:
await self.realtime.send(“response.cancel”)
return {“item”: None}
else:
item = self.conversation.get_item(id)
if not item:
raise Exception(f’Could not find item “{id}”‘)
if item[“type”] != “message”:
raise Exception(‘Can only cancelResponse messages with type “message”‘)
if item[“role”] != “assistant”:
raise Exception(‘Can only cancelResponse messages with role “assistant”‘)
await self.realtime.send(“response.cancel”)
audio_index = next((i for i, c in enumerate(item[“content”]) if c[“type”] == “audio”), -1)
if audio_index == -1:
raise Exception(“Could not find audio on item to cancel”)
await self.realtime.send(“conversation.item.truncate”, {
“item_id”: id,
“content_index”: audio_index,
“audio_end_ms”: int((sample_count / self.conversation.default_frequency) * 1000),
})
return {“item”: item}
async def wait_for_next_item(self):
event = await self.wait_for_next(“conversation.item.appended”)
return {“item”: event[“item”]}
async def wait_for_next_completed_item(self):
event = await self.wait_for_next(“conversation.item.completed”)
return {“item”: event[“item”]}
Adding Tools and Handlers
Your voice bot’s functionality can be extended by integrating various tools and handlers. These allow the bot to perform specific actions based on user inputs.
Define Tool Definitions:
In tool.py, define the capabilities of your bot, such as checking order statuses, processing returns, or updating account information.
Each tool includes a name, description, and required parameters.
Implement Handlers:
Create asynchronous handler functions for each tool to execute the desired actions.
These handlers interact with your backend systems or databases to fulfill user requests.
Integrate Tools with the Realtime Client:
Register each tool and its handler with the RealtimeClient in your app.py file.
Ensure that the bot can invoke these tools dynamically during conversations.
Key Components:
Tool Definitions:
Structured descriptions of each tool, including the required parameters and functionalities.
Example:
# Function Definitions
check_order_status_def = {
“name”: “check_order_status”,
“description”: “Check the status of a customer’s order”,
“parameters”: {
“type”: “object”,
“properties”: {
“customer_id”: {
“type”: “string”,
“description”: “The unique identifier for the customer”
},
“order_id”: {
“type”: “string”,
“description”: “The unique identifier for the order”
}
},
“required”: [“customer_id”, “order_id”]
}
}
Handler Functions:
Asynchronous functions that execute the logic for each tool.
Interact with external systems, databases, or perform specific actions based on user requests
Example:
async def check_order_status_handler(customer_id, order_id):
status = “In Transit”
# Your Business Logic
estimated_delivery, status, order_date = fetch_order_details(order_id, customer_id)
# Read the HTML template
with open(‘order_status_template.html’, ‘r’) as file:
html_content = file.read()
# Replace placeholders with actual data
html_content = html_content.format(
order_id=order_id,
customer_id=customer_id,
order_date=order_date.strftime(“%B %d, %Y”),
estimated_delivery=estimated_delivery.strftime(“%B %d, %Y”),
status=status
)
# Return the Chainlit message with HTML content
await cl.Message(content=f”Here is the detail of your order n {html_content}”).send()
return f”Order {order_id} status for customer {customer_id}: {status}”
Reference:
Integrating with Your Application
With the Realtime Client and tools in place, it’s time to weave everything into your application.
Initialize OpenAI Realtime:
In app.py, set up the connection to the GPT-4o Realtime API using your system prompt and session configurations.
Manage user sessions and track interactions seamlessly.
Handle User Interactions:
Implement event handlers for chat initiation, message reception, audio processing, and session termination.
Ensure that user inputs, whether text or voice, are appropriately processed and responded to in real-time.
Manage Conversation Flow:
Utilize the RealtimeConversation class to handle conversation states, manage audio streams, and maintain context.
Implement logic to handle interruptions, cancellations, and dynamic responses based on user actions.
Key Components:
Initialization:
Sets up the OpenAI Realtime Client with the system prompt and configures tools.
system_prompt = “””Provide helpful and empathetic support responses to customer inquiries for ShopMe in Hindi language, addressing their requests, concerns, or feedback professionally.
Maintain a friendly and service-oriented tone throughout the interaction to ensure a positive customer experience.
# Steps
1. **Identify the Issue:** Carefully read the customer’s inquiry to understand the problem or question they are presenting.
2. **Gather Relevant Information:** Check for any additional data needed, such as order numbers or account details, while ensuring the privacy and security of the customer’s information.
3. **Formulate a Response:** Develop a solution or informative response based on the understanding of the issue. The response should be clear, concise, and address all parts of the customer’s concern.
4. **Offer Further Assistance:** Invite the customer to reach out again if they need more help or have additional questions.
5. **Close Politely:** End the conversation with a polite closing statement that reinforces the service commitment of ShopMe.
# Output Format
Provide a clear and concise paragraph addressing the customer’s inquiry, including:
– Acknowledgment of their concern
– Suggested solution or response
– Offer for further assistance
– Polite closing
# Notes
– Greet user with Welcome to ShopMe For the first time only
– always speak in Hindi
– Ensure all customer data is handled according to relevant privacy and data protection laws and ShopMe’s privacy policy.
– In cases of high sensitivity or complexity, escalate the issue to a human customer support agent.
– Keep responses within a reasonable length to ensure they are easy to read and understand.”””
Event Handlers:
Manages chat start, message reception, audio streaming, and session termination events.
First we will i will initialize the real time client discussed before
async def setup_openai_realtime(system_prompt: str):
“””Instantiate and configure the OpenAI Realtime Client”””
openai_realtime = RealtimeClient(system_prompt = system_prompt)
cl.user_session.set(“track_id”, str(uuid4()))
async def handle_conversation_updated(event):
item = event.get(“item”)
delta = event.get(“delta”)
“””Currently used to stream audio back to the client.”””
if delta:
# Only one of the following will be populated for any given event
if ‘audio’ in delta:
audio = delta[‘audio’] # Int16Array, audio added
await cl.context.emitter.send_audio_chunk(cl.OutputAudioChunk(mimeType=”pcm16″, data=audio, track=cl.user_session.get(“track_id”)))
if ‘transcript’ in delta:
transcript = delta[‘transcript’] # string, transcript added
pass
if ‘arguments’ in delta:
arguments = delta[‘arguments’] # string, function arguments added
pass
async def handle_item_completed(item):
“””Used to populate the chat context with transcription once an item is completed.”””
# print(item) # TODO
pass
async def handle_conversation_interrupt(event):
“””Used to cancel the client previous audio playback.”””
cl.user_session.set(“track_id”, str(uuid4()))
await cl.context.emitter.send_audio_interrupt()
async def handle_error(event):
logger.error(event)
Session Management:
Maintains user sessions, handles conversation interruptions, and ensures smooth interaction flow. As you see in the below code the idea is whenever you a receive an audio chunk you should call the real time client with the audio chunk.
if openai_realtime:
if openai_realtime.is_connected():
await openai_realtime.append_input_audio(chunk.data)
else:
logger.info(“RealtimeClient is not connected”)
Reference: app.py
Testing and Deployment
Once your voice bot is built, thorough testing is essential to ensure reliability and user satisfaction.
Local Testing:
Use the AI Studio Real-time audio playground to interact with your deployed model.
Test various functionalities, including speech recognition, response generation, and tool execution.
Integration Testing:
Ensure that your application seamlessly communicates with the Realtime API.
Test the event handlers and tool integrations to verify correct behavior under different scenarios.
Deployment:
Deploy your application to a production environment, leveraging cloud services for scalability.
Monitor performance and make adjustments as needed to handle real-world usage.
Conclusion
Building a real-time voice bot has never been more accessible, thanks to the GPT-4o Realtime API. By consolidating speech-to-speech functionalities into a single, efficient interface, developers can craft engaging and natural conversational experiences without the complexity of managing multiple models. Whether you’re enhancing customer support, developing educational tools, or creating interactive applications, the GPT-4o Realtime API provides a robust foundation to bring your voice bot visions to life.
Embark on your development journey today and explore the endless possibilities that real-time voice interactions can offer your users!
Feel free to refer to the Azure OpenAI GPT-4o Realtime API documentation for more detailed information on setup, deployment, and advanced configurations.
Thanks
Manoranjan Rajguru
https://www.linkedin.com/in/manoranjan-rajguru/
Microsoft Tech Community – Latest Blogs –Read More
Can’t change my Outlook password
everytime i try to type in a new password for my e-mail, it just clears the box and says “This information is required.”
what can i do about this issue?
everytime i try to type in a new password for my e-mail, it just clears the box and says “This information is required.”what can i do about this issue? Read More
WiFi icon not showing.
In my asus laptop , wifi icon not showing in taskbar. it happened once in past but i fixed it but this time im not able to, so need help
I’ve tried –
1) Network reset
2) Disable/Enable driver
3) Uninstall driver then restart install driver
In my asus laptop , wifi icon not showing in taskbar. it happened once in past but i fixed it but this time im not able to, so need help I’ve tried – 1) Network reset2) Disable/Enable driver3) Uninstall driver then restart install driver Read More