Keeping your AI out of trouble
One thing is true for most AI Applications – it’s easy to get yourself in trouble if you’re not careful. AI is all about probability, and the probability of it being incorrect, or behaving unexpectedly for a new input is practically never zero. In the classic chatbot days, this often meant getting an answer about something you’re not asking about, or the good old “I did not understand” default answer we all “love” to see when we’re having an issue. But with Generative AI, mistakes are much more nuanced, and may take the appearance of plain misinformation and, even worse, harmful content!
In this article, we’ll cover some of the guidelines you can adopt to minimize risk on AI Apps. Each section is composed of a set of actions you can take, followed by good and bad examples to illustrate their role in keeping your users – and you! – safe from unexpected AI behavior.
1. User interface guidelines
Starting with UI tips – these are simple changes to the way your end-users engage with your AI application that can go a long way in preventing misuse.
Guideline
Description
Reasons
Include disclaimer text
In order to interact with the AI, end-users should acknowledge the rules and limitations of the tool. A good disclaimer should mention:
The information provided may be generated by AI
The information provided may be incorrect
The user is responsible for verifying the correctness of information against sources provided
Any additional industry specific disclaimers
Users expect to see correct information on the platforms you provide them. The concept of a tool that can provide incorrect information is new and needs to be explicitly called out.
Visually separate Generated and Retrieved content into sections
Generated content is the output of the language model, and as such can be incorrect
Retrieved content is directly extracted from trusted sources, and can be expected to be correct, but possibly not relevant
This distinction should be clear to the end user. The generated content can be grounded on retrieved content, but you should always provide an original source the user can read directly.
In addition, you may want to refrain from answering a question when no content was retrieved.
Once you establish some content must be verified by the user, you need to define a clear boundary of what information needs verification, and what can be trusted without doubt.
Providing both pieces of information side by side makes it easy for the user to check the information at a glance, without leaving the app.
Having that separation in the application also allows you to override the generated content. Even if the AI says something, you can choose not to display it through app logic if there are no sources to support it.
Add a feature to report issues and provide feedback
Users should be able to provide feedback whenever they face issues or receive unexpected responses.
If you decide to let users include chat history with their feedback, make sure to get confirmation that no personal or sensitive data was shared.
Feedback forms provide a simple way for users to tell you if the app is meeting expectations.
Establish user accountability
Inform the user that the content they submit may be subject to review when harmful content is detected.
Having users be accountable for exploiting the tool may dissuade them from repeatedly attempting to do so.
Good examples
Let’s start with the original ChatGPT interface – Notice all elements are present:
Disclaimer text at the bottom
Per-message feedback option
Clearly distinct Retrieval and Generation sections
Terms and Conditions – though hidden under the question mark on the bottom right.
All these elements are crucial to ensure the user is aware how things can go wrong, and sets the right expectations for how to use the tool.
Microsoft Copilot for M365 has its disclaimer and all links right below the logo. Straight to the point!
Don’t worry about writing a huge disclaimer that contains everything – you can link the full terms and keep a clean UI.
Bad examples
Common mistakes when setting up a UI include:
Not having the required disclaimers, sources or highlighting
Overstating the chatbot’s usefulness – e.g. “can help with anything about [topic]”
While some of these safeguards may seem like they are understating the chatbot’s usefulness, they are indispensable to setting the right expectations given the inherent limitations of the technology.
2. System message guidelines
Next, we have system message guidelines. These are instructions that are not visible to the user, but guide the chatbot to answer questions with the right focus or style. Keep in mind that these can be somewhat overridden by user prompts, and as such only prevent accidental or simple misuse.
Guideline
Description
Reasons
Define a clear scope of what the chatbot should assist with
The assistant should not attempt to help with all requests. Establish a clear boundary as to what conversations it should engage in.
For all other topics, it should politely decline to engage.
Failing to specify a scope will make the bot behave as a generic utility, like out-of-the-box ChatGPT. Users may take advantage of that fact to misuse the application or API.
Do not personify the chatbot
The chatbot should present itself as a tool to help the user navigate content, rather than a person.
Behaving as an employee or extension of the company should also be avoided.
When users make improper use a personified chatbot, it may give the impression of manipulation/gullibility, rather than simple misuse.
Good example
“You are a search engine for Contoso Technology. Your role is to assist customers in locating the right information from publicly available sources like the website. Politely decline to engage in conversations about any topic outside of Contoso Technology”
Bad example
“You are Contoso’s AI Assistant. You are a highly skilled customer service agent that can help users of the website with all their questions.”
3. Evaluation guidelines
Next, we have evaluation guidelines. These tools will help quantitatively measure the correctness of responses – and the possibility of manipulating the app into generating harmful content.
Guideline
Description
Reasons
Evaluate the chatbot’s accuracy, and other metrics for quality of information
Define a set of “critical” questions your chatbot should be able to answer reliably.
Regularly submit this dataset for inference and either manually or automatically evaluate its accuracy. Prefer a combination of manual and automatic validations to ensure best results.
As chatbots evolve to meet your customer’s expectations, it’s common to lose track of answers which it supposedly already knows. Updating the prompt or data sources may negatively impact those responses, and these regressions need to be properly tracked.
Evaluate the chatbot’s ability to avoid generating harmful content
Define a set of “red-team” requests that attempt to break the chatbot, force it to generate harmful content, or leave its scope.
As with accuracy, establish a regular re-submission of this dataset for inference.
Unfortunately, chatbots can always be misused by an ill-intended user. Keep track of the most common “jailbreaking” patterns and test your bot’s behavior against them.
Azure OpenAI comes with built-in content safety, but it’s not foolproof. Make sure you objectively measure harmful content generation.
Good examples
Leveraging Azure AI Studio to evaluate Groundedness, Relevance, Coherence, Fluency and Similarity. More information can be found in the docs!
Using Prompt Shields for Jailbreak and Harmful Content detection.
Bad examples
Trying to capture exact matches when evaluating accuracy.
Not considering evaluation as part of the release cycle.
4. Data privacy guidelines
Finally, we cover some data privacy guidelines. Data privacy is about how you receive, process, persist and discard end-user information through your applications. Be aware that this is an overview and does not cover every aspect of data privacy, but is a good place to start considering privacy concerns.
Guideline
Description
Reasons
Don’t audit all model inputs and outputs unless absolutely necessary
There is typically no need to log all user interactions. Even when instructed not to, users may submit personal information which is then at risk of exposure.
Debugging and monitoring tools should focus on response status codes and token counts, rather than actual text content.
Persisting messages often poses a more severe data privacy risk than simply not doing so.
Microsoft only ever persists messages which are suspected of breaking terms and conditions. They may be then viewed by Microsoft for the sole purpose of evaluating improper use. Review with your Data Privacy team if you require this feature to be turned off.
Good examples
Capturing HTTP response codes and error messages for debugging.
Logging token usage related metrics to Azure Application Insights.
Capturing user intent for continuous improvement.
Expiring user conversation logs and metrics once they are no longer relevant for the purpose of providing the experience, as disclosed in its Privacy Statement
Bad examples
Capturing verbatim prompt / completion pairs.
Persisting user information for longer than necessary.
Failing to adhere to the Privacy Statement.
Wrap up
Remember, AI misuse will happen in your applications. Your objective is to safeguard your legitimate users so they know what the applications can and cannot do, while giving ill-intended users an experience that gives less the impression of a failed / fragile tool, and more like a robust toolset being used incorrectly.
We hope this cheat sheet provides a good overview of the tools available in Azure to help bring safety and responsibility to the use of AI. Do you have other tips or tools to safeguard AI Applications? Let us know in the comments!
Microsoft Tech Community – Latest Blogs –Read More