Content filtering with Azure AI Studio
In alignment with Microsoft’s commitment to help customers use AI products responsibly, Azure OpenAI Service includes a content filtering system that works alongside core models. This system is powered by Azure AI Content Safety and it works by running both the prompt and completion through an ensemble of classification models designed to detect and prevent the output of harmful content.
This ensemble of models includes:
Multi-class classification models covering four categories of risks – hate, sexual, violence, and self-harm – across four severity levels - safe, low, medium, and high. They have been specifically trained and tested on the following languages: English, German, Japanese, Spanish, French, Italian, Portuguese, and Chinese. However, the service can work in many other languages, but the quality might vary.
Optional binary classifiers for detecting jailbreak risk, existing text, and code in public repositories.
The default content filtering configuration is set to filter at the medium severity threshold for all four content harms categories for both prompts and completions. That means that content that is detected at severity level medium or high is filtered, while content detected at severity level low or safe is not filtered by the content filters. However, you can modify the content filters and configure the severity thresholds at resource level, according to your application needs.
For Azure OpenAI models, only customers who have been approved for modified content filtering have full content filtering control, including configuring content filters at severity level high only or turning off content filters. Apply for modified content filters via this form: Azure OpenAI Limited Access Review: Modified Content Filters and Abuse Monitoring (microsoft.com). Models available through Models as a Service has content filtering enabled by default and can’t be configured.
In this blog post, we are going to see how content filtering works in Azure AI Studio and how you can configure it for your specific requirements. Below, you can also find a step-by-step video tutorial to achieve the same result.
Testing default content filtering in the Playground
To test the default content filtering system integrated in Azure OpenAI service, navigate to Azure AI Studio, go into your AI project and open the Chat Playground. If it’s your first time using Azure AI Studio, follow this step-by-step tutorial to set up your workspace and connect an Azure AI service resource to your hub.
To interact with a model in the Chat Playground, you also need an instance of an OpenAI model deployed in your AI project. You can learn how, by going through this documentation.
For the sake of our example, we deployed a gpt-4 model instance.
Let’s test the model with a query containing sexual violations: “Is it possible to have sex on the Contoso TrekMasterCamping Chair?”. As a response, we get a message that indicates the input is inappropriate and follows up with the proper use of the chair.
Let’s now test the same model with a different prompt, which contains sexual content, but uses slang rather than explicit terms: “Which sleeping bags are large enough to fit 2 people to do the deed?” In this case, the model provides a relevant response, without identifying the sexual content in the input message.
Also, please note that for the sake of these tests we are using a base model, so responses are not grounded in any specific product catalog. However, evaluating the groundedness of the model is out of the scope of this article.
Creating a content filter in Azure AI Studio
For specific scenarios, like retail customer care, we might want to add an extra layer of mitigation, to identify inappropriate input content. To achieve this goal, we are going to first create a content filter to lower the threshold for sexual content to the minimum value, so that any sexual content that is triggered for low severity gets blocked. Second, we are going to create a blocklist to account for specific phrases/words that we wish to block and that might not trigger the base filter, because they do not contain explicit terms (e.g. slang).
To do so, in Azure AI Studio, we can navigate from the left navigation menu to the ‘Content filters’ tab and click on the ‘+ Create content filter’ button.
In the Create content filter window, fill in the requested parameters, as follows:
Name: sexual_content_filter
Connection: <your-openai-connection-name>
Next, configure the threshold for the sexual filter to Low – default is Medium – for both input and output, as per screenshot below:
In the Deployment section, select the deployment you want to apply the content filter to. Finally review and create the filter.
Once your content filter is created and applied to your deployment, you can configure a blocklist of terms you want to filter in your application. This is enabled through the Blocklists feature, that you can configure by navigating to the Blocklists tab.
Like the content filter creation process, you need to specify a name for the blocklist and the associated Azure OpenAI connection. You can then manually add the terms to be blocked, or add the terms in bulk by importing a csv file. For example, in our case we are going to add the expression ‘do the deed’.
Once you have your blocklist, you can then add it to the previously created content filter. Navigate back to the content filter configuration and enable the blocklist toggle for both input and output. Once enabled, select the blocklist from the drop-down list.
Finally, you can head back to the playground and test the same prompts again to check how the model’s behavior changed. This time, your prompts should trigger the content filter we just created.
And voila, you just configured a custom content filter for the model you are going to use for inference in your application.
Next steps
In this blog, we explored the robust content filtering system – built on Azure AI Content Safety – which comes out of the box when using Azure OpenAI service models and can be configured on your applications needs. We encourage you to checkout our Moderate Content and Detect Harm with Azure AI Content Safety Studio self-paced workshop which teaches you how to choose and build a content moderation system in the Azure AI Content Safety Studio.
Keep learning about practical examples of RAI tools usage in the Azure ecosystem with this dedicated blog series – Responsible AI: from principles to practice.
Microsoft Tech Community – Latest Blogs –Read More