Use Auto-Label Policies to Protect Old Files from Copilot

Combining Auto-Label Policies, Trainable Classifiers, Sensitivity Labels, and DLP to Stop Copilot Accessing Old But Still Confidential Files

I’ve been on the TEC 2025 Roadshow in Europe this week. Monday was London, Tuesday Paris, and Dusseldorf is the final stop on Thursday. These trips sound like they should be great fun, but running events in three major cities over four days takes a brutal amount of effort.

In any case, my topic this week is protecting Microsoft 365 data in the era of AI. During the talk, I recommend that people use features like Restricted Content Discovery, sensitivity labels, and the (preview) DLP policy for Copilot to exert control over confidential and sensitive documents and restrict access to Copilot for Microsoft 365 and Copilot agents.

Find and Protect Old Confidential Material

All of which led to a great question at the London event: “how do I apply sensitivity labels to thousands of old but still confidential material files stored in multiple SharePoint Online sites.” It’s a good example of the kind of practical issue faced by tenant administrators during deployments.

The obvious answer is to use an auto-label policy to apply sensitivity labels that are then blocked by the DLP policy for Copilot. An auto-label policy can find Office documents at rest that don’t have sensitivity labels and apply a chosen label (manually-applied sensitivity labels are never overwritten but a policy can overwrite a lower-priority sensitivity label.

Trainable Classifiers

The issue is to identify the target set of confidential files. This is where a trainable classifier can help. Purview Data Lifecycle Management includes 75-odd built-in trainable classifiers that Microsoft has taught to find different types of documents like business plans and credit reports.

It might be possible to identify old confidential material using a built-in trainable classifier. If not, tenants can create custom trainable classifiers by using machine learning to process a training set of documents unique to the business. The process isn’t difficult, and the hardest part is often to find a suitable set of sample documents to train the classifier with. Running a simulation will quickly tell if machine learning can extract an accurate digital structure from sample documents to use as a classifier.

I have a couple of trainable classifiers in use to auto-label files. To test the process, I selected the default Source Code classifier (Figure 1). Behind the scenes, Purview looks for some matching documents to demonstrate how each of the built-in classifiers work. In this case, Purview had found several items in a projects site where I store files like drafts for blog posts. Some of the matching items had sensitivity labels, others did not. It was a good set to test the theory against.

Details of the matching items in a site found by the source code trainable classifier. — Figure 1: Details of the matching items in a site found by the source code trainable classifier

Creating an Auto-Label Policy

The next step is to create an auto-label policy. Because we want to apply sensitivity labels, the policy is created in the Purview Information Protection solution. The policy settings are very straightforward. Look for files matching the source code trainable classifier in all SharePoint Online sites and apply the Confidential sensitivity label. Figure 2 shows the rule created to find files that match the trainable qualifier.

Adding the rule to look for the source code trainable classif.ier to an auto-label policy — Figure 2: Adding the rule to look for the source code trainable classifier to an auto-label policy

You can choose to run an auto-label policy in simulation mode before making it active. Even though the trainable classifier shows some sample files that it found, it’s still a good idea to run the simulation, just to be sure. When you’re happy with the results, you can activate the policy to have Purview assign the chosen sensitivity label to the files found by the policy. Once the files are labelled, they’ll be invisible to Copilot for Microsoft 365.

Background Processing Runs Until the Job’s Done

Depending on how many old files need to be protected, the entire process to create a trainable classifier, tweak the classifier until it’s accurate, and run auto-labeling might take several weeks to complete. Most of the work happens in the background at a pace dictated by demands on the service. The auto-label policy will continue to run unless you stop it, once all those old but still valuable files are labelled.

Learn how to exploit the data available to Microsoft 365 tenant administrators through the Office 365 for IT Pros eBook. We love figuring out how things work.

Cart

Cart