Will Microsoft 365 Copilot Errors and Hallucinations Eventually Corrupt the Microsoft Graph?
Copilot Errors in AI-Generated Text Can Persist and Spread
When I discussed working with Copilot Pages last Wednesday, I noted the usefulness of being able to capture output generated by Microsoft 365 Copilot as a response to a prompt in a Loop component. That’s the happy side of the equation. The dark side is that being able to capture AI-generated text so easily makes it easier for hallucinations and mistakes to sneak into the Microsoft Graph and become the source for further Copilot errors.
Take the example I showed in Figure 1 of the article where Copilot’s response captured in a page includes an incorrect fact about compliance search purge actions. Copilot reports that a soft-delete action moves items into the Deleted Items folder (in reality, the items go into the Deletions folder in Recoverable Items). This isn’t a big problem because I recognized the issue immediately. The Copilot results cited two documents and two web sites, but I couldn’t find the erroneous text in any of these locations, which implies that the knowledge came from the LLM.
Copilot Errors Can Persist
The text copied into the Copilot page included the error and was caught and corrected there. The content stored in the Loop component is accurate. But here’s the thing. When I went back to Microsoft 365 Business Chat (aka BizChat) to repeat the question with a different prompt asking Copilot to be explicit about what happens to soft-deleted items, the error is present once again, even though Copilot now cites the page created for the previous query (Figure 1).
Figure 1: Copilot generated text contains an error
At this point there’s not much more I can do. I have checked the Graph and other sources cited by Copilot and can’t find the error there. I’ve added a Copilot page with corrected information and seen that page cited in a response where the error is present. There’s no other route available to track down pesky Copilot errors. I guess this experience underlines once again that any text generated by an AI tool must be carefully checked and verified before it’s accepted.
AI-Generated Text Infects the Graph
But humans are humans. Some of us are very good at reading over AI-generated text to correct mistakes that might be present. Some of us are less good and might just accept what Copilot generates as accurate and useful information. The problem arises when AI-generated material that includes errors is stored in files in SharePoint Online or OneDrive for Business. (I’m more worried about material stored in SharePoint Online because it is shared more broadly than the personal files held in OneDrive).
When documents containing flawed AI-generated text infect the Graph, no one knows about the errors or where they originated. The polluted text becomes part of the corporate knowledge base. Errors are available to be recycled by Copilot again and again. In fact, because more documents are created containing the same errors over time, the feeling that the errors are fact becomes stronger because Copilot has more files to cite as sources. And if people don’t know that the text originated from Copilot, they’ll regard it as content written and checked by a human.
The Human Side
Humans make mistakes too. We try and eliminate errors as much as we can by asking co-workers to review text and check facts. Important documents might be reviewed several times to pick up and tease out issues prior to publication. At least, that’s what should happen.
The content of documents ages and can become less reliable over time. The digital debris accumulated in SharePoint Online and OneDrive for Business over years is equally likely to cajole Copilot into generating inaccurate or misleading content. Unless organizations manage old content over time, the quality of the results generated by Copilot are likely to degrade. To be fair to Microsoft, lots of work is happening in places like SharePoint Advanced Management to tackle aspects of the problem.
Protecting the Graph
I hear a lot about managing the access Copilot has to content by restricting search or blocking off individual documents. By comparison, little discussion happens about how to ensure the quality of information generated by users (with or without AI help) to prevent the pollution of the Microsoft Graph.
Perhaps we’re coming out of the initial excitement caused by thoughts about how AI could liberate users from mundane tasks to a period where we realize how AI must be controlled and mastered to extract maximum advantage. It’s hard to stop AI pollution creeping into the Microsoft Graph, but I think that this is a challenge that organizations should think about before the state of their Graph descends into chaos.