Making Searching and Curating Data Assets in Microsoft Purview easier.
1. Introduction.
Currently, IT infrastructure stores and maintains data assets, even though IT doesn’t own or use the data.
There’s a disconnect between how data needs to be discovered and maintained within the business, and the teams that maintain it.
Without standardized procedures for data governance, data handling often relies on manual processes, leading to inefficiencies, data loss, insufficient data protection and higher operational costs.
Microsoft Purview is designed to help enterprises get the most value from their existing information assets.
The catalog makes data sources easily discoverable and understandable by the users who manage the data.
With Purview, organizations can gain insights into data lineage, data usage, and data connections, helping them to comply with regulations such as GDPR, CCPA, and others.
Microsoft Purview provides a cloud-based service into which you can register data sources. During registration, the data remains in its existing location, but a copy of its metadata is added to Microsoft Purview, along with a reference to the data source location.
After you register a data source, you can scan it and enrich its metadata.
Discovering and understanding data sources and their use is the primary purpose of registering the sources.
In this article, we describe smart features that allow you to search previously scanned data assets using natural language queries, along with automated metadata enrichment for curating these assets.
2. Data Catalog.
The Data Catalog is a Data Governance solution that enables business experts and technical data owners to collaborate and contribute to a shared understanding of data.
Among other functionalities, in Data Catalog you can use data search to discover data assets from multiple data sources, including Microsoft Fabric items and workspaces Exploring the Relationship Between Microsoft Fabric and Microsoft Purview: What You Need to Know
Microsoft Purview’s Smart Data Searching primarily works with scanned data assets. For unscanned data assets, manual classification and tagging can be done, but they may not fully leverage the capabilities of Smart Data Searching. For real-time or “Live View” data, you would typically need to scan the data source first to make it searchable in Microsoft Purview.
2.1 Smart Data Search.
Once the metadata is ingested into the Microsoft Purview Data Map, it can be searched using Microsoft Purview’s Smart Data Searching in the Data Catalog.
Now you can use natural language description for data assets searching in the Microsoft Purview Catalog.
Go to the new Microsoft Purview Portal: https://purview.microsoft.com
Select the Data Catalog solution and then, Data Search.
Once you enter your search, Microsoft Purview returns a list of data assets and glossary terms that match the entered keywords, provided the user has data reader permissions for them.
In the example below, the search phrase was “I want to know the data related with diseases in Latam”.
You should know that the search returns all data assets in the collection(s) that best match the query. If a collection contains data assets that match the phrase, all scanned items are returned, even if some items do not match exactly.
The correspondence between search results and desired results depends on your Data Map design, the registered data sources, and the scope applied in scanning, which helps narrow down the most common searches in your business.
See for example a multiregional and business concepts as a design for your Data Map.
The Microsoft Purview relevance engine sorts through all the matches and ranks them based on what it believes their usefulness is to a user.
Many factors determine an asset’s relevance score, and the Microsoft Purview search team is constantly tuning the relevance engine to ensure the top search results have value to you.
2.2 Browse by applying filters.
Once you entered the search phrase and wait for a few seconds, you can see a Filters Pane where you can apply the following filtering criteria:
Asset Type
Data Source Type
Collection
Classification
Contact
Endorsement
Assigned Term
Sensitivity Label
Rating
Next figure shows the Asset Type filtering:
Filtering by “Data”, you can refine your search selecting one or more data asset types according to your referred data source:
Next figure shows the assets of type “Report”:
Then select any filter category you would like to narrow your results by and select any values you would like to narrow results to. For some filters, you can select the ellipses to choose between an AND condition or an OR condition, as next figure shows:
2.3 Curation process.
The process of contributing to the catalog by tagging, documenting, and annotating data sources that have already been registered is known as metadata curation [Metadata curation in Microsoft Purview | Microsoft Learn].
The curation process is facilitated by selecting one or more data assets returned in the search that are assumed to be the curator’s responsibility.
For example, in the next figure we show two selected data assets:
By clicking on “View selected,” you can access a screen to start adding attributes to the data asset’s metadata.
Click on “Bulk edit”:
Selecting an attribute, you can add new values, replace an existing value with another one or remove values.
You can add as many attributes as you need.
Depending on the attribute, you can manage the proper values.
Purview’s Copilot can help enrich metadata by suggesting additional context, classifications, and annotations based on the data asset’s content and usage, as well as can ensure consistency in metadata definitions and standards across the organization, reducing discrepancies and improving data quality.
Selecting “Suggestions”, you can observe many derived suggestions based on your business concepts.
You should know that AI models use general internet knowledge base data so it will not return company specific or custom definitions or terms. All terms should be stewarded before being published to ensure that the term and definition aligns to company use and the specific knowledge about the term. Microsoft Purview Data Catalog Responsible AI FAQ (Preview) | Microsoft Learn
Any terms and definitions provided via suggestions should be reviewed and aligned with the company’s specific language standards. When a term is selected and created, it will be in draft status, allowing the steward to complete and finalize the term before deciding to publish it.
Conclusions.
Smart data searching and automated metadata enrichment significantly enhance the cataloging process, making it more efficient, comprehensive, and insightful.
These advanced capabilities not only improve data discoverability and governance but also empower users with richer, more contextualized information, leading to more informed and effective decision-making.
Learn more:
Microsoft Purview collections architecture and best practices | Microsoft Learn
Scans and ingestion | Microsoft Learn
How to search the Data Catalog | Microsoft Learn
Discover data with natural language search – YouTube
Metadata curation in Microsoft Purview | Microsoft Learn.
Best practices for describing data in Microsoft Purview | Microsoft Learn
How to create and manage glossary terms (Preview) | Microsoft Learn
Curate your data with Business Concepts (youtube.com)
How to configure and manage data catalog access policies (Preview) | Microsoft Learn
Microsoft Tech Community – Latest Blogs –Read More