Query insights
Question: How can I identify unused data in a modern data platform built with Azure Synapse and the medallion architecture using Log Analytics?
I’m working with a client who has built a modern data platform based on the medallion architecture, leveraging Azure Synapse and Azure Storage Accounts. Users access the data in various ways within Synapse workspaces: some through Python scripts, others through serverless SQL endpoints, and others via dedicated SQL pools (utilizing views and stored procedures).
We log a significant amount of information via Log Analytics, which means that all select statements executed on the data are essentially logged. The client now wants to identify which data is not actively used, in order to reduce storage costs by removing unused datasets. In a traditional SQL data warehouse, the Query Store could be used for this, but in this platform, we only have access to the log data stored in Log Analytics.
My question is: How can we, based on the logs in Log Analytics, determine which data (tables, views, etc.) is processed through the various layers of the medallion architecture but not actually used?
The goal is to remove unused data to save costs.
Some additional questions:
Is there a way to analyze usage patterns of datasets based on raw logs in Log Analytics?Are there any existing tools or KQL queries that could help identify which datasets have been inactive over a certain period?Could a metastore tool, such as Azure Purview, play a role in identifying unused datasets? If so, how can this be integrated with our existing platform?
Any suggestions or insights would be greatly appreciated!
Question: How can I identify unused data in a modern data platform built with Azure Synapse and the medallion architecture using Log Analytics? I’m working with a client who has built a modern data platform based on the medallion architecture, leveraging Azure Synapse and Azure Storage Accounts. Users access the data in various ways within Synapse workspaces: some through Python scripts, others through serverless SQL endpoints, and others via dedicated SQL pools (utilizing views and stored procedures). We log a significant amount of information via Log Analytics, which means that all select statements executed on the data are essentially logged. The client now wants to identify which data is not actively used, in order to reduce storage costs by removing unused datasets. In a traditional SQL data warehouse, the Query Store could be used for this, but in this platform, we only have access to the log data stored in Log Analytics. My question is: How can we, based on the logs in Log Analytics, determine which data (tables, views, etc.) is processed through the various layers of the medallion architecture but not actually used? The goal is to remove unused data to save costs. Some additional questions:Is there a way to analyze usage patterns of datasets based on raw logs in Log Analytics?Are there any existing tools or KQL queries that could help identify which datasets have been inactive over a certain period?Could a metastore tool, such as Azure Purview, play a role in identifying unused datasets? If so, how can this be integrated with our existing platform?Any suggestions or insights would be greatly appreciated! Read More