Delta Query Performance in Azure Synapse Serverless: Issues with Response Times using Serverless SQL
We are encountering significant performance issues with Delta query execution in Azure Synapse Serverless across multiple client setups. Specifically, we utilize views built on Delta tables within Synapse Serverless to deliver data via a Synapse on-demand SQL endpoint. When end users execute queries involving simple joins at this endpoint, we observe that the response times are 7-8 times slower compared to equivalent views created over Parquet files.
Efforts to optimize have included optimizing & vacuuming the Delta tables and explicitly defining data types during view creation, which resulted in only marginal improvements. We suspect the issue might be related to Synapse utilizing an outdated version of the Delta reader.
For comparison, using a Databricks SQL Warehouse dramatically enhances the query response times. However, this approach does not support querying via SQL Server Management Studio, which is a significant limitation for our end users.
Has anyone else experienced similar performance challenges with Delta in Synapse Serverless? Any recommendations or workarounds would be greatly appreciated. Also, maintaining duplicate datasets in both Delta and Parquet formats solely to support SQL endpoints feels inefficient.
Thoughts on this?
We are encountering significant performance issues with Delta query execution in Azure Synapse Serverless across multiple client setups. Specifically, we utilize views built on Delta tables within Synapse Serverless to deliver data via a Synapse on-demand SQL endpoint. When end users execute queries involving simple joins at this endpoint, we observe that the response times are 7-8 times slower compared to equivalent views created over Parquet files.Efforts to optimize have included optimizing & vacuuming the Delta tables and explicitly defining data types during view creation, which resulted in only marginal improvements. We suspect the issue might be related to Synapse utilizing an outdated version of the Delta reader. For comparison, using a Databricks SQL Warehouse dramatically enhances the query response times. However, this approach does not support querying via SQL Server Management Studio, which is a significant limitation for our end users. Has anyone else experienced similar performance challenges with Delta in Synapse Serverless? Any recommendations or workarounds would be greatly appreciated. Also, maintaining duplicate datasets in both Delta and Parquet formats solely to support SQL endpoints feels inefficient. Thoughts on this? Read More