Client-Side Compute: A Greener Approach to Natural Language Data Queries
Introduction
Leveraging deterministic tools to execute the domain-specific language on the appropriate systems and
Offloading compute to client devices.
The Challenge: Efficiently Interacting with Structured Data
A Common Use Case
Complex Query Language: Users often struggle with the complexity of SQL or other query languages required to extract insights from the data. This creates a barrier to effective data utilization.
Performance and Scalability: The server load increases significantly with complex queries, especially when multiple tenants access the data simultaneously. This can lead to performance bottlenecks and scalability issues.
Cost and Resource Management: Hosting the necessary computational resources to handle data queries on the ISV’s servers is resource-intensive and costly. This includes maintaining high-performance databases and application servers.
User Experience: Customers increasingly demand the ability to interact with their data using natural language, expecting a seamless and intuitive user experience.
The architecture diagram above illustrates the current setup:
Data Sources: Public sources and tenant data are ingested into the system.
Storage: The data lake (or lake house) process the data from multiple sources, perform cleansing, and store the data in the gold tables periodically.
Orchestrator: Orchestrating ELT/ETL is done using Azure Fabric/Synapse or Azure Data Factory pipelines.
Serving: The web application is hosted on Azure App Service, the data is queried using Azure SQL Database.
Visualize: Data is reported using Power BI or other reporting tools, including home grown dashboards.
Enhanced Approach: Energy-Efficient Data Interaction
Leveraging Deterministic Tools for Query Execution:
Translation: Utilize LLMs to convert natural language queries into SQL.
Execution: Create a sandbox environment for each customer’s data. This sandbox is hosted on lower-cost storage, such as a storage container per customer, which contains a snapshot of the data they can interact with.
Data Management: The same data ingestion pipeline that updates the gold table in Azure SQL is adapted to update a customer-specific data set stored in their respective storage container. The idea is to use SQLite to store the customer-specific data, ensuring it is lightweight and portable.
Benefits:
Efficiency and Security: Ensures that queries are executed efficiently and securely, leveraging the robust capabilities of SQL databases while minimizing risks. By isolating each customer’s data in a sandbox, the need for sophisticated guardrails against bad queries and overloading the reporting database is significantly reduced.
Cost & Energy Savings: No need to manage or host a dedicated reporting database. Since the customer-specific data is hosted on Azure storage containers, the ISV avoids the costs and energy consumption associated with maintaining high-performance database infrastructure.
Scalability and Reliability: The ISV does not need to plan for the worst-case scenario of all customers running queries simultaneously, which could impact the health of a centralized reporting database. Each customer’s queries are isolated to their data, ensuring system stability and performance.
Offloading Compute to Client Devices:
Data Transmission: The client-side application ensures it has the current data snapshot available for the customer to work with. For example, it can check the data’s timestamp or use another method to verify if the local data is up-to-date and download the latest version if necessary. This snapshot is encapsulated in portable formats like JSON, SQLite, or Parquet.
Local Processing: The client-side application processes the data locally using the translated SQL queries.
Benefits:
Performance: Reduces server load, enhances scalability, and provides faster query responses by utilizing the client’s computational resources.
Cost & Energy Savings: Significant cost savings by reducing the need for high-performance server infrastructure. Hosting a static website and leveraging client devices’ processing power also reduces overall energy consumption.
Flexibility: Ensures that customers always work with the most current data without the need for constant server communication.
Revised Architecture
Data Sources: Public sources and tenant data are ingested into the system.
Storage: The data lake (or lake house) process the data from multiple sources, perform cleansing, and store the data in customer specific containers. This enhances security and isolation.
Orchestrator: Orchestrating ELT/ETL is done using Azure Fabric/Synapse or Azure Data Factory pipelines.
Why This Approach?
Efficiency: Data queries are executed locally, reducing the load on the server and improving performance.
Security: Data is securely isolated within a client-side sandbox, ensuring customers can only query what is provided.
Cost & Energy Saving: Hosting a static website is significantly cheaper and more energy-efficient than hosting a web application with a database. This approach leverages the processing power of client devices, further reducing infrastructure costs and energy consumption.
Scalability: By isolating each customer’s data in a sandbox, the ISV does not need to worry about the impact of simultaneous queries on a centralized database, ensuring system reliability and scalability.
Flexibility: Ensures that customers always have access to the most current data without the need for constant server communication.
Potential Downsides and Pitfalls
Client-Side Performance Variability: The approach relies on the computational power of client devices.
Data Synchronization: Ensuring that the local data snapshot on client devices is up-to-date can be challenging. Delays in synchronization could lead to users working with outdated data.
Conclusion
By adopting these strategies, ISVs can provide a more efficient, scalable, and cost-effective solution for natural language querying of structured data. Leveraging deterministic tools for executing domain-specific languages within isolated sandboxes ensures robust and secure query execution. Offloading compute to client devices not only reduces server load but also enhances performance and scalability, providing a seamless and intuitive user experience.
Microsoft Tech Community – Latest Blogs –Read More