Optimizing Query Performance with Work_Mem
work_mem plays a crucial role in optimizing query performance in Azure Database for PostgreSQL. By allocating sufficient memory for sorting, hashing, and other internal operations, One can improve overall database performance and responsiveness, especially under heavy load or complex query scenarios. Fine-tuning work_mem based on workload characteristics is key to achieving optimal performance in your PostgreSQL environment
Understanding work_mem
Purpose:
Memory for Operations: work_mem sets the maximum amount of memory that can be used by operations such as sorting, hashing, and joins before PostgreSQL writes data to temporary disk files. This includes operations to accomplish:
ORDER BY: Sort nodes are introduced in the plan when ordering cannot be satisfied by an index.
DISTINCT and GROUP BY: These can introduce Aggregate nodes with a hashing strategy, which require memory to build hash tables, and potentially Sort nodes when the Aggregate is parallelized.
Merge Joins: When sorting of some or both of the relations being joined is not satisfied via indexes.
Hash Joins: To build hash tables.
Nested Loop Joins: When memoize nodes are introduced in the plan because the estimated number of duplicates is high enough that caching results of lookups is estimated to be cheaper than doing the lookups again.
Default Value: The default work_mem value is 4 MB (or 4096 KB). This means that any operation can use up to 4 MB of memory. If the operation requires more memory, it will write data to temporary disk files, which can significantly slow down query performance.
Concurrent Operations:
Multiple Operations: A single complex query may involve several sorts or hash operations that run in parallel. Each operation can utilize the work_mem allocated, potentially leading to high total memory consumption if multiple operations are occurring simultaneously.
Multiple Sessions: If there are several active sessions, each can also use up to the work_mem value for their operations, which further increases memory usage. For example, if you set work_mem to 10 MB and have 100 concurrent connections, the total potential memory usage for sorting and hashing operations could reach 1,000 MB (or 1 GB).
Impact of Disk Usage:
Spilling to Disk: When the memory allocated for an operation exceeds work_mem, PostgreSQL writes data to temporary files on disk. Disk I/O is significantly slower than memory access, which can lead to degraded performance. Therefore, optimizing work_mem is crucial to minimize disk spills.
Disk Space Considerations: Excessive disk spills can also lead to increased disk space usage, particularly for large queries, which may affect overall database performance and health.
Hash Operations:
Sensitivity to Memory: Hash-based operations (e.g., hash joins, hash aggregates) are particularly sensitive to memory availability. PostgreSQL can use a hash_mem_multiplier to allow these operations to use more memory than specified by work_mem. This multiplier can be adjusted to allocate a higher memory limit for hash operations when needed.
Adjusting work_mem at Different Levels
Server Parameter:
Affects all connections unless overridden.
Configured globally, via REST APIs, Azure CLI or the Azure portal. For more information, read Server parameters in Azure Database for PostgreSQL – Flexible Server
Session Level:
Adjusted using SET work_mem = ’32MB’;
Affects only the current session.
Reverts to default after the session ends.
Useful for optimizing specific queries.
Role or user level:
Set using ALTER ROLE username SET work_mem = ’16MB’;
Applied automatically upon user login.
Tailors settings to user-specific workloads.
Database Level:
Set using ALTER DATABASE dbname SET work_mem = ’20MB’;
Affects all connections to the specified database.
Function, Procedure Level:
Adjusted within a stored procedure/function using SET work_mem = ’64MB’;
Valid for the duration of the procedure/function execution.
Allows fine-tuning of memory settings based on specific operations.
Server Parameter: work_mem
The formula provided, work_mem = Total RAM / Max Connections / 16, is a guideline to ensure that the memory is distributed effectively without over-committing resources. Refer to the official Microsoft documentation on managing high memory utilization in Azure Database for PostgreSQL here.
Breaking Down the Formula
Total RAM:
This is the total physical memory available on your PostgreSQL server. It’s the starting point for calculating memory allocation for various PostgreSQL operations.
Max Connections:
This is the maximum number of concurrent database connections allowed. PostgreSQL needs to ensure that each connection can operate efficiently without causing the system to run out of memory.
Division by 16:
The factor of 16 is a conservative estimate to prevent overallocation of memory. This buffer accounts for other memory needs of PostgreSQL and the operating system.
If your server has a significant amount of RAM and you are confident that other memory requirements (e.g., operating system, cache, other processes) are sufficiently covered, you might reduce the divisor (e.g., to 8 or 4) to allocate more memory per operation.
Analytical workloads often involve complex queries with large sorts and joins. For such workloads, increasing work_mem by reducing the divisor can improve query performance significantly.
Step-by-Step Calculation of work_mem
Total RAM:
The server has 512 GB of RAM.
Convert 512 GB to MB: 512 * 1024 = 524,288 MB
Max Connections:
The server allows up to 2000 maximum connections.
Base Memory Per Connection:
Divide the total RAM by the number of connections: 524,288 / 2000 = 262.144 MB
Apply the Conservative Factor (Divide by 16):
Apply the Conservative Factor (Divide by 16): 262.144 / 16 = 16.384 MB
One should set work_memto approximately 16 MB (rounded from 16.384 MB).
In case one need help with how to set up server parameters or require more information, please refer to the official documentation at Azure PostgreSQL Flexible Server Server Parameters. This resource provides comprehensive insights into the server parameters and their configurations.
Query Execution with EXPLAIN ANALYZE
Fine-Tune work_mem with EXPLAIN ANALYZE
To determine the optimal work_mem value for your query, you’ll need to analyze the EXPLAIN ANALYZE output to understand how much memory the query is using and where it is spilling to disk. Here’s a step-by-step guide to help you:
Execute the query with EXPLAIN ANALYZE to get detailed execution statistics:
EXPLAIN (ANALYZE, BUFFERS)
SELECT
*
FROM DataForWorkMem
WHERE time BETWEEN ‘2006-01-01 05:00:00+00’ AND ‘2006-03-31 05:10:00+00’
ORDER BY name;
Analyze the Output
Look for the following details in the output:
Sort Operation: Check if there is a Sort operation and whether it mentions “external sort” or “external merge”, This indicates that the sort operation used more memory than allocated in work_mem and had to spill to disk.
Buffers Section: The Buffers section shows the amount of data read from and written to disk. High values here may indicate that increasing work_mem could reduce the amount of data spilled to disk.
Here is output generated by above query:
“Gather Merge (cost=8130281.85..8849949.13 rows=6168146 width=47) (actual time=2313.021..3848.958 rows=6564864 loops=1)”
” Workers Planned: 2″
” Workers Launched: 1″
” Buffers: shared hit=72278, temp read=97446 written=97605“
” -> Sort (cost=8129281.82..8136992.01 rows=3084073 width=47) (actual time=2296.884..2726.374 rows=3282432 loops=2)”
” Sort Key: name”
” Sort Method: external merge Disk: 193200kB“
” Buffers: shared hit=72278, temp read=97446 written=97605“
” Worker 0: Sort Method: external merge Disk: 196624kB“
” -> Parallel Bitmap Heap Scan on dataforworkmem (cost=88784.77..7661339.18 rows=3084073 width=47) (actual time=206.138..739.962 rows=3282432 loops=2)”
” Recheck Cond: ((“”time”” >= ‘2006-01-01 05:00:00+00’::timestamp with time zone) AND (“”time”” <= ‘2006-03-31 05:10:00+00’::timestamp with time zone))”
” Rows Removed by Index Recheck: 62934″
” Heap Blocks: exact=15199 lossy=17800″
” Buffers: shared hit=72236″
” -> Bitmap Index Scan on dataforworkmem_time_idx (cost=0.00..86934.32 rows=7401775 width=0) (actual time=203.416..203.417 rows=6564864 loops=1)”
” Index Cond: ((“”time”” >= ‘2006-01-01 05:00:00+00’::timestamp with time zone) AND (“”time”” <= ‘2006-03-31 05:10:00+00’::timestamp with time zone))”
” Buffers: shared hit=5702″
“Planning:”
” Buffers: shared hit=5″
“Planning Time: 0.129 ms”
“Execution Time: 4169.774 ms”
Let’s break down the details from the execution plan:
Gather Merge
Purpose: Gather Merge is used to combine results from parallel workers. It performs an order-preserving merge of the results produced by each of its child node instances.
Cost and Rows:
Planned Cost: 8130281.85..8849949.13
This is the estimated cost of the operation.
Planned Rows: 6168146
This is the estimated number of rows to be returned.
Actual Time: 2313.021..3848.958
The actual time taken for the Gather Merge operation.
Actual Rows: 6564864
The actual number of rows returned.
Workers:
Planned: 2
The planned number of parallel workers for this operation.
Launched: 1
The number of workers that were actually used.
Buffers
Shared Hit: 72278
This represents the number of buffer hits for shared buffers .
Temp Read: 97446
This indicates the amount of temporary disk space read.
Approximately 798.8 MB (97446 blocks * buffers of 8KB)
Temp Written: 97605
This indicates the amount of temporary disk space written.
Approximately 799.6 MB (97605 blocks * buffers of 8KB)
Sort Node
Sort:
Cost: 8129281.82..8136992.01
The estimated cost for the sorting operation includes both the startup cost and the cost of retrieving all available rows from the operator.
The startup cost represents the estimated time required to begin the output phase, such as the time needed to perform the sorting in a sort node.
Rows: 3084073
The estimated number of rows returned.
Actual Time: 2296.884..2726.374
The actual time taken for the sorting operation.
The first number represents the startup time for the operator, i.e., the time it took to begin executing this part of the plan. The second number represents the total time elapsed from the start of the execution of the plan to the completion of this operation. The difference between these two values is the actual duration that this operation took to complete.
Actual Rows: 3282432
The actual number of rows returned.
Sort Method
External Merge:
This indicates that an external merge sort was used, meaning that the sort could not be handled entirely in memory and required temporary files.
Disk:
Main Process: 193200 kB
The amount of disk space used by the main process for sorting.
Worker 0: 196624 kB
The amount of disk space used by the worker process for sorting.
To optimize PostgreSQL query performance and avoid disk spills, set the work_mem to cover the total memory usage observed during sorting:
Main Process Memory Usage: 193200 kB
Worker Memory Usage: 196624 kB
Total Memory Required: 389824 kB (approximately 380 MB)
Recommended work_mem Setting: 380 MB
This setting ensures that the sort operation can be performed entirely in memory, improving query performance and avoiding disk spills.
Increasing work_mem to 380 MB at the session level resolved the issue. The execution plan confirms that this memory allocation is now adequate for your sorting operations. The absence of temporary read/write stats in the Buffers section suggests that sorting is being managed entirely in memory, which is a favorable result.
Here’s is updated execution plan:
“Gather Merge (cost=4944657.91..5664325.19 rows=6168146 width=47) (actual time=1213.740..2170.445 rows=6564864 loops=1)”
” Workers Planned: 2″
” Workers Launched: 1″
” Buffers: shared hit=72244″
” -> Sort (cost=4943657.89..4951368.07 rows=3084073 width=47) (actual time=1207.758..1357.753 rows=3282432 loops=2)”
” Sort Key: name”
” Sort Method: quicksort Memory: 345741kB”
” Buffers: shared hit=72244″
” Worker 0: Sort Method: quicksort Memory: 327233kB”
” -> Parallel Bitmap Heap Scan on dataforworkmem (cost=88784.77..4611250.25 rows=3084073 width=47) (actual time=238.881..661.863 rows=3282432 loops=2)”
” Recheck Cond: ((“”time”” >= ‘2006-01-01 05:00:00+00’::timestamp with time zone) AND (“”time”” <= ‘2006-03-31 05:10:00+00’::timestamp with time zone))”
” Heap Blocks: exact=34572″
” Buffers: shared hit=72236″
” -> Bitmap Index Scan on dataforworkmem_time_idx (cost=0.00..86934.32 rows=7401775 width=0) (actual time=230.774..230.775 rows=6564864 loops=1)”
” Index Cond: ((“”time”” >= ‘2006-01-01 05:00:00+00’::timestamp with time zone) AND (“”time”” <= ‘2006-03-31 05:10:00+00’::timestamp with time zone))”
” Buffers: shared hit=5702″
“Planning:”
” Buffers: shared hit=5″
“Planning Time: 0.119 ms”
“Execution Time: 2456.604 ms”
It confirms that:
Sort Method: “quicksort” or “other in-memory method” instead of “external merge.”
Memory Usage: The allocated work_mem (380 MB) is used efficiently.
Execution Time: Decreased to 2456.604 ms from 4169.774 ms.
Adjusting work_mem Using pg_stat_statements Data
To estimate the memory needed for a query based on the temp_blks_readparameters from PostgreSQL’s pg_stat_statements, you can follow these steps:
Get the Block Size:
PostgreSQL uses a default block size of 8KB. You can verify this by running:
Calculate Total Temporary Block Usage:
Sum the temp_blks_read to get the total number of temporary blocks used by the query.
Convert Blocks to Bytes:
Multiply the total temporary blocks by the block size (usually 8192 bytes) to get the total temporary data in bytes.
Convert Bytes to a Human-Readable Format:
Convert the bytes to megabytes (MB) or gigabytes (GB) as needed.
To identify queries that might benefit from an increased work_mem setting, use the following query to retrieve key performance metrics from PostgreSQL’s pg_stat_statements view:
SELECT
query,
calls,
total_exec_time AS total_time,
mean_exec_time AS mean_time,
stddev_exec_time AS stddev_time,
rows,
local_blks_written,
temp_blks_read,
temp_blks_written,
blk_read_time,
blk_write_time
FROM
pg_stat_statements
ORDER BY
total_exec_time DESC
LIMIT 10;
Example Calculation
Suppose we have the following values from the pg_stat_statements:
temp_blks_read: 5000
block_size: 8192 bytes
Calculation:
Total Temporary Data (bytes)= 5000 × 8192 = 40,960,000 bytes
Total Temporary Data (MB) = 40,960,000 / (1024 × 1024) = 39.06 MB
This estimate indicates that to keep operations in memory and avoid temporary disk storage, work_mem should ideally be set to a value higher than 39 MB.
Here is a query that provides the total amount of temporary data in megabytes for each query recorded in pg_stat_statements. This information can help identify which queries might benefit from an increase in work_mem to potentially improve performance by reducing temporary disk usage.
SELECT
query,
total_temp_data_bytes / (1024 * 1024) AS total_temp_data_mb
FROM
(
SELECT
query,
temp_blks_read * 8192 AS total_temp_data_bytes
FROM pg_stat_statements
) sub;
Using Query Store to Determine work_mem
PostgreSQL’s Query Store is a powerful feature designed to provide insights into query performance, identify bottlenecks, and monitor execution patterns.
Here is how to use Query Store to analyze query performance and estimate the disk storage space required for temporary blocks read (temp_blks_read).
Analyzing Query Performance with Query Store
To analyze query performance, Query Store offers execution statistics, including temp_blks_read, which indicates the number of temporary disk blocks read by a query. Temporary blocks are used when query results or intermediate results exceed available memory.
Retrieving Average Temporary Blocks Read
Use the following SQL query to get the average temp_blks_read for individual queries:
SELECT
query_id,
AVG(temp_blks_read) AS avg_temp_blks_read
FROM query_store.qs_view
GROUP BY query_id;
This query calculates the average temp_blks_read for each query. For example, if query_id 378722 shows an average temp_blks_read of 87,348, this figure helps understand temporary storage usage.
Estimating Disk Storage Space Required
Estimate disk storage based on temp_blks_read to gauge temporary storage impact:
Know the Block Size: PostgreSQL’s default block size is 8 KB.
Calculate Disk Space in Bytes: Multiply the average temp_blks_read by the block size:
Space (bytes) = avg_temp_blks_read × Block Size (bytes)
Space (bytes) = 87,348 × 8192 = 715,048,896 bytes
Convert Bytes to Megabytes (MB):
Space (MB) = 715,048,896 / (1024 * 1024) = 682 MB
Consider adjusting work_mem at the session level or within stored procedures/functions to optimize performance.
Query Store is an invaluable tool for analyzing and optimizing query performance in PostgreSQL. By examining metrics like temp_blks_read, you can gain insights into query behavior and estimate the disk storage required. This knowledge enables better resource management, performance tuning, and cost control, ultimately leading to a more efficient and reliable database environment
Best Practices for Setting work_mem
Monitor and Adjust: Regularly monitor the database’s performance and memory usage. Tools like pg_stat_statements and pg_stat_activity can provide insights into how queries are using memory.
Incremental Changes: Adjust work_mem incrementally and observe the impact on performance and resource usage. Make small adjustments and evaluate their effects before making further changes.
Set Appropriately for Workloads: Tailor work_mem settings based on the types of queries and workloads running on your database. For example, batch operations or large sorts might need higher settings compared to simple, small queries.
Consider Total Memory: Calculate the total memory usage, considering the number of concurrent connections and operations, to ensure it does not exceed available physical RAM.
Balancing work_mem involves understanding your workload, monitoring performance, and adjusting settings to optimize both memory usage and query performance.
Microsoft Tech Community – Latest Blogs –Read More