Chapter 18: The PostgreSQL Performance Decision Framework

The Waiter of Gold Lapel · Updated Mar 30, 2026 Published Mar 11, 2026 · 10 min

A waiter who has attended to seventeen chapters of PostgreSQL optimization owes you a summary. Here it is: the order matters.

Connection pooling before query optimization. Query optimization before indexes. Indexes before materialized views. Materialized views before read replicas. Read replicas before sharding. Each step in this sequence is more expensive than the one before it — in engineering time, in operational complexity, and in the ongoing maintenance burden it imposes on your team for the lifetime of the system. Skip a step, and you will optimize the wrong layer. Add read replicas before fixing N+1 queries, and you are scaling the problem rather than solving it. Shard before adding indexes, and you have distributed your inefficiency across multiple servers at considerable expense.

This chapter organizes everything from the preceding seventeen chapters into a single diagnostic framework. It is structured by what you observe — the symptom — rather than by what you should do. Start with the symptom. Follow it to the diagnosis. The solution, with its chapter reference, will be waiting.

I intend this to be the page you bookmark and return to.

The Diagnostic Tools

Before prescribing solutions, gather evidence. PostgreSQL provides three built-in diagnostic tools that, together, reveal the nature of nearly any performance problem. Run these queries before changing anything.

pg_stat_statements: where the time goes

This extension tracks execution statistics for every query that passes through the server. The pg_stat_statements documentation covers its full capabilities. Enable it if it is not already enabled — it should be enabled on every PostgreSQL instance in existence, and the fact that it is not enabled by default remains, in my view, an oversight.

pg_stat_statements.sql

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

SELECT
    query,
    calls,
    round(total_exec_time::numeric, 1) AS total_ms,
    round(mean_exec_time::numeric, 1) AS mean_ms,
    rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

Sort by total_exec_time, not mean_exec_time. A query called 10,000 times at 50 milliseconds each consumes 500 seconds of database time. A query called once at 5 seconds consumes 5 seconds. The first query is the priority, despite having a lower average execution time. Total time reveals which queries consume the most cumulative resource — and cumulative resource is what determines whether your database is healthy or struggling.

The calls column reveals frequency. The rows column reveals data volume. Together with mean_exec_time, they tell you whether the problem is a few expensive queries or many moderately expensive ones repeated thousands of times.

pg_stat_activity: what the connections are doing

pg_stat_activity.sql

SELECT state, count(*)
FROM pg_stat_activity
GROUP BY state
ORDER BY count DESC;

The results tell you which category of problem to investigate:

If idle connections dominate — dozens or hundreds of connections sitting open with no work — you have a pooling problem. Connections are being held by application instances or frozen serverless containers that are not using them. Chapter 15's connection pooling is the solution.

If active connections dominate — most connections are executing queries — you have slow queries holding connections for too long. Move to the query optimization category below.

If idle in transaction connections appear in significant numbers — connections that started a transaction and have not committed or rolled back — you have an application code bug. Something is opening a transaction, doing work outside the database (an API call, a file operation), and leaving the transaction open. This blocks autovacuum, holds locks, and wastes connection slots. The fix is in the application code, not in PostgreSQL configuration.

pg_stat_user_tables: table health

pg_stat_user_tables.sql

SELECT
    relname,
    n_dead_tup,
    n_live_tup,
    round(100.0 * n_dead_tup / NULLIF(n_live_tup, 0), 1) AS dead_pct,
    last_autovacuum,
    last_autoanalyze
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;

High dead tuple percentages indicate that autovacuum is not keeping pace with write activity. The table is bloating — indexes are growing, sequential scans are touching dead rows, and the planner may be working with stale statistics. If last_autovacuum is NULL or days old on a heavily written table, autovacuum needs tuning. Chapter 16 covers the specific parameters.

EXPLAIN ANALYZE: the individual query diagnostic

For any specific slow query identified by pg_stat_statements, use EXPLAIN ANALYZE:

explain_analyze.sql

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT ...;

Read the output from bottom to top. Look for:

Sequential scans on large tables — a missing index, or a query that cannot use an existing index because the WHERE clause doesn't match. The fix is an index.

Nested loops with high row counts — a join that is executing row-by-row because the join column is not indexed. The fix is an index on the join column.

Sort nodes that show "Sort Method: external merge" — the sort exceeded work_mem and spilled to disk. The fix is either increasing work_mem for this session or adding an index that provides the data pre-sorted.

Shared buffers: read vs hit — a high read count relative to hits means the data is not cached in memory. This may indicate that the working set exceeds available memory, or that the query touches more data than necessary.

These four diagnostic tools — pg_stat_statements, pg_stat_activity, pg_stat_user_tables, and EXPLAIN ANALYZE — provide the evidence. The framework below provides the prescription.

The Five Categories

Every PostgreSQL performance problem falls into one of five categories. Each category has a distinct set of symptoms, a specific diagnosis, and an ordered list of solutions. The solutions are drawn from the preceding chapters and are listed in order of priority — cheapest and most effective first, most expensive and complex last.

Category 1: Connection Exhaustion

Symptoms: FATAL: too many connections for role. Connection timeouts during traffic spikes. Error rates that correlate with request volume rather than request complexity. pg_stat_activity showing hundreds of connections, most idle or idle in transaction.

Diagnosis: Application connections are exceeding PostgreSQL's max_connections limit. This is the most common failure mode in serverless deployments, microservice architectures, and applications that create connections without pooling.

Solutions, in order:

First, configure your ORM or driver's pool size. In serverless functions, set the pool size to 1 — each function needs one connection, not the driver's default of 3 or 5. This single configuration change can reduce connection consumption by 60–80%. Chapter 15 covers the details for every major framework.

Second, add a connection pooler — PgBouncer or PgCat in transaction mode. The pooler multiplexes hundreds of client connections onto a small pool of real database connections. Chapter 17 compares the options.

Third, review idle_in_transaction_session_timeout and idle_session_timeout in your PostgreSQL configuration. These server-side settings automatically terminate connections that have been idle or stuck in an open transaction beyond a threshold — a useful safety net against zombie connections and application code that forgets to close transactions.

Fourth — and only if pooling is already in place and correctly configured — increase max_connections. Each additional connection consumes approximately 5–10MB of memory. Plan accordingly.

If connections are still exhausted after proper pooling, the problem is not connection count — it is connection hold time. Queries are holding pooled connections for too long, preventing turnover. Move to Category 2.

Category 2: Slow Individual Queries

Symptoms: Specific API endpoints are slow while the rest of the application performs normally. Individual page loads take seconds. pg_stat_statements shows high mean_exec_time for specific queries. EXPLAIN ANALYZE reveals sequential scans or nested loops.

Diagnosis: The query itself is inefficient. Missing indexes, poor join strategies, N+1 patterns, or unnecessary data retrieval are causing individual queries to consume excessive time and resources.

Solutions, in order:

First, run EXPLAIN ANALYZE on the slow query. Identify sequential scans on large tables, nested loop joins on unindexed columns, and sorts spilling to disk. The diagnostic tells you exactly where the planner made a suboptimal choice and why.

Second, add the missing index. B-tree for equality and range conditions. GIN for full-text search, JSONB, and array operations. Partial indexes when only a subset of rows is relevant — WHERE status = 'active' indexes only the rows your queries actually touch, not the 90% that are archived. Covering indexes with INCLUDE to enable index-only scans when the query needs columns beyond the indexed ones.

Third, fix N+1 patterns. The ORM is generating 100 queries where 1 would suffice. Chapter 3 covered this for every major framework — eager loading, SELECT ... JOIN, select_related, includes, @EntityGraph. A single endpoint generating 100 queries is a scaling problem masquerading as a traffic problem.

Fourth, rewrite the query itself. The Datadog benchmark from Chapter 1 demonstrated a 100x improvement from changing ANY(ARRAY[...]) to ANY(VALUES ...). Mattermost achieved 1,000x by replacing OR conditions with row constructor comparisons. EXISTS instead of IN (subquery). Explicit CTEs instead of deeply nested subqueries. The SQL itself may be the bottleneck.

If the query is inherently expensive — a multi-table join aggregating millions of rows — no index will make it fast because the work itself is expensive. Move to Category 3.

Category 3: Slow Aggregation and Dashboard Queries

Symptoms: Dashboard pages take 3 to 10 seconds or more. Reports time out. Analytics queries block the database. EXPLAIN ANALYZE shows large sequential scans feeding into aggregation nodes — GroupAggregate, HashAggregate, Sort — processing millions of rows.

Diagnosis: The query is correct but expensive. It aggregates millions of rows on every request because the aggregation is computed live. No index can make a full-table aggregation fast — the work is inherently expensive. The query is doing exactly what it was told to do; it was simply told to do too much.

This is where materialized views solve the problem. This is, in fact, the central thesis of this book placed precisely where it belongs in the diagnostic framework.

Solutions, in order:

First, create a materialized view that pre-computes the aggregation. Index it with a UNIQUE index to enable REFRESH CONCURRENTLY. The dashboard query becomes an index scan returning pre-computed results in milliseconds rather than a full aggregation computed in seconds. Chapters 4 through 6 provide the complete implementation. Chapters 7 through 13 provide the framework-specific integration.

Second, schedule the refresh. pg_cron for database-level scheduling with no application code. Your framework's task scheduler — SolidQueue in Rails, Celery in Django, @Scheduled in Spring Boot, Laravel's scheduler — for application-level scheduling with error handling and monitoring. Chapter 5 covers the strategies; the framework chapters cover the implementations.

Third, add a caching layer for the hottest results. Materialized views reduce query time from seconds to milliseconds. A Redis or Memcached cache in front of the materialized view reduces it from milliseconds to microseconds for data that is read far more frequently than it changes. The combination of materialized views and an application cache is the most cost-effective performance architecture available for read-heavy dashboards.

Fourth, if the source data is very large and the materialized view's defining query is itself becoming slow to refresh, add table partitioning on the source tables. The MV's defining query will benefit from partition pruning, scanning only the relevant time periods rather than the entire history. Chapter 16 covers the implementation.

Fifth, if freshness requirements are sub-second and periodic refresh is too stale, consider pg_ivm (GitHub) for incremental materialized view maintenance or LISTEN/NOTIFY triggers to refresh on specific write events. Chapter 5 covers both approaches.

Category 4: Read Scaling Limits

Symptoms: Database CPU is consistently above 80%. All queries are moderately slow, not just specific ones. Adding more application servers makes database performance worse rather than better. Replication lag on existing replicas is increasing.

Diagnosis: The single server is handling more read traffic than its resources can serve. Individual queries may already be optimized — the problem is not any one query but the total volume of queries exceeding the server's capacity.

Solutions, in order:

First, verify that materialized views are in use for expensive aggregations. Each materialized view that replaces a live aggregation frees CPU cycles for other queries. The compound effect from Chapter 15 applies: a 7ms materialized view query holds a pooled connection for 1/1000th the time of a 7-second live aggregation, which means the same pool of connections can serve a thousand times more queries.

Second, add read replicas. Route reads to replicas, writes to the primary. PostgreSQL's streaming replication provides near-zero-lag copies of your data. PgCat provides built-in read/write splitting with replica failover; HAProxy or Patroni can provide similar routing for PgBouncer-based deployments. For read-after-write consistency, route those specific reads to the primary within a brief safety window.

Third, verify that connection pooling is correctly configured and that pool turnover is healthy. A well-tuned pool with materialized views and replicas can handle extraordinary throughput — OpenAI's architecture demonstrates this at 800 million users.

Fourth, consider vertical scaling. A larger instance with more CPU cores and more memory may be simpler and cheaper than architectural changes. Chapter 16 noted that modern hardware is extraordinary, and vertical scaling is unfashionable but effective.

Fifth, if replicas and materialized views are exhausted, evaluate a dedicated caching layer — Redis or Memcached — to serve the hottest queries without touching PostgreSQL at all.

Category 5: Write Scaling Limits

Symptoms: INSERT and UPDATE latency is increasing. WAL generation rate is high and growing. Replication lag is growing despite replicas not being overloaded. Autovacuum cannot keep up. pg_stat_user_tables shows increasing dead tuple counts on heavily written tables.

Diagnosis: Write volume is approaching the single-server ceiling. This is the rarest of the five categories — most web applications are overwhelmingly read-heavy. But when it occurs, it is the most challenging to address because PostgreSQL's single-primary architecture means all writes funnel through one server.

Solutions, in order:

First, audit application writes. Eliminate redundant writes — OpenAI discovered application bugs that caused unnecessary database writes and fixed them as a scaling strategy. Use lazy writes to smooth traffic spikes rather than hitting the database in bursts. Batch inserts where possible.

Second, use UNLOGGED tables for ephemeral data. Sessions, temporary caches, job queues — anything that is regenerable or disposable. UNLOGGED tables provide approximately 2.9x faster writes by skipping write-ahead log entries. The data survives clean shutdowns but is lost on crash. Chapter 6 covers the implementation.

Third, tune autovacuum aggressively for heavily written tables. autovacuum_vacuum_scale_factor = 0.01 triggers vacuum when 1% of rows are dead, rather than the default 20%. This prevents bloat from accumulating and degrading both read and write performance.

Fourth, offload shardable writes to specialized systems. This is the OpenAI strategy: move write-heavy workloads that can be horizontally partitioned to dedicated systems (a separate PostgreSQL instance, Redis for counters, Kafka for event streams), keeping the primary lean for the relational workloads that require strong consistency.

Fifth, add table partitioning to reduce the index maintenance overhead per write. Each INSERT into a partitioned table touches only the indexes on the target partition, not indexes spanning the entire table.

Sixth — and only after all of the above has been exhausted — shard the writes. Citus for transparent distributed PostgreSQL, or application-level routing by tenant_id or user_id. This is the only diagnostic category where sharding is the answer, and it is the last step in the most expensive sequence. Chapter 16 covers the tools and the costs.

The Optimization Order

The five categories above share a common priority sequence. Restated as a single ordered list, from least expensive to most expensive:

1. Connection pooling. A configuration change. Minutes to implement. Eliminates the single most common failure mode — connection exhaustion — and improves throughput for every query that follows.

2. Vertical scaling. An operational change. Minutes, zero code changes. Buys immediate headroom while you implement the structural optimizations that follow.

3. Query optimization. A code change. Hours to days. Fixes the specific bottleneck identified by EXPLAIN ANALYZE and pg_stat_statements.

4. Indexes. A schema change. Minutes per index. Accelerates specific access patterns by orders of magnitude. Partial and covering indexes provide surgical precision.

5. Materialized views. A new database object. Hours to implement, including refresh scheduling. Eliminates expensive aggregations entirely by pre-computing the results.

6. Read replicas. New infrastructure. Hours to provision. Scales read capacity linearly by distributing reads across multiple copies of the data.

7. Table partitioning. A table restructuring. Hours to days. Organizes large tables so that queries and maintenance operations touch only the relevant subset.

8. Caching (Redis/Memcached). New infrastructure. Hours to implement. Serves the hottest data without touching PostgreSQL at all.

9. Sharding. A system restructuring. Weeks to months. Permanent complexity for the lifetime of the system. The last resort, reserved for write scaling limits that have survived every preceding optimization.

Each step is more expensive than the one before it — in engineering time, in operational complexity, and in ongoing maintenance. The framework exists to ensure you do not reach for step 9 when step 2 would have sufficed.

One Database, Properly Attended

This framework is not theoretical. Every technique in it has been demonstrated in the preceding chapters, with code examples in seven languages and four frameworks. Every benchmark cited is from a named source. The materialized view that converts a 7-second dashboard query to a 7-millisecond index scan is not hypothetical — it is waiting for a CREATE MATERIALIZED VIEW statement and a UNIQUE index.

The diagnostic queries in this chapter can be run in five minutes. The first optimization — configuring your ORM's connection pool size — can be deployed in five more. The materialized view that eliminates your most expensive dashboard query can be created, indexed, and refreshed before lunch. None of this requires new infrastructure, new services, or new dependencies. It requires PostgreSQL, which you already have, and the willingness to use what it offers.

One chapter remains. It is not technical. It is a case — the case — for simplicity.