Chapter 1: Good Evening. We Have a Problem.

The Waiter of Gold Lapel · Updated Mar 30, 2026 Published Mar 11, 2026 · 11 min

Good evening.

I see you've arrived with a PostgreSQL database that takes rather longer than it should. Seven seconds for a dashboard query, if I'm not mistaken. A Redis instance that occasionally serves yesterday's numbers to this morning's executives. And — forgive me, I couldn't help but notice — a PagerDuty alert history suggesting that someone on your team was awake at 3 AM last Thursday because Redis exceeded its memory limit.

Please, do come in.

I should tell you at the outset what this book is about, because the title, while accurate, is only part of the story. This is not a book about Redis. It is a book about PostgreSQL — specifically, about the PostgreSQL you haven't met yet.

Most application developers interact with PostgreSQL through an ORM that exposes SELECT, INSERT, UPDATE, DELETE, basic JOINs, and GROUP BY. This represents perhaps 10–15% of what PostgreSQL actually offers. The remaining 85% includes features that eliminate the need for Redis, Elasticsearch, dedicated caching layers, and much of the infrastructure complexity that modern applications carry like old furniture nobody remembers buying. Materialized views that transform seven-second queries into seven-millisecond reads. Built-in pub/sub that replaces Redis for event-driven architectures. UNLOGGED tables purpose-built for cache-like workloads. Connection pooling that makes serverless PostgreSQL viable at scale.

This is not a hardware problem. It is not a budget problem. It is a knowledge problem. And knowledge problems, if you'll permit me, are the very best kind — because they are solved by learning, not by spending.

In this chapter, I shall demonstrate that the performance problems you've accepted as normal are, in fact, optional. We will examine five cases where engineers achieved improvements of 55x to 1,000x without adding a single new service to their infrastructure. Then I shall make the case — gently, but firmly — that much of the infrastructure surrounding your database need not exist at all.

Shall we begin?

Permit Me to Show You Something

I have, over a long career of attending to databases of considerable size and temperament, observed a pattern. A developer encounters a slow query. They add an index. The query is still slow. They add Redis. The query is fast, but now they have two systems to maintain, a cache invalidation strategy to debug, and a new category of 3 AM alerts. Six months later, someone asks why the infrastructure budget doubled, and nobody can quite explain it.

The pattern is so common it has become invisible. Allow me to make it visible again, through five cases that I find particularly instructive.

The unlocked front door

The hosting platform Render analyzed millions of developer deployments in December 2024 and identified the single most common fixable performance problem across their entire fleet: missing indexes on foreign key columns.

This bears repeating, because it is both devastating and embarrassing. PostgreSQL does not automatically create indexes on the referencing side of foreign key constraints. Every developer assumes it does. It does not.

In a simulated application — 1,000 users, 25,000 articles, 13 million comments — deleting a single user triggered cascade operations that required full table scans across every referencing table. The operation took 19 seconds. Nineteen seconds to delete one user.

Adding three indexes on the foreign key columns reduced this to 342 milliseconds. A 55x improvement. Three lines of SQL.

I mention this case first not because it is the most dramatic, but because it is the most common. It is the database equivalent of leaving the front door unlocked — not malicious, merely inattentive. And if you have not checked your foreign key columns for missing indexes, I would respectfully suggest doing so before continuing to the next page. I shall wait.

The one-line correction

Datadog, a company whose entire business is monitoring the performance of other people's software, discovered that one of their own PostgreSQL queries was performing rather... industrially.

A primary-key lookup on a 10-gigabyte table containing roughly 15 million rows was feeding their graph editor. For accounts with approximately 11,000 tag keys, this lookup took 22 seconds. Twenty-two seconds for what should have been an indexed read.

The query was using ANY(ARRAY[...]) to match against a set of values. PostgreSQL's query planner, faced with this syntax, chose a Bitmap Heap Scan — a reasonable strategy in general, but catastrophic for this particular data distribution. The fix was changing ANY(ARRAY[...]) to ANY(VALUES(...)). A single syntactic alteration that changed the query plan entirely.

Before: 22,000 milliseconds. After: 200 milliseconds.

The database was not slow. It was being asked poorly. And I should note that Datadog's engineers are not amateurs — they build performance monitoring tools for a living. If their queries can hide a 100x improvement behind a syntax choice, so can yours.

The misinformed planner

Spacelift, a infrastructure-as-code platform, ran into a problem familiar to anyone who has worked with Aurora Serverless PostgreSQL. A monitoring query was choosing what I can only describe as a catastrophic execution plan — scanning an enormous table first, then filtering the results, when the reverse order would have been dramatically more efficient.

The root cause: PostgreSQL's row count estimation was wildly inaccurate. The planner believed the large table scan would return a small number of rows. It did not.

The fix required no new indexes, no schema changes, and no additional infrastructure. The engineers rewrote the query to scan the small active set first, then perform primary key lookups against the larger table. The planner, given honest information about the work involved, made a sensible plan.

Before: approximately 20 seconds. After: approximately 100 milliseconds. A 200x improvement.

The planner is a remarkably sophisticated instrument. It does, however, require honest information.

The unclear instructions

The collaboration platform Mattermost needed to batch-index approximately 100 million posts for Elasticsearch. The indexing query was designed to walk through the dataset in chunks, advancing a cursor on each batch. It worked beautifully for the first few million rows, then slowed progressively until it hit the 30-second timeout and failed entirely.

The culprit was an OR condition in the WHERE clause: CreateAt > ?1 OR (CreateAt = ?1 AND Id > ?2). This is a common pattern for cursor-based pagination. It is also a pattern that prevents PostgreSQL from using a composite index effectively, because OR conditions force the planner to evaluate each branch independently.

The fix was replacing the OR with PostgreSQL's row constructor comparison: (CreateAt, Id) > (?1, ?2). Semantically identical. Mechanically transformative. The database went from processing 40 million shared hit buffers — roughly 312 megabytes of data — per execution, to processing 9,955 buffers, approximately 78 megabytes.

Before: 30-second timeout, unable to complete. After: 34 milliseconds. Full indexing that couldn't finish in 18 hours completed in minutes.

One does not blame the staff when the instructions were unclear.

The pattern

I could continue. A Rails developer who discovered that materialized views made their dashboard 9,000 times faster — from 3.4 seconds to 368 microseconds for a category revenue query. An enterprise team that reduced a 28-second reporting query to 180 milliseconds. A Spring Boot application serving 500 million orders that went from 7.2-second response times to 6 milliseconds.

But the individual cases matter less than what they share. Every one of these improvements was achieved without new hardware, without new services, and without architectural rewrites. They were achieved by understanding how PostgreSQL works — how the query planner makes decisions, where indexes are needed, how to write queries the planner can optimize, and how to use features like materialized views that most developers have never been introduced to.

The PostgreSQL's own wiki observes, with characteristic understatement, that in any given week, half of all support questions are requests for help with a slow query.

Half. Every week.

The problems are not rare. The solutions are not exotic. The gap between the two is, if you'll forgive the observation, simply a matter of introduction.

The Matter of Your Infrastructure

Having established that individual queries can be improved by orders of magnitude, allow me to address a broader concern — one that I find rather more troubling than any single slow query.

I should like you to consider, for a moment, not just your database, but everything you have assembled around it. The services, the dependencies, the constellation of technologies that your team deploys, monitors, and maintains. I suspect, if you were to take a careful inventory, you would find that several of those services are solving problems your database already handles — if you know where to look.

That inventory, and its implications, is the subject of the next chapter. I intend to be thorough in my accounting — the dollar costs, the engineering hours, the operational surprises — and I intend to be fair. But I will not pretend the conclusion is uncertain.

The PostgreSQL You Haven't Met

If you'll permit me a brief tour, I should like to introduce you to several features of your database that I suspect have not been part of your acquaintance. This is not an exhaustive catalogue — merely a selection of capabilities that, in my experience, most application developers discover with some surprise and occasionally mild indignation that nobody mentioned them sooner.

Materialized views store the results of a query as a physical table, served at index-scan speed until explicitly refreshed. They transform multi-second aggregation queries — the kind that power dashboards, reports, and analytics endpoints — into sub-millisecond reads. One developer, after applying them to a Rails application analyzing millions of records, reported daily sales queries dropping from 7.1 seconds to 7 milliseconds. Category revenue queries: 3.4 seconds to 368 microseconds. Chapters 4 and 5 are devoted entirely to this feature, for reasons that those numbers should make self-evident.

LISTEN/NOTIFY provides built-in publish/subscribe messaging with payloads up to 8,000 bytes. It is not as fast as Redis pub/sub for high-throughput scenarios, but it eliminates the need for a separate message broker in most application architectures. Rails ActionCable supports it natively as a Redis alternative.

UNLOGGED tables bypass the Write-Ahead Log, achieving 2 to 3 times faster write performance — and up to 6.5 times faster for bulk inserts. Data persists across clean restarts but is truncated after crashes. This is exactly the durability profile you want for cache data and session storage: fast writes, acceptable loss on failure.

pg_cron provides standard cron scheduling that runs SQL directly inside the database. Supported on AWS RDS, Google Cloud SQL, and Azure Database for PostgreSQL. It pairs naturally with materialized views — a single line schedules automatic refresh at whatever interval your data requires.

Advisory locks replace Redis distributed locks, ZooKeeper, and Consul for application-level coordination. Identified by 64-bit integers, faster than row-level locks, with no table bloat. They are automatically released when sessions end — no orphaned locks, no manual cleanup.

Full-text search has been built into PostgreSQL core since 2008. With GIN indexes, it handles 5 to 10 milliseconds per query on 100,000 documents. The pg_search extension, built on the Tantivy engine, delivers performance competitive with Elasticsearch on datasets of 10 million rows. For most applications, a dedicated search service is unnecessary overhead.

JSONB provides document storage with GIN indexes, containment queries, path operations, and full CRUD — available since 2014. It has, as one engineer observed, quietly ended a thousand "should we add MongoDB?" conversations.

pg_stat_statements tracks planning and execution statistics for every SQL statement your application runs. It is the single most important diagnostic tool for PostgreSQL performance, and it is an extension you can enable with one line of configuration.

Table partitioning, declarative since PostgreSQL 10 and significantly improved through PostgreSQL 18, replaces manual sharding and archiving for large datasets. Partition pruning automatically eliminates irrelevant data from queries.

Generated columns provide stored computed fields that can be indexed — replacing application-layer computed fields and many uses of triggers.

That is ten features. Ten features that most developers working with PostgreSQL today have either never used or never heard of. Each one replaces or reduces the need for external infrastructure. Several of them, used together, replace entire categories of services.

Why don't developers know about these? Five reinforcing factors: ORM abstractions that teach developers to treat PostgreSQL as dumb storage. Marketing from specialized database vendors promoting the "right tool for the right job" philosophy. A tutorial ecosystem that follows the PostgreSQL-plus-Redis-plus-Elasticsearch pattern without questioning its necessity. An education system where bootcamps and introductory courses teach basic CRUD and nothing else. And a perception that features like advisory locks and UNLOGGED tables are "DBA-level" concerns rather than application developer tools.

It is not your fault. But it is, if you'll permit me, now your responsibility.

By the Numbers: What PostgreSQL Can Actually Do

32,000 TPS per core — Read-only throughput on tuned PostgreSQL (CYBERTEC benchmarks)

598,000 TPS — Maximum read-only throughput on a 16-core AMD 5950X workstation

800 million users — Served by a single PostgreSQL primary at OpenAI, with ~50 read replicas

0.016 ms — Actual execution time for an indexed read (97% of response time is network)

1,000x faster — Rails dashboard query with materialized views (7,100ms → 7ms)

9,252x faster — Category revenue query with materialized views (3,400ms → 0.368ms)

155x faster — Enterprise reporting dashboard with materialized views (28,000ms → 180ms)

20 million jobs/day — Processed by 37signals without Redis, using PostgreSQL-backed SolidQueue

Questions You Ought to Be Asking

Before we proceed further into the manor, allow me to suggest several questions that the preceding material should have raised. If you find yourself able to answer them all with confidence, you are better acquainted with your database than most, and the subsequent chapters will serve as refinement rather than revelation. If not — and I say this with nothing but warmth — the investment of your time in this book will repay itself rather quickly.

How do I know if my PostgreSQL queries are underperforming?

Enable pg_stat_statements if you have not already. It requires adding one line to your PostgreSQL configuration and restarting the server. Once enabled, query the extension sorted by total_exec_time descending. The top five queries by cumulative execution time are where your attention belongs. If any single query execution exceeds 100 milliseconds in an OLTP workload, there is very likely room for improvement — and the improvement is very likely significant.

Should I be using materialized views?

If you have any query that runs repeatedly, takes more than 50 milliseconds to execute, and returns data that need not be fresh to the second — yes. This describes every dashboard, every reporting endpoint, every analytics summary, and most API responses that aggregate data from multiple tables. Chapter 4 will teach you everything you need to know. Chapter 5 will teach you how to keep them fresh without thinking about it.

Do I actually need Redis?

If you are using Redis primarily for caching database query results, session storage, or background job queuing, PostgreSQL can very likely handle all three — and do so without the operational overhead, the additional failure modes, or the cache invalidation bugs that come with maintaining a separate service. If you are using Redis for sub-millisecond lookups on a hot path serving tens of thousands of concurrent requests, keep it. Chapter 2 covers this decision framework in detail, with specific numbers to inform your judgment.

Where do I start if I cannot rewrite my entire application?

You need not rewrite anything. Materialized views, connection pooling adjustments, and index optimization work alongside your existing code with zero application changes. The improvements are additive, not architectural. Your ORM continues to work. Your application logic remains untouched. The database simply performs better underneath it.

Is this book going to try to sell me something?

Every technique in this book works with raw PostgreSQL and your framework of choice. We do build a tool that automates several of these patterns — it would be disingenuous to pretend otherwise, and disingenuousness is not a quality I cultivate. But every chapter stands entirely on its own without it. The Waiter's commitment, if I may be so direct: useful first. Always.

Now. I trust we've established that your database is rather more capable than you've been led to believe, and that the infrastructure assembled around it may not be earning its keep. In the next chapter, I should like to examine that infrastructure more closely — the Redis instance, the caching layer, the additional services — and ask, with all due courtesy, whether each one truly belongs in your household.

If you'll follow me, I believe you'll find the conversation illuminating.

If occasionally uncomfortable.