Join and Aggregation Query Benchmarks

The numbers, if you please. Multi-table joins with GROUP BY on a 10-million-row dataset, with and without materialized view optimization.

The Waiter of Gold Lapel · Updated Mar 20, 2026 Published Mar 5, 2026 · 5 min read

Four tables joined, two columns grouped, and one materialized view to serve them promptly.

Methodology

All benchmarks run on identical infrastructure with a fixed dataset. The "before" measurements use direct PostgreSQL connections. The "after" measurements use Gold Lapel as a transparent proxy, with pattern detection and materialized view creation already complete. Each test runs for 5 minutes at steady state with 16 concurrent clients.

Parameter	Value
PostgreSQL	16.2
Gold Lapel	0.1.0
CPU	4-core AMD EPYC (c5.xlarge)
RAM	8 GB
Storage	gp3 SSD, 3000 IOPS
Dataset	10M orders, 500K customers, 50K products
Concurrent clients	16
Duration	5 minutes per test

Test queries

The benchmark suite exercises five query patterns commonly found in analytics dashboards, reporting modules, and admin panels. Each involves at least two table joins and an aggregation function.

-- The most expensive pattern: revenue by region and product category
-- 3-table join, GROUP BY two columns, SUM aggregation
SELECT r.region_name, c.category_name, SUM(o.total) AS revenue
FROM orders o
  JOIN customers cu ON cu.id = o.customer_id
  JOIN regions r ON r.id = cu.region_id
  JOIN products p ON p.id = o.product_id
  JOIN categories c ON c.id = p.category_id
WHERE o.created_at >= NOW() - INTERVAL '90 days'
GROUP BY r.region_name, c.category_name
ORDER BY revenue DESC;

-- Without GL: Seq Scan on orders (10M rows), Hash Joins, Sort
-- Execution: 487ms

-- With GL: Index Scan on gl_matview_revenue_region_category
-- Execution: 1.2ms

Results

Query pattern	Without GL (P50)	With GL (P50)	Improvement
3-table join + GROUP BY (revenue by region)	487ms	1.2ms	406x
2-table join + COUNT DISTINCT (active users by plan)	234ms	0.8ms	293x
Subquery + SUM (monthly cohort retention)	612ms	1.5ms	408x
Self-join + window function (running totals)	891ms	2.3ms	387x
4-table join + GROUP BY + HAVING (top customers by category)	1,240ms	2.8ms	443x

All times are P50 (median) latency. P99 improvements are more pronounced due to elimination of sequential scan variance. The 4-table join pattern shows the highest improvement factor because the original query is the most expensive.

Analysis

The improvement factor is consistent (293x–443x) because the underlying optimization is identical across all patterns: replacing a multi-table join with aggregation with a direct read from a precomputed materialized view.

The absolute "after" times (0.8–2.8ms) are dominated by three fixed costs: proxy wire protocol overhead (~0.3ms), materialized view index lookup (~0.2–1.0ms), and result serialization (~0.3–1.5ms depending on result set size). The complexity of the original query is irrelevant — the view has already done the work.

This means improvement factors scale with the complexity of the original query. A 100ms query becomes 1ms (100x). A 1,200ms query becomes 3ms (400x). The more expensive the raw query, the larger the speedup.

Throughput impact

At 16 concurrent clients, the database served 33 queries per second for the 3-table join pattern without Gold Lapel. With Gold Lapel, the same setup served 13,300 queries per second — a 403x throughput improvement. Database CPU utilization dropped from 94% to 8%.

The freed resources allow PostgreSQL to serve other workloads — writes, transactional queries, ad-hoc reports — without contention from expensive analytical queries.

How to reproduce

All benchmark code, data generators, and infrastructure definitions are open source.

# Clone the benchmark suite
git clone https://github.com/goldlapel/benchmarks
cd benchmarks/join-aggregation

# Start the test environment
docker compose up -d

# Seed the dataset (10M orders, ~2 minutes)
./seed.sh

# Run without Gold Lapel (baseline)
./run.sh --mode baseline --clients 16 --duration 300

# Run with Gold Lapel (wait 60s for pattern detection)
./run.sh --mode goldlapel --clients 16 --duration 300

# Compare results
./compare.sh results/baseline.json results/goldlapel.json

Absolute times vary with hardware, but relative improvement factors should be consistent across environments. The benchmark suite includes a compare.sh script that generates a side-by-side report.

Terms referenced in this article

The materialized view optimization at the heart of these benchmarks has a companion piece that addresses its long-term maintenance. I have written about materialized view pitfalls — refresh costs, bloat accumulation, and the concurrency traps that appear once your views are in production rather than in a benchmark.