Automatic Index Creation Benchmarks
From sequential scan to sorted. Five index strategies, five query patterns, and the numbers to show for it.
Methodology
These benchmarks measure the performance impact of Gold Lapel's automatic index creation across five common query patterns. The baseline dataset has no secondary indexes — only primary keys. Gold Lapel observes query patterns during a warmup phase, creates indexes using CREATE INDEX CONCURRENTLY, and the benchmark measures steady-state performance after index creation.
| Parameter | Value |
|---|---|
| PostgreSQL | 16.2 |
| Gold Lapel | 0.1.0 |
| CPU | 4-core AMD EPYC (c5.xlarge) |
| RAM | 8 GB |
| Storage | gp3 SSD, 3000 IOPS |
| Dataset | 5M events, 200K users, 50K sessions |
| Concurrent clients | 16 |
| Duration | 5 minutes per test |
What the EXPLAIN looks like
The most telling measure of index effectiveness is the EXPLAIN plan transition: from sequential scan (reading the entire table) to index scan (reading only matching rows).
-- Before: Sequential scan on 5M rows
EXPLAIN ANALYZE
SELECT * FROM events WHERE user_id = 42 AND created_at > '2025-01-01';
-- QUERY PLAN
-- -------------------------------------------------------
-- Seq Scan on events (cost=0.00..198234.00 rows=847 ...)
-- (actual time=234.112..567.891 rows=823 loops=1)
-- Filter: ((user_id = 42) AND (created_at > ...))
-- Rows Removed by Filter: 4999177
-- Planning Time: 0.089 ms
-- Execution Time: 567.934 ms
-- After: Gold Lapel creates the composite index
-- CREATE INDEX CONCURRENTLY gl_idx_events_user_created
-- ON events (user_id, created_at);
-- QUERY PLAN
-- -------------------------------------------------------
-- Index Scan using gl_idx_events_user_created on events
-- (cost=0.43..42.18 rows=847 ...)
-- (actual time=0.034..4.812 rows=823 loops=1)
-- Index Cond: ((user_id = 42) AND (created_at > ...))
-- Planning Time: 0.112 ms
-- Execution Time: 4.847 ms Results
| Query pattern | Index created | Without index (P50) | With index (P50) | Improvement |
|---|---|---|---|---|
| Single-column equality (status filter) | B-tree on status | 342ms | 2.1ms | 163x |
| Composite equality + range (user + date) | B-tree on (user_id, created_at) | 567ms | 4.8ms | 118x |
| Partial index (active records only) | B-tree on status WHERE active | 342ms | 0.9ms | 380x |
| Text prefix search (email LIKE) | B-tree on email varchar_pattern_ops | 891ms | 1.4ms | 636x |
| Multi-column sort (ORDER BY a, b) | B-tree on (region, revenue DESC) | 234ms | 3.2ms | 73x |
The partial index shows the highest improvement (380x) because it indexes only 6% of rows (active records), making the index 17x smaller and faster to scan than a full B-tree on the same column.
Analysis
The improvement factor varies by pattern because it depends on selectivity — how many rows the query actually needs out of the total table size.
- High-selectivity filters (equality on a specific user, text prefix match) show the largest improvements because the index eliminates the most rows. The text prefix pattern improves 636x because the LIKE query without an index must scan every row's email column.
- Composite indexes show moderate improvements (73x–118x) because the index serves both the filter and the sort, eliminating a separate Sort node in the query plan.
- Partial indexes combine selectivity with a smaller index size. When 94% of rows are in a terminal state, indexing only the active 6% produces an index that fits in a single index page — dramatically faster than scanning the full B-tree.
Index creation overhead
Gold Lapel creates all indexes using CONCURRENTLY, which does not lock the table for writes during creation. Index build times for this dataset:
| Index type | Build time | Index size |
|---|---|---|
| Single-column B-tree (5M rows) | 3.2s | 107 MB |
| Composite B-tree (5M rows) | 4.8s | 172 MB |
| Partial B-tree (300K rows) | 0.4s | 6.4 MB |
| varchar_pattern_ops (5M rows) | 5.1s | 214 MB |
The one-time build cost is amortized across every subsequent query. For a pattern executing 100 times per hour, the 3-5 second build cost is recovered within the first minute of use.
How to reproduce
# Clone the benchmark suite
git clone https://github.com/goldlapel/benchmarks
cd benchmarks/index-auto-creation
# Start test environment
docker compose up -d
# Seed dataset (no indexes beyond primary keys)
./seed.sh --no-indexes
# Run baseline
./run.sh --mode baseline --clients 16 --duration 300
# Run with Gold Lapel (indexes created during warmup)
./run.sh --mode goldlapel --clients 16 --duration 300
# Compare
./compare.sh results/baseline.json results/goldlapel.json The seed script creates tables with no secondary indexes. Gold Lapel's warmup phase detects patterns and creates indexes before the timed benchmark begins.
Terms referenced in this article
If the index strategies behind these numbers interest you, I have written a comprehensive guide to PostgreSQL index types — B-tree, GIN, GiST, BRIN, and hash — that explains when each is selected and why the performance differences are as dramatic as you have just seen.