200 Tenants, One Postgres Instance, and a Rather Tidy Arrangement
A vertical SaaS for property management, shared-schema multi-tenancy, and the noisy-neighbor problem that every multi-tenant database eventually encounters.
The challenge
The platform — a property management SaaS serving 200 tenants — uses shared-schema multi-tenancy. All tenants share the same PostgreSQL tables, isolated by a tenant_id column and row-level security policies. One PostgreSQL instance, one schema, 200 tenants ranging from a 10-unit apartment building to a 4,000-unit commercial portfolio.
The noisy-neighbor problem: the five largest tenants — each with 2,000-4,000 properties — generated query volumes and data sizes that were 50-100x larger than the median tenant. When a large tenant's property manager ran a portfolio-wide maintenance report, the aggregation query consumed enough database resources to slow down every other tenant's experience.
The symptoms were intermittent but severe. Three to four times per day, P95 latency across all tenants would spike from 40ms to 340ms for 2-5 minutes. Customer support received complaints from small tenants who were affected by queries they had nothing to do with.
The team (six engineers, Node.js with Prisma) considered three approaches: shard the large tenants onto separate instances (complex, expensive), add read replicas and route large queries there (moderately complex), or optimize the expensive query patterns so they stopped being noisy.
The setup
Gold Lapel was deployed as a standalone service in the existing Docker Compose infrastructure. The application's Prisma connection string was pointed at the proxy.
# Gold Lapel sits between the app and Postgres
# Tenant routing is handled by the app — GL is tenant-agnostic
goldlapel:
image: goldlapel/goldlapel:latest
command: >
--upstream 'postgresql://platform:password@db:5432/platform_prod'
--pattern-ttl 3600
ports:
- "7932:7932" Gold Lapel does not need to understand multi-tenancy — it operates at the query pattern level. A slow query is a slow query regardless of which tenant generated it. The tenant_id filter in the WHERE clause is simply part of the pattern signature.
What Gold Lapel found
The noisy-neighbor spikes traced to four query patterns, all generated by the same reporting module:
- Portfolio maintenance summary. A six-table join aggregating open work orders, costs, and completion rates across all properties for a tenant. For a 4,000-property tenant, this scanned 2.3 million work order rows. Execution time: 2.8 seconds.
- Occupancy rate calculation. A three-table join computing vacancy rates by property, unit type, and month. For large tenants, this aggregated across 15,000 lease records. Execution time: 1.4 seconds.
- Rent roll aggregation. A four-table join computing expected vs actual rent collected by property and month. Execution time: 1.1 seconds for large tenants.
- Missing partial indexes. The
work_orderstable had 8 million rows, but 94% hadstatus = 'completed'. Queries filtering for open work orders (WHERE status IN ('open', 'in_progress')) were scanning the full index. A partial index on the active statuses would be 17x smaller.
Gold Lapel created three materialized views for the reporting aggregations and two partial indexes. The materialized views were parameterized by tenant — each view contained data for all tenants, but refreshes were triggered only by writes to the specific tables involved, not on a fixed schedule.
The results
| Query pattern | Before (large tenant) | After | Improvement |
|---|---|---|---|
| Maintenance summary | 2.8s | 18ms | 99.4% |
| Occupancy rate | 1.4s | 9ms | 99.4% |
| Rent roll | 1.1s | 12ms | 98.9% |
| Open work order lookups | 180ms | 3ms | 98.3% |
Cross-tenant impact
| Metric | Before | After | Improvement |
|---|---|---|---|
| P95 latency (all tenants) | 340ms (during spikes) | 12ms | 96% |
| Noisy-neighbor spikes/day | 3-4 episodes | 0 | 100% |
| Database CPU (peak) | 91% | 34% | 63% |
The noisy-neighbor spikes disappeared entirely. The large tenants' reporting queries no longer consumed enough resources to affect other tenants' experience.
Why sharding was not necessary
The team had been evaluating tenant-level sharding — moving the five largest tenants onto their own PostgreSQL instances. This would have required connection routing logic, cross-shard query handling for admin tools, separate backup and migration workflows, and five additional RDS instances (approximately $16,500/month in additional infrastructure).
The noisy-neighbor problem turned out to be a query efficiency problem rather than a data volume problem. The database had sufficient capacity. A handful of expensive aggregations were consuming disproportionate resources during execution. Once those aggregations were precomputed as materialized views, the large tenants' queries were no more expensive than the small tenants' queries.
Sharding addresses data volume challenges. Optimization addresses query efficiency. Understanding which problem you are facing helps determine the right approach.
Key takeaways
- Multi-tenant databases amplify query problems. A slow query in a single-tenant system affects one customer. A slow query in a shared-schema system affects every customer. The incentive to optimize is proportional to the tenant count.
- Partial indexes are underused in multi-tenant systems. When 94% of rows are in a terminal state, a full index is 17x larger than necessary. Partial indexes on active-status rows are dramatically smaller and faster.
- The noisy-neighbor problem is often a query problem rather than a capacity problem. Adding more hardware distributes the expensive queries across more cores but does not make them cheaper. Optimizing the queries makes them efficient enough that distribution is unnecessary.
- Sharding is the right answer — eventually. At some point, data volume will genuinely exceed what a single instance can serve. Optimization extends that runway, often further than expected.
Terms referenced in this article
The connection pooling that made 200 tenants viable on one instance is a subject that repays further study. The PostgreSQL connection pooler comparison examines PgBouncer, Pgpool-II, PgCat, and Odyssey — each of which handles the multi-tenant connection pattern differently.