Chapter 5: The Art of the Refresh

The Waiter of Gold Lapel · Updated Mar 30, 2026 Published Mar 11, 2026 · 26 min

Every developer who learns about materialized views asks the same question within thirty seconds: "But how do I keep it up to date?"

It is the right question. A materialized view that is never refreshed is a snapshot of yesterday — useful for historical analysis, useless for a dashboard. The refresh strategy you choose determines the gap between what your data shows and what your data is. Get it right, and your application serves near-real-time data at index-scan speed. Get it wrong, and you've built a very fast mechanism for serving stale numbers with the confidence of fresh ones.

PostgreSQL, in its characteristic fashion, provides the mechanism and leaves the policy to you. There is no built-in auto-refresh. No scheduled maintenance. No event-driven trigger that fires when underlying data changes. You call REFRESH MATERIALIZED VIEW when you want it refreshed, and PostgreSQL does not presume to know when that should be.

This is the correct design philosophy for a database engine. It is also the reason most developers who try materialized views once either refresh too eagerly and waste resources, refresh too rarely and serve embarrassingly stale data, or — most commonly — set up a refresh schedule during development and never revisit whether it's appropriate for production.

This chapter covers seven approaches to materialized view refresh, presented from simplest to most sophisticated. Each has a clear use case. Each has trade-offs I shall describe honestly. When in doubt, start with the first one. Most applications never need to progress beyond it.

pg_cron: The Workhorse

pg_cron_setup.sql

CREATE EXTENSION IF NOT EXISTS pg_cron;

SELECT cron.schedule(
    'refresh-dashboard',
    '*/15 * * * *',
    'REFRESH MATERIALIZED VIEW CONCURRENTLY mv_dashboard'
);

That is the entire implementation. Three lines of SQL, and your materialized view refreshes every 15 minutes for the rest of its operational life. No application code. No external scheduler. No daemon to configure on the host machine. No additional service to monitor.

pg_cron runs inside PostgreSQL itself — it is a background worker process that wakes on schedule, executes the specified SQL, and returns to sleep. It uses standard cron syntax: */15 * * * * means "every 15 minutes," 0 3 * * * means "daily at 3 AM," and 0 * * * * means "every hour on the hour." The minimum scheduling granularity is one minute.

pg_cron is supported on AWS RDS (PostgreSQL 12.5 and later), Google Cloud SQL, Azure Database for PostgreSQL, Supabase, and every self-managed PostgreSQL installation. If you are running PostgreSQL in production, you can almost certainly use pg_cron.

To monitor your refresh jobs:

monitor_jobs.sql

SELECT jobname, status, return_message,
       start_time, end_time,
       end_time - start_time AS duration
FROM cron.job_run_details
ORDER BY start_time DESC
LIMIT 10;

This tells you, for each scheduled refresh: whether it succeeded, how long it took, and if it failed, why. The return_message column contains the error text on failure — invaluable for diagnosing timeout issues, lock contention, or out-of-memory conditions during refresh.

To list all scheduled jobs:

list_jobs.sql

SELECT jobid, jobname, schedule, command, active
FROM cron.job;

To temporarily suspend a job without deleting it:

suspend_job.sql

UPDATE cron.job SET active = false WHERE jobname = 'refresh-dashboard';

To remove a job entirely:

remove_job.sql

SELECT cron.unschedule('refresh-dashboard');

pg_cron is the correct choice for the majority of materialized view refresh needs. It requires no application code, no external dependencies, and no operational overhead beyond the initial three lines of SQL. If you are unsure which approach to use, use this one. Start at 15 minutes. Measure whether your users need fresher data. Adjust the interval accordingly. In my experience, most applications discover that 15-minute freshness is more than adequate for their actual requirements — which are, reliably, less demanding than their assumed requirements.

Application-Level Scheduling

When your refresh needs to integrate with application logic — conditional refresh based on business rules, retry with exponential backoff, error notification through your existing alerting pipeline, or coordination with deployment processes — application-level scheduling gives you programmatic control that pg_cron's SQL interface cannot provide.

The SQL is identical across every language. The scheduling mechanism varies. The retry and error handling are where application scheduling earns its keep.

Python / Celery Beat

tasks.py

from celery import shared_task
from django.db import connection

@shared_task(bind=True, max_retries=3, default_retry_delay=60)
def refresh_dashboard(self):
    try:
        with connection.cursor() as cursor:
            cursor.execute(
                "REFRESH MATERIALIZED VIEW CONCURRENTLY mv_dashboard"
            )
    except Exception as exc:
        self.retry(exc=exc)

# In your Celery Beat schedule:
CELERY_BEAT_SCHEDULE = {
    'refresh-dashboard': {
        'task': 'myapp.tasks.refresh_dashboard',
        'schedule': 900.0,  # every 15 minutes
    },
}

Three retries with 60-second delays. If all three fail, Celery's error handling surfaces the failure to your monitoring system. For SQLAlchemy applications, replace connection.cursor() with session.execute(text(...)) or with engine.connect() as conn: conn.execute(text(...)).

Ruby / Sidekiq

refresh_dashboard_worker.rb

class RefreshDashboardWorker
  include Sidekiq::Worker
  sidekiq_options retry: 3, queue: :maintenance

  def perform
    ActiveRecord::Base.connection.execute(
      "REFRESH MATERIALIZED VIEW CONCURRENTLY mv_dashboard"
    )
  end
end

Schedule with sidekiq-cron or, if your application uses Good Job instead of Sidekiq, the implementation is even more fitting for this book's thesis:

refresh_dashboard_job.rb

class RefreshDashboardJob < ApplicationJob
  self.cron_key = "refresh_dashboard"
  cron_expression = "*/15 * * * *"

  def perform
    ActiveRecord::Base.connection.execute(
      "REFRESH MATERIALIZED VIEW CONCURRENTLY mv_dashboard"
    )
  end
end

Good Job is PostgreSQL-backed. It stores jobs in your existing database. No Redis. No additional infrastructure. Given the thesis of this book, I trust the appeal is self-evident.

Node.js / node-cron

refresh.js

import cron from 'node-cron';
import { pool } from './db';

cron.schedule('*/15 * * * *', async () => {
    try {
        await pool.query(
            'REFRESH MATERIALIZED VIEW CONCURRENTLY mv_dashboard'
        );
        console.log('Dashboard MV refreshed');
    } catch (err) {
        console.error('Refresh failed:', err);
        // Route to your alerting pipeline
    }
});

I note that the alternative — BullMQ — requires Redis for its job queue. If your only use of Redis is powering the job queue that refreshes materialized views designed to eliminate Redis, you have constructed a dependency loop that node-cron or pg_cron would resolve more gracefully.

Java / Spring @Scheduled

MaterializedViewRefreshService.java

@Service
@Slf4j
public class MaterializedViewRefreshService {

    private final JdbcTemplate jdbcTemplate;

    public MaterializedViewRefreshService(JdbcTemplate jdbcTemplate) {
        this.jdbcTemplate = jdbcTemplate;
    }

    @Async
    @Scheduled(fixedRate = 900_000) // 15 minutes
    public void refreshDashboard() {
        try {
            jdbcTemplate.execute(
                "REFRESH MATERIALIZED VIEW CONCURRENTLY mv_dashboard"
            );
            log.info("Dashboard MV refreshed");
        } catch (Exception e) {
            log.error("Dashboard MV refresh failed", e);
        }
    }
}

The @Async annotation is important — without it, the refresh blocks the scheduler thread. If the refresh takes 30 seconds, no other scheduled tasks run during that window. With @Async, the refresh executes on a separate thread from the task executor pool.

Go / goroutine

refresh.go

func refreshLoop(ctx context.Context, pool *pgxpool.Pool, interval time.Duration) {
    ticker := time.NewTicker(interval)
    defer ticker.Stop()

    for {
        select {
        case <-ticker.C:
            _, err := pool.Exec(ctx,
                "REFRESH MATERIALIZED VIEW CONCURRENTLY mv_dashboard")
            if err != nil {
                log.Printf("refresh failed: %v", err)
            }
        case <-ctx.Done():
            return
        }
    }
}

Context cancellation provides graceful shutdown — when your application receives SIGTERM, the context cancels, the goroutine exits, and no refresh is left mid-flight holding locks against your database. Start this with go refreshLoop(ctx, pool, 15*time.Minute) in your application's initialization.

The Flag Table Pattern: Refresh Only When Stale

A pg_cron job refreshing every 15 minutes is simple and effective. It is also indiscriminate — it refreshes whether or not the underlying data has changed. If your data changes once per hour, that's three unnecessary refreshes per hour, each consuming CPU, generating WAL, and — for CONCURRENTLY — creating dead tuples that VACUUM must eventually reclaim.

For materialized views with expensive refresh queries, this waste adds up. The flag table pattern eliminates it.

The concept: a lightweight trigger marks a flag when source data changes. A periodic job checks the flag and refreshes only when it's set. Three components, all straightforward.

The staleness table:

mv_staleness.sql

CREATE TABLE mv_staleness (
    view_name TEXT PRIMARY KEY,
    is_stale BOOLEAN NOT NULL DEFAULT FALSE,
    stale_since TIMESTAMPTZ
);

INSERT INTO mv_staleness VALUES ('mv_dashboard', FALSE, NULL);

The trigger on your source table:

mark_stale_trigger.sql

CREATE OR REPLACE FUNCTION mark_stale()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
    UPDATE mv_staleness
    SET is_stale = TRUE,
        stale_since = COALESCE(stale_since, NOW())
    WHERE view_name = 'mv_dashboard'
      AND is_stale = FALSE;
    RETURN NULL;
END $$;

CREATE TRIGGER trg_orders_stale
    AFTER INSERT OR UPDATE OR DELETE ON orders
    FOR EACH STATEMENT EXECUTE FUNCTION mark_stale();

The trigger fires per-STATEMENT, not per-row — a bulk INSERT of 10,000 rows fires the trigger once and updates a single row in a one-row table. The cost is negligible.

The conditional refresh job:

conditional_refresh.sql

SELECT cron.schedule('conditional-refresh', '* * * * *', $$
    DO $body$
    BEGIN
        IF (SELECT is_stale FROM mv_staleness
            WHERE view_name = 'mv_dashboard') THEN

            REFRESH MATERIALIZED VIEW CONCURRENTLY mv_dashboard;

            UPDATE mv_staleness
            SET is_stale = FALSE, stale_since = NULL
            WHERE view_name = 'mv_dashboard';
        END IF;
    END $body$;
$$);

The job checks every minute. If the flag is not set, it does nothing — a single SELECT on a one-row table, completing in microseconds. If the flag is set, it refreshes and resets. Your materialized view is refreshed within one minute of any data change, and never refreshed when no data has changed.

Use this pattern when refresh is expensive (more than 10 seconds), when data changes are infrequent relative to the acceptable staleness, or when you want to minimize WAL generation and dead tuple accumulation on frequently-refreshed views.

LISTEN/NOTIFY: Event-Driven Refresh

For applications that need near-real-time freshness without polling, PostgreSQL provides LISTEN/NOTIFY — a built-in publish/subscribe mechanism that most developers have never encountered.

Any database connection can listen on a named channel. Any connection can send a notification on that channel. Notifications are transactional — they are held until the sending transaction commits and discarded if it rolls back. Payloads can be up to 8,000 bytes. Latency is sub-millisecond on local connections.

The trigger:

notify_trigger.sql

CREATE OR REPLACE FUNCTION notify_refresh()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
    PERFORM pg_notify('mv_refresh', 'mv_dashboard');
    RETURN NULL;
END $$;

CREATE TRIGGER trg_orders_notify
    AFTER INSERT OR UPDATE OR DELETE ON orders
    FOR EACH STATEMENT EXECUTE FUNCTION notify_refresh();

The listener with debouncing:

listener.py

import psycopg2
from psycopg2 import sql
import select
import time

conn = psycopg2.connect("dbname=myapp")
conn.set_isolation_level(0)  # autocommit required for LISTEN
cur = conn.cursor()
cur.execute("LISTEN mv_refresh;")

DEBOUNCE_SECONDS = 5
last_notify = 0
pending = set()

while True:
    # Non-blocking check for notifications
    if select.select([conn], [], [], 1.0) != ([], [], []):
        conn.poll()
        while conn.notifies:
            notify = conn.notifies.pop(0)
            pending.add(notify.payload)
            last_notify = time.time()

    # Refresh after activity settles
    if pending and time.time() - last_notify >= DEBOUNCE_SECONDS:
        for view_name in pending:
            try:
                cur.execute(
                    sql.SQL("REFRESH MATERIALIZED VIEW CONCURRENTLY {}")
                    .format(sql.Identifier(view_name))
                )
            except Exception as e:
                print(f"Refresh failed for {view_name}: {e}")
        pending.clear()

The debounce is essential. Without it, a bulk INSERT that fires the trigger once per statement is manageable. But an application that inserts orders one at a time — say, an e-commerce checkout processing 50 orders per minute — would fire 50 notifications per minute, each potentially triggering a refresh. With a 5-second debounce, the listener waits for write activity to settle, then refreshes once. The latency from data change to refreshed view is the debounce interval plus the refresh duration — typically 7 to 35 seconds, depending on your materialized view's complexity.

For Node.js applications, the pg library provides direct access:

listener.js

const listener = new Client('postgresql://localhost/myapp');
await listener.connect();
await listener.query('LISTEN mv_refresh');

let timer = null;

const ALLOWED_VIEWS = new Set(['mv_dashboard', 'mv_daily_revenue']);

listener.on('notification', (msg) => {
    if (!ALLOWED_VIEWS.has(msg.payload)) return;
    if (timer) clearTimeout(timer);
    timer = setTimeout(async () => {
        await pool.query(
            `REFRESH MATERIALIZED VIEW CONCURRENTLY ${msg.payload}`
        );
    }, 5000);
});

LISTEN/NOTIFY has limitations you should know. It does not persist messages — if the listener is disconnected when a notification is sent, that notification is lost. There are no consumer groups, no message replay, no guaranteed delivery. For the specific use case of "tell me when data changes so I can refresh a materialized view," these limitations are irrelevant — the worst case is a missed refresh that the next notification or periodic fallback will catch. For guaranteed delivery of critical events, you want a proper message queue. For triggering materialized view refresh, LISTEN/NOTIFY is more than sufficient and requires zero additional infrastructure.

Trigger-Based Refresh: Usually a Terrible Idea

I include this approach for completeness and as a warning, because developers attempt it frequently and the consequences are severe.

The code looks reasonable:

do_not_use.sql

-- DO NOT USE THIS IN PRODUCTION
CREATE OR REPLACE FUNCTION refresh_on_change()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
    REFRESH MATERIALIZED VIEW mv_dashboard;
    RETURN NULL;
END $$;

CREATE TRIGGER trg_orders_refresh
    AFTER INSERT ON orders
    FOR EACH ROW EXECUTE FUNCTION refresh_on_change();

The REFRESH executes synchronously inside the triggering transaction. Every INSERT into the orders table waits for the entire materialized view to rebuild before the transaction commits. If the refresh takes 5 seconds and your application inserts 100 orders per minute, your insert throughput drops to 12 per minute. The application grinds to a halt within moments.

The secondary effects compound the damage. The REFRESH acquires a lock that blocks other operations. If the materialized view queries multiple source tables, the lock ordering creates deadlock risk. If the refresh fails for any reason — timeout, out of memory, division by zero in the aggregation — the triggering INSERT also fails, and the user's order is lost. Meanwhile, autovacuum cannot run on the materialized view while the refresh lock is held, so dead tuples accumulate. One production system reported dead tuples exceeding live tuples by a factor of 100 before manual intervention arrested the degradation.

If you want refresh in response to data changes, use the flag table pattern or LISTEN/NOTIFY. They achieve the same goal without the catastrophic side effects. The difference is one word: asynchronous. The trigger marks a flag or sends a notification. The refresh happens moments later, outside the triggering transaction, at a time and pace the database can accommodate.

pg_ivm: Incremental View Maintenance

Every approach described so far recomputes the entire materialized view on refresh. Even CONCURRENTLY, with its efficient diff algorithm, must execute the full defining query to produce the fresh dataset before comparing it to the old one.

The pg_ivm extension, maintained by SRA OSS, takes a fundamentally different approach. Instead of recomputing everything, it installs AFTER triggers with Transition Tables that capture the rows affected by each DML statement, compute only the incremental change to the materialized view, and apply that change immediately. An INSERT of one row into a 10-million-row source table updates the materialized view in milliseconds rather than the seconds or minutes a full refresh would require.

pg_ivm.sql

CREATE EXTENSION pg_ivm;

SELECT create_immv('mv_order_totals',
    'SELECT customer_id,
            SUM(total) AS lifetime_value,
            COUNT(*) AS order_count
     FROM orders
     GROUP BY customer_id'
);

-- Now auto-updates on every DML:
INSERT INTO orders (customer_id, total) VALUES (42, 99.95);
-- mv_order_totals is immediately updated — no REFRESH needed

The extension is at version 1.12 as of September 2025, compatible with PostgreSQL 13 through 18. It supports SELECT, WHERE, JOINs, DISTINCT, and common aggregate functions: SUM, COUNT, AVG, MIN, MAX. It has limited subquery support — simple FROM subqueries and WHERE EXISTS with AND conditions — but does not support window functions, CTEs, HAVING clauses, or set operations (UNION, INTERSECT, EXCEPT).

Cloud availability is the primary limitation. pg_ivm is not available on AWS RDS, Aurora, Google Cloud SQL, or Azure Database for PostgreSQL as of early 2026. It is available on BytePlus (TikTok's cloud platform) and on any self-managed PostgreSQL installation where you control the extensions.

A candid assessment: pg_ivm represents where PostgreSQL materialized views are heading. The trajectory from manual refresh to scheduled refresh to event-driven refresh to incremental maintenance is clear, and pg_ivm is the furthest point on that trajectory within the PostgreSQL ecosystem. For self-managed deployments with views that fit within its supported feature set — aggregation queries, join queries, straightforward WHERE clauses — it provides the closest thing to Oracle's incremental refresh that PostgreSQL offers.

For teams on managed cloud services, or with materialized views that require window functions or CTEs, it is not yet an option. The commercial alternative Epsio provides streaming incremental view maintenance as a service alongside PostgreSQL, if the requirement is urgent enough to justify a vendor dependency. Materialize and RisingWave offer streaming SQL as standalone databases, but they are separate systems — not PostgreSQL extensions — and introducing them adds the infrastructure complexity this book argues against.

Watch this space. The trajectory is clear.

Preventing Refresh Storms

When multiple processes attempt to refresh the same materialized view simultaneously — a cron job fires while a manual refresh is running, two application instances both schedule the same refresh, a developer runs a refresh from psql while pg_cron handles the scheduled one — PostgreSQL's locking prevents corruption but not waste. The second REFRESH blocks until the first completes, then refreshes again unnecessarily, doubling the resource consumption and lock duration.

Advisory locks prevent this with zero overhead:

safe_refresh.sql

CREATE OR REPLACE FUNCTION safe_refresh(p_view TEXT)
RETURNS BOOLEAN LANGUAGE plpgsql AS $$
DECLARE
    lock_id BIGINT;
BEGIN
    -- Use the materialized view's OID as a unique lock key
    SELECT oid INTO lock_id
    FROM pg_class
    WHERE relname = p_view AND relkind = 'm';

    -- Try to acquire the lock — returns immediately
    IF NOT pg_try_advisory_lock(lock_id) THEN
        RAISE NOTICE 'Refresh already in progress for %, skipping', p_view;
        RETURN FALSE;
    END IF;

    -- Perform the refresh
    EXECUTE format(
        'REFRESH MATERIALIZED VIEW CONCURRENTLY %I', p_view
    );

    -- Release the lock
    PERFORM pg_advisory_unlock(lock_id);
    RETURN TRUE;
END $$;

pg_try_advisory_lock returns immediately — TRUE if the lock was acquired, FALSE if another session holds it. No blocking. No waiting. The second process simply skips the refresh and moves on. Using the materialized view's OID from pg_class as the lock key guarantees uniqueness across all materialized views in the database.

Advisory locks create no table bloat (unlike row-level locks), require no cleanup, and are automatically released if the session terminates unexpectedly — a safety net against leaked locks from crashed processes.

Replace your direct REFRESH calls with SELECT safe_refresh('mv_dashboard') in pg_cron schedules, application tasks, and manual operations alike. The advisory lock ensures that no matter how many processes attempt a refresh, exactly one executes it.

The Anti-Patterns

Allow me to save you some grief.

Refreshing too frequently. If your underlying data changes once per hour and you refresh every minute, you execute 59 unnecessary refreshes per hour. Each consumes CPU, generates WAL entries that must be shipped to replicas, and — for CONCURRENTLY — creates dead tuples that VACUUM must reclaim. The waste is proportional to the ratio of unnecessary refreshes to necessary ones. Use the flag table pattern to refresh only when data has actually changed.

Not refreshing frequently enough. A materialized view refreshed once daily that powers a "real-time" dashboard is worse than no materialized view at all. It gives the illusion of current data while serving yesterday's numbers. Users make decisions based on stale information and don't know it. Match the refresh interval to the staleness your users can genuinely tolerate — and ask them, because their tolerance is usually more generous than developers assume but less generous than "once a day."

Forgetting the UNIQUE index for CONCURRENTLY. The error message is clear: cannot refresh materialized view concurrently with a hint to create a unique index. Yet it surprises developers every time, usually during the first production deployment. Create the UNIQUE index immediately after creating the materialized view. Make it part of the same migration. Consider it mandatory.

Not indexing the materialized view at all. A materialized view is a table. An unindexed table performs sequential scans on every query. I observe this mistake with distressing regularity — developers create the materialized view, see that the refresh is fast, and assume the reads must also be fast. They are not. The reads are full table scans against a table that may contain millions of rows. Add indexes for every column combination your queries filter, sort, or join by.

Ignoring refresh duration versus schedule interval. If your refresh takes 6 minutes and your pg_cron schedule runs every 5 minutes, the second refresh attempt arrives while the first is still running. Without advisory locks, it blocks until the first completes, then immediately starts another 6-minute refresh. With advisory locks, it skips — but your effective refresh rate is 6 minutes regardless of what the schedule says. Rule of thumb: set your interval to at least twice your measured refresh duration under production load. Measure the duration from cron.job_run_details or your refresh log table, not from your development environment.

Skipping VACUUM on CONCURRENTLY-refreshed views. The diff algorithm used by REFRESH CONCURRENTLY produces dead tuples — rows marked for deletion but not yet reclaimed. Autovacuum is supposed to handle this, but CONCURRENTLY holds an ExclusiveLock that conflicts with autovacuum's ShareUpdateExclusiveLock. If refreshes are frequent, autovacuum may never get a window to run. Dead tuples accumulate. Table bloat grows. Query performance degrades. The fix is in the next section.

Production Operations

Managing CONCURRENTLY bloat

Standard REFRESH produces zero dead tuples — the heap swap mechanism replaces the entire table atomically. CONCURRENTLY produces dead tuples through its INSERT/UPDATE/DELETE diff operations, exactly as regular DML would on any table.

The complication: CONCURRENTLY holds an ExclusiveLock for the duration of the refresh. Autovacuum requires a ShareUpdateExclusiveLock, which conflicts. If your materialized view refreshes every 15 minutes and each refresh takes 2 minutes, autovacuum has a 13-minute window to run. That's usually sufficient. If your refresh takes 10 minutes with a 15-minute interval, the window shrinks to 5 minutes — and autovacuum may not complete its work before the next refresh locks it out again.

Two solutions, both effective:

Aggressive autovacuum settings per materialized view:

autovacuum_settings.sql

ALTER MATERIALIZED VIEW mv_dashboard SET (
    autovacuum_vacuum_scale_factor = 0.01,
    autovacuum_vacuum_threshold = 50,
    autovacuum_analyze_scale_factor = 0.02,
    autovacuum_vacuum_cost_delay = 0
);

This triggers autovacuum when just 1% of rows are dead (instead of the default 20%), with zero cost delay for the fastest possible execution. Autovacuum will be more aggressive about seizing its window when one becomes available.

Scheduled explicit VACUUM after refresh:

scheduled_vacuum.sql

-- Refresh at :00, :15, :30, :45
SELECT cron.schedule('refresh-dashboard', '0,15,30,45 * * * *',
    'SELECT safe_refresh(''mv_dashboard'')');

-- VACUUM at :03, :18, :33, :48 — three minutes after each refresh
SELECT cron.schedule('vacuum-dashboard', '3,18,33,48 * * * *',
    'VACUUM ANALYZE mv_dashboard');

Three minutes provides enough time for the refresh to complete and release its lock. The VACUUM reclaims dead tuples. The ANALYZE updates statistics so the query planner has accurate information for subsequent reads.

Monitoring freshness

The refresh log table introduced in Chapter 4 is your primary monitoring tool:

monitor_freshness.sql

-- Find materialized views that haven't refreshed on schedule
SELECT view_name,
       last_refreshed,
       NOW() - last_refreshed AS staleness,
       duration AS last_duration
FROM mv_refresh_log
WHERE NOW() - last_refreshed > INTERVAL '30 minutes'
ORDER BY staleness DESC;

For Prometheus and Grafana integration, expose the staleness of each materialized view as a gauge metric and alert when any view exceeds its expected refresh interval by more than twice. A materialized view scheduled for 15-minute refresh that hasn't refreshed in 35 minutes indicates a failed or blocked refresh that requires investigation.

pg_dump and pg_restore

pg_dump exports materialized views in two phases: the DDL definition (CREATE MATERIALIZED VIEW ... WITH NO DATA) appears in the schema dump, and a REFRESH MATERIALIZED VIEW command appears in the data dump. It does not use COPY — the data is regenerated by re-executing the defining query during restore.

For large databases where you want to restore the schema quickly and refresh materialized views later — after the base tables are populated and indexed — filter out the materialized view data during restore:

pg_dump_restore.sh

pg_dump -Fc -f backup.dump mydb
pg_restore -l backup.dump | grep -v 'MATERIALIZED VIEW DATA' > restore.list
pg_restore -L restore.list -d mydb backup.dump

Then refresh each materialized view manually after the restore completes, in dependency order.

Dependency ordering for chained views

When materialized views depend on other materialized views — a daily metrics view feeding a weekly rollup feeding a monthly summary — refresh order matters. The base view must refresh before its dependents. If the daily view is stale, the weekly view built from it will also be stale, regardless of its own refresh timing.

In pg_cron, stagger the schedules:

dependency_ordering.sql

-- Base view: refresh at :00
SELECT cron.schedule('refresh-daily', '0 * * * *',
    'SELECT safe_refresh(''mv_metrics_daily'')');

-- Dependent: refresh at :05 (after base completes)
SELECT cron.schedule('refresh-weekly', '5 * * * *',
    'SELECT safe_refresh(''mv_metrics_weekly'')');

In application code, execute refreshes sequentially with error propagation — if the base refresh fails, skip the dependent refresh rather than rebuilding it from stale data.

Choosing Your Refresh Strategy

I am frequently asked which approach to use. The answer depends on two factors: how fresh the data must be, and how often the underlying data changes.

The Refresh Strategy Decision Framework

Freshness Need Data Change Rate Recommended Approach Complexity
Hours acceptable Any pg_cron at fixed interval Trivial
Minutes acceptable Low-moderate writes pg_cron at 1–5 minute interval Trivial
Minutes acceptable High writes, expensive refresh Flag table + pg_cron Low
Near-real-time (seconds) Moderate writes LISTEN/NOTIFY with debounce Medium
Immediate Low writes, simple aggregates pg_ivm (self-managed PG only) Medium
Immediate High writes, complex queries External streaming (Materialize, RisingWave) High

When in doubt, start with pg_cron at 15 minutes. Measure. Adjust. Most applications discover that 15-minute freshness is more than adequate for their actual requirements, which are often less demanding than their assumed requirements.

Freshness Need	Data Change Rate	Recommended Approach	Complexity
Hours acceptable	Any	pg_cron at fixed interval	Trivial
Minutes acceptable	Low-moderate writes	pg_cron at 1–5 minute interval	Trivial
Minutes acceptable	High writes, expensive refresh	Flag table + pg_cron	Low
Near-real-time (seconds)	Moderate writes	LISTEN/NOTIFY with debounce	Medium
Immediate	Low writes, simple aggregates	pg_ivm (self-managed PG only)	Medium
Immediate	High writes, complex queries	External streaming (Materialize, RisingWave)	High

Now you know every approach to keeping materialized views fresh, the operational considerations for each, and the mistakes to avoid. The refresh problem — the single biggest objection developers raise against materialized views — is, I trust, thoroughly resolved.

In the next chapter, I shall address the other great objection: cache invalidation. Specifically, why it is a structurally intractable problem when your cache and your database are separate systems — and a structurally trivial one when they are the same system.

The Waiter has opinions about cache invalidation. Rather strong ones, as it happens.