Aurora PostgreSQL at Scale: Replication, Failover & Performance Tuning

Why Aurora Over Standard RDS?

Aurora PostgreSQL isn't just managed Postgres — it's a fundamentally different storage architecture. Storage is distributed across 3 AZs in 6 copies, auto-heals corrupted blocks, and scales to 128TB. Failover completes in under 30 seconds vs 1–2 minutes for standard Multi-AZ RDS.

<30s

automated failover

128 TB

max storage (auto)

copies across 3 AZs

max read replicas

Cluster Architecture

Aurora PostgreSQL Cluster with RDS Proxy

RDS Proxy: Connection Pooling

Lambda functions create a new DB connection on every cold start. With 500 concurrent Lambda executions you'll exhaust PostgreSQL's connection limit instantly. RDS Proxy pools and reuses connections, acting as a multiplexer.

resource "aws_db_proxy" "main" {
  name          = "aurora-proxy"
  engine_family = "POSTGRESQL"
  role_arn      = aws_iam_role.rds_proxy.arn
  require_tls   = true
  auth {
    auth_scheme = "SECRETS"
    iam_auth    = "REQUIRED"  # IAM auth — no static passwords
    secret_arn  = aws_secretsmanager_secret.db_creds.arn
  }
  target { db_cluster_identifier = aws_rds_cluster.main.cluster_identifier }
}

Parameter Group Tuning

Aurora's defaults are conservative. The highest-impact changes for OLTP workloads:

random_page_cost = 1.1 (default 4.0) — Aurora uses SSD-backed distributed storage, so index scans are cheap
effective_cache_size = 75% of RAM — tells the query planner how much memory is available for caching
work_mem = 4MB globally, set per-session to 256MB for analytical queries
log_min_duration_statement = 1000 — log all queries over 1 second

Read Replica Auto Scaling

Aurora auto-scales read replicas based on CPU or connection count. We scale out at 60% average CPU across replicas — this absorbed a 10× traffic spike during a product launch with zero manual intervention.

resource "aws_appautoscaling_policy" "aurora_read" {
  policy_type        = "TargetTrackingScaling"
  resource_id        = "cluster:${aws_rds_cluster.main.cluster_identifier}"
  scalable_dimension = "rds:cluster:ReadReplicaCount"
  target_tracking_scaling_policy_configuration {
    target_value       = 60.0
    predefined_metric_specification {
      predefined_metric_type = "RDSReaderAverageCPUUtilization"
    }
    scale_in_cooldown  = 300
    scale_out_cooldown = 60
  }
}