← back to blog
cloudFeatured·Sep 1, 2025·12 min read

Aurora PostgreSQL at Scale: Replication, Failover & Performance Tuning

Deep dive into Amazon Aurora PostgreSQL — cluster architecture, read replica scaling, automated failover under 30s, RDS Proxy for connection pooling, and parameter group tuning for high-throughput OLTP.

SJ
Sabin Joshi
DevOps Engineer
#aws#rds#aurora#postgresql#rds-proxy#performance#high-availability

Why Aurora Over Standard RDS?

Aurora PostgreSQL isn't just managed Postgres — it's a fundamentally different storage architecture. Storage is distributed across 3 AZs in 6 copies, auto-heals corrupted blocks, and scales to 128TB. Failover completes in under 30 seconds vs 1–2 minutes for standard Multi-AZ RDS.

<30s
automated failover
128 TB
max storage (auto)
6
copies across 3 AZs
15
max read replicas

Cluster Architecture

Aurora PostgreSQL Cluster with RDS Proxy
{arr('a','#555')}{arr('ag','#00ff88')}{arr('ao','#ff6b35')}{arr('ab','#00ccff')} Application RDS Proxy connection pool IAM auth only multiplexer Writer Endpoint cluster.region.rds.aws.com Reader Endpoint balances read replicas Primary Read / Write AZ: us-east-1a Replica 1 Read-only · AZ: 1b Replica 2 Read-only · AZ: 1c Aurora Shared Distributed Storage 6 copies · 3 AZs · Auto-heal · Up to 128TB · Write quorum 4/6 failover <30s

RDS Proxy: Connection Pooling

Lambda functions create a new DB connection on every cold start. With 500 concurrent Lambda executions you'll exhaust PostgreSQL's connection limit instantly. RDS Proxy pools and reuses connections, acting as a multiplexer.

resource "aws_db_proxy" "main" {
  name          = "aurora-proxy"
  engine_family = "POSTGRESQL"
  role_arn      = aws_iam_role.rds_proxy.arn
  require_tls   = true
  auth {
    auth_scheme = "SECRETS"
    iam_auth    = "REQUIRED"  # IAM auth — no static passwords
    secret_arn  = aws_secretsmanager_secret.db_creds.arn
  }
  target { db_cluster_identifier = aws_rds_cluster.main.cluster_identifier }
}

Parameter Group Tuning

Aurora's defaults are conservative. The highest-impact changes for OLTP workloads:

  • random_page_cost = 1.1 (default 4.0) — Aurora uses SSD-backed distributed storage, so index scans are cheap
  • effective_cache_size = 75% of RAM — tells the query planner how much memory is available for caching
  • work_mem = 4MB globally, set per-session to 256MB for analytical queries
  • log_min_duration_statement = 1000 — log all queries over 1 second

Read Replica Auto Scaling

Aurora auto-scales read replicas based on CPU or connection count. We scale out at 60% average CPU across replicas — this absorbed a 10× traffic spike during a product launch with zero manual intervention.

resource "aws_appautoscaling_policy" "aurora_read" {
  policy_type        = "TargetTrackingScaling"
  resource_id        = "cluster:${aws_rds_cluster.main.cluster_identifier}"
  scalable_dimension = "rds:cluster:ReadReplicaCount"
  target_tracking_scaling_policy_configuration {
    target_value       = 60.0
    predefined_metric_specification {
      predefined_metric_type = "RDSReaderAverageCPUUtilization"
    }
    scale_in_cooldown  = 300
    scale_out_cooldown = 60
  }
}