Serverless at Scale: AWS Lambda, API Gateway & Event-Driven Patterns

Why Serverless in 2025?

Serverless eliminates entire categories of operational burden — no servers to patch, no capacity planning, pay only for what you use. But "just use Lambda" isn't a strategy. Serverless at scale requires deliberate architecture or you'll hit walls fast.

This guide covers lessons from running 150M+ Lambda invocations per month in a production fintech platform.

150M

monthly invocations

12ms

avg cold start (optimised)

99.97%

uptime SLA

72%

cost vs EC2 equivalent

Event-Driven Architecture

The core rule: every Lambda is triggered by an event source, never called directly from another Lambda. Direct Lambda-to-Lambda calls create tight coupling, cascading failures, and idle-wait costs.

Serverless Event-Driven Architecture

Conquering Cold Starts

Cold starts happen when Lambda creates a new execution environment — typically 100ms–2s depending on runtime and package size. Techniques we used to go from ~800ms to ~12ms:

Provisioned Concurrency for latency-sensitive paths (checkout, auth)
Slim packages — from 45MB to 3.2MB by removing dev deps and using Lambda Layers
Right runtime — Node.js and Python start 5–10× faster than JVM runtimes
Move init outside the handler — DB connections, SDK clients created once per container

// ❌ Bad — new DB connection every invocation
export const handler = async (event) => {
  const db = await createConnection(); // cold start tax every time
  return db.query('SELECT ...');
};

// ✅ Good — connection reused across warm invocations
let db;
const getDb = async () => db ?? (db = await createConnection());

export const handler = async (event) => {
  const conn = await getDb(); // instant on warm invocation
  return conn.query('SELECT ...');
};

Concurrency & Throttling

Lambda has an account-level concurrency limit (default 1000 per region). Without reserved concurrency, one noisy function can exhaust the entire limit — including your payment and auth flows. Reserve concurrency per critical function and use Provisioned Concurrency for predictable latency.

resource "aws_lambda_function" "payment" {
  reserved_concurrent_executions = 200  # hard cap
}
resource "aws_lambda_provisioned_concurrency_config" "warm" {
  function_name                  = aws_lambda_function.payment.function_name
  qualifier                      = aws_lambda_alias.live.name
  provisioned_concurrent_executions = 20  # always warm
}

SQS + Dead Letter Queues

Every SQS-triggered Lambda must have a DLQ. Without it, failed messages silently vanish after the visibility timeout. Our setup: 3 retries → DLQ → PagerDuty alert + S3 for replay audit.

⚠️Never set SQS batch size above what your Lambda can process in under 30 seconds. Large batches + slow processing = visibility timeout = entire batch retried = duplicate processing.

Observability with Lambda Powertools

CloudWatch's default Lambda metrics miss critical signals. We use Lambda Powertools for structured JSON logs, distributed tracing, custom metrics, and idempotency handling.

from aws_lambda_powertools import Logger, Tracer, Metrics
from aws_lambda_powertools.utilities.idempotency import idempotent

logger  = Logger()
tracer  = Tracer()
metrics = Metrics()

@logger.inject_lambda_context
@tracer.capture_lambda_handler
@idempotent(persistence_store=dynamodb_persistence)
def handler(event, context):
    logger.info("Processing order", order_id=event["orderId"])
    metrics.add_metric(name="OrderProcessed", unit="Count", value=1)
    return process_order(event)

💡Use @idempotent on any Lambda handling financial transactions. It uses DynamoDB to guarantee exactly-once processing even when SQS delivers the same message twice.