Serverless at Scale: AWS Lambda, API Gateway & Event-Driven Patterns
Building production serverless platforms handling 150M+ invocations/month — cold start optimization, concurrency management, SQS/SNS event patterns, and real-world performance tuning.
Why Serverless in 2025?
Serverless eliminates entire categories of operational burden — no servers to patch, no capacity planning, pay only for what you use. But "just use Lambda" isn't a strategy. Serverless at scale requires deliberate architecture or you'll hit walls fast.
This guide covers lessons from running 150M+ Lambda invocations per month in a production fintech platform.
Event-Driven Architecture
The core rule: every Lambda is triggered by an event source, never called directly from another Lambda. Direct Lambda-to-Lambda calls create tight coupling, cascading failures, and idle-wait costs.
Conquering Cold Starts
Cold starts happen when Lambda creates a new execution environment — typically 100ms–2s depending on runtime and package size. Techniques we used to go from ~800ms to ~12ms:
- Provisioned Concurrency for latency-sensitive paths (checkout, auth)
- Slim packages — from 45MB to 3.2MB by removing dev deps and using Lambda Layers
- Right runtime — Node.js and Python start 5–10× faster than JVM runtimes
- Move init outside the handler — DB connections, SDK clients created once per container
// ❌ Bad — new DB connection every invocation
export const handler = async (event) => {
const db = await createConnection(); // cold start tax every time
return db.query('SELECT ...');
};
// ✅ Good — connection reused across warm invocations
let db;
const getDb = async () => db ?? (db = await createConnection());
export const handler = async (event) => {
const conn = await getDb(); // instant on warm invocation
return conn.query('SELECT ...');
};
Concurrency & Throttling
Lambda has an account-level concurrency limit (default 1000 per region). Without reserved concurrency, one noisy function can exhaust the entire limit — including your payment and auth flows. Reserve concurrency per critical function and use Provisioned Concurrency for predictable latency.
resource "aws_lambda_function" "payment" {
reserved_concurrent_executions = 200 # hard cap
}
resource "aws_lambda_provisioned_concurrency_config" "warm" {
function_name = aws_lambda_function.payment.function_name
qualifier = aws_lambda_alias.live.name
provisioned_concurrent_executions = 20 # always warm
}
SQS + Dead Letter Queues
Every SQS-triggered Lambda must have a DLQ. Without it, failed messages silently vanish after the visibility timeout. Our setup: 3 retries → DLQ → PagerDuty alert + S3 for replay audit.
Observability with Lambda Powertools
CloudWatch's default Lambda metrics miss critical signals. We use Lambda Powertools for structured JSON logs, distributed tracing, custom metrics, and idempotency handling.
from aws_lambda_powertools import Logger, Tracer, Metrics
from aws_lambda_powertools.utilities.idempotency import idempotent
logger = Logger()
tracer = Tracer()
metrics = Metrics()
@logger.inject_lambda_context
@tracer.capture_lambda_handler
@idempotent(persistence_store=dynamodb_persistence)
def handler(event, context):
logger.info("Processing order", order_id=event["orderId"])
metrics.add_metric(name="OrderProcessed", unit="Count", value=1)
return process_order(event)
@idempotent on any Lambda handling financial transactions. It uses DynamoDB to guarantee exactly-once processing even when SQS delivers the same message twice.