Overview
Source: The System Design Newsletter — Neo Kim
AWS Lambda is a serverless compute platform where developers deploy functions without managing servers. Lambda must instantly provision isolated execution environments, handle millions of concurrent invocations, and return to idle efficiently — all while providing strong security isolation.
Key Concepts
Serverless — The cloud provider handles all infrastructure: servers, scaling, patching, and capacity planning. Developers only write and deploy code.
Function — A self-contained unit of code (handler) triggered by events. Functions are stateless; any state must be externalized (S3, DynamoDB, etc.).
Cold Start — The latency penalty for initializing a new execution environment: downloading the function package, starting the runtime, and running initialization code. Typically 100ms–1s+.
Warm Start — Reusing an already-initialized execution environment for subsequent invocations. Near-zero overhead.
Firecracker — AWS-built open-source microVM technology. Creates lightweight VMs in ~125ms with very low memory overhead. Powers Lambda's isolation model.
Core Components
- Frontend Service — Receives invocation requests (API Gateway, S3 events, SQS, etc.). Routes to Worker Manager.
- Worker Manager — Manages the pool of available execution environments (warm/cold). Assigns invocations to workers.
- Worker (Firecracker MicroVM) — Isolated execution environment. Each function invocation runs inside its own microVM slot.
- Placement Service — Decides which physical host a new microVM should run on (bin-packing by memory/CPU).
- Control Plane — Manages function metadata: code package location, runtime, memory config, concurrency limits.
- Sandbox — The actual process inside the microVM running the function handler.
Invocation Flow
- Event triggers Lambda invocation (API Gateway, S3 event, SQS message, etc.)
- Frontend Service routes request to Worker Manager
- Worker Manager checks for a warm execution environment for this function:
- Warm: Assign immediately → inject event → run handler
- Cold: Request new microVM from Placement Service → provision Firecracker VM → download code package → start runtime → run handler
- Handler executes, returns response
- Execution environment held warm for ~15 minutes for potential reuse
- After timeout, environment is frozen or terminated
Cold Start Breakdown
Phase | Time |
Firecracker VM boot | ~125ms |
Runtime init (JVM, Node, Python) | 50–500ms |
Function initialization code | Variable |
Total (worst case) | 100ms – 2s+ |
Concurrency & Scaling
- Each concurrent invocation gets its own execution environment
- Default limit: 1,000 concurrent executions per account per region (soft limit)
- Provisioned Concurrency — Pre-warms N environments to eliminate cold starts for latency-sensitive workloads
- Scaling is near-instant for event-driven workloads (SQS, Kinesis)
Key Trade-offs
Decision | Reasoning |
Firecracker microVMs | Strong isolation (vs. containers) with low overhead |
Stateless functions | Enables horizontal scaling with no coordination |
Warm environment reuse | Amortizes cold start cost across many invocations |
15-min max execution | Enforces the "short-lived function" contract; long jobs use ECS/EC2 |