logo

Overview

Source: The System Design Newsletter — Neo Kim
AWS Lambda is a serverless compute platform where developers deploy functions without managing servers. Lambda must instantly provision isolated execution environments, handle millions of concurrent invocations, and return to idle efficiently — all while providing strong security isolation.

Key Concepts

Serverless — The cloud provider handles all infrastructure: servers, scaling, patching, and capacity planning. Developers only write and deploy code.
Function — A self-contained unit of code (handler) triggered by events. Functions are stateless; any state must be externalized (S3, DynamoDB, etc.).
Cold Start — The latency penalty for initializing a new execution environment: downloading the function package, starting the runtime, and running initialization code. Typically 100ms–1s+.
Warm Start — Reusing an already-initialized execution environment for subsequent invocations. Near-zero overhead.
Firecracker — AWS-built open-source microVM technology. Creates lightweight VMs in ~125ms with very low memory overhead. Powers Lambda's isolation model.

Core Components

  • Frontend Service — Receives invocation requests (API Gateway, S3 events, SQS, etc.). Routes to Worker Manager.
  • Worker Manager — Manages the pool of available execution environments (warm/cold). Assigns invocations to workers.
  • Worker (Firecracker MicroVM) — Isolated execution environment. Each function invocation runs inside its own microVM slot.
  • Placement Service — Decides which physical host a new microVM should run on (bin-packing by memory/CPU).
  • Control Plane — Manages function metadata: code package location, runtime, memory config, concurrency limits.
  • Sandbox — The actual process inside the microVM running the function handler.

Invocation Flow

  1. Event triggers Lambda invocation (API Gateway, S3 event, SQS message, etc.)
  1. Frontend Service routes request to Worker Manager
  1. Worker Manager checks for a warm execution environment for this function:
      • Warm: Assign immediately → inject event → run handler
      • Cold: Request new microVM from Placement Service → provision Firecracker VM → download code package → start runtime → run handler
  1. Handler executes, returns response
  1. Execution environment held warm for ~15 minutes for potential reuse
  1. After timeout, environment is frozen or terminated

Cold Start Breakdown

Phase
Time
Firecracker VM boot
~125ms
Runtime init (JVM, Node, Python)
50–500ms
Function initialization code
Variable
Total (worst case)
100ms – 2s+

Concurrency & Scaling

  • Each concurrent invocation gets its own execution environment
  • Default limit: 1,000 concurrent executions per account per region (soft limit)
  • Provisioned Concurrency — Pre-warms N environments to eliminate cold starts for latency-sensitive workloads
  • Scaling is near-instant for event-driven workloads (SQS, Kinesis)

Key Trade-offs

Decision
Reasoning
Firecracker microVMs
Strong isolation (vs. containers) with low overhead
Stateless functions
Enables horizontal scaling with no coordination
Warm environment reuse
Amortizes cold start cost across many invocations
15-min max execution
Enforces the "short-lived function" contract; long jobs use ECS/EC2