16 · How AWS Lambda Works

Overview

Source: The System Design Newsletter — Neo Kim

AWS Lambda is a serverless compute platform where developers deploy functions without managing servers. Lambda must instantly provision isolated execution environments, handle millions of concurrent invocations, and return to idle efficiently — all while providing strong security isolation.

Key Concepts

Serverless — The cloud provider handles all infrastructure: servers, scaling, patching, and capacity planning. Developers only write and deploy code.

Function — A self-contained unit of code (handler) triggered by events. Functions are stateless; any state must be externalized (S3, DynamoDB, etc.).

Cold Start — The latency penalty for initializing a new execution environment: downloading the function package, starting the runtime, and running initialization code. Typically 100ms–1s+.

Warm Start — Reusing an already-initialized execution environment for subsequent invocations. Near-zero overhead.

Firecracker — AWS-built open-source microVM technology. Creates lightweight VMs in ~125ms with very low memory overhead. Powers Lambda's isolation model.

Core Components

Frontend Service — Receives invocation requests (API Gateway, S3 events, SQS, etc.). Routes to Worker Manager.

Worker Manager — Manages the pool of available execution environments (warm/cold). Assigns invocations to workers.

Worker (Firecracker MicroVM) — Isolated execution environment. Each function invocation runs inside its own microVM slot.

Placement Service — Decides which physical host a new microVM should run on (bin-packing by memory/CPU).

Control Plane — Manages function metadata: code package location, runtime, memory config, concurrency limits.

Sandbox — The actual process inside the microVM running the function handler.

Invocation Flow

Event triggers Lambda invocation (API Gateway, S3 event, SQS message, etc.)

Frontend Service routes request to Worker Manager

Worker Manager checks for a warm execution environment for this function:

Warm: Assign immediately → inject event → run handler

Cold: Request new microVM from Placement Service → provision Firecracker VM → download code package → start runtime → run handler

Handler executes, returns response

Execution environment held warm for ~15 minutes for potential reuse

After timeout, environment is frozen or terminated

Cold Start Breakdown

Phase	Time
Firecracker VM boot	~125ms
Runtime init (JVM, Node, Python)	50–500ms
Function initialization code	Variable
Total (worst case)	100ms – 2s+

Concurrency & Scaling

Each concurrent invocation gets its own execution environment

Default limit: 1,000 concurrent executions per account per region (soft limit)

Provisioned Concurrency — Pre-warms N environments to eliminate cold starts for latency-sensitive workloads

Scaling is near-instant for event-driven workloads (SQS, Kinesis)

Key Trade-offs

Decision	Reasoning
Firecracker microVMs	Strong isolation (vs. containers) with low overhead
Stateless functions	Enables horizontal scaling with no coordination
Warm environment reuse	Amortizes cold start cost across many invocations
15-min max execution	Enforces the "short-lived function" contract; long jobs use ECS/EC2