Overview
Source: The System Design Newsletter — Neo Kim
Spotify serves 600M+ users streaming 100M+ tracks. Its architecture solves audio streaming, personalized recommendations, offline sync, and social features at planetary scale — all while keeping buffering to near-zero.
Key Concepts
Audio Transcoding — Tracks are encoded at multiple bitrates (24 kbps, 96 kbps, 160 kbps, 320 kbps) and formats (Ogg Vorbis, AAC, MP3) to serve different devices and network conditions.
Adaptive Streaming — Client requests audio chunks and adjusts quality based on available bandwidth (similar to video ABR).
Collaborative Filtering — Recommendation algorithm that surfaces music based on listening patterns of users with similar taste. Powers Discover Weekly.
Offline Sync — Premium users can download tracks for offline playback. Downloaded tracks are DRM-encrypted.
Core Components
- Audio Storage — Tracks stored in GCS/object storage. Multiple encoded variants per track.
- CDN — Audio chunks cached at edge nodes globally. Most playback served from CDN, not origin.
- Streaming Service — Handles audio chunk delivery. Generates time-limited signed URLs for CDN chunks.
- Catalog Service — Stores track metadata: title, artist, album, duration, genre, lyrics.
- Recommendation Engine — Collaborative filtering + NLP on audio features. Runs batch ML jobs (on Hadoop/Spark) + real-time inference.
- Search Service — Elasticsearch-backed search over tracks, artists, albums, playlists.
- Playlist Service — Manages user-created and algorithmic playlists (Discover Weekly, Daily Mixes).
- Social Service — Friend activity feed, shared playlists, collaborative playlists.
- Podcast Service — Separate pipeline for podcast ingestion, hosting, and analytics.
Audio Streaming Flow
- User presses play
- Client requests track metadata from Catalog Service
- Streaming Service generates signed CDN URL for audio chunks
- Client fetches first chunk from nearest CDN edge node
- Client pre-buffers next N chunks in background
- Playback continues seamlessly while buffer refills
Recommendation System (Discover Weekly)
- Weekly batch job analyzing 600M+ user listening histories
- Collaborative filtering: "users who like X also like Y"
- Audio feature analysis (tempo, key, danceability) using ML models
- Natural Language Processing on blogs, playlists, and lyrics
- Results materialized as personalized playlist every Monday morning
Scale Characteristics
- 600M+ monthly active users, 100M+ tracks
- Audio chunks pre-cached at CDN edge — origin rarely hit for popular tracks
- Recommendation batch jobs run on Hadoop/Spark clusters
- Event streaming via Kafka for listening event analytics
Key Trade-offs
Decision | Reasoning |
Multiple bitrate variants | Adaptive quality across all network conditions |
Signed CDN URLs | Security without routing audio through origin servers |
Batch recommendations | Weekly freshness acceptable; real-time ML too expensive |
DRM for offline | Protects label agreements while enabling premium feature |