logo

Overview

Source: The System Design Newsletter — Neo Kim
Spotify serves 600M+ users streaming 100M+ tracks. Its architecture solves audio streaming, personalized recommendations, offline sync, and social features at planetary scale — all while keeping buffering to near-zero.

Key Concepts

Audio Transcoding — Tracks are encoded at multiple bitrates (24 kbps, 96 kbps, 160 kbps, 320 kbps) and formats (Ogg Vorbis, AAC, MP3) to serve different devices and network conditions.
Adaptive Streaming — Client requests audio chunks and adjusts quality based on available bandwidth (similar to video ABR).
Collaborative Filtering — Recommendation algorithm that surfaces music based on listening patterns of users with similar taste. Powers Discover Weekly.
Offline Sync — Premium users can download tracks for offline playback. Downloaded tracks are DRM-encrypted.

Core Components

  • Audio Storage — Tracks stored in GCS/object storage. Multiple encoded variants per track.
  • CDN — Audio chunks cached at edge nodes globally. Most playback served from CDN, not origin.
  • Streaming Service — Handles audio chunk delivery. Generates time-limited signed URLs for CDN chunks.
  • Catalog Service — Stores track metadata: title, artist, album, duration, genre, lyrics.
  • Recommendation Engine — Collaborative filtering + NLP on audio features. Runs batch ML jobs (on Hadoop/Spark) + real-time inference.
  • Search Service — Elasticsearch-backed search over tracks, artists, albums, playlists.
  • Playlist Service — Manages user-created and algorithmic playlists (Discover Weekly, Daily Mixes).
  • Social Service — Friend activity feed, shared playlists, collaborative playlists.
  • Podcast Service — Separate pipeline for podcast ingestion, hosting, and analytics.

Audio Streaming Flow

  1. User presses play
  1. Client requests track metadata from Catalog Service
  1. Streaming Service generates signed CDN URL for audio chunks
  1. Client fetches first chunk from nearest CDN edge node
  1. Client pre-buffers next N chunks in background
  1. Playback continues seamlessly while buffer refills

Recommendation System (Discover Weekly)

  • Weekly batch job analyzing 600M+ user listening histories
  • Collaborative filtering: "users who like X also like Y"
  • Audio feature analysis (tempo, key, danceability) using ML models
  • Natural Language Processing on blogs, playlists, and lyrics
  • Results materialized as personalized playlist every Monday morning

Scale Characteristics

  • 600M+ monthly active users, 100M+ tracks
  • Audio chunks pre-cached at CDN edge — origin rarely hit for popular tracks
  • Recommendation batch jobs run on Hadoop/Spark clusters
  • Event streaming via Kafka for listening event analytics

Key Trade-offs

Decision
Reasoning
Multiple bitrate variants
Adaptive quality across all network conditions
Signed CDN URLs
Security without routing audio through origin servers
Batch recommendations
Weekly freshness acceptable; real-time ML too expensive
DRM for offline
Protects label agreements while enabling premium feature