logo

Overview

Source: The System Design Newsletter — Neo Kim
Google Docs enables multiple users to edit the same document simultaneously with real-time synchronization. Solving collaborative editing at scale requires elegant solutions to conflict resolution, consistency, and low-latency propagation.

Key Concepts

Operational Transformation (OT) — The original algorithm Google Docs uses to merge concurrent edits. Every operation is transformed relative to previously applied operations so all clients converge on the same document state.
Conflict-Free Replicated Data Type (CRDT) — A newer approach where data structures are designed so concurrent edits can be merged automatically without conflict. No central coordination needed.
WebSocket — Persistent bidirectional connection between client and server enabling low-latency real-time collaboration.

Core Components

  • Client Editor — Rich-text editor running in the browser. Captures user operations and sends them to the server.
  • Collaboration Service — Receives operations, applies OT, broadcasts transformed operations to all connected clients.
  • Document Storage — Persistent store for document snapshots and operation logs.
  • Presence Service — Tracks which users are in the document, their cursor positions, and selection ranges.
  • Revision History Service — Records all operations for full undo/redo and version history.

Collaborative Editing Flow

  1. User A types a character → operation created (e.g., insert('h', position=5))
  1. Operation sent to Collaboration Service via WebSocket
  1. Server applies OT: transforms op against any concurrent ops from other users
  1. Transformed op broadcast to all other connected clients
  1. Each client applies the op to their local document state
  1. All clients converge to the same document state

Conflict Scenario Example

  • User A: insert('X', position=2) at time T
  • User B: insert('Y', position=2) at time T
  • Without OT: one insert is lost
  • With OT: server transforms B's op to insert('Y', position=3) → both characters appear

Scale & Reliability Patterns

  • Optimistic local application — Client applies op locally before server confirmation for zero perceived latency
  • Operation log — Every change is an append-only log entry; documents can be reconstructed from scratch
  • Checkpointing — Periodic snapshots prevent replaying the entire operation log on load
  • Cursor broadcasting — Presence positions sent as lightweight deltas, not full state

Key Trade-offs

Decision
Reasoning
WebSocket over HTTP polling
Real-time bidirectional — no polling delay
OT over last-write-wins
Preserves all concurrent edits
Optimistic UI updates
Instant feedback even on slow connections
Server-side transformation
Single source of truth prevents divergence