Overview
Source: The System Design Newsletter — Neo Kim
Google Docs enables multiple users to edit the same document simultaneously with real-time synchronization. Solving collaborative editing at scale requires elegant solutions to conflict resolution, consistency, and low-latency propagation.
Key Concepts
Operational Transformation (OT) — The original algorithm Google Docs uses to merge concurrent edits. Every operation is transformed relative to previously applied operations so all clients converge on the same document state.
Conflict-Free Replicated Data Type (CRDT) — A newer approach where data structures are designed so concurrent edits can be merged automatically without conflict. No central coordination needed.
WebSocket — Persistent bidirectional connection between client and server enabling low-latency real-time collaboration.
Core Components
- Client Editor — Rich-text editor running in the browser. Captures user operations and sends them to the server.
- Collaboration Service — Receives operations, applies OT, broadcasts transformed operations to all connected clients.
- Document Storage — Persistent store for document snapshots and operation logs.
- Presence Service — Tracks which users are in the document, their cursor positions, and selection ranges.
- Revision History Service — Records all operations for full undo/redo and version history.
Collaborative Editing Flow
- User A types a character → operation created (e.g.,
insert('h', position=5))
- Operation sent to Collaboration Service via WebSocket
- Server applies OT: transforms op against any concurrent ops from other users
- Transformed op broadcast to all other connected clients
- Each client applies the op to their local document state
- All clients converge to the same document state
Conflict Scenario Example
- User A:
insert('X', position=2)at time T
- User B:
insert('Y', position=2)at time T
- Without OT: one insert is lost
- With OT: server transforms B's op to
insert('Y', position=3)→ both characters appear
Scale & Reliability Patterns
- Optimistic local application — Client applies op locally before server confirmation for zero perceived latency
- Operation log — Every change is an append-only log entry; documents can be reconstructed from scratch
- Checkpointing — Periodic snapshots prevent replaying the entire operation log on load
- Cursor broadcasting — Presence positions sent as lightweight deltas, not full state
Key Trade-offs
Decision | Reasoning |
WebSocket over HTTP polling | Real-time bidirectional — no polling delay |
OT over last-write-wins | Preserves all concurrent edits |
Optimistic UI updates | Instant feedback even on slow connections |
Server-side transformation | Single source of truth prevents divergence |