Source: Scalable System Design PatternsThese notes condense the eight patterns described by Ricky Ho for building highly–scalable distributed systems. Each section captures the core idea, workflow, trade‑offs, and real‑world examples.
1. Load Balancer
Idea: A dispatcher chooses one worker (via round‑robin, least‑busy, sticky‑session, etc.) and forwards the client request.
Workers are stateless so any instance can handle the request. citeturn1view0
Use When
- Horizontal request fan‑out is required.
- Reads/writes are evenly distributed across identical nodes.
| Pros | Cons |
|---|---|
| Easy to add capacity by adding workers | Dispatcher can become a bottleneck / SPOF (use multiple LBs + health‑checks) |
| Supports zero‑downtime rolling deploys | Requires sticky sessions or external session store for stateful apps |
2. Scatter & Gather
Idea: Dispatcher broadcasts a request to all workers, waits for their partial results, then merges them into one response. citeturn1view0
Use When
- Query needs data that is sharded across nodes (e.g., search indexes, sharded DBs).
- Latency ≤ slowest shard; workers run in parallel.
| Pros | Cons |
|---|---|
| Linear speed‑up with number of shards | Tail‑latency of slowest shard dictates overall latency |
| Simple concurrency model | Requires aggregation logic & partial‑result schema |
3. Result Cache
Idea: Dispatcher first checks cache; if hit return cached value, else compute via worker and store. citeturn1view0
Use When
- High read‑to‑write ratio with repeated queries.
- Tolerable data staleness (use TTL, explicit invalidation).
4. Shared Space (Blackboard)
Idea: Workers collaborate via a tuple‑space, continuously enriching data until a final solution emerges. citeturn1view0
Use When
- Problem can be solved by incremental knowledge accumulation (e.g., expert systems).
- Loose coupling between producers & consumers.
5. Pipe & Filter
Idea: Data flows through a series of processing stages (filters) connected by queues (pipes). Each filter is independent and stateless. citeturn1view0
Use When
- Multi‑step ETL pipelines, media transcoding, log processing.
- Need back‑pressure & decoupled stages.
| Pros | Cons |
|---|---|
| Easy parallelism per stage | End‑to‑end latency = Σ stage latencies |
| Replace/scale individual filters independently | Queue size tuning & ordering guarantees needed |
6. Map‑Reduce
Idea: Batch job splits data across distributed file system blocks, runs Mappers, shuffles intermediate keys, then Reducers consolidate output. citeturn1view0
Use When
- Dataset ≫ memory; throughput more important than latency.
- Embarrassingly parallel transformations, log analytics, offline ML.
7. Bulk Synchronous Parallel (BSP)
Idea: Master coordinates lock‑step “supersteps”: each worker reads local data, processes, sends messages, then global barrier sync. Repeat until convergence. citeturn1view0
Use When
- Graph algorithms (PageRank, BFS) requiring iterative, message‑passing computation.
8. Execution Orchestrator
Idea: Smart scheduler translates a directed acyclic task graph into runnable tasks dispatched across dumb workers, handling dependencies & retries. citeturn1view0
Use When
- Complex DAG workflows (e.g., machine‑learning pipelines, video encoding trees).
- Mix of CPU & I/O tasks with varied runtimes.
Quick Comparison
| Pattern | Concurrency Model | Typical Latency | Best For |
|---|---|---|---|
| Load Balancer | Parallel, single response | Low | Stateless microservices |
| Scatter‑Gather | Fan‑out, aggregate | Medium | Sharded queries |
| Result Cache | Lookup first | Sub‑ms (hit) | Read‑heavy workloads |
| Shared Space | Shared mutable store | High / iterative | Collaborative reasoning |
| Pipe‑Filter | Sequential stages | Per‑stage | ETL pipelines |
| Map‑Reduce | Batch, two‑phase | Minutes+ | Massive offline jobs |
| BSP | Iterative barriers | High | Graph processing |
| Execution Orchestrator | DAG scheduling | Depends | Heterogeneous workflows |
Further Reading
- Ricky Ho, Scalable System Design Patterns (original blog)
- Dean & Ghemawat, MapReduce: Simplified Data Processing on Large Clusters
- Malewicz et al., Pregel: A System for Large‑Scale Graph Processing