Source: Scalable System Design Patterns
These notes condense the eight patterns described by Ricky Ho for building highly–scalable distributed systems. Each section captures the core idea, workflow, trade‑offs, and real‑world examples.

1. Load Balancer

Load Balancer diagram Idea: A dispatcher chooses one worker (via round‑robin, least‑busy, sticky‑session, etc.) and forwards the client request. Workers are stateless so any instance can handle the request. citeturn1view0 Use When
  • Horizontal request fan‑out is required.
  • Reads/writes are evenly distributed across identical nodes.
ProsCons
Easy to add capacity by adding workersDispatcher can become a bottleneck / SPOF (use multiple LBs + health‑checks)
Supports zero‑downtime rolling deploysRequires sticky sessions or external session store for stateful apps
Examples: NGINX/Envoy L7 LB, HAProxy in front of microservices, AWS ALB/ELB.

2. Scatter & Gather

Scatter & Gather diagram Idea: Dispatcher broadcasts a request to all workers, waits for their partial results, then merges them into one response. citeturn1view0 Use When
  • Query needs data that is sharded across nodes (e.g., search indexes, sharded DBs).
  • Latency ≤ slowest shard; workers run in parallel.
ProsCons
Linear speed‑up with number of shardsTail‑latency of slowest shard dictates overall latency
Simple concurrency modelRequires aggregation logic & partial‑result schema
Examples: Google Web Search fan‑out, ElasticSearch / Solr distributed search.

3. Result Cache

Result Cache diagram Idea: Dispatcher first checks cache; if hit return cached value, else compute via worker and store. citeturn1view0 Use When
  • High read‑to‑write ratio with repeated queries.
  • Tolerable data staleness (use TTL, explicit invalidation).
Tech: Memcached, Redis, CDN edge caches.

4. Shared Space (Blackboard)

Shared Space diagram Idea: Workers collaborate via a tuple‑space, continuously enriching data until a final solution emerges. citeturn1view0 Use When
  • Problem can be solved by incremental knowledge accumulation (e.g., expert systems).
  • Loose coupling between producers & consumers.
Examples: JavaSpaces, GigaSpaces, Linda coordination model.

5. Pipe & Filter

Pipe & Filter diagram Idea: Data flows through a series of processing stages (filters) connected by queues (pipes). Each filter is independent and stateless. citeturn1view0 Use When
  • Multi‑step ETL pipelines, media transcoding, log processing.
  • Need back‑pressure & decoupled stages.
ProsCons
Easy parallelism per stageEnd‑to‑end latency = Σ stage latencies
Replace/scale individual filters independentlyQueue size tuning & ordering guarantees needed
Tech: Unix pipes, Apache Beam, Kafka Streams.

6. Map‑Reduce

Map Reduce diagram Idea: Batch job splits data across distributed file system blocks, runs Mappers, shuffles intermediate keys, then Reducers consolidate output. citeturn1view0 Use When
  • Dataset ≫ memory; throughput more important than latency.
  • Embarrassingly parallel transformations, log analytics, offline ML.
Tech: Hadoop, Spark (RDD lineage follows MR spirit), Google MapReduce.

7. Bulk Synchronous Parallel (BSP)

Bulk Synchronous Parallel diagram Idea: Master coordinates lock‑step “supersteps”: each worker reads local data, processes, sends messages, then global barrier sync. Repeat until convergence. citeturn1view0 Use When
  • Graph algorithms (PageRank, BFS) requiring iterative, message‑passing computation.
Tech: Google Pregel, Apache Giraph, Apache Hama.

8. Execution Orchestrator

Execution Orchestrator diagram Idea: Smart scheduler translates a directed acyclic task graph into runnable tasks dispatched across dumb workers, handling dependencies & retries. citeturn1view0 Use When
  • Complex DAG workflows (e.g., machine‑learning pipelines, video encoding trees).
  • Mix of CPU & I/O tasks with varied runtimes.
Tech: Microsoft Dryad, Apache Airflow, Google Cloud Dataflow.

Quick Comparison

PatternConcurrency ModelTypical LatencyBest For
Load BalancerParallel, single responseLowStateless microservices
Scatter‑GatherFan‑out, aggregateMediumSharded queries
Result CacheLookup firstSub‑ms (hit)Read‑heavy workloads
Shared SpaceShared mutable storeHigh / iterativeCollaborative reasoning
Pipe‑FilterSequential stagesPer‑stageETL pipelines
Map‑ReduceBatch, two‑phaseMinutes+Massive offline jobs
BSPIterative barriersHighGraph processing
Execution OrchestratorDAG schedulingDependsHeterogeneous workflows

Further Reading

  • Ricky Ho, Scalable System Design Patterns (original blog)
  • Dean & Ghemawat, MapReduce: Simplified Data Processing on Large Clusters
  • Malewicz et al., Pregel: A System for Large‑Scale Graph Processing