In a system‑design interview, “back‑of‑the‑envelope” (BoE) calculations are quick, order‑of‑magnitude estimates that ground your high‑level architecture in reality. They answer one overarching question:
“How big does each part of my system need to be so it can handle the expected workload with acceptable performance and cost?”
To do that, you typically estimate five classes of numbers:
rWhat you estimateWhy it mattersTypical outputs
1Traffic volumesDrives capacity decisions everywhere else1. Daily active users (DAU) 2. Requests per second (RPS/QPS) 3. Peak vs. average load factors
2Data size & growthGuides storage engines, sharding strategy, and retention policies1. Bytes/row or object 2. Total rows/objects per day 3. Storage after 1year, 3 years
3Throughput & bandwidthDetermines network links, replication costs, CDN usage1. Ingest MB/s 2. Egress MB/s to clients 3. Replication traffic between DCs
4Latency budgetsShapes caching layers, queue depths, timeouts1. Client‑visible SLA (e.g., ‑‑p99 < 200ms) 2. Per‑hop allocation (frontend, cache, DB)
5Hardware / cost footprintJustifies design trade‑offs and shows business awareness1. No. app servers, DB shards, cache nodes 2. Monthly cloud bill rough‑cut

How those numbers are used during the interview

  1. Validate feasibility Show that your design can actually handle 10M QPS without a single‑node bottleneck.
  2. Justify component choices • “A Redis cache reduces DB load from 30k to 3k QPS, so we only need 4 DB shards instead of 40.” • “A CDN cuts egress from origin by 95%, saving≈$X/month.”
  3. Expose trade‑offs Invite discussion about why you chose SSDs over HDDs, or DynamoDB over PostgreSQL, in light of the numbers.
  4. Guide prioritisation Huge storage growth? Plan for lifecycle policies and cold storage first. Tight latency SLO? Focus on cache and request fan‑out next.

Typical BoE workflow (2–3min)

  1. Start with the user activity “Suppose we have 50M DAU; on an average, each opens the app 5 times/day ⇒ 250M sessions/day …”
  2. Derive peak traffic Use a peak/average ratio (often 5–10×). “… peak ≈15k requests/s.”
  3. Compute per‑request data Payload, metadata, DB rows touched. “… profile pic upload: 4MB * 2M/day = 8TB/day.”
  4. Roll up totals and apply headroom Add 50%–100% safety margin to anticipate growth and bursts.
  5. Sanity‑check Compare numbers with known reference points (Twitter, Instagram) so the interviewer sees you’re calibrated.

Key interview tips

Round aggressively – 1.6TB → “about 2TB”. Precision is less important than reasoning speed. Narrate assumptions – The interviewer can correct unrealistic ones, giving you richer guidance. Keep a small “cheat sheet” of latency reference numbers (L1 cache, SSD read, cross‑DC RTT) to justify claims. Use the numbers – Don’t let them sit on the whiteboard; drive design decisions with them.

Powers of two table

PowerExact ValueApprox ValueBytes
7128
8256
1010241 thousand1 KB
1665,53664 KB
201,048,5761 million1 MB
301,073,741,8241 billion1 GB
324,294,967,2964 GB
401,099,511,627,7761 trillion1 TB

Latency numbers every programmer should know

latency-numbers
OperationLatency (ns)Latency (us)Latency (ms)Notes
L1 cache reference0.5
Branch mispredict5
L2 cache reference714x L1 cache
Mutex lock/unlock25
Main memory reference10020x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy10,00010
Send 1 KB over 1 Gbps network10,00010
Read 4 KB randomly from SSD*150,000150~1GB/sec SSD
Read 1 MB sequentially from memory250,000250
Round trip within same datacenter500,000500
Read 1 MB sequentially from SSD*1,000,0001,0001~1GB/sec SSD, 4X memory
HDD seek10,000,00010,0001020x datacenter roundtrip
Read 1 MB sequentially from 1 Gbps net10,000,00010,0001040x memory, 10X SSD
Read 1 MB sequentially from HDD30,000,00030,00030120x memory, 30X SSD
Send packet CA → Netherlands → CA150,000,000150,000150

Notes

  • 1 ns = 10⁻⁹ seconds
  • 1 µs = 10⁻⁶ seconds = 1,000 ns
  • 1 ms = 10⁻³ seconds = 1,000 µs = 1,000,000 ns
Handy metrics based on the numbers above:
  • Read sequentially from HDD at 30 MB/s
  • Read sequentially from 1 Gbps Ethernet at 100 MB/s
  • Read sequentially from SSD at 1 GB/s
  • Read sequentially from the main memory at 4 GB/s
  • 6–7 worldwide round trips per second
  • 2,000 round trips per second within a data center

Availability numbers

High availability is the ability of a system to be continuously operational for a desirably long period of time. High availability is measured as a percentage, with 100% means a service that has 0 downtime. Most services fall between 99% and 100%. A service level agreement (SLA) is a commonly used term for service providers. This is an agreement between you (the service provider) and your customer, and this agreement formally defines the level of uptime your service will deliver. Cloud providers Amazon [4], Google [5] and Microsoft [6] set their SLAs at 99.9% or above. Uptime is traditionally measured in nines. The more the nines, the better. As shown in Table 3, the number of nines correlate to the expected system downtime.
Availability %Downtime per dayDowntime per weekDowntime per monthDowntime per year
99%14.40 minutes1.68 hours7.31 hours3.65 days
99.99%8.64 seconds1.01 minutes4.38 minutes52.60 minutes
99.999%864.00 milliseconds6.05 seconds26.30 seconds5.26 minutes
99.9999%86.40 milliseconds604.802.63 seconds31.56 seconds

Example: Estimate Twitter QPS and storage requirements

Please note the following numbers are for this exercise only as they are not real numbers from Twitter. Assumptions:
  • 300 million monthly active users.
  • 50% of users use Twitter daily.
  • Users post 2 tweets per day on average.
  • 10% of tweets contain media.
  • Data is stored for 5 years.
Estimations: Query per second (QPS) estimate:
  • Daily active users (DAU) = 300 million * 50% = 150 million
  • Tweets QPS = 150 million * 2 tweets / 24 hour / 3600 seconds = ~3500
  • Peek QPS = 2 * QPS = ~7000
We will only estimate media storage here.
  • Average tweet size:
  • tweet_id 64 bytes
  • text 140 bytes
  • media 1 MB
  • Media storage: 150 million * 2 * 10% * 1 MB = 30 TB per day
  • 5-year media storage: 30 TB * 365 * 5 = ~55 PB

Tips

Back-of-the-envelope estimation is all about the process. Solving the problem is more important than obtaining results. Interviewers may test your problem-solving skills. Here are a few tips to follow: Rounding and Approximation. It is difficult to perform complicated math operations during the interview. For example, what is the result of “99987 / 9.1”? There is no need to spend valuable time to solve complicated math problems. Precision is not expected. Use round numbers and approximation to your advantage. The division question can be simplified as follows: “100,000 / 10”. Write down your assumptions. It is a good idea to write down your assumptions to be referenced later. Label your units. When you write down “5”, does it mean 5 KB or 5 MB? You might confuse yourself with this. Write down the units because “5 MB” helps to remove ambiguity. Commonly asked back-of-the-envelope estimations: QPS, peak QPS, storage, cache, number of servers, etc. You can practice these calculations when preparing for an interview. Practice makes perfect.

Source(s) and further reading