The 4-Step process for Effective System Design Interview
A methodical, step-by-step approach is key to tackling any high-level design (HLD) question. Start by gathering requirements.
Ask clarifying questions to fully understand the functional requirements (core features and use cases) and non-functional
requirements (scale, latency, availability). For example, clarify user actions, expected traffic, and performance goals
up front. Write down what must be built and any constraints. You may even explicitly mark some advanced features or edge
cases as “out of scope” if time is limited. This ensures focus on the core problem. In practice, interviewers expect you
to treat the problem as a black box first and list what needs to be done, not how, before diving into architecture.
List the main user-facing features (e.g. user signup, posting content, messaging) and key use-cases. Ask about workflow
details (authentication, error cases) to avoid assumptions.
Identify targets for scale (DAU, QPS), latency (e.g. “sub-100ms responses”), consistency (strong vs eventual), availability
(“five 9’s”), and data retention. These shape the architecture.
Next, do back-of-the-envelope calculations to size the system. Use the requirements to estimate traffic and data volumes.
Compute metrics like daily active users (DAU), requests per second (RPS/QPS), and data growth. For example, if you expect
1 million daily users making 10 requests each per day, that’s roughly 1M * 10 / 86,400 ≈ 115 QPS. Similarly, estimate average
payload sizes and multiply by QPS to get bandwidth. Determine how much disk storage is needed (number of objects * size per object)
and how much RAM is needed to cache the hottest items.Traffic Calculation: Use DAU and user behavior to get RPS:\
Example: 1M DAU × 10 ops/day → ~115 QPS
Factor read/write ratio: e.g. 100:1 means read-heavy, justifying more caching.
Data Storage: Estimate number and size of database records. E.g. 100M users × 1KB per profile = ~100GB data. Factor in indexes and backups.
Memory & Cache: Calculate needed RAM for caching hot data (e.g. user sessions, popular objects). If each cached item is 500B and 1M items are hot,
that’s ~500MB RAM.Bandwidth: Multiply data size per request by QPS to ensure network capacity.These approximations will guide your architectural choices (e.g. horizontal scaling, sharding strategy) and demonstrate you can handle real-world scale.read more
Now sketch the overall system, focusing on major components and their interactions (a block diagram). Include elements
like API gateways or load balancers, stateless application servers/services, databases, caches, and external interfaces
(e.g. CDNs). For example, in a social network you might draw blocks for “User Service”, “Post Service”, and “Notification
Service”, with arrows showing how they communicate. Use common patterns: load balancers in front of server fleets, CDN
in front of static content, message queues between services, etc. The goal is to convey the big picture before diving
into details.
Specify the API layer and service interfaces. Define the main endpoints or RPC calls that satisfy the functional requirements.
Use clear RESTful naming or RPC conventions. For instance, you might have POST /users, GET /users/{id}/feed, POST /posts, etc.
Describe the request/response formats in brief. Ensure each service is stateless if possible (this makes horizontal scaling easier).
Endpoints: Enumerate key APIs and what they do. E.g. “CreatePost(userID, content)”, “GetNewsFeed(userID, pageToken)”, “LikePost(userID, postID)”. Mention input parameters and output (ID, success flag, etc.).
Communication: For microservices, define how they call each other (synchronous REST/gRPC or asynchronous queues). Mention any service discovery or gateway if used.
Idempotency & Versioning: Note if any operations need idempotency (e.g. retry-safe) or how you’d handle API versioning.
RESTful/GraphQL: Use REST principles (noun-based resources) or GraphQL if many client-specific queries are needed. The key is clarity and consistency.
Design the data layer. Decide on storage technologies (SQL vs NoSQL vs specialized stores) based on the requirements. For
example, user/account data with complex relationships might use a relational database, whereas high-volume feed or session
data might use a NoSQL store. Outline key entities and their fields. Sketch an ER diagram or collection schema for core
entities (User, Post, Comment, etc.) and their relations.
Database Choice: Explain your choice: e.g. “We’ll use a relational DB for transactional data (strong consistency), and
a distributed NoSQL or search index for the feed (high scalability)”. Mention trade-offs like SQL’s ACID vs NoSQL’s scalability.
Schema Design: List major tables/collections and attributes. E.g. User(id, name, email), Post(id, user_id, content, timestamp),
etc. Define primary keys and indexes.
Partitioning: Plan how data is partitioned (sharded). For instance, shard by user ID or by time range if appropriate.
Ensure each shard can fit on one machine (based on your capacity calc).
Replication: Describe replication strategy (master-slave, multi-master, multi-region) to ensure availability.
Caching: Identify cacheable data and TTLs. For example, use a key-value cache (Redis/Memcached) for user sessions or popular post objects to reduce DB reads.