System Design Interview Questions

Question 1

What is horizontal scaling vs vertical scaling?

Answer

Vertical scaling (scaling up) adds more CPU, RAM, or storage to an existing server, which is simpler but has hardware limits. Horizontal scaling (scaling out) adds more servers to distribute the load, providing better fault tolerance and near-unlimited growth. Most large-scale systems use horizontal scaling with load balancers to distribute traffic across multiple machines.

Question 2

How would you design a URL shortening service like Bitly?

Answer

Use a hash or base62-encoded auto-incrementing ID to generate short codes, storing the mapping in a database with the short code as the primary key. Reads are served from a cache layer (Redis) since URL lookups are far more frequent than writes. For scale, partition the database by short code prefix and use multiple application servers behind a load balancer. Track analytics asynchronously via a message queue to avoid slowing down redirects.

Question 3

What is the CAP theorem?

Answer

The CAP theorem states that a distributed system can only guarantee two of three properties simultaneously: Consistency (all nodes see the same data), Availability (every request receives a response), and Partition Tolerance (the system continues operating despite network failures). Since network partitions are unavoidable in distributed systems, the practical choice is between CP (consistency during partitions) and AP (availability during partitions).

Question 4

How does a load balancer work and what are common algorithms?

Answer

A load balancer distributes incoming traffic across multiple servers to prevent any single server from becoming overwhelmed. Common algorithms include round-robin (sequential rotation), least connections (routes to the server with fewest active connections), weighted round-robin (accounts for server capacity), and consistent hashing (maps requests to servers based on key hash). Health checks ensure traffic is only sent to healthy servers.

Question 5

What are the differences between SQL and NoSQL databases?

Answer

SQL databases use structured schemas with tables and relations, support ACID transactions, and are ideal for complex queries and data integrity requirements. NoSQL databases (document, key-value, column-family, graph) offer flexible schemas, horizontal scalability, and high throughput for specific access patterns. Choose SQL when you need strong consistency and complex joins; choose NoSQL when you need flexible schemas, massive scale, or specialized data models.

Question 6

How would you design a rate limiter?

Answer

Common algorithms include token bucket (tokens are added at a fixed rate and consumed per request), sliding window counter (tracks request counts in time windows), and leaky bucket (processes requests at a constant rate). Store counters in Redis for distributed rate limiting across multiple servers. Return HTTP 429 when limits are exceeded, and include rate limit headers so clients can self-throttle.

Question 7

What is database sharding and what are its challenges?

Answer

Sharding is the practice of splitting a database into smaller partitions (shards) distributed across multiple servers, each holding a subset of the data. Common strategies include range-based (by ID ranges), hash-based (by hash of a key), and directory-based sharding. Challenges include cross-shard queries, rebalancing data when adding shards, maintaining referential integrity, and increased operational complexity.

Question 8

Explain the concept of eventual consistency.

Answer

Eventual consistency is a model where distributed replicas may temporarily have different data, but all replicas will converge to the same state given enough time without new updates. It trades immediate consistency for higher availability and lower latency. Systems like DynamoDB, Cassandra, and DNS use eventual consistency, often with configurable consistency levels to balance between latency and freshness.

Question 9

How would you design a notification system?

Answer

Use a message queue (Kafka or RabbitMQ) to decouple notification creation from delivery, with separate workers for each channel (push, email, SMS). Store notification templates and user preferences in a database to handle routing and personalization. Implement idempotency keys to prevent duplicate sends, and use exponential backoff with dead-letter queues for failed deliveries. Priority queues ensure time-sensitive notifications are processed first.

Question 10

What is a CDN and when should you use one?

Answer

A Content Delivery Network is a geographically distributed network of servers that caches content closer to users, reducing latency and offloading traffic from origin servers. CDNs are ideal for static assets (images, CSS, JavaScript), video streaming, and any content that is read-heavy and infrequently updated. They also provide DDoS protection and SSL termination at the edge.

Question 11

What are microservices and when are they appropriate?

Answer

Microservices are an architectural pattern where an application is composed of small, independently deployable services that communicate over APIs. Each service owns its data and business logic, enabling independent scaling, technology choices, and team autonomy. They are appropriate for large, complex applications with multiple teams, but add overhead in terms of networking, data consistency, and operational complexity that may not be justified for smaller applications.

Question 12

How would you design a distributed cache?

Answer

Use consistent hashing to distribute keys across cache nodes, minimizing redistribution when nodes are added or removed. Implement replication for fault tolerance, with each key stored on multiple nodes. Use TTL-based expiration combined with cache invalidation events for freshness. Consider cache-aside (application manages cache), write-through (cache updates synchronously with database), or write-behind (cache updates asynchronously) strategies based on consistency needs.

Question 13

What is an API gateway and what are its responsibilities?

Answer

An API gateway is a single entry point that sits between clients and backend services, handling cross-cutting concerns like authentication, rate limiting, request routing, protocol translation, and response aggregation. It simplifies client interactions by providing a unified interface to multiple microservices. Popular implementations include Kong, AWS API Gateway, and Netflix Zuul.

Question 14

How do you handle data consistency across microservices?

Answer

The Saga pattern coordinates transactions across services using a sequence of local transactions with compensating actions for rollback. Event sourcing stores all state changes as immutable events, allowing services to reconstruct state and stay synchronized. The outbox pattern ensures reliable event publishing by writing events to a local database table and asynchronously dispatching them. Choose the approach based on consistency requirements and complexity tolerance.

Question 15

What strategies would you use for database read scaling?

Answer

Read replicas distribute read queries across multiple database copies, with the primary handling writes and replicating to secondaries. Caching layers (Redis, Memcached) serve frequently accessed data without hitting the database. Materialized views precompute complex queries for fast reads. For extreme scale, CQRS (Command Query Responsibility Segregation) separates read and write models entirely, optimizing each for its specific access patterns.