- Ask before design
- Keep it simple (people prefer simple system over complex one)
- Working solution, not perfect solution
- Analysis > Solution. Tradeoffs!! Bottlenecks
The entire interview is around 45mins. Deducting 10 mins for questions, we have ~35mins for design.
- Functional requirements (Ask question; clarify workflow)
- Non-functional requirements
- CAP (AP or CP?)
- Partition tolerant
- Availability (system downtime)
- 5 = 99.999% = 5.26 minutes / year (AWS standard; highly available)
- Data Replication (for disaster recovery)
- Load balancing (global → different region; local → exclude dead servers)
- Consistency (eventual consistency; read your write; transaction)
- Latency.
- Good DB choice (e.g. Bigtable for thumbnails)
- Cache
- CDN
- Geographically distributed servers
- Scalability
- Horizontal Scaling
- Load Balancer
- Data Sharding
- Reliability / Durability (data loss; redundancy)
- Data Replication
- Message Queue (producer and consumer can fail independently without affecting the others)
- Fault-tolerance
- Failover
- Data Replication
- Read-heavy or Write-heavy?
- Observability
- Maintainability
- Micro-service (Scale independently)
- Message Queue
- API Design
- Don't have to talk about the details of the API signatures.
- DB Design (Tradeoff)
- Entity
- Relationship
- 1:1: single table OR 2 tables with the same primary key
- 1:m: foreign key
- m:n: middle (relationship) table
- DB Choice
- Estimate (Account for spikes, growth rate and redundancy)
- Assumptions:
- Total Users (for total storage, etc), DAU (for QPS)
- Rate (10 photo uploads / s)
- File size
- Considerations:
- QPS (Read; Write)
- Storage
- Bandwidth
Limit the above to 10~15mins. Save time for the high-level design.
- High-level design
- Detailed design
- Ask the interviewer which part he/she is interested in
- Check if we achieved the non-functional requirements
- Fault-tolerance:
- Failover (active-active; active-passive)
- Data Replication (Single-leader; Multi-leader; Leaderless)
- Scaling:
- Load-balancer (round-robin; minimum connection; minimum load)
- Data Sharding
- Reliability/Redundancy/Availability: Data Replication (Single-leader; Multi-leader; Leaderless)