HLD fundamentals
DNS (Domain Name System) is the internet’s phone book. It translates human-friendly domain names into IP addresses, allowing clients to find servers on the internet.
Error generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle DNS resolution flowskinparam backgroundColor #ffffffskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam sequenceArrowColor #334155skinparam sequenceParticipantBorderColor #94a3b8skinparam sequenceParticipantBackgroundColor #f8fafcskinparam noteBackgroundColor #f8fafcskinparam noteBorderColor #94a3b8hide footboxactor Clientparticipant Browser as "Browser Cache"participant OS as "OS Cache"participant Resolver as "DNS Resolver\n(ISP)"participant Root as "Root DNS\nServer"participant TLD as "TLD DNS\nServer (.com)"participant Auth as "Authoritative\nDNS Server"participant Web as "Web Server\n(example.com)"Client -> Browser: Type example.comBrowser -> Browser: Check cachealt Cache Hit Browser --> Client: Return cached IPelse Cache Miss Browser -> OS: Query example.com OS -> OS: Check cache alt OS Cache Hit OS --> Browser: Return cached IP else OS Cache Miss OS -> Resolver: Query example.com Resolver -> Resolver: Check cache alt Resolver Cache Hit Resolver --> OS: Return cached IP else Resolver Cache Miss Resolver -> Root: Where is .com? Root --> Resolver: Ask TLD server at X.X.X.X Resolver -> TLD: Where is example.com? TLD --> Resolver: Ask authoritative at Y.Y.Y.Y Resolver -> Auth: What's IP of example.com? Auth --> Resolver: IP is 93.184.216.34 Resolver -> Resolver: Cache result (TTL) Resolver --> OS: IP is 93.184.216.34 end OS -> OS: Cache result OS --> Browser: IP is 93.184.216.34 end Browser -> Browser: Cache result Browser --> Client: IP is 93.184.216.34endClient -> Web: HTTP request to 93.184.216.34Web --> Client: Web page responselegend rightDNS resolution process with caching at multiple levels.Root servers (13 sets) direct to TLD servers.TLD servers direct to authoritative name servers.TTL (Time To Live) controls cache duration.endlegend@endumlAn API (Application Programming Interface) is a contract or set of rules that defines how one piece of software can request services from another.
REST (Representational State Transfer)
Section titled “REST (Representational State Transfer)”Treats everything as resources (e.g., userProfile), with each resource having a unique URL. REST is stateless—each request contains all information the server needs to fulfill it, enhancing scalability. Commonly used for public web APIs.
Error generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle REST API Patternskinparam backgroundColor #ffffffskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam sequenceArrowColor #334155skinparam sequenceParticipantBorderColor #94a3b8skinparam sequenceParticipantBackgroundColor #f8fafcskinparam noteBackgroundColor #f8fafcskinparam noteBorderColor #94a3b8hide footboxparticipant Clientparticipant "API Server" as APIdatabase "Database" as DBClient -> API: GET /users/123note right Stateless request Contains all needed infoend noteAPI -> DB: Query user 123DB --> API: User dataAPI --> Client: 200 OK\n{"id": 123, "name": "John"}Client -> API: PUT /users/123\n{"name": "Jane"}API -> DB: Update user 123DB --> API: SuccessAPI --> Client: 200 OKlegend rightREST: Resources via HTTP methodsGET (read), POST (create)PUT (update), DELETE (remove)endlegend@endumlRPC (Remote Procedure Call)
Section titled “RPC (Remote Procedure Call)”Enables calling functions on a remote server as if they were local. Focuses on actions and operations rather than resources. Typically used for internal service communication where tight coupling is acceptable or desirable.
How it works: Client uses generated code (stub) that serializes the function call and parameters into binary format (Protocol Buffers), sends it over HTTP/2, and the server deserializes it to execute the function.
Error generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle RPC Pattern (gRPC Example)skinparam backgroundColor #ffffffskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam sequenceArrowColor #334155skinparam sequenceParticipantBorderColor #94a3b8skinparam sequenceParticipantBackgroundColor #f8fafcskinparam noteBackgroundColor #f8fafcskinparam noteBorderColor #94a3b8hide footboxparticipant "Client\n(gRPC stub)" as Clientparticipant "Service A\n(gRPC server)" as ServiceAparticipant "Service B\n(gRPC server)" as ServiceBClient -> ServiceA: HTTP/2 POST\ngetUserById(id=123)\n[protobuf binary]note right of Client 1. Client calls method 2. Stub serializes to protobuf 3. Sends via HTTP/2end noteServiceA -> ServiceA: Deserialize protobuf\nExecute getUserById()ServiceA --> Client: HTTP/2 Response\nUser{id:123, name:"John"}\n[protobuf binary]note left of ServiceA Server deserializes request Executes function Serializes responseend noteClient -> ServiceA: HTTP/2 POST\ncalculateOrderTotal(orderId)\n[protobuf binary]ServiceA -> ServiceB: HTTP/2 POST\ngetOrderItems(orderId)\n[protobuf binary]note right of ServiceA Service-to-service RPC Also over HTTP/2end noteServiceB --> ServiceA: Items list [protobuf]ServiceA -> ServiceA: Calculate totalServiceA --> Client: Total amount [protobuf]legend rightgRPC Details:- Transport: HTTP/2 (multiplexing, streaming)- Serialization: Protocol Buffers (binary, compact)- Code generation: Stubs for type-safe calls- Common for internal microservicesendlegend@endumlGraphQL
Section titled “GraphQL”Gives clients fine-grained control over data retrieval. Instead of fixed REST endpoints, clients specify exactly what data they need in a single request. Uses a strong type system, making it ideal for mobile apps where minimizing data transfer is crucial.
Error generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle GraphQL Query Patternskinparam backgroundColor #ffffffskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam sequenceArrowColor #334155skinparam sequenceParticipantBorderColor #94a3b8skinparam sequenceParticipantBackgroundColor #f8fafcskinparam noteBackgroundColor #f8fafcskinparam noteBorderColor #94a3b8hide footboxparticipant "Mobile App" as Clientparticipant "GraphQL Server" as GQLdatabase "Database" as DBClient -> GQL: POST /graphqlnote right query { user(id: 123) { name email posts(limit: 5) { title } } }end noteGQL -> DB: Resolve user fieldsDB --> GQL: User dataGQL -> DB: Resolve postsDB --> GQL: Posts dataGQL --> Client: Exact fields requested\n(no over-fetching)legend rightGraphQL: Client specifies exact dataSingle endpoint, flexible queriesStrong type systemendlegend@endumlSummary
Section titled “Summary”Error generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle API styles overviewleft to right directionskinparam backgroundColor #ffffffskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam ArrowColor #334155skinparam rectangleBorderColor #94a3b8skinparam rectangleBackgroundColor #f8fafcrectangle "Client needs data" as Startrectangle REST as "REST: resources + URLs + stateless"rectangle RPC as "RPC: procedures + tight coupling"rectangle GQL as "GraphQL: client-specified data"Start --> RESTStart --> RPCStart --> GQL@endumlDatabase
Section titled “Database”Databases store the data that APIs use and serve. There are two main categories: SQL (relational) and NoSQL (non-relational).
SQL (Relational DB)
Section titled “SQL (Relational DB)”SQL databases are like spreadsheets on steroids. They store data in tables with rows and columns using a predefined schema. They guarantee ACID properties:
- Atomicity: Ensures all transactions succeed or none do
- Consistency: Validates all data before and after transactions by enforcing constraints
- Isolation: Prevents concurrent transactions from interfering with each other
- Durability: Guarantees committed transactions persist even after crashes
Examples: MySQL, PostgreSQL, Oracle DB, MS SQL Server (MSSQL)
NoSQL databases offer flexibility beyond rigid table structures. They handle semi-structured or unstructured data efficiently.
Document Databases (MongoDB, CouchDB)
- Store data in JSON-like documents
- Flexible schema for evolving data models
Key-Value Stores (Redis, DynamoDB, Memcached)
- Simple key-value pair storage
- Ideal for caching, session management, and high-speed operations
Column-Family Stores (Cassandra, HBase)
- Store data in columns rather than rows
- Optimized for massive write/read workloads
- Use cases: activity feeds, time-series data, big data analytics
Graph Databases (Neo4j, ArangoDB)
- Store data as nodes (entities) and edges (relationships)
- Perfect when relationships are as important as the data itself
- Use cases: social networks, recommendation systems, fraud detection
Summary
Section titled “Summary”Error generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle Databases overview (SQL vs NoSQL)left to right directionskinparam backgroundColor #ffffffskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam ArrowColor #334155skinparam rectangleBorderColor #94a3b8skinparam rectangleBackgroundColor #f8fafcrectangle DB as Databasesrectangle SQL as "SQL / Relational"rectangle NoSQLrectangle ACID as "ACID: A C I D"rectangle Doc as Documentrectangle KV as "Key-Value"rectangle Col as "Column-family"rectangle GraphDB --> SQLDB --> NoSQLSQL --> ACIDNoSQL --> DocNoSQL --> KVNoSQL --> ColNoSQL --> Graph@endumlKey-Value Stores
Section titled “Key-Value Stores”Key-value databases like Redis and Memcached operate primarily in main memory (RAM), enabling extremely fast read and write operations.
Primary Use Cases:
- Caching: Keep frequently accessed data in memory
- Session Management: Store user session data for quick retrieval
- Real-time Analytics: Process high-velocity data streams
Error generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle KV stores (Redis/Memcached)left to right directionskinparam backgroundColor #ffffffskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam ArrowColor #334155skinparam rectangleBorderColor #94a3b8skinparam rectangleBackgroundColor #f8fafcrectangle App as Applicationrectangle Cache as "Redis / Memcached in RAM"App --> CacheCache --> App@endumlScalability
Section titled “Scalability”Approaches
Section titled “Approaches”As applications grow in popularity, they need to scale to handle increased load. There are two fundamental approaches:
Vertical Scaling (Scale Up)
Section titled “Vertical Scaling (Scale Up)”Upgrade existing hardware: add more CPU, RAM, or faster storage.
Pros:
- Simpler to implement
- No application changes needed
Cons:
- Physical hardware limits
- Single point of failure
- Limited high availability
Horizontal Scaling (Scale Out)
Section titled “Horizontal Scaling (Scale Out)”Add more machines to distribute the load across multiple servers.
Pros:
- Nearly unlimited scaling potential
- Better fault tolerance
- High availability (if one server fails, others continue)
Cons:
- Increased complexity
- Data consistency challenges
- Requires load balancing and coordination
Error generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle Vertical vs Horizontal Scalingskinparam backgroundColor #ffffffskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam rectangleBorderColor #94a3b8skinparam rectangleBackgroundColor #f8fafcskinparam noteBorderColor #94a3b8skinparam noteBackgroundColor #fef3c7left to right directionpackage "Vertical Scaling" { rectangle "Small Server\n2 CPU, 4GB RAM" as V1 #e0f2fe note bottom of V1 Upgrade to bigger machine end note rectangle "Large Server\n16 CPU, 64GB RAM" as V2 #dbeafe V1 -down-> V2 : Upgrade}package "Horizontal Scaling" { rectangle "Server 1\n4 CPU, 8GB" as H1 #fce7f3 rectangle "Server 2\n4 CPU, 8GB" as H2 #fce7f3 rectangle "Server 3\n4 CPU, 8GB" as H3 #fce7f3 rectangle "Load\nBalancer" as LB #f3e8ff note bottom of H3 Add more machines end note LB -down-> H1 LB -down-> H2 LB -down-> H3}@endumlError generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle Scalability approachesleft to right directionskinparam backgroundColor #ffffffskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam ArrowColor #334155skinparam rectangleBorderColor #94a3b8skinparam rectangleBackgroundColor #f8fafcrectangle Vertical as "Vertical scaling: Bigger machine"rectangle Limit as "Single-machine limits"rectangle Horizontal as "Horizontal scaling: More machines"rectangle LoadBalancerScalability as "Load balancer + sharding"Vertical --> LimitHorizontal --> LoadBalancerScalability@endumlLoad Balancers
Section titled “Load Balancers”Algorithms
Section titled “Algorithms”Load balancers use various algorithms to distribute traffic:
Round Robin
- Distributes requests in circular sequence: Server 1 → Server 2 → Server 3 → Server 1…
- Simple and fair distribution
Least Connections
- Routes to the server with fewest active connections
- Keeps all servers equally busy
IP Hash
- Uses client IP address to consistently route to the same server
- Enables session stickiness for stateful applications
High Availability
Section titled “High Availability”Self-Managed:
- HAProxy
- NGINX
Cloud-Managed:
- AWS Elastic Load Balancing (ELB)
- Google Cloud Load Balancing
- Azure Load Balancer
Error generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle Load balancer algorithmsleft to right directionskinparam backgroundColor #ffffffskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam ArrowColor #334155skinparam rectangleBorderColor #94a3b8skinparam rectangleBackgroundColor #f8fafcrectangle Incoming as "Incoming requests"rectangle LoadBalancer as "Load balancer"rectangle Server1 as "Server 1"rectangle Server2 as "Server 2"rectangle Server3 as "Server 3"Incoming --> LoadBalancerLoadBalancer --> Server1 : Round robinLoadBalancer --> Server2 : Least connectionsLoadBalancer --> Server3 : IP hash@endumlCaching
Section titled “Caching”Caching creates a high-speed storage layer for frequently accessed data, reducing latency and improving performance—like keeping your most-used tools within arm’s reach.
Caching Levels
Section titled “Caching Levels”Caching occurs at multiple layers:
Browser Cache
- Stores static assets (images, CSS, JavaScript)
- Reduces load times on repeat visits
DNS Cache
- Caches IP address mappings for domain names
- Speeds up DNS resolution
Application Cache
- In-memory storage for frequently accessed data
- Reduces database queries
Database Cache
- Caches query results
- Speeds up repeated queries
CDN (Content Delivery Network)
- Caches static assets at edge locations globally
- Delivers content from geographically closest servers
Caching Strategies
Section titled “Caching Strategies”Cache Aside (Lazy Loading)
Section titled “Cache Aside (Lazy Loading)”Application checks cache first. On miss, fetch from database and populate cache.
Pros: Simple, only caches what’s needed Cons: Cache miss penalty, potential stale data
Write Through
Section titled “Write Through”Writes go to both cache and database simultaneously.
Pros: Strong consistency between cache and database Cons: Slower writes (waiting for both operations)
Write Back (Write Behind)
Section titled “Write Back (Write Behind)”Writes go to cache first, then asynchronously to database.
Pros: Fast writes, reduced database load Cons: Risk of data loss if cache fails before database sync
Write Around
Section titled “Write Around”Writes bypass cache, going directly to database. Cache populated only on reads.
Pros: Prevents cache pollution with infrequently accessed data Cons: Cache misses on recently written data
Error generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle Cache Aside (Lazy Loading)skinparam backgroundColor #ffffffskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam sequenceArrowColor #334155skinparam sequenceParticipantBorderColor #94a3b8skinparam sequenceParticipantBackgroundColor #f8fafcskinparam noteBackgroundColor #f8fafcskinparam noteBorderColor #94a3b8hide footboxparticipant App as "Application"participant Cacheparticipant DB as "Database"App -> Cache: Read keyalt Cache Hit Cache --> App: Return valueelse Cache Miss Cache --> App: <miss> App -> DB: Query key DB --> App: Return value App -> Cache: Set key=value Cache --> App: OKendlegend rightCache Aside: Application manages cache.Read from cache first, on miss read from DBand populate cache.endlegend@endumlError generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle Write Throughskinparam backgroundColor #ffffffskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam sequenceArrowColor #334155skinparam sequenceParticipantBorderColor #94a3b8skinparam sequenceParticipantBackgroundColor #f8fafcskinparam noteBackgroundColor #f8fafcskinparam noteBorderColor #94a3b8hide footboxparticipant App as "Application"participant Cacheparticipant DB as "Database"App -> Cache: Write key=valueCache -> DB: Write key=valueDB --> Cache: Write confirmedCache --> App: Write confirmednote right of Cache Both cache and database updated synchronously. Guarantees consistency.end notelegend rightWrite Through: Data written to cache and DB together.Slower writes but ensures consistency.endlegend@endumlError generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle Write Back (Write Behind)skinparam backgroundColor #ffffffskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam sequenceArrowColor #334155skinparam sequenceParticipantBorderColor #94a3b8skinparam sequenceParticipantBackgroundColor #f8fafcskinparam noteBackgroundColor #f8fafcskinparam noteBorderColor #94a3b8hide footboxparticipant App as "Application"participant Cacheparticipant DB as "Database"App -> Cache: Write key=valueCache --> App: Write confirmed (fast)note right of Cache Data written to cache immediately. DB write happens later asynchronously.end note... async batch write ...Cache -> DB: Batch write key=valueDB --> Cache: Write confirmedlegend rightWrite Back: Fast writes to cache, async DB updates.Risk of data loss if cache fails before DB sync.endlegend@endumlError generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle Write Aroundskinparam backgroundColor #ffffffskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam sequenceArrowColor #334155skinparam sequenceParticipantBorderColor #94a3b8skinparam sequenceParticipantBackgroundColor #f8fafcskinparam noteBackgroundColor #f8fafcskinparam noteBorderColor #94a3b8hide footboxparticipant App as "Application"participant Cacheparticipant DB as "Database"App -> DB: Write key=valueDB --> App: Write confirmednote right of DB Writes bypass cache. Cache only populated on reads.end note... later ...App -> Cache: Read keyCache --> App: <miss>App -> DB: Query keyDB --> App: Return valueApp -> Cache: Set key=valuelegend rightWrite Around: Writes go directly to DB, bypassing cache.Useful for infrequently accessed data.endlegend@endumlCache Eviction Policies
Section titled “Cache Eviction Policies”When cache is full, eviction policies determine what gets removed:
LRU (Least Recently Used)
- Removes least recently accessed items
- Good general-purpose strategy
- Assumes recent access predicts future access
FIFO (First In First Out)
- Removes oldest items regardless of access frequency
- Simple but not always optimal
- Best for sequential access patterns (video streaming, live sports)
LFU (Least Frequently Used)
- Removes items with lowest access count
- Keeps “hottest” data in cache
- Good for workloads with clear access patterns
Error generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle Eviction policies — LRUskinparam backgroundColor #ffffffskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam sequenceArrowColor #475569skinparam sequenceParticipantBorderColor #94a3b8skinparam sequenceParticipantBackgroundColor #f8fafcskinparam noteBackgroundColor #f8fafcskinparam noteBorderColor #94a3b8hide footboxparticipant App as "Application"database Cache as "Cache"App -> Cache: Put key=K, value=Vnote right of Cache: Cache is fullCache -> Cache: Evict least recently used entryCache --> App: OKlegend rightEviction occurs when cache is full.LRU removes the least recently used entry.endlegend@endumlError generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle Eviction policies — FIFOskinparam backgroundColor #ffffffskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam sequenceArrowColor #475569skinparam sequenceParticipantBorderColor #94a3b8skinparam sequenceParticipantBackgroundColor #f8fafcskinparam noteBackgroundColor #f8fafcskinparam noteBorderColor #94a3b8hide footboxparticipant App as "Application"database Cache as "Cache"App -> Cache: Put key=K, value=Vnote right of Cache Cache is full Evict oldest inserted entryend noteCache --> App: OKlegend rightEviction occurs when cache is full.FIFO removes the oldest inserted entry.endlegend@endumlError generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle Eviction policies — LFUskinparam backgroundColor #ffffffskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam sequenceArrowColor #475569skinparam sequenceParticipantBorderColor #94a3b8skinparam sequenceParticipantBackgroundColor #f8fafcskinparam noteBackgroundColor #f8fafcskinparam noteBorderColor #94a3b8hide footboxparticipant App as "Application"database Cache as "Cache"App -> Cache: Put key=K, value=Vnote right of Cache Cache is full Evict least frequently used entry (lowest access count)end noteCache --> App: OKlegend rightEviction occurs when cache is full.LFU removes the least frequently used entry.endlegend@endumlCDN (Content Delivery Network)
Section titled “CDN (Content Delivery Network)”A geographically distributed network of servers that cache static content (images, videos, web assets) at edge locations worldwide.
Benefits:
- Reduced latency by serving from geographically closest server
- Improved load times
- Reduced load on origin server
- Better global user experience
Error generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle Content Delivery Network (CDN)left to right directionskinparam backgroundColor #ffffffskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam ArrowColor #334155skinparam rectangleBorderColor #94a3b8skinparam rectangleBackgroundColor #f8fafcrectangle Userrectangle Edge as "CDN Edge"rectangle Origin as "Origin Server"User --> EdgeEdge --> User : Cache hitEdge --> Origin : Cache missOrigin --> Edge : Cache + respond@endumlData Partitioning and Sharding
Section titled “Data Partitioning and Sharding”When databases grow to terabytes, a single database can’t handle it all. Storage limits are reached and query performance degrades.
Solution: Divide data into smaller independent chunks (shards) distributed across multiple servers.
Partitioning Strategies
Section titled “Partitioning Strategies”Horizontal Partitioning (Sharding)
- Splits data by rows
- Example: Users A-M in Shard 1, N-Z in Shard 2
- Uses partition key (user ID, username)
- Most common scaling approach
Vertical Partitioning
- Splits data by columns
- Example: User profiles in Shard 1, user activity logs in Shard 2
- Separates frequently vs infrequently accessed data
- Optimizes for different access patterns
Error generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle Data partitioning & sharding (hash-based)left to right directionskinparam backgroundColor #ffffffskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam ArrowColor #334155skinparam rectangleBorderColor #94a3b8skinparam rectangleBackgroundColor #f8fafcrectangle KeyId as "Key / User ID"rectangle Hash as "Hash function"rectangle ShardA as "Shard A"rectangle ShardB as "Shard B"rectangle ShardC as "Shard C"KeyId --> HashHash --> ShardAHash --> ShardBHash --> ShardC@endumlReplication
Section titled “Replication”Sharding handles large datasets, but replication ensures availability and reliability by maintaining multiple identical copies of data. If one server fails, others take over without data loss or downtime.
Benefits:
- Improved system reliability and availability
- Better read performance (distribute reads across replicas)
- Fault tolerance
Replication Strategies
Section titled “Replication Strategies”Single Leader (Master-Slave)
Section titled “Single Leader (Master-Slave)”Architecture:
- One primary (master) handles all writes
- Multiple replicas (slaves) handle reads
- Changes replicated asynchronously to replicas
Pros: Simple, writes centralized Cons: Write bottleneck at primary
Multi-Leader
Section titled “Multi-Leader”Architecture:
- Writes accepted by any replica
- Changes replicated to other replicas
Pros: Higher write availability Cons: Complex conflict resolution for concurrent writes
Synchronization Modes
Section titled “Synchronization Modes”Synchronous
- Primary waits for all replicas to confirm
- Very safe, but slower
Asynchronous
- Primary doesn’t wait for confirmation
- Fast, but small window of potential inconsistency
Semi-Synchronous
- Wait for at least one replica
- Balances safety and performance
Error generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle Replication Synchronization Modesskinparam backgroundColor #ffffffskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam sequenceArrowColor #334155skinparam sequenceParticipantBorderColor #94a3b8skinparam sequenceParticipantBackgroundColor #f8fafcskinparam noteBackgroundColor #f8fafcskinparam noteBorderColor #94a3b8hide footboxparticipant Clientparticipant Primaryparticipant "Replica 1" as R1participant "Replica 2" as R2== Synchronous Replication ==Client -> Primary: Write dataPrimary -> R1: ReplicatePrimary -> R2: ReplicateR1 --> Primary: ACKR2 --> Primary: ACKnote right of Primary Wait for ALL replicas Safe but SLOWend notePrimary --> Client: Write confirmed== Asynchronous Replication ==Client -> Primary: Write dataPrimary --> Client: Write confirmed (fast)note right of Primary Don't wait for replicas Fast but RISKYend notePrimary ->> R1: Replicate (async)Primary ->> R2: Replicate (async)== Semi-Synchronous ==Client -> Primary: Write dataPrimary -> R1: ReplicatePrimary ->> R2: Replicate (async)R1 --> Primary: ACKnote right of Primary Wait for ONE replica Balanced approachend notePrimary --> Client: Write confirmedlegend rightTrade-off: Safety vs PerformanceSynchronous: Most safe, slowestAsync: Fastest, least safeSemi-sync: Balancedendlegend@endumlError generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle Replication — single leaderleft to right directionskinparam backgroundColor #ffffffskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam ArrowColor #334155skinparam rectangleBorderColor #94a3b8skinparam rectangleBackgroundColor #f8fafcrectangle WritesNode as Writesrectangle Primaryrectangle Replica1 as "Replica 1"rectangle Replica2 as "Replica 2"rectangle ReadsSink as ReadsWritesNode --> PrimaryPrimary --> Replica1 : asyncPrimary --> Replica2 : asyncReplica1 --> ReadsSinkReplica2 --> ReadsSink@endumlReverse Proxy
Section titled “Reverse Proxy”A reverse proxy acts as an intermediary gateway between clients and backend servers.
Key Functions
Section titled “Key Functions”- Load Balancing: Distribute requests across servers
- SSL/TLS Termination: Handle encryption/decryption TLS-Termination
- Caching: Cache static content
- Security: Filter malicious requests, hide infrastructure
- Compression: Compress responses before sending to clients
Popular Tools
Section titled “Popular Tools”- NGINX
- HAProxy
- Apache HTTP Server
- Envoy
Error generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle Reverse proxyleft to right directionskinparam backgroundColor #ffffffskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam ArrowColor #334155skinparam rectangleBorderColor #94a3b8skinparam rectangleBackgroundColor #f8fafcrectangle Clientrectangle ReverseProxy as "Reverse Proxy: TLS termination"rectangle Service1 as "Service 1"rectangle Service2 as "Service 2"Client --> ReverseProxyReverseProxy --> Service1ReverseProxy --> Service2@endumlDistributed Messaging
Section titled “Distributed Messaging”Enables asynchronous communication between services in microservices architectures when immediate responses aren’t required.
Example: Sending order confirmation emails
How It Works
Section titled “How It Works”Services publish messages (e.g., order 123 confirmed) to queues/topics without knowing which service consumes them. This decouples services, allowing independent scaling.
Benefits
Section titled “Benefits”Resilience
- If consumer service is down, producer continues publishing
- Messages processed when consumer recovers
Asynchronous Processing
- Producer doesn’t wait for consumer
- Handles different processing speeds
Scalability
- Add more consumers to handle load
- Services scale independently
Popular Tools
Section titled “Popular Tools”- Apache Kafka: High-throughput, distributed streaming
- RabbitMQ: Feature-rich message broker
- AWS SQS: Managed cloud queue service
- Redis Streams: Lightweight pub/sub
Error generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle Message Queue Patternsleft to right directionskinparam backgroundColor #ffffffskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam rectangleBorderColor #94a3b8skinparam rectangleBackgroundColor #f8fafcskinparam queueBorderColor #10b981skinparam queueBackgroundColor #ecfdf5package "Point-to-Point (Queue)" { rectangle "Producer" as P1 queue "Task Queue" as Q1 rectangle "Consumer 1" as C1 rectangle "Consumer 2" as C2 P1 --> Q1 : Send task Q1 --> C1 : One consumer Q1 --> C2 : gets message note bottom of Q1 Each message consumed once Load balancing end note}package "Pub/Sub (Topic)" { rectangle "Publisher" as P2 queue "Topic" as T1 rectangle "Subscriber A" as S1 rectangle "Subscriber B" as S2 rectangle "Subscriber C" as S3 P2 --> T1 : Publish event T1 --> S1 : All subscribers T1 --> S2 : receive copy T1 --> S3 : of message note bottom of T1 Broadcast to all Event distribution end note}@endumlError generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle Distributed messaging (pub/sub)skinparam backgroundColor #ffffffskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam sequenceArrowColor #334155skinparam sequenceParticipantBorderColor #94a3b8skinparam sequenceActorBorderColor #94a3b8skinparam sequenceParticipantBackgroundColor #f8fafchide footboxactor Userparticipant ServiceAqueue "Kafka Topic" as Kafkaparticipant ServiceBUser -> ServiceA: Place orderServiceA -> Kafka: Publish "order 123 confirmed"... later ...Kafka -> ServiceB: Deliver messageServiceB -> ServiceB: Process / send email@endumlMicroservices
Section titled “Microservices”An architectural style that breaks large applications into smaller, independent services, each focused on a specific business capability.
Examples: Inventory service, user management service, payment service
Communication
Section titled “Communication”Synchronous: REST APIs, gRPC Asynchronous: Message queues (Kafka, RabbitMQ)
Advantages
Section titled “Advantages”Modularity
- Develop, deploy, and scale services independently
- Changes to one service don’t require redeploying others
Fault Isolation
- Bugs in one service don’t crash the entire application
- Graceful degradation of functionality
Technology Diversity
- Each service can use different tech stacks
- Choose best tool for each job
Trade-offs
Section titled “Trade-offs”Error generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle Microservices architectureskinparam backgroundColor #ffffffskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam rectangleBorderColor #10b981skinparam rectangleBackgroundColor #ecfdf5skinparam databaseBackgroundColor #f8fafcskinparam databaseBorderColor #94a3b8rectangle "API Gateway" as APIGWpackage "Microservices" { [User Service] [Inventory Service] [Payment Service]}database "User DB" as UDBdatabase "Inventory DB" as IDBdatabase "Payment DB" as PDBAPIGW --> [User Service]APIGW --> [Inventory Service]APIGW --> [Payment Service][User Service] --> UDB[Inventory Service] --> IDB[Payment Service] --> PDB@endumlRelated Building Blocks and Operational Concepts
Section titled “Related Building Blocks and Operational Concepts”Notification Systems
Section titled “Notification Systems”Systems for sending alerts to users or other systems.
Types: Push notifications, emails, SMS Implementation: Often built using message queues internally Tools: Firebase Cloud Messaging, AWS SNS, Twilio
Full-Text Search
Section titled “Full-Text Search”Specialized search engines for querying large text datasets.
Purpose: Fast, flexible search through product descriptions, articles, logs
Advantage: Much faster than SQL LIKE queries
Tools: Elasticsearch, Apache Solr, Algolia
Distributed Coordination Services
Section titled “Distributed Coordination Services”Critical for managing state and agreement in distributed systems.
Tools: Apache ZooKeeper, etcd, Consul
Key Functions
Section titled “Key Functions”Service Discovery
- Track which services are running and where
Distributed Locking
- Ensure only one service performs critical operations
- Prevent race conditions
Leader Election
- Determine which server is primary/coordinator
Configuration Management
- Centralized, reliable source of truth
- Distribute config changes to all services
Heartbeats
- Periodic health checks
- Detect service failures
- Trigger alerts or failover
Checksums
- Digital fingerprints for data integrity
- Verify data hasn’t been corrupted during transmission
- Compare checksums before and after transfer
CAP Theorem
Section titled “CAP Theorem”States that distributed systems can guarantee only two of three properties when network partitions occur:
C — Consistency
- Every read gets the most recent write
- All nodes see the same data simultaneously
A — Availability
- Every request receives a response (success or failure)
- System remains operational even with stale data
P — Partition Tolerance
- System continues operating despite network failures
- Some servers can’t communicate, but system functions
CP Systems (Consistency + Partition Tolerance)
Section titled “CP Systems (Consistency + Partition Tolerance)”Prioritize: Strong consistency
Trade-off: May become unavailable during partitions
Examples: Traditional RDBMS, MongoDB (with strong consistency settings), HBase
Use case: Financial transactions, inventory management
AP Systems (Availability + Partition Tolerance)
Section titled “AP Systems (Availability + Partition Tolerance)”Prioritize: High availability
Trade-off: May serve stale data during partitions
Examples: Cassandra, DynamoDB, Couchbase
Concept: Eventual Consistency — all replicas converge to the same state once updates stop
Use case: Social media feeds, product catalogs, user profiles
Error generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle CAP Theoremskinparam backgroundColor #ffffffskinparam ArrowColor #475569skinparam rectangleBorderColor #64748bskinparam rectangleBackgroundColor #f8fafcskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13left to right directionrectangle C as "Consistency"rectangle A as "Availability"rectangle P as "Partition Tolerance"C --- AA --- PP --- Cnote right of CCP: Consistency + Partition toleranceStrong consistency, reduced availabilityon partitions.end notenote bottom of AAP: Availability + Partition toleranceHigh availability, eventual consistencyunder partitions.end notelegend rightTriangle shows the three properties.In a partition, you can guarantee onlytwo: CP or AP (not CA).endlegend@endumlPACELC Theorem
Section titled “PACELC Theorem”An extension of the CAP theorem that addresses what happens during normal operation (no partition). CAP only explains behavior during partitions, but PACELC provides a more complete picture.
PACELC stands for:
- P (Partition): If there is a partition…
- A (Availability) vs C (Consistency): choose between Availability or Consistency
- E (Else): Otherwise (no partition)…
- L (Latency) vs C (Consistency): choose between Latency or Consistency
Understanding PACELC
Section titled “Understanding PACELC”During Partition (PA/C):
- Same as CAP theorem
- Choose between Availability (serve possibly stale data) or Consistency (refuse requests)
During Normal Operation (EL/C):
- Choose between Lower Latency (faster responses with relaxed consistency) or stronger Consistency (slower responses, wait for all replicas)
PACELC Categories
Section titled “PACELC Categories”PA/EL Systems (Availability + Latency)
- Partition: Prioritize Availability (AP)
- Normal: Prioritize Latency (fast responses)
- Trade-off: Eventual consistency
- Examples: Cassandra, DynamoDB, Riak
- Use case: Social media, content delivery, shopping carts
PA/EC Systems (Availability + Consistency)
- Partition: Prioritize Availability (AP)
- Normal: Prioritize Consistency (slower but consistent)
- Examples: MongoDB (with eventual consistency mode)
- Use case: Less common, but useful for systems that need consistency when stable
PC/EL Systems (Consistency + Latency)
- Partition: Prioritize Consistency (CP)
- Normal: Prioritize Latency (caching, read replicas)
- Examples: Memcached, some caching systems
- Use case: Systems that can tolerate unavailability during partitions but need speed normally
PC/EC Systems (Consistency + Consistency)
- Partition: Prioritize Consistency (CP)
- Normal: Prioritize Consistency (strong guarantees)
- Trade-off: Higher latency
- Examples: Traditional RDBMS (MySQL, PostgreSQL), HBase, VoltDB
- Use case: Banking, financial transactions, inventory systems
Error generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle PACELC Theoremskinparam backgroundColor #ffffffskinparam ArrowColor #475569skinparam rectangleBorderColor #64748bskinparam rectangleBackgroundColor #f8fafcskinparam noteBorderColor #94a3b8skinparam noteBackgroundColor #fef3c7skinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13
package "Network Partition?" { rectangle "YES\n(Partition exists)" as P { rectangle "Choose:\nAvailability (A)\nvs\nConsistency (C)" as PAC }
rectangle "NO\n(Normal operation)" as E { rectangle "Choose:\nLatency (L)\nvs\nConsistency (C)" as ELC }}
note right of PAC PA: High availability, eventual consistency
PC: Strong consistency, may be unavailableend note
note right of ELC EL: Fast responses, relaxed consistency
EC: Strong consistency, higher latencyend note
rectangle "PA/EL" as PAEL #e0f2ferectangle "PA/EC" as PAEC #dbeaferectangle "PC/EL" as PCEL #fce7f3rectangle "PC/EC" as PCEC #fce7f3
PAC --> PAEL : Choose PAPAC --> PAEC : Choose PAPAC --> PCEL : Choose PCPAC --> PCEC : Choose PC
ELC --> PAEL : Choose ELELC --> PAEC : Choose ECELC --> PCEL : Choose ELELC --> PCEC : Choose EC
note bottom of PAEL Examples: Cassandra, DynamoDB Fast & Available, Eventual Consistencyend note
note bottom of PCEC Examples: MySQL, PostgreSQL Strong Consistency, Higher Latencyend note
legend rightPACELC extends CAP to cover normal operation:- P/A vs C: During partition (CAP theorem)- E/L vs C: During normal operation (new insight)endlegend@endumlReal-World Examples
Section titled “Real-World Examples”Cassandra (PA/EL):
- Partition: Remains available, accepts writes
- Normal: Fast reads/writes with tunable consistency (eventual by default)
- Result: High performance, eventually consistent
PostgreSQL (PC/EC):
- Partition: May become unavailable to maintain consistency
- Normal: Synchronous replication for strong consistency
- Result: Strong guarantees, higher latency
DynamoDB (PA/EL by default):
- Partition: Stays available
- Normal: Low latency reads (eventually consistent reads by default)
- Optional: Can request strongly consistent reads (PA/EC behavior) with higher latency
Failover
Section titled “Failover”Automatic switching to redundant systems when primary components fail.
Examples:
- Switch to standby database replica
- Redirect traffic to healthy servers via load balancer
- Promote backup to primary role
Goal: Minimize downtime and maintain service availability
Types:
- Active-Passive: Standby idles until needed. There will be sets of servers which are kept in standby mode until needed.
- Active-Active: Multiple systems handle load simultaneously. All the servers will handle all requests, if one goes down the load balancer will redirect the requests to other servers. complex consistency strategies are needed here.
Circuit Breaker Pattern
Section titled “Circuit Breaker Pattern”A fault tolerance mechanism that prevents applications from repeatedly attempting operations likely to fail, protecting systems from cascading failures.
Analogy: Like an electrical circuit breaker in your home—when something goes wrong, it “trips” to prevent further damage.
How It Works
Section titled “How It Works”The Circuit Breaker operates in three states:
1. Closed (Normal Operation)
- All requests flow through normally
- System monitors for failures (error rates, timeouts, response times)
- Tracks failure metrics against threshold
2. Open (Failure Mode)
- Threshold exceeded (e.g., 50% failure rate or 5 consecutive failures)
- Circuit “trips open”
- Requests fail fast without calling the downstream service
- Prevents resource exhaustion (no blocked threads or connections)
- Waits for timeout period (e.g., 30 seconds) before testing recovery
3. Half-Open (Testing Recovery)
- After timeout expires, circuit enters half-open state
- Allows limited test requests through (e.g., 3 requests)
- If successful → return to Closed state
- If still failing → return to Open state
Error generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle Circuit Breaker Patternskinparam backgroundColor #ffffffskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam stateBorderColor #334155skinparam stateBackgroundColor #f8fafcskinparam stateArrowColor #334155[*] --> Closed : Initial stateClosed : Requests flow normallyClosed : Monitor failuresClosed --> Open : Threshold exceeded\n(e.g., 5 failures)Open : Requests fail fastOpen : No calls to serviceOpen : Wait timeout periodOpen --> HalfOpen : After timeout\n(e.g., 30 seconds)HalfOpen : Allow limited requestsHalfOpen : Test if service recoveredHalfOpen --> Closed : Success threshold met\n(e.g., 3 successes)HalfOpen --> Open : Still failingnote right of Open Prevents cascading failures Fast fail to protect systemend notelegend rightCircuit Breaker protects against:- Cascading failures- Resource exhaustion- Slow degrading servicesendlegend@endumlReal-World Example
Section titled “Real-World Example”Scenario: E-commerce application calling Payment Service
Without Circuit Breaker:
User → Service A → Payment Service (slow/failing) ↓ Threads blocked ↓ Resource exhaustion ↓ Service A crashes tooWith Circuit Breaker:
User → Service A → Circuit Breaker → Payment Service ↓ After 5 failures ↓ Circuit OPENS ↓ Service A stays healthy Returns fallback responseKey Benefits
Section titled “Key Benefits”Prevents Cascading Failures
- Isolates failing services from the rest of the system
- Stops the domino effect across microservices
Fail Fast
- Immediate error response instead of waiting for timeouts
- Better user experience (quick feedback vs hanging requests)
Resource Protection
- Frees up threads, connections, and memory
- Prevents thread pool exhaustion
Automatic Recovery Testing
- Periodically checks if downstream service recovered
- No manual intervention needed
Graceful Degradation
- System continues operating with reduced functionality
- Can return cached data or default responses
Configuration Parameters
Section titled “Configuration Parameters”Failure Threshold: How many failures trigger opening?
- Example: 5 consecutive failures or 50% error rate within a time window
Timeout Period: How long to wait before testing recovery?
- Example: 30 seconds, 1 minute
- Gives downstream service time to recover
Success Threshold: How many successes in half-open to close?
- Example: 3 consecutive successful requests
Request Volume Threshold: Minimum requests before calculating error rate
- Example: Need at least 10 requests in window before opening circuit
Common Use Cases
Section titled “Common Use Cases”External API Calls
- Third-party payment gateways
- Shipping APIs
- Authentication services
- Weather APIs
Database Connections
- Protecting against database outages
- Preventing connection pool exhaustion
Microservice Communication
- Service-to-service calls
- Preventing cascading failures across distributed systems
Fallback Strategies
Section titled “Fallback Strategies”When circuit is open, implement graceful degradation:
Cached Data: Return last known good value
return circuitBreaker.execute( () -> paymentService.getBalance(), fallback: () -> cache.getLastBalance());Default Response: Return sensible default
return DEFAULT_SHIPPING_ESTIMATE; // "3-5 business days"Queue Request: Store for later processing
messageQueue.enqueue(paymentRequest);return "Your payment is being processed";Alternative Service: Route to backup service
if (primaryCircuit.isOpen()) { return secondaryPaymentService.process();}User Message: Clear error explaining temporary unavailability
return "Payment service temporarily unavailable. Please try again in a moment.";Implementation
Section titled “Implementation”Popular libraries:
Java: Resilience4j, Hystrix (maintenance mode)
.NET: Polly
Node.js: Opossum
Python: pybreaker
breaker = pybreaker.CircuitBreaker( fail_max=5, reset_timeout=30)
@breakerdef call_payment_service(): return payment_service.process()Consistent Hashing
Section titled “Consistent Hashing”A technique for distributing data across servers that minimizes redistribution when servers are added or removed.
Problem with Simple Hashing:
- Adding/removing one server → rehash all keys → massive data movement
Consistent Hashing Solution:
- Only a small fraction of keys need rebalancing
- Smoother scaling operations
Use Cases:
- Distributed caches (Memcached, Redis clusters)
- Distributed databases (Cassandra, DynamoDB)
- Load balancing
Error generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle Consistent Hashing Ringskinparam backgroundColor #ffffffskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam noteBorderColor #94a3b8skinparam noteBackgroundColor #f8fafccircle "Hash Ring\n(0-360°)" as Ringrectangle "Server A\n(90°)" as SA #dbeaferectangle "Server B\n(180°)" as SB #fce7f3rectangle "Server C\n(270°)" as SC #fef3c7rectangle "Key1 (45°)" as K1 #ecfdf5rectangle "Key2 (120°)" as K2 #ecfdf5rectangle "Key3 (200°)" as K3 #ecfdf5rectangle "Key4 (300°)" as K4 #ecfdf5Ring --> SARing --> SBRing --> SCK1 --> SA : Clockwise\nnext serverK2 --> SBK3 --> SCK4 --> SAnote bottom of Ring When adding Server D at 315°: Only Key4 moves to D Other keys stay on same serversend notelegend rightConsistent Hashing Benefits:- Minimal redistribution on add/remove- Balanced load distribution- Virtual nodes for uniformityendlegend@endumlAPI Idempotency
Section titled “API Idempotency”Ensures performing an operation multiple times has the same effect as performing it once.
Why It Matters: Network failures may cause automatic request retries. Without idempotency, duplicate requests could cause unintended effects.
Example: Payment Processing
- User clicks “Pay”
- Network hiccup causes retry
- Without idempotency: double charge ❌
- With idempotency: same result ✅
Implementation:
- Use unique identifiers (transaction ID, idempotency key)
- Server checks if request already processed
- Return original response for duplicate requests
HTTP Method Idempotency:
- Naturally Idempotent:
GET,PUT,DELETE - Needs Design:
POST(use idempotency keys)
Rate Limiting
Section titled “Rate Limiting”Controls incoming request volume to prevent abuse and overload.
Benefits:
- Prevents DDoS attacks
- Ensures fair usage
- Protects backend resources
- Maintains service quality
Implementation Level: API Gateway or Load Balancer
Algorithms
Section titled “Algorithms”Token Bucket
- Bucket holds tokens (request capacity)
- Tokens refill at constant rate
- Request consumes token; rejected if bucket empty
- Allows bursts up to bucket capacity
Error generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle Token Bucket Algorithmskinparam backgroundColor #ffffffskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam sequenceArrowColor #334155skinparam sequenceParticipantBorderColor #94a3b8skinparam sequenceParticipantBackgroundColor #f8fafcskinparam noteBackgroundColor #f8fafcskinparam noteBorderColor #94a3b8hide footboxparticipant Clientparticipant "Token Bucket" as Bucketparticipant Servicenote over BucketBucket State:Capacity: 10 tokensCurrent: 7 tokensRefill: 2 tokens/secend noteClient -> Bucket: Request 1 (costs 1 token)Bucket -> Bucket: Check tokens: 7 availablenote right of Bucket: Token available ✓Bucket -> Service: Forward requestService --> Bucket: 200 OKBucket --> Client: 200 OKnote over Bucket: Tokens: 7 → 6...Time passes (0.5 sec)...note over Bucket: Refill: +1 token\nTokens: 6 → 7Client -> Bucket: Burst of 8 requestsBucket -> Bucket: Check tokens: 7 availablenote right of BucketFirst 7 requests: Pass ✓8th request: REJECT ❌end noteBucket -> Service: Forward 7 requestsService --> Bucket: 200 OKBucket --> Client: 7× 200 OKBucket --> Client: 1× 429 Too Many Requestsnote over Bucket: Tokens: 7 → 0legend rightToken Bucket allows bursts up to capacitywhile maintaining average rate via refill.Good for: API rate limiting, traffic shapingendlegend@endumlLeaky Bucket
- Requests added to queue (bucket)
- Processed at constant rate (leak)
- Overflow requests dropped
- Smooths traffic spikes
Error generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle Leaky Bucket Algorithmskinparam backgroundColor #ffffffskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam sequenceArrowColor #334155skinparam sequenceParticipantBorderColor #94a3b8skinparam sequenceParticipantBackgroundColor #f8fafcskinparam noteBackgroundColor #f8fafcskinparam noteBorderColor #94a3b8hide footboxparticipant Clientparticipant "Leaky Bucket\nQueue" as Queueparticipant "Rate Limiter" as Limiterparticipant Servicenote over QueueQueue State:Capacity: 5 requestsCurrent: 2 queuedLeak rate: 1 req/secend noteClient -> Queue: Request 1Queue -> Queue: Add to queue (2→3)note right of Queue: Queue size: 3/5 ✓Queue --> Client: Request queuedClient -> Queue: Request 2Queue -> Queue: Add to queue (3→4)note right of Queue: Queue size: 4/5 ✓Queue --> Client: Request queuedClient -> Queue: Request 3Queue -> Queue: Add to queue (4→5)note right of Queue: Queue size: 5/5 (FULL) ✓Queue --> Client: Request queuedClient -> Queue: Request 4 (burst)Queue -> Queue: Queue full!note right of Queue: OVERFLOW ❌Queue --> Client: 429 Too Many Requests...Time: 1 second passes...Queue -> Limiter: Leak 1 requestnote right of QueueProcess at constant rateQueue: 5 → 4end noteLimiter -> Service: Forward requestService --> Limiter: 200 OKLimiter --> Queue: Processing complete...Time: 1 second passes...Queue -> Limiter: Leak 1 requestnote right of Queue: Queue: 4 → 3Limiter -> Service: Forward requestService --> Limiter: 200 OKlegend rightLeaky Bucket processes requests at constant rate.Smooths traffic bursts, enforces steady flow.Good for: Network traffic shaping, video streamingendlegend@endumlFixed Window
- Count requests in fixed time windows (e.g., per minute)
- Reset counter at window boundary
- Simple but can allow traffic spikes at boundaries
Error generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle Fixed Window Algorithmskinparam backgroundColor #ffffffskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam sequenceArrowColor #334155skinparam sequenceParticipantBorderColor #94a3b8skinparam sequenceParticipantBackgroundColor #f8fafcskinparam noteBackgroundColor #f8fafcskinparam noteBorderColor #94a3b8hide footboxparticipant Clientparticipant "Fixed Window\nCounter" as Counterparticipant Servicenote over CounterWindow: 00:00:00 - 00:00:59Limit: 5 requests/minuteCurrent count: 0end note== Window 1: 00:00:00 - 00:00:59 ==Client -> Counter: Request @ 00:00:10Counter -> Counter: Count: 0 → 1note right of Counter: 1/5 ✓Counter -> Service: ForwardService --> Counter: 200 OKCounter --> Client: 200 OKClient -> Counter: Request @ 00:00:20Counter -> Counter: Count: 1 → 2note right of Counter: 2/5 ✓Counter -> Service: ForwardService --> Counter: 200 OKCounter --> Client: 200 OKClient -> Counter: 3 more requests @ 00:00:30Counter -> Counter: Count: 2 → 5note right of Counter: 5/5 ✓ (LIMIT REACHED)Counter -> Service: Forward 3 requestsService --> Counter: 200 OKCounter --> Client: 3× 200 OKClient -> Counter: Request @ 00:00:50Counter -> Counter: Count: 5 (at limit)note right of Counter: LIMIT EXCEEDED ❌Counter --> Client: 429 Too Many Requests== Window 2: 00:01:00 - 00:01:59 ==note over CounterWindow resets!Count: 5 → 0New window startsend noteClient -> Counter: Request @ 00:01:00Counter -> Counter: Count: 0 → 1note right of Counter1/5 ✓Fresh windowend noteCounter -> Service: ForwardService --> Counter: 200 OKCounter --> Client: 200 OKnote over Counter⚠️ Boundary Issue:5 requests @ 00:00:59+ 5 requests @ 00:01:00= 10 requests in 1 second!end notelegend rightFixed Window: Simple but allows boundary bursts.Window resets at fixed intervals.Good for: Simple rate limiting, low complexityendlegend@endumlSliding Window
- Tracks requests over sliding time window
- More accurate than fixed window
- Prevents boundary spike issues
Error generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle Sliding Window Algorithmskinparam backgroundColor #ffffffskinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam sequenceArrowColor #334155skinparam sequenceParticipantBorderColor #94a3b8skinparam sequenceParticipantBackgroundColor #f8fafcskinparam noteBackgroundColor #f8fafcskinparam noteBorderColor #94a3b8hide footboxparticipant Clientparticipant "Sliding Window\nLog" as Windowparticipant Servicenote over WindowCurrent time: 00:00:30Window size: 60 secondsLimit: 5 requests/minuteLog: [timestamps of requests]end noteClient -> Window: Request @ 00:00:30Window -> Window: Check last 60s\n(00:00:00 - 00:00:30)note right of WindowCount in window: 00 < 5 ✓end noteWindow -> Window: Add timestamp: 00:00:30Window -> Service: Forward requestService --> Window: 200 OKWindow --> Client: 200 OKClient -> Window: 4 more requests\n@ 00:00:35, 00:00:40\n00:00:45, 00:00:50Window -> Window: Add timestampsnote right of WindowRequests in last 60s: 55/5 ✓ (LIMIT REACHED)end noteWindow -> Service: Forward 4 requestsService --> Window: 200 OKWindow --> Client: 4× 200 OKClient -> Window: Request @ 00:00:55Window -> Window: Check window\n(00:00:00 - 00:00:55)note right of WindowCount: 5 requests00:00:30, 00:00:35,00:00:40, 00:00:45,00:00:505 ≥ 5 ❌ LIMIT EXCEEDEDend noteWindow --> Client: 429 Too Many Requests...Time passes: 30 seconds...Client -> Window: Request @ 00:01:20Window -> Window: Check window\n(00:00:20 - 00:01:20)note right of WindowOld requests expired:× 00:00:30 (outside window)✓ 00:00:35, 00:00:40, 00:00:45, 00:00:50Count: 4/5 ✓end noteWindow -> Window: Add timestamp: 00:01:20Window -> Service: Forward requestService --> Window: 200 OKWindow --> Client: 200 OKnote over Window✓ No boundary issueWindow slides continuouslyMore accurate rate limitingend notelegend rightSliding Window: Most accurate, prevents boundary bursts.Tracks exact timestamps within rolling window.Good for: Strict rate limiting, API quotasTrade-off: Higher memory (stores timestamps)endlegend@endumlMonitoring and Logging
Section titled “Monitoring and Logging”Monitoring
Section titled “Monitoring”Collects metrics about system health and performance.
Key Metrics:
- CPU and memory usage
- Request latency (P50, P95, P99)
- Error rates and types
- Throughput (requests/sec)
- Database connection pool stats
Popular Stack:
- Prometheus: Metrics collection and storage
- Grafana: Visualization and dashboards
- AlertManager: Alert routing and notification
Logging
Section titled “Logging”Records events, errors, and system activities for debugging and auditing.
Log Levels: DEBUG, INFO, WARN, ERROR, FATAL
Structured Logging: Use JSON format for machine-readable logs
Popular Stack:
- ELK Stack: Elasticsearch, Logstash, Kibana
- EFK Stack: Elasticsearch, Fluentd, Kibana
- OpenSearch: Open-source alternative to Elasticsearch
Error generating PlantUML diagram: connect ECONNREFUSED 127.0.0.1:8080
@startumltitle Monitoring and loggingskinparam backgroundColor #ffffffskinparam componentStyle rectangleskinparam ArrowColor #334155skinparam Shadowing falseskinparam DefaultFontName Arialskinparam DefaultFontSize 13skinparam componentBorderColor #94a3b8skinparam componentBackgroundColor #f8fafcskinparam databaseBorderColor #94a3b8skinparam databaseBackgroundColor #f8fafcleft to right directionpackage "Services" { component "Service A" as S1 component "Service B" as S2}package "Observability" { component Prom as "Prometheus (metrics)" component Graf as "Grafana (dashboards)" database Store as "Log Store (ELK/OpenSearch)" component Alerts as "Alerts"}S1 --> PromS2 --> PromProm --> GrafLogs --> StoreStore --> Alertsnote bottom of GrafVisualization of metricsend notenote right of AlertsNotifications based on log rulesand thresholdsend notelegend rightServices emit metrics and logs. Prometheus scrapes metrics,Grafana visualizes, and the log store powers alerting.endlegend@endumlPrinciples Guiding Design Choices
Section titled “Principles Guiding Design Choices”Key principles that guide distributed system design:
Core Abilities
Section titled “Core Abilities”Scalability
- Can the system grow to handle increased load?
- Achieved through horizontal scaling, load balancing, partitioning
Maintainability
- How easy to understand, modify, and operate over time?
- Clear code, documentation, modular architecture
Efficiency
- Optimal use of resources (CPU, memory, network, storage)
- Cost-effectiveness at scale
Resilience
- Gracefully handle failures
- Design for failures, not against them
Measurement Metrics
Section titled “Measurement Metrics”Key metrics to evaluate system design:
Availability
- Percentage of time system is operational
- Measured in “nines”: 99.9% (“three nines”), 99.99% (“four nines”)
- 99.9% = ~8.76 hours downtime/year
- 99.99% = ~52.56 minutes downtime/year
Reliability
- System consistently performs as expected without failure
- Mean Time Between Failures (MTBF)
Fault Tolerance
- System’s ability to continue operating despite component failures
Redundancy
- Availability of backup components when needed
Throughput
- Amount of work processed per unit time (requests/sec, transactions/sec)
Latency
- Time to process a single request
- Measured as percentiles: P50 (median), P95, P99
API Design Principles
Section titled “API Design Principles”Core Principles
Section titled “Core Principles”Operation Clarity
- Use clear, intent-revealing names over generic CRUD
- Example:
POST /orders/123/cancelvsDELETE /orders/123
Protocol Selection
- HTTP/REST: Browser-based, public APIs
- gRPC: High-performance internal services
- GraphQL: Flexible client-driven queries
Data Formats
- JSON: Human-readable, universal support
- Protocol Buffers: Compact, efficient (gRPC)
- XML: Legacy systems
Best Practices
Section titled “Best Practices”Pagination
- Limit result sets for performance
- Patterns: offset/limit, cursor-based, page numbers
Filtering and Sorting
- Allow clients to query specific data
- Example:
/users?status=active&sort=created_at
Idempotency
- Ensure safe request retries
GET,PUT,DELETEare naturally idempotent- Add idempotency keys for
POST
Versioning
- Maintain backward compatibility
- Strategies: URL versioning (
/v1/users), header versioning
Security
- Authentication (OAuth 2.0, JWT)
- Rate limiting
- Input validation
- HTTPS only
Documentation
- OpenAPI/Swagger specifications
- Interactive API explorers
- Code examples in multiple languages
Common Trade-offs
Section titled “Common Trade-offs”System design involves balancing competing concerns:
Consistency vs Availability
Section titled “Consistency vs Availability”CAP Theorem: Choose 2 of 3 (Consistency, Availability, Partition Tolerance)
- CP Systems: Strong consistency, possible downtime
- AP Systems: High availability, eventual consistency
SQL vs NoSQL
Section titled “SQL vs NoSQL”| SQL | NoSQL |
|---|---|
| Strong consistency | Flexibility |
| ACID guarantees | Horizontal scalability |
| Complex joins | Simpler data models |
| Structured schema | Schema-less |
Caching Strategies
Section titled “Caching Strategies”| Strategy | Performance | Consistency | Complexity | Risk |
|---|---|---|---|---|
| Cache Aside | Medium | Medium | Low | Low |
| Write Through | Low (writes) | High | Medium | Low |
| Write Back | High | Low | High | Data loss |
| Write Around | Low (first read) | Medium | Low | Low |
Microservices vs Monolith
Section titled “Microservices vs Monolith”Microservices:
- ✅ Modularity, independent scaling
- ❌ Operational complexity, distributed debugging
Monolith:
- ✅ Simple deployment, easier debugging
- ❌ Tight coupling, harder to scale
Security vs Usability
Section titled “Security vs Usability”More Security → More friction for users Less Security → Better UX but higher risk
Balance: Multi-factor authentication, rate limiting, clear error messages
Performance vs Cost
Section titled “Performance vs Cost”Higher Performance → More expensive infrastructure Cost Optimization → Potential performance trade-offs
Balance: Caching, efficient algorithms, right-sized resources
Conclusion
Section titled “Conclusion”The “right” system design depends entirely on your specific context:
- Requirements: What must the system do?
- Constraints: Budget, team size, timeline, regulations
- Priorities: Performance, reliability, cost, time-to-market
Remember: Context is everything. A design that works for a startup with 1,000 users will differ vastly from one serving 100 million users.
Start simple, evolve as needed. Premature optimization is the root of much wasted effort.