Distributed Cache - System Design Document

1. Overview

A distributed, in-memory key-value cache system built in Go for learning distributed systems concepts. Designed to run locally for development and testing, with production-grade patterns.

Target Use Case: Developers can spin up a multi-node cache cluster locally to practice distributed systems concepts.

2. Requirements

Functional Requirements

Store and retrieve key-value pairs (bytes)
Support TTL (time-to-live) for automatic expiration
Distribute data across multiple nodes using consistent hashing
Replicate data for fault tolerance
Handle node failures gracefully
Dynamic node discovery and membership

Non-Functional Requirements

Latency: < 10ms for local deployments
Availability: Survive single node failures (with replication)
Consistency: Configurable (eventual or quorum-based)
Scalability: Horizontal scaling by adding nodes
Operability: Easy to run locally with Docker Compose

Out of Scope (v1)

Persistence to disk
Authentication/Authorization
Multi-datacenter replication
Transactions or CAS (Compare-And-Swap)

3. High-Level Architecture

┌──────────────────────────────────────────────────┐
│             Service Registry (etcd)               │
│  • Node membership & health                       │
│  • Heartbeat monitoring                           │
└───────────┬──────────────────────────────────────┘
            │ (register, heartbeat, query)
            │
    ┌───────┴────────┬─────────────┬──────────┐
    │                │             │          │
┌───▼────┐      ┌───▼────┐   ┌───▼────┐  ┌──▼─────┐
│ Node 1 │◄────►│ Node 2 │◄─►│ Node 3 │  │ Client │
│ :8001  │ gRPC │ :8002  │   │ :8003  │  │        │
│        │      │        │   │        │  │        │
│ [LRU]  │      │ [LRU]  │   │ [LRU]  │  └────────┘
└────────┘      └────────┘   └────────┘
   Replication via gRPC

Components

Cache Node: gRPC server with in-memory LRU cache
Service Registry: Tracks active nodes (in-memory or etcd)
Client Library: Routes requests using consistent hashing
Replication Layer: Ensures data redundancy

4. Key Design Decisions

4.1 Client-Side vs Proxy-Based Routing

Decision: Client-side sharding (with optional proxy in v2)

Rationale:

Lower latency (no extra hop)
No single point of failure
Better scalability
Standard in production systems (Redis Cluster, Memcached)

Trade-off: Clients must maintain hash ring and node list

4.2 Data Distribution: Consistent Hashing

Algorithm: Consistent hashing with virtual nodes

Hash Ring (0 to 2^32):
Node1: [vnode-1-1, vnode-1-2, ..., vnode-1-100] at positions [1234, 5678, ...]
Node2: [vnode-2-1, vnode-2-2, ..., vnode-2-100] at positions [2345, 6789, ...]
Node3: [vnode-3-1, vnode-3-2, ..., vnode-3-100] at positions [3456, 7890, ...]

Key "user:123" → hash(key) = 4000 → walks clockwise → finds vnode-1-50 → Node1

Parameters:

Virtual nodes per physical node: 150 (balance between distribution and overhead)
Hash function: FNV-1a (fast, good distribution)

Benefits:

Adding/removing nodes only affects ~1/N of keys
Better load distribution with few physical nodes
Deterministic placement (all clients agree)

4.3 Replication Strategy

Decision: Configurable async (default) or sync replication

Async Replication (Default)

Write Flow:
1. Client → Primary node (Set key-value)
2. Primary writes to local cache
3. Primary returns SUCCESS to client immediately
4. Primary replicates to replicas in background goroutine

Configuration:

replication:
  strategy: "async"
  factor: 3          # Total copies (1 primary + 2 replicas)

Trade-offs:

✅ Low latency (~1-2ms)
✅ High throughput
❌ Risk of data loss if primary crashes before replication
❌ Eventual consistency (replicas lag)

Sync Replication (Optional)

Write Flow:
1. Client → Primary node
2. Primary writes locally
3. Primary replicates to replicas (parallel)
4. Wait for write_quorum acknowledgments
5. Return SUCCESS if quorum met

Configuration:

replication:
  strategy: "sync"
  factor: 3
  write_quorum: 2    # Must succeed on 2 nodes (including primary)
  timeout_ms: 100

Quorum Math: W + R > N ensures consistency

N=3, W=2, R=2 → Strong consistency (can tolerate 1 failure)

Trade-offs:

✅ Stronger consistency
✅ Less data loss risk
❌ Higher latency (~5-10ms)
❌ Lower throughput

4.4 Replica Placement

Strategy: Next N-1 nodes clockwise on the hash ring

Hash Ring:
... → Node1 → Node2 → Node3 → Node1 → ...

Key "user:123" → Primary: Node1
              → Replicas: [Node2, Node3]

Rationale:

Simple to implement
Deterministic (all clients agree)
Automatic rebalancing when nodes join/leave

4.5 Cache Eviction Policy

Algorithm: LRU (Least Recently Used)

Implementation:

Doubly-linked list for access order
Hash map for O(1) lookups
Move to front on Get/Set

Memory Management:

cache:
  max_size_mb: 512
  eviction_policy: "LRU"

Why LRU?

Simple and predictable
Works well for general workloads
O(1) eviction

(v2 could add LFU, FIFO, or TTL-only)

4.6 Service Discovery

Decision: Registry interface with pluggable implementations

type Registry interface {
    Register(node NodeInfo) error
    Heartbeat(nodeID string) error
    ListNodes() ([]NodeInfo, error)
    Deregister(nodeID string) error
}

Implementations:

In-Memory Registry (v1, default)

No external dependencies
Perfect for local development
Single process (not distributed)

etcd Registry (v2)

Production-ready
Distributed consensus
Watch API for membership changes
TTL-based leases

Configuration:

registry:
  type: "memory"              # or "etcd"
  endpoints: ["localhost:2379"]
  heartbeat_interval_s: 10
  node_ttl_s: 30

4.7 Failure Detection & Recovery

Health Checks

Nodes send heartbeat every 10 seconds
Registry marks node UNHEALTHY if no heartbeat for 30 seconds
Clients refresh node list every 5 seconds

Recovery Mechanisms

v1: Read Repair (Lazy)

Client reads key → Primary miss → Query replica → Hit
→ Primary learns key during read

v2: Anti-Entropy (Proactive)

Nodes periodically sync with replicas
Use Merkle trees for efficient comparison

v3: Hinted Handoff

When node is down, writes go to temporary node with "hint"
Forwarded when node recovers

5. API Design

gRPC Services

CacheService (cache.proto)

service CacheService {
  rpc Get(GetRequest) returns (GetResponse);
  rpc Set(SetRequest) returns (SetResponse);
  rpc Delete(DeleteRequest) returns (DeleteResponse);
  rpc HealthCheck(HealthCheckRequest) returns (HealthCheckResponse);
}

NodeRegistryService (registry.proto)

service NodeRegistryService {
  rpc RegisterNode(RegisterNodeRequest) returns (RegisterNodeResponse);
  rpc Heartbeat(HeartbeatRequest) returns (HeartbeatResponse);
  rpc ListNodes(ListNodesRequest) returns (ListNodesResponse);
}

6. Data Flows

Write Path (Async Replication)

1. Client computes hash(key) → identifies primary & replicas
2. Client → gRPC Set(key, value) → Primary
3. Primary:
   a. Writes to local LRU cache
   b. Returns SUCCESS to client
   c. Spawns goroutine:
      - gRPC Replicate(key, value) → Replica1
      - gRPC Replicate(key, value) → Replica2

Latency: ~1-2ms (local network)

Read Path

1. Client computes hash(key) → identifies primary
2. Client → gRPC Get(key) → Primary
3. Primary:
   a. Lookup in local cache
   b. If found → return value
   c. If miss → [v2: query replicas for read repair]

Latency: ~1ms (cache hit)

Node Join

1. New node starts
2. Node → gRPC RegisterNode() → Registry
3. Registry:
   a. Adds node to membership list
   b. Returns current node list
4. New node:
   a. Builds hash ring from node list
   b. Starts serving requests
5. Existing clients:
   a. Refresh node list (next poll)
   b. Rebuild hash ring
   c. Keys rehash to new topology

Note: In v1, no data migration. Keys naturally migrate as they're written.

Node Failure

1. Node crashes (stops sending heartbeats)
2. Registry marks node UNHEALTHY after 30s
3. Clients refresh node list
4. Clients rebuild hash ring (node removed)
5. Reads/writes go to replicas
6. [v2: Read repair fills gaps]

7. Configuration

Node Configuration (config.yaml)

node:
  id: "node-1"                    # Unique identifier
  address: "localhost:8001"       # gRPC listen address
  
cache:
  max_size_mb: 512
  eviction_policy: "LRU"
  default_ttl_s: 3600             # 1 hour
  
replication:
  strategy: "async"               # "async" or "sync"
  factor: 3
  write_quorum: 2                 # For sync only
  timeout_ms: 100
  
registry:
  type: "memory"                  # "memory" or "etcd"
  endpoints: ["localhost:2379"]
  heartbeat_interval_s: 10
  node_ttl_s: 30

metrics:
  enabled: true
  port: 9090                      # Prometheus metrics

logging:
  level: "info"                   # debug, info, warn, error
  format: "json"

8. Observability

Metrics (Prometheus)

# Cache metrics
cache_hits_total
cache_misses_total
cache_evictions_total
cache_size_bytes
cache_keys_total

# Replication metrics
replication_lag_seconds
replication_failures_total

# Node metrics
node_status (healthy=1, unhealthy=0)
grpc_requests_total{method, status}
grpc_request_duration_seconds{method}

Logging

Structured JSON logs (zap)
Request IDs for tracing
Key operations logged: Set, Get, Eviction, Replication

9. Testing Strategy

Unit Tests

Cache operations (LRU, TTL)
Consistent hashing (distribution, node add/remove)
Replication strategies

Integration Tests

Multi-node cluster setup
Node failure scenarios
Data distribution verification

Performance Tests

Throughput benchmarks
Latency percentiles (p50, p95, p99)
Memory usage under load

10. Deployment

Local Development (Docker Compose)

version: '3'
services:
  node1:
    build: .
    ports: ["8001:8001", "9091:9090"]
    environment:
      - NODE_ID=node-1
  node2:
    build: .
    ports: ["8002:8001", "9092:9090"]
  node3:
    build: .
    ports: ["8003:8001", "9093:9090"]

Start cluster:

docker-compose up -d

11. Future Enhancements (v2+)

Sync replication with quorum
Read repair / anti-entropy
etcd-based service discovery
Kubernetes deployment (StatefulSet)
Persistence (RocksDB backend)
Client connection pooling
Hot key detection
Admin API (metrics, key distribution)
Proxy mode (optional coordinator)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
cmd/cache-node		cmd/cache-node
pkg		pkg
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Folders and files

Latest commit

History

Repository files navigation

Distributed Cache - System Design Document

1. Overview

2. Requirements

Functional Requirements

Non-Functional Requirements

Out of Scope (v1)

3. High-Level Architecture

Components

4. Key Design Decisions

4.1 Client-Side vs Proxy-Based Routing

4.2 Data Distribution: Consistent Hashing

4.3 Replication Strategy

Async Replication (Default)

Sync Replication (Optional)

4.4 Replica Placement

4.5 Cache Eviction Policy

4.6 Service Discovery

In-Memory Registry (v1, default)

etcd Registry (v2)

4.7 Failure Detection & Recovery

Health Checks

Recovery Mechanisms

5. API Design

gRPC Services

CacheService (cache.proto)

NodeRegistryService (registry.proto)

6. Data Flows

Write Path (Async Replication)

Read Path

Node Join

Node Failure

7. Configuration

Node Configuration (config.yaml)

8. Observability

Metrics (Prometheus)

Logging

9. Testing Strategy

Unit Tests

Integration Tests

Performance Tests

10. Deployment

Local Development (Docker Compose)

11. Future Enhancements (v2+)

12. References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages