Scalability Guide
Horizontal scaling, database scaling, caching strategies, and multi-region deployment for NFYio.
This guide covers scaling strategies for NFYio at high load: horizontal scaling, database scaling, caching, and multi-region deployment.
Horizontal Scaling
API Gateway
Scale the gateway horizontally behind a load balancer:
# Kubernetes Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: nfyio-gateway
spec:
replicas: 3
selector:
matchLabels:
app: nfyio-gateway
template:
spec:
containers:
- name: gateway
image: nfyio/gateway:latest
ports:
- containerPort: 3000
| Component | Scaling Strategy |
|---|---|
| API Gateway | Add replicas; stateless, scales linearly |
| Load Balancer | Round-robin or least-connections |
Storage Nodes
SeaweedFS scales by adding volume nodes:
# Add more volume nodes
seaweedfs-volume-1:
image: chrislusf/seaweedfs
command: volume -mserver=seaweedfs-master:9333 -port=8080
seaweedfs-volume-2:
image: chrislusf/seaweedfs
command: volume -mserver=seaweedfs-master:9333 -port=8080
Each volume node adds capacity and throughput. The master distributes writes across volumes.
Embedding Workers
For high-volume embedding pipelines, scale worker replicas:
nfyio-embedding-worker:
deploy:
replicas: 4
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- BATCH_SIZE=32
| Setting | Impact |
|---|---|
| Replicas | Throughput (linear) |
| BATCH_SIZE | API efficiency (larger = fewer calls, more memory) |
Database Scaling
Read Replicas
Offload read traffic to replicas. Use PostgreSQL streaming replication:
# Primary
postgres-primary:
image: pgvector/pgvector:pg16
environment:
- POSTGRES_REPLICATION_MODE=master
# Read replica
postgres-replica:
image: pgvector/pgvector:pg16
environment:
- POSTGRES_REPLICATION_MODE=slave
- POSTGRES_MASTER_HOST=postgres-primary
Route read queries (SELECT, list operations) to replicas. Writes go to primary.
Connection Pooling
Use PgBouncer to handle connection spikes:
[databases]
nfyio = host=postgres port=5432 dbname=nfyio
[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 50
reserve_pool_size = 25
| Parameter | Purpose |
|---|---|
| max_client_conn | Total client connections |
| default_pool_size | DB connections per database |
| reserve_pool_size | Extra for burst |
pgvector Optimization
For vector similarity search at scale:
-- IVFFlat index (faster build, good for < 1M vectors)
CREATE INDEX idx_embeddings_ivfflat ON embeddings
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- HNSW index (faster query, better recall, slower build)
CREATE INDEX idx_embeddings_hnsw ON embeddings
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
| Index | Build Time | Query Speed | Recall |
|---|---|---|---|
| IVFFlat | Fast | Good | Good |
| HNSW | Slower | Faster | Better |
Caching Strategies
Redis
Use Redis for session, rate limit, and query caching:
redis:
image: redis:7-alpine
command: redis-server --maxmemory 2gb --maxmemory-policy allkeys-lru
| Use Case | TTL | Key Pattern |
|---|---|---|
| Session | 24h | session:{id} |
| Rate limit | 1m | ratelimit:{key}:{window} |
| Query cache | 5m | query:{hash} |
| Embedding cache | 24h | emb:{hash} |
CDN Caching
Cache public objects at the edge:
| Header | Effect |
|---|---|
Cache-Control: public, max-age=3600 | Cache 1 hour |
Cache-Control: s-maxage=86400 | CDN cache 24h |
Vary: Authorization | Separate cache per auth |
Query Result Caching
Cache expensive RAG or list operations:
const cacheKey = `list:${bucket}:${prefix}:${page}`;
let result = await redis.get(cacheKey);
if (!result) {
result = await s3.listObjectsV2({ Bucket: bucket, Prefix: prefix });
await redis.setex(cacheKey, 60, JSON.stringify(result));
}
return JSON.parse(result);
Multi-Region Deployment
Architecture
Region A (Primary) Region B (DR/Read)
┌─────────────────────┐ ┌─────────────────────┐
│ Gateway │ │ Gateway (read) │
│ Storage (primary) │────▶│ Storage (replica) │
│ PostgreSQL (primary)│───▶│ PostgreSQL (replica)│
│ Redis (primary) │ │ Redis (replica) │
└─────────────────────┘ └─────────────────────┘
Considerations
| Aspect | Strategy |
|---|---|
| Data replication | Async replication (PostgreSQL, SeaweedFS) |
| Routing | GeoDNS or latency-based routing |
| Consistency | Eventually consistent for cross-region reads |
| Failover | Manual or automated (RTO/RPO defined) |
Cross-Region Object Replication
For object storage, use replication rules:
{
"replication": {
"role": "source",
"rules": [
{
"id": "replicate-to-region-b",
"status": "enabled",
"destination": {
"bucket": "arn:nfyio:storage:region-b::my-bucket",
"storage_class": "STANDARD"
},
"filter": { "prefix": "critical/" }
}
]
}
}
Scaling Checklist
- API Gateway replicas behind load balancer
- Storage volume nodes scaled for capacity
- Embedding workers scaled for throughput
- Read replicas for database
- PgBouncer for connection pooling
- pgvector index tuned (IVFFlat or HNSW)
- Redis for sessions and caching
- CDN for public objects
- Multi-region plan if required
Next Steps
- Performance Optimization — Tuning for efficiency
- Cost Optimization — Scaling cost-effectively
- Architecture — System overview
- Storage Overview — Storage scaling