Performance Optimization
Storage performance, agent tuning, database optimization, and network efficiency for NFYio.
This guide covers performance optimization strategies for NFYio storage, AI agents, database, and network layers.
Storage Performance
Multipart Uploads
For objects larger than 100 MB, use multipart uploads to improve throughput and resilience:
# AWS CLI multipart upload (S3-compatible)
aws --endpoint-url https://storage.yourdomain.com s3 cp large-file.bin s3://my-bucket/ \
--multipart-threshold 100MB \
--multipart-chunksize 64MB
// JavaScript SDK - multipart upload
const { Upload } = require('@aws-sdk/lib-storage');
const upload = new Upload({
client: s3Client,
params: {
Bucket: 'my-bucket',
Key: 'large-file.bin',
Body: fs.createReadStream('large-file.bin'),
},
partSize: 64 * 1024 * 1024, // 64 MB
queueSize: 4, // Concurrent parts
});
await upload.done();
| Object Size | Strategy |
|---|---|
| < 5 MB | Single PUT |
| 5–100 MB | Single PUT or multipart |
| > 100 MB | Multipart (64 MB parts recommended) |
Connection Pooling
Reuse HTTP connections for S3 operations. Most SDKs do this by default; ensure you use a single client instance:
// Good: Single client, connection pooling
const s3 = new S3Client({ endpoint: STORAGE_URL, maxAttempts: 3 });
for (const key of keys) {
await s3.send(new GetObjectCommand({ Bucket, Key: key }));
}
// Avoid: New client per request
// const s3 = new S3Client(); // per request - no pooling
CDN Caching
For public or semi-public objects, put a CDN in front of the storage endpoint:
| Header | Purpose |
|---|---|
Cache-Control: public, max-age=3600 | Cache for 1 hour |
Cache-Control: private, no-cache | No caching (sensitive data) |
X-Amz-Meta-Cache-TTL: 86400 | Custom cache TTL (if supported) |
# Upload with cache headers
aws s3 cp index.html s3://my-bucket/ \
--metadata "Cache-Control=public, max-age=3600"
Agent Performance
Chunk Size Tuning
Chunk size affects retrieval quality and latency. Smaller chunks = more precise, more tokens. Larger chunks = faster, less precise.
| Chunk Size | Use Case | Trade-off |
|---|---|---|
| 256 tokens | Factual Q&A, exact matches | Higher precision, more chunks |
| 512 tokens | General RAG | Balanced |
| 1024 tokens | Long documents, summarization | Faster, may miss details |
{
"chunk_size": 512,
"chunk_overlap": 50,
"embedding_model": "text-embedding-3-small"
}
Embedding Model Selection
| Model | Dimensions | Speed | Quality | Cost |
|---|---|---|---|---|
| text-embedding-3-small | 1536 | Fast | Good | Low |
| text-embedding-3-large | 3072 | Slower | Best | High |
| voyage-2 | 1024 | Fast | Excellent | Medium |
Choose based on latency vs. quality requirements. For high-throughput, use smaller models.
Caching Frequent Queries
Cache embedding and retrieval results for repeated queries:
// Pseudocode: Cache by query hash
const cacheKey = `rag:${hash(query)}`;
let result = await redis.get(cacheKey);
if (!result) {
result = await agent.query(query);
await redis.setex(cacheKey, 300, JSON.stringify(result)); // 5 min TTL
}
Database Optimization
Connection Pooling
Use PgBouncer or similar for connection pooling:
# docker-compose - PgBouncer
pgbouncer:
image: pgbouncer/pgbouncer
environment:
- DATABASES_HOST=postgres
- DATABASES_PORT=5432
- PGBOUNCER_POOL_MODE=transaction
- PGBOUNCER_MAX_CLIENT_CONN=1000
- PGBOUNCER_DEFAULT_POOL_SIZE=25
| Setting | Recommended |
|---|---|
| Pool mode | transaction |
| Pool size | 20–50 per service |
| Max client conn | 2–3× pool size |
Indexing
Add indexes for common query patterns:
-- Bucket lookups by owner
CREATE INDEX idx_buckets_owner ON buckets(owner_id);
-- Object listing by prefix
CREATE INDEX idx_objects_bucket_prefix ON objects(bucket_id, name text_pattern_ops);
-- pgvector similarity search
CREATE INDEX idx_embeddings_vector ON embeddings
USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
Query Optimization
- Use
EXPLAIN ANALYZEfor slow queries - Avoid N+1: batch loads with
INor joins - Use pagination (
LIMIT/OFFSETor cursor-based) for large result sets
Network Optimization
VPC Endpoint Routing
Route traffic through VPC endpoints to avoid public internet:
Client → VPC Endpoint → NFYio (private)
(no egress to internet)
Reduces latency and egress costs.
Minimize Egress
| Strategy | Benefit |
|---|---|
| Same-region clients | Lower latency, no cross-region egress |
| VPC peering | Private traffic, no egress charges |
| CDN at edge | Reduce origin egress |
Compression
Enable gzip for API responses when supported:
curl -H "Accept-Encoding: gzip" https://api.yourdomain.com/v1/buckets
Performance Checklist
- Multipart uploads for objects > 100 MB
- Connection pooling for S3 and DB
- CDN for public objects
- Chunk size tuned for RAG use case
- Embedding model selected for cost/quality
- Query result caching where appropriate
- Database indexes on hot paths
- VPC endpoints for private traffic
Next Steps
- Objects — Upload and download patterns
- Embeddings — Chunking and indexing
- RAG Agents — Agent configuration
- Cost Optimization — Reduce spend