Performance Optimization

This guide covers performance optimization strategies for NFYio storage, AI agents, database, and network layers.

Storage Performance

Multipart Uploads

For objects larger than 100 MB, use multipart uploads to improve throughput and resilience:

# AWS CLI multipart upload (S3-compatible)
aws --endpoint-url https://storage.yourdomain.com s3 cp large-file.bin s3://my-bucket/ \
  --multipart-threshold 100MB \
  --multipart-chunksize 64MB

// JavaScript SDK - multipart upload
const { Upload } = require('@aws-sdk/lib-storage');
const upload = new Upload({
  client: s3Client,
  params: {
    Bucket: 'my-bucket',
    Key: 'large-file.bin',
    Body: fs.createReadStream('large-file.bin'),
  },
  partSize: 64 * 1024 * 1024, // 64 MB
  queueSize: 4, // Concurrent parts
});
await upload.done();

Object Size	Strategy
< 5 MB	Single PUT
5–100 MB	Single PUT or multipart
> 100 MB	Multipart (64 MB parts recommended)

Connection Pooling

Reuse HTTP connections for S3 operations. Most SDKs do this by default; ensure you use a single client instance:

// Good: Single client, connection pooling
const s3 = new S3Client({ endpoint: STORAGE_URL, maxAttempts: 3 });
for (const key of keys) {
  await s3.send(new GetObjectCommand({ Bucket, Key: key }));
}

// Avoid: New client per request
// const s3 = new S3Client(); // per request - no pooling

CDN Caching

For public or semi-public objects, put a CDN in front of the storage endpoint:

Header	Purpose
`Cache-Control: public, max-age=3600`	Cache for 1 hour
`Cache-Control: private, no-cache`	No caching (sensitive data)
`X-Amz-Meta-Cache-TTL: 86400`	Custom cache TTL (if supported)

# Upload with cache headers
aws s3 cp index.html s3://my-bucket/ \
  --metadata "Cache-Control=public, max-age=3600"

Agent Performance

Chunk Size Tuning

Chunk size affects retrieval quality and latency. Smaller chunks = more precise, more tokens. Larger chunks = faster, less precise.

Chunk Size	Use Case	Trade-off
256 tokens	Factual Q&A, exact matches	Higher precision, more chunks
512 tokens	General RAG	Balanced
1024 tokens	Long documents, summarization	Faster, may miss details

{
  "chunk_size": 512,
  "chunk_overlap": 50,
  "embedding_model": "text-embedding-3-small"
}

Embedding Model Selection

Model	Dimensions	Speed	Quality	Cost
text-embedding-3-small	1536	Fast	Good	Low
text-embedding-3-large	3072	Slower	Best	High
voyage-2	1024	Fast	Excellent	Medium

Choose based on latency vs. quality requirements. For high-throughput, use smaller models.

Caching Frequent Queries

Cache embedding and retrieval results for repeated queries:

// Pseudocode: Cache by query hash
const cacheKey = `rag:${hash(query)}`;
let result = await redis.get(cacheKey);
if (!result) {
  result = await agent.query(query);
  await redis.setex(cacheKey, 300, JSON.stringify(result)); // 5 min TTL
}

Database Optimization

Connection Pooling

Use PgBouncer or similar for connection pooling:

# docker-compose - PgBouncer
pgbouncer:
  image: pgbouncer/pgbouncer
  environment:
    - DATABASES_HOST=postgres
    - DATABASES_PORT=5432
    - PGBOUNCER_POOL_MODE=transaction
    - PGBOUNCER_MAX_CLIENT_CONN=1000
    - PGBOUNCER_DEFAULT_POOL_SIZE=25

Setting	Recommended
Pool mode	transaction
Pool size	20–50 per service
Max client conn	2–3× pool size

Indexing

Add indexes for common query patterns:

-- Bucket lookups by owner
CREATE INDEX idx_buckets_owner ON buckets(owner_id);

-- Object listing by prefix
CREATE INDEX idx_objects_bucket_prefix ON objects(bucket_id, name text_pattern_ops);

-- pgvector similarity search
CREATE INDEX idx_embeddings_vector ON embeddings 
  USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

Query Optimization

Use EXPLAIN ANALYZE for slow queries
Avoid N+1: batch loads with IN or joins
Use pagination (LIMIT/OFFSET or cursor-based) for large result sets

Network Optimization

VPC Endpoint Routing

Route traffic through VPC endpoints to avoid public internet:

Client → VPC Endpoint → NFYio (private)
         (no egress to internet)

Reduces latency and egress costs.

Minimize Egress

Strategy	Benefit
Same-region clients	Lower latency, no cross-region egress
VPC peering	Private traffic, no egress charges
CDN at edge	Reduce origin egress

Compression

Enable gzip for API responses when supported:

curl -H "Accept-Encoding: gzip" https://api.yourdomain.com/v1/buckets

Performance Checklist

Multipart uploads for objects > 100 MB
Connection pooling for S3 and DB
CDN for public objects
Chunk size tuned for RAG use case
Embedding model selected for cost/quality
Query result caching where appropriate
Database indexes on hot paths
VPC endpoints for private traffic

Next Steps

Objects — Upload and download patterns
Embeddings — Chunking and indexing
RAG Agents — Agent configuration
Cost Optimization — Reduce spend