Performance Optimization

Storage performance, agent tuning, database optimization, and network efficiency for NFYio.

This guide covers performance optimization strategies for NFYio storage, AI agents, database, and network layers.

Storage Performance

Multipart Uploads

For objects larger than 100 MB, use multipart uploads to improve throughput and resilience:

# AWS CLI multipart upload (S3-compatible)
aws --endpoint-url https://storage.yourdomain.com s3 cp large-file.bin s3://my-bucket/ \
  --multipart-threshold 100MB \
  --multipart-chunksize 64MB
// JavaScript SDK - multipart upload
const { Upload } = require('@aws-sdk/lib-storage');
const upload = new Upload({
  client: s3Client,
  params: {
    Bucket: 'my-bucket',
    Key: 'large-file.bin',
    Body: fs.createReadStream('large-file.bin'),
  },
  partSize: 64 * 1024 * 1024, // 64 MB
  queueSize: 4, // Concurrent parts
});
await upload.done();
Object SizeStrategy
< 5 MBSingle PUT
5–100 MBSingle PUT or multipart
> 100 MBMultipart (64 MB parts recommended)

Connection Pooling

Reuse HTTP connections for S3 operations. Most SDKs do this by default; ensure you use a single client instance:

// Good: Single client, connection pooling
const s3 = new S3Client({ endpoint: STORAGE_URL, maxAttempts: 3 });
for (const key of keys) {
  await s3.send(new GetObjectCommand({ Bucket, Key: key }));
}

// Avoid: New client per request
// const s3 = new S3Client(); // per request - no pooling

CDN Caching

For public or semi-public objects, put a CDN in front of the storage endpoint:

HeaderPurpose
Cache-Control: public, max-age=3600Cache for 1 hour
Cache-Control: private, no-cacheNo caching (sensitive data)
X-Amz-Meta-Cache-TTL: 86400Custom cache TTL (if supported)
# Upload with cache headers
aws s3 cp index.html s3://my-bucket/ \
  --metadata "Cache-Control=public, max-age=3600"

Agent Performance

Chunk Size Tuning

Chunk size affects retrieval quality and latency. Smaller chunks = more precise, more tokens. Larger chunks = faster, less precise.

Chunk SizeUse CaseTrade-off
256 tokensFactual Q&A, exact matchesHigher precision, more chunks
512 tokensGeneral RAGBalanced
1024 tokensLong documents, summarizationFaster, may miss details
{
  "chunk_size": 512,
  "chunk_overlap": 50,
  "embedding_model": "text-embedding-3-small"
}

Embedding Model Selection

ModelDimensionsSpeedQualityCost
text-embedding-3-small1536FastGoodLow
text-embedding-3-large3072SlowerBestHigh
voyage-21024FastExcellentMedium

Choose based on latency vs. quality requirements. For high-throughput, use smaller models.

Caching Frequent Queries

Cache embedding and retrieval results for repeated queries:

// Pseudocode: Cache by query hash
const cacheKey = `rag:${hash(query)}`;
let result = await redis.get(cacheKey);
if (!result) {
  result = await agent.query(query);
  await redis.setex(cacheKey, 300, JSON.stringify(result)); // 5 min TTL
}

Database Optimization

Connection Pooling

Use PgBouncer or similar for connection pooling:

# docker-compose - PgBouncer
pgbouncer:
  image: pgbouncer/pgbouncer
  environment:
    - DATABASES_HOST=postgres
    - DATABASES_PORT=5432
    - PGBOUNCER_POOL_MODE=transaction
    - PGBOUNCER_MAX_CLIENT_CONN=1000
    - PGBOUNCER_DEFAULT_POOL_SIZE=25
SettingRecommended
Pool modetransaction
Pool size20–50 per service
Max client conn2–3× pool size

Indexing

Add indexes for common query patterns:

-- Bucket lookups by owner
CREATE INDEX idx_buckets_owner ON buckets(owner_id);

-- Object listing by prefix
CREATE INDEX idx_objects_bucket_prefix ON objects(bucket_id, name text_pattern_ops);

-- pgvector similarity search
CREATE INDEX idx_embeddings_vector ON embeddings 
  USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

Query Optimization

  • Use EXPLAIN ANALYZE for slow queries
  • Avoid N+1: batch loads with IN or joins
  • Use pagination (LIMIT/OFFSET or cursor-based) for large result sets

Network Optimization

VPC Endpoint Routing

Route traffic through VPC endpoints to avoid public internet:

Client → VPC Endpoint → NFYio (private)
         (no egress to internet)

Reduces latency and egress costs.

Minimize Egress

StrategyBenefit
Same-region clientsLower latency, no cross-region egress
VPC peeringPrivate traffic, no egress charges
CDN at edgeReduce origin egress

Compression

Enable gzip for API responses when supported:

curl -H "Accept-Encoding: gzip" https://api.yourdomain.com/v1/buckets

Performance Checklist

  • Multipart uploads for objects > 100 MB
  • Connection pooling for S3 and DB
  • CDN for public objects
  • Chunk size tuned for RAG use case
  • Embedding model selected for cost/quality
  • Query result caching where appropriate
  • Database indexes on hot paths
  • VPC endpoints for private traffic

Next Steps