Cost Optimization

Storage lifecycle, compute sizing, AI cost control, and network cost reduction for NFYio.

This guide helps you reduce costs across storage, compute, AI, and network layers in NFYio deployments.

Storage Cost Optimization

Lifecycle Rules

Automatically transition or expire objects to reduce storage costs:

{
  "rules": [
    {
      "id": "archive-old-logs",
      "status": "enabled",
      "filter": { "prefix": "logs/" },
      "transitions": [
        { "days": 30, "storage_class": "STANDARD_IA" },
        { "days": 90, "storage_class": "GLACIER" }
      ],
      "expiration": { "days": 365 }
    }
  ]
}
TransitionUse Case
STANDARD → STANDARD_IAAfter 30 days, infrequently accessed
STANDARD_IA → GLACIERAfter 90 days, archive
ExpirationDelete after retention period

Storage Classes

Choose the right storage class for each workload:

ClassCostAccessUse Case
STANDARDHighestInstantHot data, active workloads
STANDARD_IAMediumInstantInfrequent access
GLACIERLowestHoursArchives, compliance
# Upload directly to STANDARD_IA
aws s3 cp backup.tar.gz s3://my-bucket/ \
  --storage-class STANDARD_IA

Data Tiering

Tier data by access pattern:

TierData TypeStorage Class
HotActive app data, recent uploadsSTANDARD
WarmLogs, backups 30–90 days oldSTANDARD_IA
ColdArchives, complianceGLACIER

Compute Cost Optimization

Right-Size Containers

Avoid over-provisioning. Start with minimum viable resources and scale up:

# docker-compose - resource limits
nfyio-gateway:
  deploy:
    resources:
      limits:
        cpus: '2'
        memory: 2G
      reservations:
        cpus: '0.5'
        memory: 512M
ServiceMin (dev)Recommended (prod)
Gateway0.5 CPU, 512M2 CPU, 2G
Storage proxy0.5 CPU, 512M1 CPU, 1G
Agent1 CPU, 2G2 CPU, 4G

Auto-Scaling

Scale based on load to avoid paying for idle capacity:

# Kubernetes HPA example
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nfyio-gateway
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nfyio-gateway
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

AI Cost Optimization

Model Selection

Balance cost vs. quality when choosing models:

TaskLow CostBalancedHigh Quality
Embeddingstext-embedding-3-smallvoyage-2text-embedding-3-large
Chat/Completiongpt-4o-minigpt-4ogpt-4-turbo
# Use smaller model for embeddings
EMBEDDING_MODEL=text-embedding-3-small

Token Usage Optimization

  • Chunk size: Larger chunks = fewer embeddings = lower cost
  • Context window: Limit context sent to LLM; summarize when possible
  • Caching: Cache embeddings and frequent query results
  • Batch processing: Batch embedding requests to reduce API overhead
// Reduce tokens: summarize before sending to LLM
const summary = await summarize(chunks); // Shorter context
const response = await llm.chat([{ role: 'user', content: summary }]);

Embedding Reuse

Re-embed only when documents change. Use content hashes to detect changes:

const contentHash = hash(documentContent);
if (await db.getEmbeddingHash(docId) === contentHash) {
  return; // Skip re-embedding
}

Network Cost Optimization

Minimize Inter-Region Transfer

Keep data and compute in the same region. Cross-region transfer is typically more expensive.

TrafficCost
Same regionLow / free
Cross-regionHigher
Internet egressHighest

VPC Peering vs. Endpoints

OptionUse CaseCost
VPC PeeringConnect multiple VPCsNo egress for peered traffic
VPC EndpointsAccess NFYio privatelyEndpoint hourly + no egress
Public internetDev/testingEgress charges

Use VPC Peering when connecting your VPC to NFYio’s VPC. Use endpoints for single-VPC private access.

CDN for Public Content

Serve static/public objects via CDN to reduce origin egress:

User → CDN (edge) → Cache HIT → No origin request
                → Cache MISS → Origin (NFYio) → One-time egress

Cost Optimization Checklist

  • Lifecycle rules for old data
  • Storage classes matched to access patterns
  • Container resources right-sized
  • Auto-scaling configured
  • Embedding model chosen for cost/quality
  • Token usage optimized (chunk size, caching)
  • Same-region deployment
  • VPC peering/endpoints for private traffic

Next Steps