Cost Optimization

This guide helps you reduce costs across storage, compute, AI, and network layers in NFYio deployments.

Storage Cost Optimization

Lifecycle Rules

Automatically transition or expire objects to reduce storage costs:

{
  "rules": [
    {
      "id": "archive-old-logs",
      "status": "enabled",
      "filter": { "prefix": "logs/" },
      "transitions": [
        { "days": 30, "storage_class": "STANDARD_IA" },
        { "days": 90, "storage_class": "GLACIER" }
      ],
      "expiration": { "days": 365 }
    }
  ]
}

Transition	Use Case
STANDARD → STANDARD_IA	After 30 days, infrequently accessed
STANDARD_IA → GLACIER	After 90 days, archive
Expiration	Delete after retention period

Storage Classes

Choose the right storage class for each workload:

Class	Cost	Access	Use Case
STANDARD	Highest	Instant	Hot data, active workloads
STANDARD_IA	Medium	Instant	Infrequent access
GLACIER	Lowest	Hours	Archives, compliance

# Upload directly to STANDARD_IA
aws s3 cp backup.tar.gz s3://my-bucket/ \
  --storage-class STANDARD_IA

Data Tiering

Tier data by access pattern:

Tier	Data Type	Storage Class
Hot	Active app data, recent uploads	STANDARD
Warm	Logs, backups 30–90 days old	STANDARD_IA
Cold	Archives, compliance	GLACIER

Compute Cost Optimization

Right-Size Containers

Avoid over-provisioning. Start with minimum viable resources and scale up:

# docker-compose - resource limits
nfyio-gateway:
  deploy:
    resources:
      limits:
        cpus: '2'
        memory: 2G
      reservations:
        cpus: '0.5'
        memory: 512M

Service	Min (dev)	Recommended (prod)
Gateway	0.5 CPU, 512M	2 CPU, 2G
Storage proxy	0.5 CPU, 512M	1 CPU, 1G
Agent	1 CPU, 2G	2 CPU, 4G

Auto-Scaling

Scale based on load to avoid paying for idle capacity:

# Kubernetes HPA example
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nfyio-gateway
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nfyio-gateway
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

AI Cost Optimization

Model Selection

Balance cost vs. quality when choosing models:

Task	Low Cost	Balanced	High Quality
Embeddings	text-embedding-3-small	voyage-2	text-embedding-3-large
Chat/Completion	gpt-4o-mini	gpt-4o	gpt-4-turbo

# Use smaller model for embeddings
EMBEDDING_MODEL=text-embedding-3-small

Token Usage Optimization

Chunk size: Larger chunks = fewer embeddings = lower cost
Context window: Limit context sent to LLM; summarize when possible
Caching: Cache embeddings and frequent query results
Batch processing: Batch embedding requests to reduce API overhead

// Reduce tokens: summarize before sending to LLM
const summary = await summarize(chunks); // Shorter context
const response = await llm.chat([{ role: 'user', content: summary }]);

Embedding Reuse

Re-embed only when documents change. Use content hashes to detect changes:

const contentHash = hash(documentContent);
if (await db.getEmbeddingHash(docId) === contentHash) {
  return; // Skip re-embedding
}

Network Cost Optimization

Minimize Inter-Region Transfer

Keep data and compute in the same region. Cross-region transfer is typically more expensive.

Traffic	Cost
Same region	Low / free
Cross-region	Higher
Internet egress	Highest

VPC Peering vs. Endpoints

Option	Use Case	Cost
VPC Peering	Connect multiple VPCs	No egress for peered traffic
VPC Endpoints	Access NFYio privately	Endpoint hourly + no egress
Public internet	Dev/testing	Egress charges

Use VPC Peering when connecting your VPC to NFYio’s VPC. Use endpoints for single-VPC private access.

CDN for Public Content

Serve static/public objects via CDN to reduce origin egress:

User → CDN (edge) → Cache HIT → No origin request
                → Cache MISS → Origin (NFYio) → One-time egress

Cost Optimization Checklist

Lifecycle rules for old data
Storage classes matched to access patterns
Container resources right-sized
Auto-scaling configured
Embedding model chosen for cost/quality
Token usage optimized (chunk size, caching)
Same-region deployment
VPC peering/endpoints for private traffic

Next Steps

Storage Classes — Class options and pricing
Storage Overview — Lifecycle and versioning
Scalability Guide — Scaling strategies
Performance Optimization — Tuning for efficiency