Embeddings & Vector Search

What are embeddings, supported models (OpenAI, Voyage AI), auto-indexing on object upload, pgvector similarity search, chunk strategies, re-indexing, and usage metering.

Embeddings are dense vector representations of text. NFYio uses them to power semantic search: similar texts have similar vectors, so you can find relevant documents by vector similarity instead of keyword matching. Embeddings are stored in pgvector and support auto-indexing, configurable chunk strategies, and usage metering.

What are Embeddings?

An embedding is a fixed-size vector of numbers that captures semantic meaning. For example:

  • “refund policy” and “return policy” → similar vectors (close in vector space)
  • “refund policy” and “weather forecast” → dissimilar vectors (far apart)

Models like OpenAI’s text-embedding-3-small convert text into these vectors. NFYio stores them in PostgreSQL with the pgvector extension for fast similarity search.

Supported Models

ModelProviderDimensionsMax TokensUse Case
text-embedding-3-smallOpenAI15368191Fast, cost-effective
text-embedding-3-largeOpenAI30728191Higher quality
voyage-3.5-liteVoyage AI102416000Long documents

Configuring the Embedding Model

{
  "embedding": {
    "model": "text-embedding-3-small",
    "dimensions": 1536
  }
}

For text-embedding-3-large, you can optionally reduce dimensions (e.g., 256, 1024) for smaller indexes and faster search.

Auto-Indexing on Object Upload

When you upload objects to a configured bucket, NFYio automatically:

  1. Detects new or updated objects (via S3 events or polling)
  2. Loads the document (PDF, DOCX, TXT, Markdown, images with OCR)
  3. Chunks the content according to your chunk strategy
  4. Embeds each chunk with the configured model
  5. Stores embeddings in pgvector with metadata (bucket, key, chunk index)

Enabling Auto-Indexing

{
  "bucket": "my-docs",
  "prefix": "knowledge/",
  "autoIndex": true,
  "embedding": {
    "model": "text-embedding-3-small",
    "chunkSize": 512,
    "chunkOverlap": 64
  }
}

Only objects under the specified prefix are indexed. Use prefix: "" to index the entire bucket.

NFYio uses pgvector for vector storage and search. Supported distance metrics:

MetricOperatorUse Case
Cosine<=>Default, normalized vectors
L2 (Euclidean)<->When magnitude matters
Inner product<#>Pre-normalized vectors

Query Example

-- Cosine similarity (NFYio default)
SELECT chunk_id, content, 1 - (embedding <=> $1::vector) AS similarity
FROM document_embeddings
WHERE workspace_id = $2
ORDER BY embedding <=> $1::vector
LIMIT 5;

Indexing for Scale

For large corpora, create an HNSW or IVFFlat index:

CREATE INDEX ON document_embeddings
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

NFYio creates these indexes automatically when you configure a bucket for embedding.

Chunk Strategies

How you chunk documents affects retrieval quality:

Fixed Token Chunks

StrategyChunk SizeOverlapBest For
Small25632Fine-grained retrieval, FAQs
Medium51264General purpose (default)
Large1024128Long-form context, narratives

Semantic Chunking (Experimental)

Split on sentence or paragraph boundaries instead of fixed tokens. Preserves logical units and can improve retrieval for structured documents.

{
  "chunkStrategy": "semantic",
  "splitOn": "paragraph",
  "minChunkSize": 200,
  "maxChunkSize": 512
}

Overlap

Overlap between chunks prevents splitting important context across boundaries. Typical overlap: 10–20% of chunk size.

Re-indexing

Re-index when you:

  • Change the embedding model
  • Change chunk size or strategy
  • Fix corrupted or missing embeddings
  • Add new document types

Trigger Re-index via API

curl -X POST "https://api.yourdomain.com/v1/buckets/my-docs/reindex" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "prefix": "knowledge/",
    "full": true
  }'
OptionDescription
full: trueRe-embed all documents (destructive)
full: falseOnly process new/updated objects since last index
prefixLimit to objects under this prefix

Incremental Updates

By default, NFYio performs incremental indexing: only new or modified objects are processed. Deleted objects have their embeddings removed.

Analytics and Usage Metering

NFYio tracks embedding usage for billing and analytics:

MetricDescription
embedding_tokensTotal tokens embedded
embedding_requestsNumber of embedding API calls
search_queriesNumber of similarity searches
indexed_documentsDocuments in the vector store
indexed_chunksTotal chunks stored

Usage API

curl "https://api.yourdomain.com/v1/usage/embeddings?workspaceId=ws_123&from=2026-03-01&to=2026-03-31" \
  -H "Authorization: Bearer $TOKEN"
{
  "workspaceId": "ws_123",
  "period": "2026-03-01 to 2026-03-31",
  "embeddingTokens": 1250000,
  "embeddingRequests": 4200,
  "searchQueries": 8500,
  "indexedDocuments": 1200,
  "indexedChunks": 45000
}

Best Practices

Chunk Size

  • Start with 512 tokens and 64 overlap
  • Use smaller chunks (256) for precise retrieval; larger (1024) for narrative context

Model Selection

  • text-embedding-3-small for most use cases
  • text-embedding-3-large when quality is critical
  • voyage-3.5-lite for very long documents (16K context)

Index Maintenance

  • Run incremental re-index regularly if documents change often
  • Monitor indexed_chunks growth; consider archiving old documents

Similarity Threshold

  • Filter low-similarity results (e.g., similarity < 0.7) to reduce noise
  • Tune per use case; support chatbots may need lower thresholds than strict Q&A

Next Steps