Embeddings & Vector Search

Embeddings are dense vector representations of text. NFYio uses them to power semantic search: similar texts have similar vectors, so you can find relevant documents by vector similarity instead of keyword matching. Embeddings are stored in pgvector and support auto-indexing, configurable chunk strategies, and usage metering.

What are Embeddings?

An embedding is a fixed-size vector of numbers that captures semantic meaning. For example:

“refund policy” and “return policy” → similar vectors (close in vector space)
“refund policy” and “weather forecast” → dissimilar vectors (far apart)

Models like OpenAI’s text-embedding-3-small convert text into these vectors. NFYio stores them in PostgreSQL with the pgvector extension for fast similarity search.

Supported Models

Model	Provider	Dimensions	Max Tokens	Use Case
`text-embedding-3-small`	OpenAI	1536	8191	Fast, cost-effective
`text-embedding-3-large`	OpenAI	3072	8191	Higher quality
`voyage-3.5-lite`	Voyage AI	1024	16000	Long documents

Configuring the Embedding Model

{
  "embedding": {
    "model": "text-embedding-3-small",
    "dimensions": 1536
  }
}

For text-embedding-3-large, you can optionally reduce dimensions (e.g., 256, 1024) for smaller indexes and faster search.

Auto-Indexing on Object Upload

When you upload objects to a configured bucket, NFYio automatically:

Detects new or updated objects (via S3 events or polling)
Loads the document (PDF, DOCX, TXT, Markdown, images with OCR)
Chunks the content according to your chunk strategy
Embeds each chunk with the configured model
Stores embeddings in pgvector with metadata (bucket, key, chunk index)

Enabling Auto-Indexing

{
  "bucket": "my-docs",
  "prefix": "knowledge/",
  "autoIndex": true,
  "embedding": {
    "model": "text-embedding-3-small",
    "chunkSize": 512,
    "chunkOverlap": 64
  }
}

Only objects under the specified prefix are indexed. Use prefix: "" to index the entire bucket.

pgvector Similarity Search

NFYio uses pgvector for vector storage and search. Supported distance metrics:

Metric	Operator	Use Case
Cosine	`<=>`	Default, normalized vectors
L2 (Euclidean)	`<->`	When magnitude matters
Inner product	`<#>`	Pre-normalized vectors

Query Example

-- Cosine similarity (NFYio default)
SELECT chunk_id, content, 1 - (embedding <=> $1::vector) AS similarity
FROM document_embeddings
WHERE workspace_id = $2
ORDER BY embedding <=> $1::vector
LIMIT 5;

Indexing for Scale

For large corpora, create an HNSW or IVFFlat index:

CREATE INDEX ON document_embeddings
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

NFYio creates these indexes automatically when you configure a bucket for embedding.

Chunk Strategies

How you chunk documents affects retrieval quality:

Fixed Token Chunks

Strategy	Chunk Size	Overlap	Best For
Small	256	32	Fine-grained retrieval, FAQs
Medium	512	64	General purpose (default)
Large	1024	128	Long-form context, narratives

Semantic Chunking (Experimental)

Split on sentence or paragraph boundaries instead of fixed tokens. Preserves logical units and can improve retrieval for structured documents.

{
  "chunkStrategy": "semantic",
  "splitOn": "paragraph",
  "minChunkSize": 200,
  "maxChunkSize": 512
}

Overlap

Overlap between chunks prevents splitting important context across boundaries. Typical overlap: 10–20% of chunk size.

Re-indexing

Re-index when you:

Change the embedding model
Change chunk size or strategy
Fix corrupted or missing embeddings
Add new document types

Trigger Re-index via API

curl -X POST "https://api.yourdomain.com/v1/buckets/my-docs/reindex" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "prefix": "knowledge/",
    "full": true
  }'

Option	Description
`full: true`	Re-embed all documents (destructive)
`full: false`	Only process new/updated objects since last index
`prefix`	Limit to objects under this prefix

Incremental Updates

By default, NFYio performs incremental indexing: only new or modified objects are processed. Deleted objects have their embeddings removed.

Analytics and Usage Metering

NFYio tracks embedding usage for billing and analytics:

Metric	Description
`embedding_tokens`	Total tokens embedded
`embedding_requests`	Number of embedding API calls
`search_queries`	Number of similarity searches
`indexed_documents`	Documents in the vector store
`indexed_chunks`	Total chunks stored

Usage API

curl "https://api.yourdomain.com/v1/usage/embeddings?workspaceId=ws_123&from=2026-03-01&to=2026-03-31" \
  -H "Authorization: Bearer $TOKEN"

{
  "workspaceId": "ws_123",
  "period": "2026-03-01 to 2026-03-31",
  "embeddingTokens": 1250000,
  "embeddingRequests": 4200,
  "searchQueries": 8500,
  "indexedDocuments": 1200,
  "indexedChunks": 45000
}

Best Practices

Chunk Size

Start with 512 tokens and 64 overlap
Use smaller chunks (256) for precise retrieval; larger (1024) for narrative context

Model Selection

text-embedding-3-small for most use cases
text-embedding-3-large when quality is critical
voyage-3.5-lite for very long documents (16K context)

Index Maintenance

Run incremental re-index regularly if documents change often
Monitor indexed_chunks growth; consider archiving old documents

Similarity Threshold

Filter low-similarity results (e.g., similarity < 0.7) to reduce noise
Tune per use case; support chatbots may need lower thresholds than strict Q&A

Next Steps

RAG Agents — Use embeddings in RAG pipelines
Agent Tools — Document search tool uses embeddings
Storage Overview — Bucket configuration for ingestion