Core Concepts

This document introduces the key concepts in NFYio: storage primitives, AI agents, embeddings, networking, and access control.

Buckets

A bucket is a top-level container for objects. It is the S3-compatible equivalent of a folder or namespace.

Bucket names must be globally unique within your NFYio deployment
You can create buckets via the API, AWS CLI, or dashboard
Buckets support versioning, lifecycle policies, and access control

# Create a bucket
aws --endpoint-url http://localhost:7007 s3 mb s3://my-documents

# List buckets
aws --endpoint-url http://localhost:7007 s3 ls

Objects

An object is a file stored in a bucket. Each object has:

Key — Path-like identifier (e.g., documents/report.pdf)
Size — Size in bytes
Content-Type — MIME type
Metadata — Custom key-value pairs
Version ID — If versioning is enabled

Objects are stored in SeaweedFS and metadata in PostgreSQL.

# Upload an object
aws --endpoint-url http://localhost:7007 s3 cp report.pdf s3://my-documents/reports/

# Download an object
aws --endpoint-url http://localhost:7007 s3 cp s3://my-documents/reports/report.pdf ./

Access Keys

Access keys are credentials for programmatic access to NFYio storage. Each key has:

Access Key ID — Public identifier
Secret Access Key — Private secret (store securely)

Use them with AWS SDKs or the aws CLI by configuring the endpoint and credentials. Keys are scoped to a workspace or project and respect RBAC.

# Configure AWS CLI for NFYio
aws configure set aws_access_key_id YOUR_ACCESS_KEY
aws configure set aws_secret_access_key YOUR_SECRET_KEY
aws configure set default.region us-east-1

# Use with endpoint
aws --endpoint-url http://localhost:7007 s3 ls

Agents

Agents are AI-powered components that process data and respond to queries.

RAG Agent

Retrieval-Augmented Generation — Ingest documents, generate embeddings, and answer questions using your data.

Documents are chunked and embedded
Queries are embedded and matched via vector similarity
Relevant chunks are passed to the LLM as context
The LLM generates answers grounded in your corpus

LLM Agent

Large Language Model — Direct interaction with models (e.g., GPT-4o) for chat, summarization, or generation. Can be combined with RAG for knowledge-grounded responses.

Workflow Agent

Multi-step workflows — Chain multiple steps (retrieve → reason → act) using LangChain. Supports tools, policy checks, and branching logic.

Embeddings and Vector Search

Embeddings

Embeddings are dense vector representations of text. NFYio uses OpenAI or Voyage AI to convert:

Document chunks → vectors stored in pgvector
User queries → vectors used for similarity search

Vector Search

Vector search finds the most similar chunks to a query vector using cosine similarity or L2 distance in pgvector. This enables semantic search: “find documents about X” without exact keyword matches.

-- Conceptual: find similar chunks (actual API differs)
SELECT id, content, embedding <=> query_embedding AS distance
FROM document_chunks
ORDER BY distance
LIMIT 10;

VPCs, Subnets, Security Groups

NFYio supports virtual networking for multi-tenant isolation:

VPC (Virtual Private Cloud)

A VPC is an isolated network segment. Resources in a VPC can communicate privately; traffic to/from the internet is controlled.

Subnets

Subnets are subdivisions of a VPC. They define IP ranges and can be public or private.

Security Groups

Security groups act as firewalls. They define which inbound and outbound traffic is allowed for resources (e.g., agent runtimes, storage access).

Workspaces, Projects, Teams

NFYio organizes resources in a hierarchy:

Workspace

A workspace is the top-level tenant. It typically represents an organization or customer. All projects and teams belong to a workspace.

Project

A project groups related resources (buckets, agents, datasets) within a workspace. Use projects to separate environments (e.g., dev, staging, prod) or product lines.

Team

A team is a group of users with shared access to projects. Teams have roles that determine what members can do (read, write, admin).

Workspace (Acme Corp)
├── Project (Document AI)
│   ├── Bucket: documents
│   ├── RAG Agent: doc-qa
│   └── Team: doc-team (read/write)
└── Project (Analytics)
    ├── Bucket: raw-data
    └── Team: analytics-team (read)

Roles and Permissions

Roles

Roles define what a user or team can do:

Role	Typical Permissions
`viewer`	Read buckets, objects, agents; run queries
`editor`	Viewer + upload, delete objects; create agents
`admin`	Editor + manage buckets, teams, settings
`owner`	Full control over workspace/project

Permissions

Permissions are enforced at multiple layers:

Keycloak — Authentication and user attributes
NFYio Gateway — JWT validation, workspace/project membership
Storage Proxy — Bucket and object access checks
Agent Service — Workspace-scoped agent access
PostgreSQL RLS — Row-level security for metadata

Best Practices

Use the principle of least privilege: assign the minimum role needed
Prefer team-based access over individual grants
Rotate access keys periodically
Enable audit logging for sensitive operations