Source: docs/manual/embedding-workers.md

This page is generated by site/scripts/sync-manual-docs.mjs.

Embedding Workers Guide

embed-worker processes asynchronous embedding requests and writes vectors into the configured vector store.

Source: cmd/embed-worker/*, internal/memory/embedding.go, internal/memory/embed_worker_handler.go, internal/embedding/*, internal/vectorstore/*, internal/config/config_llm.go

Runtime Architecture

embed-worker startup flow:

Load config and require CRUVERO_EVENTS_BACKEND=nats.
Connect to Postgres (CRUVERO_POSTGRES_URL).
Initialize embedding provider (CRUVERO_EMBEDDING_PROVIDER).
Initialize vector store (CRUVERO_VECTOR_STORE).
Ensure vector collection facts exists for provider dimensions.
Consume from CRUVERO_EMBED stream subject <prefix>.embed.requests.
Publish results and DLQ events.

Async Embedding Subject Contract

Purpose	Subject
Request queue	`<prefix>.embed.requests`
Result (per-request)	`<prefix>.embed.results.<request_id>`
Result (broadcast)	`<prefix>.embed.results`
Dead-letter queue	`<prefix>.embed.dlq`

<prefix> is CRUVERO_EVENTS_SUBJECT_PREFIX (default cruvero).

Embedding Providers

CRUVERO_EMBEDDING_PROVIDER supports:

none (no external embedding calls)
openai
google
ollama

Core provider variables

Variable	Purpose
`CRUVERO_EMBEDDING_PROVIDER`	Provider selection
`CRUVERO_EMBEDDING_MODEL`	Provider model name
`CRUVERO_EMBEDDING_DIMENSIONS`	Explicit vector dimension (optional)
`CRUVERO_EMBEDDING_BATCH_SIZE`	Batch size for provider requests
`CRUVERO_EMBEDDING_TIMEOUT`	Per-request timeout
`CRUVERO_EMBEDDING_MAX_RETRIES`	Provider retry count

Provider-specific credentials/endpoints:

Provider	Variables
`openai`	`CRUVERO_OPENAI_API_KEY`, optional `CRUVERO_OPENAI_EMBEDDING_BASE_URL`
`google`	`CRUVERO_GOOGLE_API_KEY`, `CRUVERO_GOOGLE_PROJECT_ID`, `CRUVERO_GOOGLE_LOCATION`
`ollama`	`CRUVERO_OLLAMA_BASE_URL`

Vector Store Backends

embed-worker supports:

CRUVERO_VECTOR_STORE=pgvector -> Postgres pgvector store
CRUVERO_VECTOR_STORE=qdrant -> Qdrant primary with pgvector fallback (composite)
CRUVERO_VECTOR_STORE=composite -> same as above

Qdrant variables

Variable	Purpose
`CRUVERO_QDRANT_URL`	Qdrant endpoint
`CRUVERO_QDRANT_API_KEY`	Optional API key
`CRUVERO_QDRANT_COLLECTION_PREFIX`	Collection name prefix
`CRUVERO_QDRANT_ON_DISK`	Persist vectors on disk
`CRUVERO_QDRANT_GRPC_POOL_SIZE`	gRPC client pool size
`CRUVERO_QDRANT_UPSERT_BATCH_SIZE`	Upsert batch sizing
`CRUVERO_QDRANT_TLS_CA_CERT` / `CRUVERO_QDRANT_TLS_INSECURE`	TLS controls

Validation note: CRUVERO_VECTOR_STORE=qdrant requires a non-none embedding provider.

Worker Throughput and Retry Controls

Variable	Purpose	Default
`CRUVERO_EMBED_BATCH_SIZE`	Consumer batch size	`32`
`CRUVERO_EMBED_FLUSH_MS`	Batch flush interval (ms)	`500`
`CRUVERO_EMBED_DLQ_MAX_RETRIES`	Max retries before DLQ	`3`
`CRUVERO_EMBED_WORKER_CONCURRENCY`	Configured worker concurrency	`4`
`CRUVERO_EMBED_SYNC_TIMEOUT`	Sync embedding timeout	`10s`
`CRUVERO_EMBEDDING_FAILURE_MODE`	`fail	warn

Pending Reconciler (Backlog Recovery)

When enabled, worker periodically reconciles pending embeddings in Postgres metadata.

Variable	Purpose
`CRUVERO_EMBED_RECONCILE_ENABLED`	Enable reconciler
`CRUVERO_EMBED_RECONCILE_INTERVAL`	Pass interval
`CRUVERO_EMBED_RECONCILE_BATCH_SIZE`	Records per worker pass
`CRUVERO_EMBED_RECONCILE_MAX_ATTEMPTS`	Max attempts before failed status
`CRUVERO_EMBED_RECONCILE_WORKERS`	Parallel reconcile workers
`CRUVERO_EMBED_RECONCILE_STALE_AFTER`	Stale backlog threshold

Metrics emitted by reconciler include:

embed_pending_reconcile
embed_pending_backlog_stale

Caching

Embedding response caching can be enabled via Postgres-backed cache:

Variable	Purpose
`CRUVERO_EMBEDDING_CACHE_ENABLED`	Enable cache
`CRUVERO_EMBEDDING_CACHE_TTL`	Cache TTL
`CRUVERO_EMBEDDING_CACHE_EPOCH`	Epoch salt for invalidation

Running the Worker

CRUVERO_EVENTS_BACKEND=nats \
CRUVERO_POSTGRES_URL=postgres://... \
CRUVERO_VECTOR_STORE=qdrant \
CRUVERO_EMBEDDING_PROVIDER=openai \
CRUVERO_OPENAI_API_KEY=... \
go run ./cmd/embed-worker

Monitoring and Troubleshooting

Confirm worker start log shows subject/stream/batch values.
Verify request traffic:

go run ./cmd/event-bus subscribe 'cruvero.embed.requests'

Verify results and DLQ activity:

go run ./cmd/event-bus subscribe 'cruvero.embed.results.>'
go run ./cmd/event-bus subscribe 'cruvero.embed.dlq'

If Qdrant is configured, validate endpoint/TLS settings and provider dimensions.
If backlog grows, tune reconcile and batch settings before increasing retry caps.

Runtime Architecture​

Async Embedding Subject Contract​

Embedding Providers​

Core provider variables​

Vector Store Backends​

Qdrant variables​

Worker Throughput and Retry Controls​

Pending Reconciler (Backlog Recovery)​

Caching​

Running the Worker​

Monitoring and Troubleshooting​

Related Docs​