Skip to main content

Source: docs/manual/embedding-workers.md

This page is generated by site/scripts/sync-manual-docs.mjs.

Embedding Workers Guide

embed-worker processes asynchronous embedding requests and writes vectors into the configured vector store.

Source: cmd/embed-worker/*, internal/memory/embedding.go, internal/memory/embed_worker_handler.go, internal/embedding/*, internal/vectorstore/*, internal/config/config_llm.go

Runtime Architecture

embed-worker startup flow:

  1. Load config and require CRUVERO_EVENTS_BACKEND=nats.
  2. Connect to Postgres (CRUVERO_POSTGRES_URL).
  3. Initialize embedding provider (CRUVERO_EMBEDDING_PROVIDER).
  4. Initialize vector store (CRUVERO_VECTOR_STORE).
  5. Ensure vector collection facts exists for provider dimensions.
  6. Consume from CRUVERO_EMBED stream subject <prefix>.embed.requests.
  7. Publish results and DLQ events.

Async Embedding Subject Contract

PurposeSubject
Request queue<prefix>.embed.requests
Result (per-request)<prefix>.embed.results.<request_id>
Result (broadcast)<prefix>.embed.results
Dead-letter queue<prefix>.embed.dlq

<prefix> is CRUVERO_EVENTS_SUBJECT_PREFIX (default cruvero).

Embedding Providers

CRUVERO_EMBEDDING_PROVIDER supports:

  • none (no external embedding calls)
  • openai
  • google
  • ollama

Core provider variables

VariablePurpose
CRUVERO_EMBEDDING_PROVIDERProvider selection
CRUVERO_EMBEDDING_MODELProvider model name
CRUVERO_EMBEDDING_DIMENSIONSExplicit vector dimension (optional)
CRUVERO_EMBEDDING_BATCH_SIZEBatch size for provider requests
CRUVERO_EMBEDDING_TIMEOUTPer-request timeout
CRUVERO_EMBEDDING_MAX_RETRIESProvider retry count

Provider-specific credentials/endpoints:

ProviderVariables
openaiCRUVERO_OPENAI_API_KEY, optional CRUVERO_OPENAI_EMBEDDING_BASE_URL
googleCRUVERO_GOOGLE_API_KEY, CRUVERO_GOOGLE_PROJECT_ID, CRUVERO_GOOGLE_LOCATION
ollamaCRUVERO_OLLAMA_BASE_URL

Vector Store Backends

embed-worker supports:

  • CRUVERO_VECTOR_STORE=pgvector -> Postgres pgvector store
  • CRUVERO_VECTOR_STORE=qdrant -> Qdrant primary with pgvector fallback (composite)
  • CRUVERO_VECTOR_STORE=composite -> same as above

Qdrant variables

VariablePurpose
CRUVERO_QDRANT_URLQdrant endpoint
CRUVERO_QDRANT_API_KEYOptional API key
CRUVERO_QDRANT_COLLECTION_PREFIXCollection name prefix
CRUVERO_QDRANT_ON_DISKPersist vectors on disk
CRUVERO_QDRANT_GRPC_POOL_SIZEgRPC client pool size
CRUVERO_QDRANT_UPSERT_BATCH_SIZEUpsert batch sizing
CRUVERO_QDRANT_TLS_CA_CERT / CRUVERO_QDRANT_TLS_INSECURETLS controls

Validation note: CRUVERO_VECTOR_STORE=qdrant requires a non-none embedding provider.

Worker Throughput and Retry Controls

VariablePurposeDefault
CRUVERO_EMBED_BATCH_SIZEConsumer batch size32
CRUVERO_EMBED_FLUSH_MSBatch flush interval (ms)500
CRUVERO_EMBED_DLQ_MAX_RETRIESMax retries before DLQ3
CRUVERO_EMBED_WORKER_CONCURRENCYConfigured worker concurrency4
CRUVERO_EMBED_SYNC_TIMEOUTSync embedding timeout10s
CRUVERO_EMBEDDING_FAILURE_MODE`failwarn

Pending Reconciler (Backlog Recovery)

When enabled, worker periodically reconciles pending embeddings in Postgres metadata.

VariablePurpose
CRUVERO_EMBED_RECONCILE_ENABLEDEnable reconciler
CRUVERO_EMBED_RECONCILE_INTERVALPass interval
CRUVERO_EMBED_RECONCILE_BATCH_SIZERecords per worker pass
CRUVERO_EMBED_RECONCILE_MAX_ATTEMPTSMax attempts before failed status
CRUVERO_EMBED_RECONCILE_WORKERSParallel reconcile workers
CRUVERO_EMBED_RECONCILE_STALE_AFTERStale backlog threshold

Metrics emitted by reconciler include:

  • embed_pending_reconcile
  • embed_pending_backlog_stale

Caching

Embedding response caching can be enabled via Postgres-backed cache:

VariablePurpose
CRUVERO_EMBEDDING_CACHE_ENABLEDEnable cache
CRUVERO_EMBEDDING_CACHE_TTLCache TTL
CRUVERO_EMBEDDING_CACHE_EPOCHEpoch salt for invalidation

Running the Worker

CRUVERO_EVENTS_BACKEND=nats \
CRUVERO_POSTGRES_URL=postgres://... \
CRUVERO_VECTOR_STORE=qdrant \
CRUVERO_EMBEDDING_PROVIDER=openai \
CRUVERO_OPENAI_API_KEY=... \
go run ./cmd/embed-worker

Monitoring and Troubleshooting

  1. Confirm worker start log shows subject/stream/batch values.
  2. Verify request traffic:
go run ./cmd/event-bus subscribe 'cruvero.embed.requests'
  1. Verify results and DLQ activity:
go run ./cmd/event-bus subscribe 'cruvero.embed.results.>'
go run ./cmd/event-bus subscribe 'cruvero.embed.dlq'
  1. If Qdrant is configured, validate endpoint/TLS settings and provider dimensions.
  2. If backlog grows, tune reconcile and batch settings before increasing retry caps.