neutral

Phase 18 — Prompt Library

Adds a versioned, searchable prompt library that lets agents discover task-specific prompts before constructing them from scratch, and contribute effective prompts back to a shared catalog. Modeled after the tool registry's immutable-versioned pattern (internal/registry/) with embedding-based semantic search via the existing vector infrastructure and a salience-inspired ranking formula.

Status: Completed (2026-02-09) Depends on: Phases 1-14 complete Migrations: 0025_prompt_library (Phase 18A) Branch: dev

Why Now

With Phases 1-14 complete, Cruvero agents have a mature tool registry, memory system, and embedding infrastructure — but prompt construction remains entirely hardcoded:

No prompt reuse — Every agent run constructs prompts from scratch in internal/agent/activities.go (LLMDecideActivity). There is no mechanism to search for previously successful prompts for similar tasks.
Hardcoded prompt builders — System prompts, repair prompts, and routing prompts are built inline with string concatenation. Variations require code changes, not catalog updates.
No quality tracking — There is no feedback loop recording which prompts produce good outcomes. Agents cannot learn from past prompt effectiveness across runs.
No parameterization — Prompt templates with variable interpolation would allow a single prompt definition to serve multiple contexts (e.g., different tool sets, domains, or agent personalities).

Phase 18 solves all four by introducing internal/promptlib/ as a prompt catalog with content-hashed versioning, embedding-based search, quality metrics, and text/template rendering.

Architecture

New package: `internal/promptlib/`

All prompt storage, search, ranking, and rendering consolidates here. Agent tool executors (prompt_search, prompt_create) provide the agent-facing interface.

┌───────────────────────────────────────────────────────────────────┐
│                       promptlib.Store                             │
│                                                                   │
│  ┌──────────────────┐  ┌──────────────────┐  ┌────────────────┐  │
│  │  PostgresStore    │  │  MetricsStore    │  │   Renderer     │  │
│  │  (prompt CRUD,    │  │  (usage counts,  │  │  (text/template│  │
│  │   immutable hash) │  │   success rate,  │  │   interpolation│  │
│  │                   │  │   LLM ratings)   │  │   + validation)│  │
│  └────────┬─────────┘  └────────┬─────────┘  └────────┬───────┘  │
│           │                     │                      │          │
│           └─────────┬───────────┘                      │          │
│                     │                                  │          │
│              ┌──────▼──────┐                           │          │
│              │   Searcher  │                           │          │
│              │  (3-stage)  │                           │          │
│              └──────┬──────┘                           │          │
│                     │                                  │          │
│        ┌────────────┼────────────┐                     │          │
│        │            │            │                     │          │
│  ┌─────▼─────┐ ┌────▼────┐ ┌────▼─────┐               │          │
│  │  Vector   │ │ Ranking │ │ Result   │               │          │
│  │ Retrieval │ │ Scorer  │ │ Assembly │               │          │
│  │ (embed +  │ │ (quality│ │ (render, │               │          │
│  │  search)  │ │  +recen-│ │  format) │               │          │
│  │           │ │  cy+use)│ │          │               │          │
│  └───────────┘ └─────────┘ └──────────┘               │          │
│                                                        │          │
│  External deps (reused, not owned):                    │          │
│  ├─ internal/embedding/Embedder                        │          │
│  ├─ internal/vectorstore/VectorStore (collection:      │          │
│  │    "prompt_library")                                │          │
│  ├─ internal/memory/salience.go (ComputeRecency,       │          │
│  │    ComputeUsageFrequency)                           │          │
│  └─ internal/tenant/ (multi-tenant isolation)          │          │
└───────────────────────────────────────────────────────────────────┘

Core API

// Store manages prompt CRUD with content-hash immutability.
type Store interface {
    Get(ctx context.Context, id string, version int) (Prompt, error)
    GetByHash(ctx context.Context, hash string) (Prompt, error)
    GetLatest(ctx context.Context, id string) (Prompt, error)
    Put(ctx context.Context, prompt Prompt) error
    List(ctx context.Context, filter ListFilter) ([]Prompt, error)
}

// Searcher finds prompts by semantic similarity + quality ranking.
type Searcher interface {
    Search(ctx context.Context, query SearchQuery) ([]ScoredPrompt, error)
}

// Renderer applies text/template interpolation to prompt content.
type Renderer interface {
    Render(prompt Prompt, params map[string]interface{}) (string, error)
    ValidateParams(prompt Prompt, params map[string]interface{}) error
}

// MetricsStore tracks mutable quality signals separately from immutable prompts.
type MetricsStore interface {
    RecordUsage(ctx context.Context, promptHash string, outcome UsageOutcome) error
    RecordFeedback(ctx context.Context, promptHash string, feedback Feedback) error
    GetMetrics(ctx context.Context, promptHash string) (PromptMetrics, error)
}

Key Types

type Prompt struct {
    ID          string          `json:"id"`
    Version     int             `json:"version"`
    Hash        string          `json:"hash"`
    Type        PromptType      `json:"type"`
    Name        string          `json:"name"`
    Description string          `json:"description"`
    Content     string          `json:"content"`
    Parameters  []ParamDef      `json:"parameters,omitempty"`
    Tags        []string        `json:"tags,omitempty"`
    Author      string          `json:"author"`
    TenantID    string          `json:"tenant_id"`
    CreatedAt   time.Time       `json:"created_at"`
    Metadata    json.RawMessage `json:"metadata,omitempty"`
}

type PromptType string

const (
    PromptTypeSystem         PromptType = "system"
    PromptTypeUser           PromptType = "user"
    PromptTypeTask           PromptType = "task"
    PromptTypeRepair         PromptType = "repair"
    PromptTypeRouting        PromptType = "routing"
    PromptTypeToolDesc       PromptType = "tool_description"
    PromptTypeChainOfThought PromptType = "chain_of_thought"
    PromptTypeCustom         PromptType = "custom"
)

type ParamDef struct {
    Name        string `json:"name"`
    Type        string `json:"type"`
    Required    bool   `json:"required"`
    Default     string `json:"default,omitempty"`
    Description string `json:"description,omitempty"`
}

type PromptMetrics struct {
    PromptHash   string    `json:"prompt_hash"`
    UsageCount   int       `json:"usage_count"`
    SuccessCount int       `json:"success_count"`
    FailureCount int       `json:"failure_count"`
    AvgLLMRating float64   `json:"avg_llm_rating"`
    LastUsedAt   time.Time `json:"last_used_at"`
}

type ScoredPrompt struct {
    Prompt     Prompt          `json:"prompt"`
    Score      float64         `json:"score"`
    Components ScoreComponents `json:"components"`
}

type ScoreComponents struct {
    Similarity float64 `json:"similarity"`
    Quality    float64 `json:"quality"`
    Recency    float64 `json:"recency"`
    Usage      float64 `json:"usage"`
}

Content Hashing (Immutability)

Mirrors registry.ComputeHash (internal/registry/types.go:65-78):

func ComputeHash(id string, version int, content string, promptType PromptType) (string, error) {
    payload := hashInput{ID: id, Version: version, Content: content, Type: string(promptType)}
    b, _ := json.Marshal(payload)
    h := sha256.Sum256(b)
    return hex.EncodeToString(h[:]), nil
}

Store uses INSERT ... ON CONFLICT DO NOTHING + hash verification — same pattern as registry.PostgresStore.Put (internal/registry/store.go:108-144).

Search Pipeline

Three-stage pipeline:

Stage 1: Vector Retrieval

Embed query text using embedding.Embedder.Embed() (internal/embedding/embedder.go:23)
Search prompt_library collection via vectorstore.VectorStore.Search() (internal/vectorstore/store.go:35)
Apply tenant isolation filter (internal/tenant/)
Retrieve top-K candidates (default K=20)

Stage 2: Re-Ranking

Score each candidate using a weighted formula adapted from memory.SalienceScorer (internal/memory/salience.go:51-65):

score = W_sim * similarity + W_qual * quality + W_rec * recency + W_use * usage

Weight	Default	Source
`W_sim` (similarity)	0.4	Vector cosine similarity from Stage 1
`W_qual` (quality)	0.3	`success_rate * avg_llm_rating` from `prompt_metrics`
`W_rec` (recency)	0.2	`ComputeRecency(created_at, now, half_life)` from `memory/salience.go:155`
`W_use` (usage)	0.1	`ComputeUsageFrequency(usage_count, max_count)` from `memory/salience.go:187`

Stage 3: Result Assembly

Sort by composite score
Truncate to requested limit (default 5)
Optionally render templates with provided parameters
Return []ScoredPrompt with score components for transparency

Feedback System

LLM Auto-Feedback

After each agent run that used a library prompt, the LLM self-assesses prompt effectiveness:

type UsageOutcome struct {
    PromptHash string  `json:"prompt_hash"`
    RunID      string  `json:"run_id"`
    StepIdx    int     `json:"step_idx"`
    Success    bool    `json:"success"`
    LLMRating  float64 `json:"llm_rating"` // 0.0-1.0
    TenantID   string  `json:"tenant_id"`
}

Recorded as a Temporal activity (non-blocking, fire-and-forget). Updates prompt_metrics table via MetricsStore.RecordUsage().

Optional User Feedback

Non-blocking user feedback via CLI/API signal — never blocks workflow:

type Feedback struct {
    PromptHash string  `json:"prompt_hash"`
    UserID     string  `json:"user_id"`
    Rating     float64 `json:"rating"` // 0.0-1.0
    Comment    string  `json:"comment,omitempty"`
    TenantID   string  `json:"tenant_id"`
}

Recorded via MetricsStore.RecordFeedback(). Feedback is additive — it adjusts the running average but cannot delete or modify prompt content (immutable).

Template Rendering

Prompts use Go text/template for parameterized content:

// Example prompt content:
// "You are a {{.Role}} agent. Your task is to {{.Task}}. Available tools: {{range .Tools}}{{.Name}}, {\{end\}}"

func (r *TemplateRenderer) Render(prompt Prompt, params map[string]interface{}) (string, error) {
    tmpl, err := template.New(prompt.ID).Parse(prompt.Content)
    if err != nil {
        return "", fmt.Errorf("invalid template: %w", err)
    }
    var buf bytes.Buffer
    if err := tmpl.Execute(&buf, params); err != nil {
        return "", fmt.Errorf("template execution failed: %w", err)
    }
    return buf.String(), nil
}

Parameter validation checks required params are present and types match ParamDef definitions before rendering.

Agent Access via Tools

Two tool executors following the memory_read/memory_write pattern (internal/tools/memory_read.go, internal/tools/memory_write.go):

`prompt_search` Tool

type PromptSearchTool struct {
    searcher Searcher
    renderer Renderer
}

func (t *PromptSearchTool) Name() string { return "prompt_search" }

// Schema:
// {
//   "type": "object",
//   "properties": {
//     "query": {"type": "string"},
//     "type": {"type": "string", "enum": ["system","user","task","repair","routing","tool_description","chain_of_thought","custom"]},
//     "tags": {"type": "array", "items": {"type": "string"}},
//     "params": {"type": "object"},
//     "limit": {"type": "integer"}
//   },
//   "required": ["query"]
// }

`prompt_create` Tool

type PromptCreateTool struct {
    store    Store
    embedder embedding.Embedder
    vs       vectorstore.VectorStore
}

func (t *PromptCreateTool) Name() string { return "prompt_create" }

// Schema:
// {
//   "type": "object",
//   "properties": {
//     "name": {"type": "string"},
//     "type": {"type": "string", "enum": ["system","user","task","repair","routing","tool_description","chain_of_thought","custom"]},
//     "description": {"type": "string"},
//     "content": {"type": "string"},
//     "parameters": {"type": "array", "items": {"type": "object"}},
//     "tags": {"type": "array", "items": {"type": "string"}}
//   },
//   "required": ["name", "type", "content"]
// }

Both tools are explicitly invoked by the agent — not auto-injected. Every access appears in the decision log via normal tool execution audit trail.

Sub-Phases

Sub-Phase	Name	Prompts	Depends On
18A	Foundation: Types, Store, Hash, Renderer, Migration	5	—
18B	Search + Ranking: Embedder Wiring, Searcher, Scorer	5	18A
18C	Agent Integration + Feedback: Tools, Activities, Wiring	4	18B
18D	CLI, Testing & Ops: Seed, Query, Feedback CLIs, Tests	4	18C

Total: 4 sub-phases, 18 prompts, 9 documentation files

Dependency Graph

18A (Foundation) → 18B (Search/Ranking) → 18C (Agent Integration) → 18D (CLI/Testing)

Strictly sequential: each sub-phase builds on the previous.

Environment Variables

Variable	Default	Description
`CRUVERO_PROMPTLIB_ENABLED`	`true`	Enable prompt library
`CRUVERO_PROMPTLIB_COLLECTION`	`prompt_library`	Vector store collection name
`CRUVERO_PROMPTLIB_SEARCH_K`	`20`	Vector retrieval candidates (Stage 1)
`CRUVERO_PROMPTLIB_RESULT_LIMIT`	`5`	Max results returned to agent
`CRUVERO_PROMPTLIB_W_SIMILARITY`	`0.4`	Ranking weight: vector similarity
`CRUVERO_PROMPTLIB_W_QUALITY`	`0.3`	Ranking weight: quality score
`CRUVERO_PROMPTLIB_W_RECENCY`	`0.2`	Ranking weight: recency decay
`CRUVERO_PROMPTLIB_W_USAGE`	`0.1`	Ranking weight: usage frequency
`CRUVERO_PROMPTLIB_HALF_LIFE`	`168h`	Recency decay half-life (7 days)
`CRUVERO_PROMPTLIB_FEEDBACK_ENABLED`	`true`	Enable user feedback recording
`CRUVERO_PROMPTLIB_AUTO_FEEDBACK`	`true`	Enable LLM self-assessment after prompt use

Files Overview

New Files

File	Sub-Phase	Description
`internal/promptlib/types.go`	18A	Prompt, PromptType, ParamDef, PromptMetrics, ScoredPrompt, ScoreComponents
`internal/promptlib/store.go`	18A	Store interface + PostgresStore (CRUD, hash immutability)
`internal/promptlib/metrics_store.go`	18A	MetricsStore interface + PostgresMetricsStore
`internal/promptlib/hash.go`	18A	ComputeHash (SHA256, mirrors registry pattern)
`internal/promptlib/renderer.go`	18A	Renderer interface + TemplateRenderer (text/template)
`internal/promptlib/searcher.go`	18B	Searcher interface + DefaultSearcher (3-stage pipeline)
`internal/promptlib/scorer.go`	18B	PromptScorer (ranking formula, weight config)
`internal/promptlib/indexer.go`	18B	Indexer (embed + upsert to vector store on Put)
`internal/promptlib/config.go`	18B	Config wiring + component assembly from env vars
`internal/tools/prompt_search.go`	18C	PromptSearchTool executor
`internal/tools/prompt_create.go`	18C	PromptCreateTool executor
`internal/promptlib/feedback.go`	18C	Feedback types + RecordUsageActivity (Temporal)
`cmd/prompt-seed/main.go`	18D	CLI to seed prompt library from YAML/JSON files
`cmd/prompt-query/main.go`	18D	CLI to search prompt library
`cmd/prompt-feedback/main.go`	18D	CLI to submit user feedback
`migrations/0025_prompt_library.up.sql`	18A	Create prompts + prompt_metrics tables
`migrations/0025_prompt_library.down.sql`	18A	Drop tables
`internal/promptlib/types_test.go`	18D	Type validation and JSON round-trip tests
`internal/promptlib/hash_test.go`	18D	Hash determinism and uniqueness tests
`internal/promptlib/store_test.go`	18D	PostgresStore tests (sqlmock)
`internal/promptlib/metrics_store_test.go`	18D	PostgresMetricsStore tests (sqlmock)
`internal/promptlib/renderer_test.go`	18D	TemplateRenderer tests
`internal/promptlib/indexer_test.go`	18D	Indexer tests (mock embedder + vector store)
`internal/promptlib/scorer_test.go`	18D	PromptScorer tests
`internal/promptlib/searcher_test.go`	18D	DefaultSearcher pipeline tests
`internal/promptlib/config_test.go`	18D	Config loading and validation tests
`internal/promptlib/feedback_test.go`	18D	Feedback activity tests
`docs/manual/prompt-library.md`	18D	Feature manual page

Modified Files

File	Sub-Phase	Change
`internal/tools/manager.go`	18C	Register prompt_search and prompt_create executors
`internal/agent/activities.go`	18C	Wire optional prompt library lookup before LLM prompt construction
`internal/config/config.go`	18B	Add promptlib config fields + env var loading

Migration: `0025_prompt_library`

-- 0025_prompt_library.up.sql

CREATE TABLE IF NOT EXISTS prompts (
    tenant_id   TEXT NOT NULL DEFAULT '_global',
    id          TEXT NOT NULL,
    version     INTEGER NOT NULL,
    hash        TEXT NOT NULL,
    type        TEXT NOT NULL,
    name        TEXT NOT NULL,
    description TEXT NOT NULL DEFAULT '',
    content     TEXT NOT NULL,
    parameters  JSONB,
    tags        TEXT[] DEFAULT '{}',
    author      TEXT NOT NULL DEFAULT '',
    metadata    JSONB,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    PRIMARY KEY (tenant_id, id, version),
    UNIQUE (tenant_id, hash)
);

CREATE INDEX idx_prompts_type ON prompts (tenant_id, type);
CREATE INDEX idx_prompts_tags ON prompts USING GIN (tags);
CREATE INDEX idx_prompts_hash ON prompts (hash);

CREATE TABLE IF NOT EXISTS prompt_metrics (
    prompt_hash   TEXT NOT NULL PRIMARY KEY,
    tenant_id     TEXT NOT NULL DEFAULT '_global',
    usage_count   INTEGER NOT NULL DEFAULT 0,
    success_count INTEGER NOT NULL DEFAULT 0,
    failure_count INTEGER NOT NULL DEFAULT 0,
    total_rating  DOUBLE PRECISION NOT NULL DEFAULT 0,
    rating_count  INTEGER NOT NULL DEFAULT 0,
    last_used_at  TIMESTAMPTZ,
    updated_at    TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_prompt_metrics_tenant ON prompt_metrics (tenant_id);

Success Metrics

Metric	Target
Prompt type coverage	8 types (system, user, task, repair, routing, tool_description, chain_of_thought, custom)
Search latency (vector + re-rank)	< 50ms p99
Store immutability	Hash verification on every Put (0 content mutations)
Template rendering	< 1ms p99
Feedback recording	Non-blocking, < 5ms fire-and-forget
Agent tool integration	prompt_search + prompt_create registered and functional
Multi-tenant isolation	All queries scoped by tenant_id
Quality signal accuracy	LLM auto-rating within 0.15 of user rating (when both present)
Backward compatibility	Existing prompt construction in activities.go unchanged when library disabled
Test coverage	>= 80% for `internal/promptlib/` (enforced by `scripts/check-coverage.sh`)

Risk Mitigation

Risk	Mitigation
Cold start (empty library)	`cmd/prompt-seed` CLI pre-loads curated prompts. Library search returns empty gracefully — agent falls back to hardcoded builders.
Low-quality prompt proliferation	Quality score incorporates success rate + LLM rating. Low-quality prompts naturally sink in rankings.
Template injection via parameters	`text/template` auto-escapes. Parameter validation enforces types. Content is immutable (can't be modified post-creation).
Vector search latency at scale	Stage 1 retrieval bounded by K=20. Re-ranking is in-memory, O(K log K). Collection uses existing vector infrastructure.
Embedding cost for indexing	Embeddings generated once on Put, cached in vector store. Search embeds query only (single call).
Breaking existing prompt construction	Library is opt-in. When disabled (`CRUVERO_PROMPTLIB_ENABLED=false`), `activities.go` prompt builders are unchanged.

Relationship to Other Phases

Phase	Relationship
Phase 5 (Memory)	18B reuses `memory.ComputeRecency` and `ComputeUsageFrequency` for ranking
Phase 6 (Tool Registry)	18A mirrors `registry.Store` immutability pattern (hash, ON CONFLICT, tenant isolation)
Phase 8 (Embeddings + Vector)	18B reuses `embedding.Embedder` and `vectorstore.VectorStore` with new collection
Phase 14 (API)	API endpoints can expose prompt library search/create via existing route patterns
Phase 17 (PII Guard)	PII filtering applies to prompt content at output boundary (no special handling needed)

Progress Notes

(none yet)

Why Now​

Architecture​

New package: internal/promptlib/​

Core API​

Key Types​

Content Hashing (Immutability)​

Search Pipeline​

Stage 1: Vector Retrieval​

Stage 2: Re-Ranking​

Stage 3: Result Assembly​

Feedback System​

LLM Auto-Feedback​

Optional User Feedback​

Template Rendering​

Agent Access via Tools​

prompt_search Tool​

prompt_create Tool​

Sub-Phases​

Dependency Graph​

Environment Variables​

Files Overview​

New Files​

Modified Files​

Migration: 0025_prompt_library​

Success Metrics​

Risk Mitigation​

Relationship to Other Phases​

Progress Notes​