Phase 18 — Prompt Library
Adds a versioned, searchable prompt library that lets agents discover task-specific prompts before constructing them from scratch, and contribute effective prompts back to a shared catalog. Modeled after the tool registry's immutable-versioned pattern (internal/registry/) with embedding-based semantic search via the existing vector infrastructure and a salience-inspired ranking formula.
Status: Completed (2026-02-09)
Depends on: Phases 1-14 complete
Migrations: 0025_prompt_library (Phase 18A)
Branch: dev
Why Now
With Phases 1-14 complete, Cruvero agents have a mature tool registry, memory system, and embedding infrastructure — but prompt construction remains entirely hardcoded:
- No prompt reuse — Every agent run constructs prompts from scratch in
internal/agent/activities.go(LLMDecideActivity). There is no mechanism to search for previously successful prompts for similar tasks. - Hardcoded prompt builders — System prompts, repair prompts, and routing prompts are built inline with string concatenation. Variations require code changes, not catalog updates.
- No quality tracking — There is no feedback loop recording which prompts produce good outcomes. Agents cannot learn from past prompt effectiveness across runs.
- No parameterization — Prompt templates with variable interpolation would allow a single prompt definition to serve multiple contexts (e.g., different tool sets, domains, or agent personalities).
Phase 18 solves all four by introducing internal/promptlib/ as a prompt catalog with content-hashed versioning, embedding-based search, quality metrics, and text/template rendering.
Architecture
New package: internal/promptlib/
All prompt storage, search, ranking, and rendering consolidates here. Agent tool executors (prompt_search, prompt_create) provide the agent-facing interface.
┌───────────────────────────────────────────────────────────────────┐
│ promptlib.Store │
│ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌────────────────┐ │
│ │ PostgresStore │ │ MetricsStore │ │ Renderer │ │
│ │ (prompt CRUD, │ │ (usage counts, │ │ (text/template│ │
│ │ immutable hash) │ │ success rate, │ │ interpolation│ │
│ │ │ │ LLM ratings) │ │ + validation)│ │
│ └────────┬─────────┘ └────────┬─────────┘ └────────┬───────┘ │
│ │ │ │ │
│ └─────────┬───────────┘ │ │
│ │ │ │
│ ┌──────▼──────┐ │ │
│ │ Searcher │ │ │
│ │ (3-stage) │ │ │
│ └──────┬──────┘ │ │
│ │ │ │
│ ┌────────────┼────────────┐ │ │
│ │ │ │ │ │
│ ┌─────▼─────┐ ┌────▼────┐ ┌────▼─────┐ │ │
│ │ Vector │ │ Ranking │ │ Result │ │ │
│ │ Retrieval │ │ Scorer │ │ Assembly │ │ │
│ │ (embed + │ │ (quality│ │ (render, │ │ │
│ │ search) │ │ +recen-│ │ format) │ │ │
│ │ │ │ cy+use)│ │ │ │ │
│ └───────────┘ └─────────┘ └──────────┘ │ │
│ │ │
│ External deps (reused, not owned): │ │
│ ├─ internal/embedding/Embedder │ │
│ ├─ internal/vectorstore/VectorStore (collection: │ │
│ │ "prompt_library") │ │
│ ├─ internal/memory/salience.go (ComputeRecency, │ │
│ │ ComputeUsageFrequency) │ │
│ └─ internal/tenant/ (multi-tenant isolation) │ │
└───────────────────────────────────────────────────────────────────┘
Core API
// Store manages prompt CRUD with content-hash immutability.
type Store interface {
Get(ctx context.Context, id string, version int) (Prompt, error)
GetByHash(ctx context.Context, hash string) (Prompt, error)
GetLatest(ctx context.Context, id string) (Prompt, error)
Put(ctx context.Context, prompt Prompt) error
List(ctx context.Context, filter ListFilter) ([]Prompt, error)
}
// Searcher finds prompts by semantic similarity + quality ranking.
type Searcher interface {
Search(ctx context.Context, query SearchQuery) ([]ScoredPrompt, error)
}
// Renderer applies text/template interpolation to prompt content.
type Renderer interface {
Render(prompt Prompt, params map[string]interface{}) (string, error)
ValidateParams(prompt Prompt, params map[string]interface{}) error
}
// MetricsStore tracks mutable quality signals separately from immutable prompts.
type MetricsStore interface {
RecordUsage(ctx context.Context, promptHash string, outcome UsageOutcome) error
RecordFeedback(ctx context.Context, promptHash string, feedback Feedback) error
GetMetrics(ctx context.Context, promptHash string) (PromptMetrics, error)
}
Key Types
type Prompt struct {
ID string `json:"id"`
Version int `json:"version"`
Hash string `json:"hash"`
Type PromptType `json:"type"`
Name string `json:"name"`
Description string `json:"description"`
Content string `json:"content"`
Parameters []ParamDef `json:"parameters,omitempty"`
Tags []string `json:"tags,omitempty"`
Author string `json:"author"`
TenantID string `json:"tenant_id"`
CreatedAt time.Time `json:"created_at"`
Metadata json.RawMessage `json:"metadata,omitempty"`
}
type PromptType string
const (
PromptTypeSystem PromptType = "system"
PromptTypeUser PromptType = "user"
PromptTypeTask PromptType = "task"
PromptTypeRepair PromptType = "repair"
PromptTypeRouting PromptType = "routing"
PromptTypeToolDesc PromptType = "tool_description"
PromptTypeChainOfThought PromptType = "chain_of_thought"
PromptTypeCustom PromptType = "custom"
)
type ParamDef struct {
Name string `json:"name"`
Type string `json:"type"`
Required bool `json:"required"`
Default string `json:"default,omitempty"`
Description string `json:"description,omitempty"`
}
type PromptMetrics struct {
PromptHash string `json:"prompt_hash"`
UsageCount int `json:"usage_count"`
SuccessCount int `json:"success_count"`
FailureCount int `json:"failure_count"`
AvgLLMRating float64 `json:"avg_llm_rating"`
LastUsedAt time.Time `json:"last_used_at"`
}
type ScoredPrompt struct {
Prompt Prompt `json:"prompt"`
Score float64 `json:"score"`
Components ScoreComponents `json:"components"`
}
type ScoreComponents struct {
Similarity float64 `json:"similarity"`
Quality float64 `json:"quality"`
Recency float64 `json:"recency"`
Usage float64 `json:"usage"`
}
Content Hashing (Immutability)
Mirrors registry.ComputeHash (internal/registry/types.go:65-78):
func ComputeHash(id string, version int, content string, promptType PromptType) (string, error) {
payload := hashInput{ID: id, Version: version, Content: content, Type: string(promptType)}
b, _ := json.Marshal(payload)
h := sha256.Sum256(b)
return hex.EncodeToString(h[:]), nil
}
Store uses INSERT ... ON CONFLICT DO NOTHING + hash verification — same pattern as registry.PostgresStore.Put (internal/registry/store.go:108-144).
Search Pipeline
Three-stage pipeline:
Stage 1: Vector Retrieval
- Embed query text using
embedding.Embedder.Embed()(internal/embedding/embedder.go:23) - Search
prompt_librarycollection viavectorstore.VectorStore.Search()(internal/vectorstore/store.go:35) - Apply tenant isolation filter (
internal/tenant/) - Retrieve top-K candidates (default K=20)
Stage 2: Re-Ranking
Score each candidate using a weighted formula adapted from memory.SalienceScorer (internal/memory/salience.go:51-65):
score = W_sim * similarity + W_qual * quality + W_rec * recency + W_use * usage
| Weight | Default | Source |
|---|---|---|
W_sim (similarity) | 0.4 | Vector cosine similarity from Stage 1 |
W_qual (quality) | 0.3 | success_rate * avg_llm_rating from prompt_metrics |
W_rec (recency) | 0.2 | ComputeRecency(created_at, now, half_life) from memory/salience.go:155 |
W_use (usage) | 0.1 | ComputeUsageFrequency(usage_count, max_count) from memory/salience.go:187 |
Stage 3: Result Assembly
- Sort by composite score
- Truncate to requested limit (default 5)
- Optionally render templates with provided parameters
- Return
[]ScoredPromptwith score components for transparency
Feedback System
LLM Auto-Feedback
After each agent run that used a library prompt, the LLM self-assesses prompt effectiveness:
type UsageOutcome struct {
PromptHash string `json:"prompt_hash"`
RunID string `json:"run_id"`
StepIdx int `json:"step_idx"`
Success bool `json:"success"`
LLMRating float64 `json:"llm_rating"` // 0.0-1.0
TenantID string `json:"tenant_id"`
}
Recorded as a Temporal activity (non-blocking, fire-and-forget). Updates prompt_metrics table via MetricsStore.RecordUsage().
Optional User Feedback
Non-blocking user feedback via CLI/API signal — never blocks workflow:
type Feedback struct {
PromptHash string `json:"prompt_hash"`
UserID string `json:"user_id"`
Rating float64 `json:"rating"` // 0.0-1.0
Comment string `json:"comment,omitempty"`
TenantID string `json:"tenant_id"`
}
Recorded via MetricsStore.RecordFeedback(). Feedback is additive — it adjusts the running average but cannot delete or modify prompt content (immutable).
Template Rendering
Prompts use Go text/template for parameterized content:
// Example prompt content:
// "You are a {{.Role}} agent. Your task is to {{.Task}}. Available tools: {{range .Tools}}{{.Name}}, {\{end\}}"
func (r *TemplateRenderer) Render(prompt Prompt, params map[string]interface{}) (string, error) {
tmpl, err := template.New(prompt.ID).Parse(prompt.Content)
if err != nil {
return "", fmt.Errorf("invalid template: %w", err)
}
var buf bytes.Buffer
if err := tmpl.Execute(&buf, params); err != nil {
return "", fmt.Errorf("template execution failed: %w", err)
}
return buf.String(), nil
}
Parameter validation checks required params are present and types match ParamDef definitions before rendering.
Agent Access via Tools
Two tool executors following the memory_read/memory_write pattern (internal/tools/memory_read.go, internal/tools/memory_write.go):
prompt_search Tool
type PromptSearchTool struct {
searcher Searcher
renderer Renderer
}
func (t *PromptSearchTool) Name() string { return "prompt_search" }
// Schema:
// {
// "type": "object",
// "properties": {
// "query": {"type": "string"},
// "type": {"type": "string", "enum": ["system","user","task","repair","routing","tool_description","chain_of_thought","custom"]},
// "tags": {"type": "array", "items": {"type": "string"}},
// "params": {"type": "object"},
// "limit": {"type": "integer"}
// },
// "required": ["query"]
// }
prompt_create Tool
type PromptCreateTool struct {
store Store
embedder embedding.Embedder
vs vectorstore.VectorStore
}
func (t *PromptCreateTool) Name() string { return "prompt_create" }
// Schema:
// {
// "type": "object",
// "properties": {
// "name": {"type": "string"},
// "type": {"type": "string", "enum": ["system","user","task","repair","routing","tool_description","chain_of_thought","custom"]},
// "description": {"type": "string"},
// "content": {"type": "string"},
// "parameters": {"type": "array", "items": {"type": "object"}},
// "tags": {"type": "array", "items": {"type": "string"}}
// },
// "required": ["name", "type", "content"]
// }
Both tools are explicitly invoked by the agent — not auto-injected. Every access appears in the decision log via normal tool execution audit trail.
Sub-Phases
| Sub-Phase | Name | Prompts | Depends On |
|---|---|---|---|
| 18A | Foundation: Types, Store, Hash, Renderer, Migration | 5 | — |
| 18B | Search + Ranking: Embedder Wiring, Searcher, Scorer | 5 | 18A |
| 18C | Agent Integration + Feedback: Tools, Activities, Wiring | 4 | 18B |
| 18D | CLI, Testing & Ops: Seed, Query, Feedback CLIs, Tests | 4 | 18C |
Total: 4 sub-phases, 18 prompts, 9 documentation files
Dependency Graph
18A (Foundation) → 18B (Search/Ranking) → 18C (Agent Integration) → 18D (CLI/Testing)
Strictly sequential: each sub-phase builds on the previous.
Environment Variables
| Variable | Default | Description |
|---|---|---|
CRUVERO_PROMPTLIB_ENABLED | true | Enable prompt library |
CRUVERO_PROMPTLIB_COLLECTION | prompt_library | Vector store collection name |
CRUVERO_PROMPTLIB_SEARCH_K | 20 | Vector retrieval candidates (Stage 1) |
CRUVERO_PROMPTLIB_RESULT_LIMIT | 5 | Max results returned to agent |
CRUVERO_PROMPTLIB_W_SIMILARITY | 0.4 | Ranking weight: vector similarity |
CRUVERO_PROMPTLIB_W_QUALITY | 0.3 | Ranking weight: quality score |
CRUVERO_PROMPTLIB_W_RECENCY | 0.2 | Ranking weight: recency decay |
CRUVERO_PROMPTLIB_W_USAGE | 0.1 | Ranking weight: usage frequency |
CRUVERO_PROMPTLIB_HALF_LIFE | 168h | Recency decay half-life (7 days) |
CRUVERO_PROMPTLIB_FEEDBACK_ENABLED | true | Enable user feedback recording |
CRUVERO_PROMPTLIB_AUTO_FEEDBACK | true | Enable LLM self-assessment after prompt use |
Files Overview
New Files
| File | Sub-Phase | Description |
|---|---|---|
internal/promptlib/types.go | 18A | Prompt, PromptType, ParamDef, PromptMetrics, ScoredPrompt, ScoreComponents |
internal/promptlib/store.go | 18A | Store interface + PostgresStore (CRUD, hash immutability) |
internal/promptlib/metrics_store.go | 18A | MetricsStore interface + PostgresMetricsStore |
internal/promptlib/hash.go | 18A | ComputeHash (SHA256, mirrors registry pattern) |
internal/promptlib/renderer.go | 18A | Renderer interface + TemplateRenderer (text/template) |
internal/promptlib/searcher.go | 18B | Searcher interface + DefaultSearcher (3-stage pipeline) |
internal/promptlib/scorer.go | 18B | PromptScorer (ranking formula, weight config) |
internal/promptlib/indexer.go | 18B | Indexer (embed + upsert to vector store on Put) |
internal/promptlib/config.go | 18B | Config wiring + component assembly from env vars |
internal/tools/prompt_search.go | 18C | PromptSearchTool executor |
internal/tools/prompt_create.go | 18C | PromptCreateTool executor |
internal/promptlib/feedback.go | 18C | Feedback types + RecordUsageActivity (Temporal) |
cmd/prompt-seed/main.go | 18D | CLI to seed prompt library from YAML/JSON files |
cmd/prompt-query/main.go | 18D | CLI to search prompt library |
cmd/prompt-feedback/main.go | 18D | CLI to submit user feedback |
migrations/0025_prompt_library.up.sql | 18A | Create prompts + prompt_metrics tables |
migrations/0025_prompt_library.down.sql | 18A | Drop tables |
internal/promptlib/types_test.go | 18D | Type validation and JSON round-trip tests |
internal/promptlib/hash_test.go | 18D | Hash determinism and uniqueness tests |
internal/promptlib/store_test.go | 18D | PostgresStore tests (sqlmock) |
internal/promptlib/metrics_store_test.go | 18D | PostgresMetricsStore tests (sqlmock) |
internal/promptlib/renderer_test.go | 18D | TemplateRenderer tests |
internal/promptlib/indexer_test.go | 18D | Indexer tests (mock embedder + vector store) |
internal/promptlib/scorer_test.go | 18D | PromptScorer tests |
internal/promptlib/searcher_test.go | 18D | DefaultSearcher pipeline tests |
internal/promptlib/config_test.go | 18D | Config loading and validation tests |
internal/promptlib/feedback_test.go | 18D | Feedback activity tests |
docs/manual/prompt-library.md | 18D | Feature manual page |
Modified Files
| File | Sub-Phase | Change |
|---|---|---|
internal/tools/manager.go | 18C | Register prompt_search and prompt_create executors |
internal/agent/activities.go | 18C | Wire optional prompt library lookup before LLM prompt construction |
internal/config/config.go | 18B | Add promptlib config fields + env var loading |
Migration: 0025_prompt_library
-- 0025_prompt_library.up.sql
CREATE TABLE IF NOT EXISTS prompts (
tenant_id TEXT NOT NULL DEFAULT '_global',
id TEXT NOT NULL,
version INTEGER NOT NULL,
hash TEXT NOT NULL,
type TEXT NOT NULL,
name TEXT NOT NULL,
description TEXT NOT NULL DEFAULT '',
content TEXT NOT NULL,
parameters JSONB,
tags TEXT[] DEFAULT '{}',
author TEXT NOT NULL DEFAULT '',
metadata JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (tenant_id, id, version),
UNIQUE (tenant_id, hash)
);
CREATE INDEX idx_prompts_type ON prompts (tenant_id, type);
CREATE INDEX idx_prompts_tags ON prompts USING GIN (tags);
CREATE INDEX idx_prompts_hash ON prompts (hash);
CREATE TABLE IF NOT EXISTS prompt_metrics (
prompt_hash TEXT NOT NULL PRIMARY KEY,
tenant_id TEXT NOT NULL DEFAULT '_global',
usage_count INTEGER NOT NULL DEFAULT 0,
success_count INTEGER NOT NULL DEFAULT 0,
failure_count INTEGER NOT NULL DEFAULT 0,
total_rating DOUBLE PRECISION NOT NULL DEFAULT 0,
rating_count INTEGER NOT NULL DEFAULT 0,
last_used_at TIMESTAMPTZ,
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_prompt_metrics_tenant ON prompt_metrics (tenant_id);
Success Metrics
| Metric | Target |
|---|---|
| Prompt type coverage | 8 types (system, user, task, repair, routing, tool_description, chain_of_thought, custom) |
| Search latency (vector + re-rank) | < 50ms p99 |
| Store immutability | Hash verification on every Put (0 content mutations) |
| Template rendering | < 1ms p99 |
| Feedback recording | Non-blocking, < 5ms fire-and-forget |
| Agent tool integration | prompt_search + prompt_create registered and functional |
| Multi-tenant isolation | All queries scoped by tenant_id |
| Quality signal accuracy | LLM auto-rating within 0.15 of user rating (when both present) |
| Backward compatibility | Existing prompt construction in activities.go unchanged when library disabled |
| Test coverage | >= 80% for internal/promptlib/ (enforced by scripts/check-coverage.sh) |
Risk Mitigation
| Risk | Mitigation |
|---|---|
| Cold start (empty library) | cmd/prompt-seed CLI pre-loads curated prompts. Library search returns empty gracefully — agent falls back to hardcoded builders. |
| Low-quality prompt proliferation | Quality score incorporates success rate + LLM rating. Low-quality prompts naturally sink in rankings. |
| Template injection via parameters | text/template auto-escapes. Parameter validation enforces types. Content is immutable (can't be modified post-creation). |
| Vector search latency at scale | Stage 1 retrieval bounded by K=20. Re-ranking is in-memory, O(K log K). Collection uses existing vector infrastructure. |
| Embedding cost for indexing | Embeddings generated once on Put, cached in vector store. Search embeds query only (single call). |
| Breaking existing prompt construction | Library is opt-in. When disabled (CRUVERO_PROMPTLIB_ENABLED=false), activities.go prompt builders are unchanged. |
Relationship to Other Phases
| Phase | Relationship |
|---|---|
| Phase 5 (Memory) | 18B reuses memory.ComputeRecency and ComputeUsageFrequency for ranking |
| Phase 6 (Tool Registry) | 18A mirrors registry.Store immutability pattern (hash, ON CONFLICT, tenant isolation) |
| Phase 8 (Embeddings + Vector) | 18B reuses embedding.Embedder and vectorstore.VectorStore with new collection |
| Phase 14 (API) | API endpoints can expose prompt library search/create via existing route patterns |
| Phase 17 (PII Guard) | PII filtering applies to prompt content at output boundary (no special handling needed) |
Progress Notes
(none yet)