Source:
docs/manual/prompt-tools.mdThis page is generated by
site/scripts/sync-manual-docs.mjs.
Prompt Engineering Tools Guide
Cruvero ships a prompt-engineering CLI suite for dataset management, evaluation, experimentation, and version diffing.
Source: cmd/prompt-tools/*, cmd/prompt-eval/*, cmd/prompt-dataset/*, cmd/prompt-experiment/*, cmd/prompt-diff/*, internal/promptcli/*, internal/promptlib/*
Architecture
prompt-tools is a dispatcher CLI that routes to dedicated subcommands:
prompt-tools eval ...->prompt-evalprompt-tools dataset ...->prompt-datasetprompt-tools experiment ...->prompt-experimentprompt-tools diff ...->prompt-diff
The same binaries can also be executed directly.
Command Matrix
| Command | Primary Use | Backing Package |
|---|---|---|
prompt-eval | Evaluate prompt output quality against a dataset | internal/promptcli/evalcli |
prompt-dataset | Create/list/get datasets and build from audit logs | internal/promptcli/datasetcli |
prompt-experiment | Manage A/B prompt experiments and winners | internal/promptcli/experimentcli |
prompt-diff | Compare prompt versions | internal/promptcli/diffcli |
prompt-eval
Runs prompt evaluations and computes summary pass/fail metrics.
Key flags
| Flag | Description |
|---|---|
--prompt-hash | Prompt hash to evaluate (required) |
--dataset / --dataset-version | Dataset id/version (required id) |
--scorers | Comma-separated scorers (default exact_match) |
--threshold | Pass threshold (default 0.8) |
--fail-on-regression | Exit non-zero when regression detected |
--baseline-run / --regression-baseline | Baseline run strategy (auto or run id) |
--tenant | Tenant id (default default) |
--format | `text |
--ci / --github-summary | CI-friendly output modes |
--notify / --notify-subject | Publish completion event to NATS |
Example
go run ./cmd/prompt-eval \
--prompt-hash ph_abc123 \
--dataset support-regression \
--dataset-version 3 \
--scorers exact_match,semantic_similarity \
--threshold 0.85 \
--fail-on-regression \
--regression-baseline auto \
--format markdown
prompt-dataset
Manages evaluation datasets in Postgres and can generate datasets from audit logs.
Key flags
| Flag | Description |
|---|---|
--create <file> | Create dataset from JSON file |
--list | List datasets for tenant |
--get <id> / --version | Get dataset by id/version |
--add-entries <file> + --dataset | Add entries to an existing dataset |
--from-logs | Build dataset from audit logs |
--prompt-hash | Required with --from-logs |
--failures-only | Keep only failed cases in --from-logs mode |
--since / --max-entries | Log extraction window and cap |
--tenant / --name | Tenant id and dataset name override |
Example
go run ./cmd/prompt-dataset \
--from-logs \
--prompt-hash ph_abc123 \
--since 168h \
--failures-only \
--max-entries 300 \
--name support-regression-v2
prompt-experiment
Creates and tracks prompt experiments with variant winner selection.
Key flags
| Flag | Description |
|---|---|
--create <file> | Create experiment from JSON |
--list | List tenant experiments |
--get <id> | Fetch experiment |
--complete <id> | Mark experiment complete |
--winner <name> | Winner variant name for completion |
--tenant | Tenant id |
Example
go run ./cmd/prompt-experiment --complete exp-173 --winner concise_v2 --tenant default
prompt-diff
Computes a diff between prompt versions with text or JSON output.
Key flags
| Flag | Description |
|---|---|
--prompt | Prompt id (required) |
--from | Source version (required) |
--to | Target version (default: latest) |
--json | JSON diff output |
--tenant | Tenant id |
Example
go run ./cmd/prompt-diff --prompt incident_classifier --from 7 --to 9
Configuration
Runtime dependencies
| Variable | Purpose |
|---|---|
CRUVERO_POSTGRES_URL | Prompt library, datasets, and experiment storage |
CRUVERO_LLM_PROVIDER | Active provider for evaluation calls |
CRUVERO_OPENROUTER_API_KEY / CRUVERO_OPENROUTER_MODEL | OpenRouter provider settings |
CRUVERO_OPENAI_API_KEY / CRUVERO_OPENAI_MODEL | OpenAI provider settings |
CRUVERO_GOOGLE_API_KEY / CRUVERO_GOOGLE_MODEL | Google provider settings |
CRUVERO_ANTHROPIC_API_KEY / CRUVERO_ANTHROPIC_MODEL | Anthropic provider settings |
Prompt library controls
| Variable | Purpose |
|---|---|
CRUVERO_PROMPTLIB_EVAL_ENABLED | Enable evaluation paths |
CRUVERO_PROMPTLIB_EVAL_TIMEOUT | Evaluation timeout budget |
CRUVERO_PROMPTLIB_EVAL_MAX_CONCURRENT | Eval parallelism cap |
CRUVERO_PROMPTLIB_DIFF_CONTEXT_LINES | Context lines in computed prompt diffs |
CRUVERO_PROMPTLIB_EXPERIMENTS_ENABLED | Experiment feature switch |
CRUVERO_PROMPTLIB_EXPERIMENT_MAX_VARIANTS | Max variants per experiment |
CRUVERO_PROMPTLIB_SNIPPETS_ENABLED | Snippet composition support |
CRUVERO_PROMPTLIB_SNIPPET_MAX_DEPTH | Max snippet nesting depth |
Integration with Prompt Library v2
The CLI suite and Prompt Library v2 share the same storage and scoring primitives:
prompt-datasetcreates datasets consumed byprompt-eval.prompt-evalwriteseval_runsandeval_resultsused by prompt governance workflows.prompt-experimentpersists experiment state and winner metadata used by promotion flows.prompt-diffuses the same diff engine used in UI/version review paths.