Phase 9

neutral

Subphases

Phase 9: Enterprise Hardening

Status: Completed (2026-02-09)

Production-ready for serious, multi-tenant workloads. Security, compliance, and operational resilience as infrastructure guarantees.

Why This Phase Matters

Cruvero's core value proposition is "production survival." Phases 1–8 built the runtime, tools, memory, multi-agent coordination, and observability. Phase 9 ensures the platform can be operated by teams you don't control, for workloads you didn't anticipate, under compliance regimes you must satisfy.

This is the difference between "works on my machine" and "SOC 2 auditor approved."

Design Philosophy

Tenant isolation is not a feature — it's a property of the architecture. Every boundary (namespace, quota, network, audit) is enforced at the infrastructure layer (Temporal namespaces, Postgres row-level security, network policies) rather than application-level checks that can be bypassed.

Zero-trust by default. Every tool call, LLM invocation, and state mutation is authenticated, authorized, and auditable. Opt out of security for development; never opt in for production.

Compliance as code. Audit trails, PII detection, and export formats are automated pipelines — not manual processes bolted on after the fact.

Subphases

Subphase	Scope	Est. Duration
9A	Multi-Tenancy & Namespace Isolation	2 weeks
9B	Rate Limiting, Quotas & Cost Guardrails	1.5 weeks
9C	Audit Logging & Compliance	2 weeks
9D	Security Hardening & I/O Sanitization	2 weeks
9E	High Availability & Disaster Recovery	1.5 weeks

Total estimated: 8–10 weeks (some parallelizable)

Subphase Index

Sub	Title	Key Deliverable	Prompts
9A	Multi-Tenancy & Namespace Isolation	Tenant CRUD, Temporal namespaces, RLS, memory/registry scoping	4 prompts
9B	Rate Limiting, Quotas & Cost Guardrails	Token bucket limiter, cost caps, model downgrade, quota dashboard	3 prompts
9C	Audit Logging & Compliance	Hash-chained audit trail, PII detection, SOC 2/HIPAA exports	3 prompts
9D	Security Hardening & I/O Sanitization	gVisor/nsjail sandbox, prompt injection defense, network policies, Vault	4 prompts
9E	High Availability & Disaster Recovery	Health checks, LLM failover, K8s manifests, DR playbook, runbooks	3 prompts

Dependencies

Phase 2 (signals, queries, decision log) — required
Phase 4 (memory) — required for tenant-scoped memory isolation
Phase 5 (supervisor) — required for multi-tenant agent coordination
Phase 6B (cost tracking) — required for quota enforcement
Phase 8C (observability, auth) — required for OIDC integration and OTEL pipeline

Architecture Decisions

Tenant Model

One Temporal namespace per tenant. This gives hard isolation at the workflow engine level — tenants cannot see, signal, or query each other's workflows. The alternative (shared namespace with workflow-ID prefixing) was rejected because it relies on application-level enforcement and breaks Temporal's native access controls.

Quota Enforcement Layer

Quotas are enforced via a middleware activity wrapper that checks tenant limits before every LLM call and tool execution. This is not a rate limiter in front of the API — it's baked into the workflow execution path, so even replayed or continued-as-new workflows respect current quotas.

Audit Storage

Audit events go to an append-only Postgres table with hash chaining (each event includes hash of previous event). This provides tamper evidence without requiring external blockchain infrastructure. Export pipelines produce SOC 2 and HIPAA-compatible formats.

Security Layers

Layer	Mechanism
Tool sandbox	gVisor/nsjail for python_exec/bash_exec
Input sanitization	Pre-LLM prompt injection detection
Output filtering	PII redaction, sensitive data masking
Network policies	Per-tool egress rules, deny-by-default
Secret injection	Vault/OIDC per-tenant, no env vars in prod

Key Files (New)

internal/tenant/
  config.go          # TenantConfig, ResourceQuotas, RateLimits
  store.go           # TenantStore interface
  postgres_store.go  # Postgres implementation
  middleware.go      # Activity middleware for quota enforcement
  namespace.go       # Temporal namespace management

internal/quota/
  limiter.go         # Token bucket + sliding window
  policy.go          # QuotaPolicy evaluation
  store.go           # Quota state persistence

internal/audit/
  event.go           # AuditEvent types
  logger.go          # Append-only audit writer
  chain.go           # Hash chain verification
  export.go          # SOC2/HIPAA export
  pii.go             # PII detection + redaction

internal/security/
  sanitizer.go       # Input sanitization
  output_filter.go   # Output filtering
  network_policy.go  # Per-tool egress rules
  sandbox.go         # Enhanced sandbox (gVisor/nsjail)

migrations/
  0013_tenants.up.sql / down.sql
  0014_tenant_usage_daily.up.sql / down.sql
  0015_quotas.up.sql / down.sql
  0016_audit_log.up.sql / down.sql

Exit Criteria (Phase 9 Complete)

Closeout Gaps and Future-Proof Backlog (2026-02-07)

Audit UI surface tracked in Phase 7F (docs/phases/PHASE7F.md) and implemented in legacy UI bridge pages.
Security alerts UI surface tracked in Phase 7F (docs/phases/PHASE7F.md) and implemented in legacy UI bridge pages.
Host-level sandbox integration tests added (tagged security,integration; opt-in via CRUVERO_RUN_HOST_SANDBOX_TESTS=true).
Alert rules as code added under deploy/monitoring/ (Prometheus + Loki).
DR and HA drill automation scripts added under scripts/ops/.
Security posture and DR readiness checklists added under docs/operations/checklists/.
Execute staged HA/DR drills and attach reports to release evidence.

Environment Variables (New)

# Tenancy
CRUVERO_TENANT_MODE=single|multi          # default: single
CRUVERO_TENANT_STORE=postgres             # default: postgres
CRUVERO_TENANT_DEFAULT_NAMESPACE=default

# Quotas
CRUVERO_QUOTA_ENABLED=true|false          # default: true
CRUVERO_QUOTA_DEFAULT_RPM=60              # requests per minute
CRUVERO_QUOTA_DEFAULT_TPD=1000000         # tokens per day
CRUVERO_QUOTA_DEFAULT_COST_USD=100.0      # max daily cost

# Audit
CRUVERO_AUDIT_ENABLED=true|false          # default: false
CRUVERO_AUDIT_PII_DETECTION=true|false    # default: false
CRUVERO_AUDIT_EXPORT_FORMAT=soc2|hipaa|json
CRUVERO_AUDIT_RETENTION_DAYS=365

# Security
CRUVERO_SANDBOX_MODE=process|gvisor|nsjail  # default: process
CRUVERO_INPUT_SANITIZATION=true|false       # default: false
CRUVERO_OUTPUT_PII_REDACTION=true|false     # default: true
CRUVERO_NETWORK_POLICY_ENABLED=true|false   # default: false

🌐 Phase 9E: High Availability & Disaster Recovery

Subphases
Why This Phase Matters
Design Philosophy
Subphases
- Subphase Index
Dependencies
Architecture Decisions
Key Files (New)
Exit Criteria (Phase 9 Complete)
Closeout Gaps and Future-Proof Backlog (2026-02-07)
Environment Variables (New)

Subphases​

Phase 9: Enterprise Hardening

Why This Phase Matters​

Design Philosophy​

Subphases​

Subphase Index​

Dependencies​

Architecture Decisions​

Tenant Model​

Quota Enforcement Layer​

Audit Storage​

Security Layers​

Key Files (New)​

Exit Criteria (Phase 9 Complete)​

Closeout Gaps and Future-Proof Backlog (2026-02-07)​

Environment Variables (New)​

Subphases

Why This Phase Matters

Design Philosophy

Subphases

Subphase Index

Dependencies

Architecture Decisions

Tenant Model

Quota Enforcement Layer

Audit Storage

Security Layers

Key Files (New)

Exit Criteria (Phase 9 Complete)

Closeout Gaps and Future-Proof Backlog (2026-02-07)

Environment Variables (New)