Phase 9
๐ Phase 9D: Security Hardening & I/O Sanitization
๐ Phase 9C: Audit Logging & Compliance
โ๏ธ Phase 9B: Rate Limiting, Quotas & Cost Guardrails
๐๏ธ Phase 9A: Multi-Tenancy & Namespace Isolation
Subphasesโ
Phase 9: Enterprise Hardening
Status: Completed (2026-02-09)
Production-ready for serious, multi-tenant workloads. Security, compliance, and operational resilience as infrastructure guarantees.
Why This Phase Mattersโ
Cruvero's core value proposition is "production survival." Phases 1โ8 built the runtime, tools, memory, multi-agent coordination, and observability. Phase 9 ensures the platform can be operated by teams you don't control, for workloads you didn't anticipate, under compliance regimes you must satisfy.
This is the difference between "works on my machine" and "SOC 2 auditor approved."
Design Philosophyโ
Tenant isolation is not a feature โ it's a property of the architecture. Every boundary (namespace, quota, network, audit) is enforced at the infrastructure layer (Temporal namespaces, Postgres row-level security, network policies) rather than application-level checks that can be bypassed.
Zero-trust by default. Every tool call, LLM invocation, and state mutation is authenticated, authorized, and auditable. Opt out of security for development; never opt in for production.
Compliance as code. Audit trails, PII detection, and export formats are automated pipelines โ not manual processes bolted on after the fact.
Subphasesโ
| Subphase | Scope | Est. Duration |
|---|---|---|
| 9A | Multi-Tenancy & Namespace Isolation | 2 weeks |
| 9B | Rate Limiting, Quotas & Cost Guardrails | 1.5 weeks |
| 9C | Audit Logging & Compliance | 2 weeks |
| 9D | Security Hardening & I/O Sanitization | 2 weeks |
| 9E | High Availability & Disaster Recovery | 1.5 weeks |
Total estimated: 8โ10 weeks (some parallelizable)
Subphase Indexโ
| Sub | Title | Key Deliverable | Prompts |
|---|---|---|---|
| 9A | Multi-Tenancy & Namespace Isolation | Tenant CRUD, Temporal namespaces, RLS, memory/registry scoping | 4 prompts |
| 9B | Rate Limiting, Quotas & Cost Guardrails | Token bucket limiter, cost caps, model downgrade, quota dashboard | 3 prompts |
| 9C | Audit Logging & Compliance | Hash-chained audit trail, PII detection, SOC 2/HIPAA exports | 3 prompts |
| 9D | Security Hardening & I/O Sanitization | gVisor/nsjail sandbox, prompt injection defense, network policies, Vault | 4 prompts |
| 9E | High Availability & Disaster Recovery | Health checks, LLM failover, K8s manifests, DR playbook, runbooks | 3 prompts |
Dependenciesโ
- Phase 2 (signals, queries, decision log) โ required
- Phase 4 (memory) โ required for tenant-scoped memory isolation
- Phase 5 (supervisor) โ required for multi-tenant agent coordination
- Phase 6B (cost tracking) โ required for quota enforcement
- Phase 8C (observability, auth) โ required for OIDC integration and OTEL pipeline
Architecture Decisionsโ
Tenant Modelโ
One Temporal namespace per tenant. This gives hard isolation at the workflow engine level โ tenants cannot see, signal, or query each other's workflows. The alternative (shared namespace with workflow-ID prefixing) was rejected because it relies on application-level enforcement and breaks Temporal's native access controls.
Quota Enforcement Layerโ
Quotas are enforced via a middleware activity wrapper that checks tenant limits before every LLM call and tool execution. This is not a rate limiter in front of the API โ it's baked into the workflow execution path, so even replayed or continued-as-new workflows respect current quotas.
Audit Storageโ
Audit events go to an append-only Postgres table with hash chaining (each event includes hash of previous event). This provides tamper evidence without requiring external blockchain infrastructure. Export pipelines produce SOC 2 and HIPAA-compatible formats.
Security Layersโ
| Layer | Mechanism |
|---|---|
| Tool sandbox | gVisor/nsjail for python_exec/bash_exec |
| Input sanitization | Pre-LLM prompt injection detection |
| Output filtering | PII redaction, sensitive data masking |
| Network policies | Per-tool egress rules, deny-by-default |
| Secret injection | Vault/OIDC per-tenant, no env vars in prod |
Key Files (New)โ
internal/tenant/
config.go # TenantConfig, ResourceQuotas, RateLimits
store.go # TenantStore interface
postgres_store.go # Postgres implementation
middleware.go # Activity middleware for quota enforcement
namespace.go # Temporal namespace management
internal/quota/
limiter.go # Token bucket + sliding window
policy.go # QuotaPolicy evaluation
store.go # Quota state persistence
internal/audit/
event.go # AuditEvent types
logger.go # Append-only audit writer
chain.go # Hash chain verification
export.go # SOC2/HIPAA export
pii.go # PII detection + redaction
internal/security/
sanitizer.go # Input sanitization
output_filter.go # Output filtering
network_policy.go # Per-tool egress rules
sandbox.go # Enhanced sandbox (gVisor/nsjail)
migrations/
0013_tenants.up.sql / down.sql
0014_tenant_usage_daily.up.sql / down.sql
0015_quotas.up.sql / down.sql
0016_audit_log.up.sql / down.sql
Exit Criteria (Phase 9 Complete)โ
- Tenants fully isolated at Temporal namespace level
- Per-tenant rate limits enforced without race conditions
- Audit log tamper-evident with hash chain verification
- PII detection and redaction operational
- Compliance exports (SOC 2, HIPAA) passing validation
- Tool sandbox hardened with gVisor or nsjail
- Input sanitization blocks prompt injection patterns
- Network policies enforced per-tool
- HA deployment guide validated in staging
- DR playbook tested with failover scenario
Closeout Gaps and Future-Proof Backlog (2026-02-07)โ
- Audit UI surface tracked in Phase 7F (
docs/phases/PHASE7F.md) and implemented in legacy UI bridge pages. - Security alerts UI surface tracked in Phase 7F (
docs/phases/PHASE7F.md) and implemented in legacy UI bridge pages. - Host-level sandbox integration tests added (tagged
security,integration; opt-in viaCRUVERO_RUN_HOST_SANDBOX_TESTS=true). - Alert rules as code added under
deploy/monitoring/(Prometheus + Loki). - DR and HA drill automation scripts added under
scripts/ops/. - Security posture and DR readiness checklists added under
docs/operations/checklists/. - Execute staged HA/DR drills and attach reports to release evidence.
Environment Variables (New)โ
# Tenancy
CRUVERO_TENANT_MODE=single|multi # default: single
CRUVERO_TENANT_STORE=postgres # default: postgres
CRUVERO_TENANT_DEFAULT_NAMESPACE=default
# Quotas
CRUVERO_QUOTA_ENABLED=true|false # default: true
CRUVERO_QUOTA_DEFAULT_RPM=60 # requests per minute
CRUVERO_QUOTA_DEFAULT_TPD=1000000 # tokens per day
CRUVERO_QUOTA_DEFAULT_COST_USD=100.0 # max daily cost
# Audit
CRUVERO_AUDIT_ENABLED=true|false # default: false
CRUVERO_AUDIT_PII_DETECTION=true|false # default: false
CRUVERO_AUDIT_EXPORT_FORMAT=soc2|hipaa|json
CRUVERO_AUDIT_RETENTION_DAYS=365
# Security
CRUVERO_SANDBOX_MODE=process|gvisor|nsjail # default: process
CRUVERO_INPUT_SANITIZATION=true|false # default: false
CRUVERO_OUTPUT_PII_REDACTION=true|false # default: true
CRUVERO_NETWORK_POLICY_ENABLED=true|false # default: false