Papers
Active studyv0.1
Burn-In Testing for Autonomous Database Agents
This study treats burn-in as an operational benchmark for database agents, measuring evidence freshness, workload activity, fleet-depth probes, and torture-matrix stability over multi-day runs.
Summary
Before QueryRook earns production authority, it should survive days of boring, hostile, repetitive database work and produce evidence the buyer can inspect.
Claims
QR-CLAIM-2026-002
Burn-in as a release gate
A multi-day database-agent burn-in can expose onboarding, permission, workload-drift, and evidence-freshness failures before production authority is granted.
Methods
- Run a 168-hour AWS-hosted burn-in against a TLS-enabled Neon Postgres target and AWS shadow Postgres.
- Collect burn-in verdicts, workload pulse artifacts, fleet-depth probes, and torture-matrix summaries.
- Publish limitations and run metadata beside the headline result.
Metrics
Evidence cycle count
Fleet score
Query shape count
Torture event count
Failure count
p50/p95/p99 latency where available
Limitations
- A passing burn-in proves the tested scope, not every possible customer workload.
- Small cloud instances and shared managed Postgres can introduce noise.
Reproducibility
Current status: Partially reproducible. Reports generated by ResearchOps include commit hashes, run IDs, artifact paths, required disclosures, and a SHA-256 integrity hash.