Active studyv0.12026-04-30

Burn-In Testing for Autonomous Database Agents

This study treats burn-in as an operational benchmark for database agents, measuring evidence freshness, workload activity, fleet-depth probes, and torture-matrix stability over multi-day runs.

Summary

Before QueryRook earns production authority, it should survive days of boring, hostile, repetitive database work and produce evidence the buyer can inspect.

Claims

QR-CLAIM-2026-002

Burn-in as a release gate

A multi-day database-agent burn-in can expose onboarding, permission, workload-drift, and evidence-freshness failures before production authority is granted.

Methods

Run a 168-hour AWS-hosted burn-in against a TLS-enabled Neon Postgres target and AWS shadow Postgres.
Collect burn-in verdicts, workload pulse artifacts, fleet-depth probes, and torture-matrix summaries.
Publish limitations and run metadata beside the headline result.

Metrics

Evidence cycle count

Fleet score

Query shape count

Torture event count

Failure count

p50/p95/p99 latency where available

Limitations

A passing burn-in proves the tested scope, not every possible customer workload.
Small cloud instances and shared managed Postgres can introduce noise.

Reproducibility

Current status: Partially reproducible. Reports generated by ResearchOps include commit hashes, run IDs, artifact paths, required disclosures, and a SHA-256 integrity hash.