Papers
Active studyv0.1

Burn-In Testing for Autonomous Database Agents

This study treats burn-in as an operational benchmark for database agents, measuring evidence freshness, workload activity, fleet-depth probes, and torture-matrix stability over multi-day runs.

Summary

Before QueryRook earns production authority, it should survive days of boring, hostile, repetitive database work and produce evidence the buyer can inspect.

Claims

QR-CLAIM-2026-002

Burn-in as a release gate

A multi-day database-agent burn-in can expose onboarding, permission, workload-drift, and evidence-freshness failures before production authority is granted.

Methods

  • Run a 168-hour AWS-hosted burn-in against a TLS-enabled Neon Postgres target and AWS shadow Postgres.
  • Collect burn-in verdicts, workload pulse artifacts, fleet-depth probes, and torture-matrix summaries.
  • Publish limitations and run metadata beside the headline result.

Metrics

Evidence cycle count
Fleet score
Query shape count
Torture event count
Failure count
p50/p95/p99 latency where available

Limitations

  • A passing burn-in proves the tested scope, not every possible customer workload.
  • Small cloud instances and shared managed Postgres can introduce noise.

Reproducibility

Current status: Partially reproducible. Reports generated by ResearchOps include commit hashes, run IDs, artifact paths, required disclosures, and a SHA-256 integrity hash.