← All stories

Voyage · EventFarm Parity Audit Trail

Every evaluation. Every escape. Every healing plan.

Methodology only earns its keep if its results are auditable. Every factory cycle leaves behind per-axis verdicts, evidence artifacts, and a row in the ledger below. When a bug escapes the methodology and a human catches it, the case is logged with a healing plan that names the methodology change preventing the same class of escape next time. The trail is persistent — past entries stay; charts visualize whether the methodology is getting better.

Last updated: 2026-04-30 · Methodology axes: 4 (Functional, Visual, Semantic, Honesty) · Cycles audited: 7 (incl. self-audit) · Open findings: 0 · Addressed findings: 2 · Retrospective-resolved findings: 3 · Open escapes: 1

Cumulative metrics

Where the methodology stands as of the last factory cycle.

Methodology axes
4
Functional · Visual · Semantic · Honesty
Pilot surfaces
3
EF-074, EF-073, EF-077
Surfaces × axes green
7 / 9
EF-073 and EF-077 visual axes demoted under tightened prompt
Open honesty findings
0
All findings addressed or retrospective-resolved
Addressed honesty findings
2
Both serious; data-server-total tautology fix in aa71c38 + guard test
Retrospective-resolved findings
3
2 from EF-074 cycle 3 (escape #1); 1 from visual-quality pilot

Per-cycle ledger

Every cycle, every axis verdict, every artifact pointer. Newest first.

Cycle Date Surfaces / scope Functional Visual Semantic Honesty Verdict
Visual-quality prompt tightening1a0b78d · 4983338 · deploy/self-audit 2026-04-30 EF-074, EF-073 (×2 viewports), EF-077
Escape #2 calibration + pilot re-eval
N/A 2 demoted N/A PASS (self) Methodology hardening
Honesty findings clarity + open-issue close018a808 2026-04-30 audit-trail rendering + close real open data-server-total tautology N/A N/A ADDRESSED PASS Remediation
Honesty axis pilot1e60042 · aa71c38 · f0e3076 · 4bbc4c9 2026-04-30 5 cycles audited retrospectively + self-audit N/A N/A N/A PASS (self) Harness landed
Semantic-invariants pilotffe6c81 · ed9942d · 1a33202 · 79b2256 2026-04-30 EF-074, EF-073, EF-077
3 surfaces × 8 invariants
PASS PASS PASS ADDRESSED · 2 serious EF-074 → Shippable
Visualizer polish56fc597 · 01f5b30 · bf035f8 · 5b57b45 2026-04-30 EF-074, EF-073 (×2 viewports), EF-077
polish iteration ×2
PASS PASS N/A PASS (retro) Visual closed; semantic pending
Visualizer chrome-fix27507ae · d1b85f8 · 57d600e · 46e588e 2026-04-30 SurfaceApp suppress admin chrome for surface=visualizer PASS demoted N/A PASS (retro) honest demote — polish < 4
Visual-quality pilot2ec2c43 · cf3a226 · 4ac6ece 2026-04-30 3 EFx surfaces — first vision-evaluator run PASS all 4 fail N/A RETRO · 1 serious resolved Harness pilot
EF-074 cycle 3 (client fixes)b3364f9 · 7d054ad · ee5123f · df65cab 2026-04-29 EF-074 client out-of-order + foreign-event filter PASS no axis yet no axis yet RETRO · 2 serious resolved demoted by user; founding escape #1 healed
EF-077 access control637c053 · ec94b0c · f3ca200 · 976b89f · 1bd7bb7 2026-04-29 /access-station, door scan, audit row, capacity PASS no axis yet no axis yet not audited Partial — NFC + organizer admin deferred
EF-073 EFx Polld79a1b2 · 75f0559 · adff28b · 3a89f34 2026-04-29 /poll-attendee + /poll-station net-new PASS no axis yet no axis yet not audited Partial — organizer admin deferred
W1 mailing edgefb6f884 · 976279c · 5323e4b 2026-04-29 99/100 → 100/100 stuck-sending fix PASS no UI no axis yet not audited Deliverable 3 closed
EF-074 tightenea76be3 · abcc49c · 81a98ce · 8fb3542 · 593905b 2026-04-29 delta-injection + load wrapper, surfaced 3 client bugs PASS no axis yet no axis yet not audited Partial — 3 honest demotions

Trends

Methodology coverage and outcomes over time. Hand-rendered for now; auto-generation comes when the dataset warrants it.

Escape ledger

Bugs caught by humans that the methodology should have caught. Each escape includes a healing plan: what was missed, root cause, and the methodology change that prevents the same class of escape next time.

Escape #2 — OPEN · logged 2026-04-30

EF-077 /access-station visual-quality false-pass — kitchen-sink rendering, debug-string leak, state contradiction, misleading affordance

Caught: 2026-04-30 by user (operator review of deployed /access-station) · Surface: visualizer.vxge-aperture.porivo.com/access-station · Cycle that false-passed: visualizer polish

What the methodology missed

The visual-quality prompt read aesthetic surface quality without checking whether the station rendered a coherent runtime state for a real door operator. The prompt missed ten visible failures: kitchen-sink ALLOW / DENY / CAPACITY REACHED / LATE POLICY pills; Checkpoint ef077-door-station debug-string leakage; ALLOW shown while capacity is reached; active equal-weight NFC affordance while NFC proof is deferred; headline dominating the actual scan action; audit rows without column headers; 4 rows developer-database terminology; yellow/cream capacity-reached color semantics; massive audit-panel dead space; and an empty middle column beneath the checkpoint label.

Root cause

The prompt's sub-axis definitions were too loose. In particular, would_a_designer_ship was being returned true based on "looks intentional and fairly production-ready" framing without testing whether the page would actually function for its named persona.

Healing plan

  • Complete Tighten visual-quality prompt with 10 named hallmarks: coherent runtime state, no debug strings, no state/data contradictions, affordance/capability alignment, action hierarchy, table labels, no developer terminology, correct color semantics, layout voids, and no empty regions.
  • Complete Re-evaluate pilot under tightened prompt. EF-074 carried forward; EF-073 desktop and EF-077 demoted.
  • Queued Fix factories per failed surface: /access-station first; /poll-attendee desktop hierarchy next unless grouped into the same visual remediation pass.
  • Queued Re-run after each fix until all pilot surfaces pass under the tightened prompt.
  • Queued Wide pass blocked until pilot is genuinely clean under the tightened prompt.

What changed structurally

The visual-quality predicate now requires functional coherence in addition to aesthetic surface quality: a single resolved state instead of kitchen-sink rendering, surface-language hygiene instead of debug identifiers, affordance/capability alignment instead of active no-op controls, and label completeness for table-like data. The methodology tightened in response to the human-caught escape, per the audit-trail commitment.

Escape #1 — Founding case · RESOLVED 2026-04-30

EF-074 cycle 3 over-claim — admin chrome + 101% rounding

Caught: 2026-04-29 by user (operator review of deployed /poll-results) · Surface: visualizer.vxge-aperture.porivo.com/poll-results · Cycle that promoted: EF-074 cycle 3 client fixes

What the methodology missed

The cycle's strict-predicate harness covered tagged elements via selector-visible, selector-absent, color-contrast on a single marker, and similar. The page passed every probe. But the operator-visible state was broken in two distinct ways:

  • Aggregate visual quality: admin chrome (oversized h1 marketing copy, primary/secondary metric block, status-card panels stack, data-band readout, SEEDED EVENTS table with dark-on-darker text) wrapped the visualizer surface. The page looked like a marketing dashboard with the visualizer embedded as a panel.
  • Semantic correctness: option percentages displayed 37% + 27% + 21% + 16% = 101%. Naive independent rounding produced an arithmetically-impossible total. No probe checked that the displayed numbers reconcile.

Root cause

The methodology at the time had one axis: functional. The functional axis evaluated tagged elements end-to-end correctness, but it did not evaluate aggregate page quality (visual axis territory) or arithmetic invariants on rendered values (semantic axis territory). The cycle's "Shippable" verdict was broader than the underlying probe coverage warranted — a classic instance of over-claim that the messaging audit (honesty axis) is now built to catch. The honesty audit, applied retroactively, flagged this cycle FAIL with 2 serious findings, validating the calibration.

Healing plan

  • Complete Add the visual axis. Vision-evaluator harness with 5 sub-axes + axe-clean. Pilot on the 3 EFx surfaces. Predicate flipped green after chrome-fix + polish.
  • Complete Add the semantic axis. Story corpus extension + invariant evaluator + largest-remainder rounding util. Predicate flipped green after the EF-074 percentage rounding fix.
  • Complete Add the honesty axis. Static code scan + post-hoc message audit. Retrospective audit of EF-074 cycle 3 returned FAIL (2 serious), confirming the harness flags this exact case.
  • Queued Four-axis wide pass across all 30 currently-Shippable rows + EFx Partial rows. No row stays Shippable until all four axes pass.

What changed structurally

"Shippable" is no longer a claim earned by passing the probes the cycle authored. It is a claim that requires evidence on all four axes. The honesty axis specifically watches for the pattern that produced this escape: cycles that conveniently choose probe sets that exclude the dimensions where there's residual risk. The retroactive honesty audit on this very cycle validates that the harness has the teeth required.

Audit-trail commitments

What every future factory cycle obligates the methodology to do.

Per-cycle audit-trail update protocol:

  • Every cycle's third commit appends a row to the per-cycle ledger above with the cycle name, date, surfaces touched, per-axis verdict, final verdict, and commit pointers.
  • Per-axis evidence (JSONL findings, screenshots, evaluator JSON, audit reports) is published under /findings/ at stable paths so the ledger's links don't rot.
  • Trend chart data is appended (one new data point per axis per cycle); when the dataset crosses ~20 cycles, the hand-rendered SVG charts switch to a small generation script.

Escape protocol:

  • When a human catches a bug the methodology should have caught, an escape entry is added to the ledger with: who caught, when, what surface, what was missed, root cause, healing plan with status-tracked steps, and what changes structurally to prevent the same class of escape.
  • Healing-plan steps stay on the ledger as queued / in flight / complete until the methodology change actually lands. No steps are silently retired.
  • Every escape becomes a calibration test for the methodology going forward — e.g., the EF-074 cycle 3 case is now the calibration test for the honesty axis (the harness must flag it Serious-or-Critical retroactively, or the prompt is too lax).

Persistence: the trail does not get rewritten. Past entries stay. New evidence layers (e.g., honesty audits applied retroactively) are added as new entries that reference the original cycle, never by editing the original.

See also