Skip to main content

Audit Choke-Point Invariant

Context

Every audit row written by the runtime must be routed through internal/runtime/audit.ChainTracker (specifically RecordRoot or RecordChild) before it reaches audit.Writer.Write(...). The tracker is the single choke-point that allocates the entry's stable id, computes its chain_link_hash, and populates the provenance_chain column that LifecycleService.GetProvenanceChain returns to clients.

A direct audit.Writer.Write(audit.Entry{...}) bypasses all of that:

  • the row lands with chain_link_hash = "" and provenance_chain = NULL;
  • cross-session provenance walks (LLD 13 ChainTracker.WalkCrossSession) break at the rootless entry because the parent anchor cannot be verified;
  • the resulting audit history looks complete but is silently incomplete.

This was the failure mode that issue #197 opened against: during the T9 rebase (PR #186) a direct audit.Writer.Write(...) call in runtime_server.go was caught manually, AFTER the T8 guard test (PR #169) had been "protecting" the invariant for weeks.

The invariant (canonical statement)

For every audit.Entry{…} literal constructed in production code under a guarded package, there must be exactly one matching call to <tracker>.RecordRoot(…) or <tracker>.RecordChild(…) in the same file. The returned audit.Entry is what gets passed to audit.Writer.Write.

The enforcement is a per-file literal-count parity check, not a full dataflow analysis. Parity is possible because every tracker call takes an audit.Entry{…} literal as its sole argument, and every Writer.Write receives the tracker's return value. A mismatch means at least one literal was constructed and written without passing through the tracker.

Guard (CI signal)

TestTestAuditWriteSites_AllRouteThroughChainTracker
Fileinternal/runtime/server/lifecycle_provenance_test.go
HelperscanAuditChokePointParity(t, dir) — pure stdlib, walks .go files in dir, returns (offenders, totalSites)
Packages in scope (as of #197)internal/runtime/server/, internal/runtime/subagent/
Negative testsTestAuditWriteSites_GuardTripsOnMisroutedEntry (crafted misroute trips), TestAuditWriteSites_GuardSilentOnCorrectlyRoutedEntry (positive control)
CI triggerEvery Go test run. Additionally — the new nightly main-green-nightly.yml runs go test -race ./... against main at 04:00 UTC so guard drift post-merge is caught even if the PR-gate path filters missed it.

Offender lines look like:

approval.go: audit.Entry literals=3 but tracker calls=2

meaning one literal was constructed directly; visually diff the file for a missing RecordChild(…) wrapper.

How to detect a violation locally

# Fast grep — list every production file containing an audit literal +
# a Writer call, plus the tracker count, so you can eyeball parity:
for f in internal/runtime/server/*.go internal/runtime/subagent/*.go; do
[[ "$f" == *_test.go ]] && continue
ents=$(grep -c 'audit\.Entry{' "$f" || true)
recs=$(grep -cE '\.(RecordRoot|RecordChild)\(' "$f" || true)
writes=$(grep -c '\.Write(' "$f" || true)
[[ "$ents$recs$writes" != "000" ]] && printf '%-60s ents=%s recs=%s writes=%s\n' "$f" "$ents" "$recs" "$writes"
done

(Note: grep counts any audit.Entry{ occurrence including the zero-value audit.Entry{} used for error-return signatures. The regex-based guard test intentionally excludes the zero-value via the character class audit\.Entry\{[^}]. Manual parity-by-grep therefore requires a mental deduction for any return audit.Entry{}, err lines.)

The authoritative check is:

go test -run TestAuditWriteSites ./internal/runtime/server/...

How to fix a violation

1. Find the offending literal. The guard message names the file. The offending literal is the one NOT preceded by a RecordRoot / RecordChild.

2. Wrap it. Replace

entry := audit.Entry{
OrgID: orgID,
ActionType: runtime.AuditActionXxx,
Detail: detail,
CreatedAt: time.Now().UTC(),
}
s.audit.Write(entry) // ← bypasses the tracker

with

entry, err := s.tracker.RecordChild(audit.Entry{ // or RecordRoot if it's a chain root
OrgID: orgID,
ActionType: runtime.AuditActionXxx,
Detail: detail,
CreatedAt: time.Now().UTC(),
})
if err != nil {
slog.Warn("audit: chain record failed", "err", err, "action", runtime.AuditActionXxx)
return
}
s.audit.Write(entry)

RecordRoot vs RecordChild:

  • Roots (e.g. session_started, message_received, session_paused, subagent_completed) have no parent in the chain and are classified by audit.chain.isRootType.
  • Children (e.g. llm_call, checkpoint_written, tool_call, subagent_invoked, subagent_approval_*) attach under the most recent action in the same session and inherit its provenance_chain + [parent_id].

If you're unsure, check internal/runtime/audit/chain.go :: isRootType for the authoritative list.

3. Verify.

go test -race -run TestAuditWriteSites ./internal/runtime/server/...
go test -race ./internal/runtime/audit/...

The negative test (TestAuditWriteSites_GuardTripsOnMisroutedEntry) ensures the guard still trips on a crafted misroute — it is a critical regression barrier for the guard itself.

Maintenance — widening scope to new packages

When a new package starts writing audit rows (imports internal/runtime/audit and calls .Write(...)), the guard must be widened to cover it. Otherwise the package silently escapes the invariant, which is exactly what happened with internal/runtime/subagent/ between PR #449 and the re-audit in #197.

Checklist for expanding the guard:

  1. Open internal/runtime/server/lifecycle_provenance_test.go.
  2. Find the scopes slice inside TestAuditWriteSites_AllRouteThroughChainTracker.
  3. Append a new {relDir: "<new-package>", minSites: <N>} entry, where N is the count of intentional audit-write sites in that package's production files.
  4. Run the test locally — ensure all three tests still pass.
  5. Update the "Packages in scope" row in this runbook.
  6. Add a short note in the PR description pointing to this runbook so reviewers know the invariant expansion was intentional.

Known next candidate: internal/runtime/session/sweeper.go contains an emitSessionCrashed path that routes through Tracker.RecordRoot + Audit.Write — correctly compliant today, but outside the guard's current walk. Adding the session package is a zero-risk widening (parity is already 1/1/1) and should land with the next audit-write addition in that package.

Lessons learned (from #197 re-audit)

The pattern that caused #449 to escape the guard scope was:

The guard test literally hard-coded filepath.Dir(thisFile) as the walk root and lived in the same directory as the code it protected. When audit-write sites landed in a sibling package, nothing triggered a reviewer to widen the walk.

Process fixes applied in #197:

  1. Widened walker — the guard now walks a list of explicit scope directories. Adding a new audit-writing package requires one scopes = append(...) edit that the owning PR's author must make.
  2. Nightly main-green CImain-green-nightly.yml runs go test -race ./... against merged main daily so guard drift post-merge is caught even when the per-PR path filter missed a trigger. This is the systemic fix for the T6/T8/T9 rebase-timing gap documented in the original #197 body.
  3. PR checklist addition (recommended follow-up — not part of #197 scope): when reviewing any PR that adds a new import of internal/runtime/audit, reviewer must confirm the guard scope was widened in the same PR. A mechanical helper could eventually live in .github/pull_request_template.md or a pre-commit hook.

Refs

  • Issue #197 — re-audit comment that re-scoped this work
  • PR #169 — T8, original single-choke-point guard
  • PR #186 — T9 rebase where the runtime_server.go violation was caught
  • PR #449 — introduced the subagent audit-write sites that escaped the guard scope
  • internal/runtime/audit/chain.goChainTracker implementation
  • internal/runtime/audit/chain_test.go — unit tests for the tracker
  • docs/lld/wave3-lld-12-child-session-lifecycle.md — §3 Race-Freedom Invariants (related chain-invariant documentation)