Audit Choke-Point Invariant
Context
Every audit row written by the runtime must be routed through
internal/runtime/audit.ChainTracker (specifically RecordRoot or
RecordChild) before it reaches audit.Writer.Write(...). The tracker
is the single choke-point that allocates the entry's stable id, computes
its chain_link_hash, and populates the provenance_chain column that
LifecycleService.GetProvenanceChain returns to clients.
A direct audit.Writer.Write(audit.Entry{...}) bypasses all of that:
- the row lands with
chain_link_hash = ""andprovenance_chain = NULL; - cross-session provenance walks (LLD 13
ChainTracker.WalkCrossSession) break at the rootless entry because the parent anchor cannot be verified; - the resulting audit history looks complete but is silently incomplete.
This was the failure mode that issue #197 opened against: during the T9
rebase (PR #186) a direct audit.Writer.Write(...) call in
runtime_server.go was caught manually, AFTER the T8 guard test (PR
#169) had been "protecting" the invariant for weeks.
The invariant (canonical statement)
For every
audit.Entry{…}literal constructed in production code under a guarded package, there must be exactly one matching call to<tracker>.RecordRoot(…)or<tracker>.RecordChild(…)in the same file. The returnedaudit.Entryis what gets passed toaudit.Writer.Write.
The enforcement is a per-file literal-count parity check, not a full
dataflow analysis. Parity is possible because every tracker call takes an
audit.Entry{…} literal as its sole argument, and every Writer.Write
receives the tracker's return value. A mismatch means at least one literal
was constructed and written without passing through the tracker.
Guard (CI signal)
| Test | TestAuditWriteSites_AllRouteThroughChainTracker |
| File | internal/runtime/server/lifecycle_provenance_test.go |
| Helper | scanAuditChokePointParity(t, dir) — pure stdlib, walks .go files in dir, returns (offenders, totalSites) |
| Packages in scope (as of #197) | internal/runtime/server/, internal/runtime/subagent/ |
| Negative tests | TestAuditWriteSites_GuardTripsOnMisroutedEntry (crafted misroute trips), TestAuditWriteSites_GuardSilentOnCorrectlyRoutedEntry (positive control) |
| CI trigger | Every Go test run. Additionally — the new nightly main-green-nightly.yml runs go test -race ./... against main at 04:00 UTC so guard drift post-merge is caught even if the PR-gate path filters missed it. |
Offender lines look like:
approval.go: audit.Entry literals=3 but tracker calls=2
meaning one literal was constructed directly; visually diff the file for
a missing RecordChild(…) wrapper.
How to detect a violation locally
# Fast grep — list every production file containing an audit literal +
# a Writer call, plus the tracker count, so you can eyeball parity:
for f in internal/runtime/server/*.go internal/runtime/subagent/*.go; do
[[ "$f" == *_test.go ]] && continue
ents=$(grep -c 'audit\.Entry{' "$f" || true)
recs=$(grep -cE '\.(RecordRoot|RecordChild)\(' "$f" || true)
writes=$(grep -c '\.Write(' "$f" || true)
[[ "$ents$recs$writes" != "000" ]] && printf '%-60s ents=%s recs=%s writes=%s\n' "$f" "$ents" "$recs" "$writes"
done
(Note: grep counts any audit.Entry{ occurrence including the zero-value
audit.Entry{} used for error-return signatures. The regex-based guard
test intentionally excludes the zero-value via the character class
audit\.Entry\{[^}]. Manual parity-by-grep therefore requires a mental
deduction for any return audit.Entry{}, err lines.)
The authoritative check is:
go test -run TestAuditWriteSites ./internal/runtime/server/...
How to fix a violation
1. Find the offending literal. The guard message names the file. The
offending literal is the one NOT preceded by a RecordRoot /
RecordChild.
2. Wrap it. Replace
entry := audit.Entry{
OrgID: orgID,
ActionType: runtime.AuditActionXxx,
Detail: detail,
CreatedAt: time.Now().UTC(),
}
s.audit.Write(entry) // ← bypasses the tracker
with
entry, err := s.tracker.RecordChild(audit.Entry{ // or RecordRoot if it's a chain root
OrgID: orgID,
ActionType: runtime.AuditActionXxx,
Detail: detail,
CreatedAt: time.Now().UTC(),
})
if err != nil {
slog.Warn("audit: chain record failed", "err", err, "action", runtime.AuditActionXxx)
return
}
s.audit.Write(entry)
RecordRoot vs RecordChild:
- Roots (e.g.
session_started,message_received,session_paused,subagent_completed) have no parent in the chain and are classified byaudit.chain.isRootType. - Children (e.g.
llm_call,checkpoint_written,tool_call,subagent_invoked,subagent_approval_*) attach under the most recent action in the same session and inherit itsprovenance_chain + [parent_id].
If you're unsure, check internal/runtime/audit/chain.go :: isRootType
for the authoritative list.
3. Verify.
go test -race -run TestAuditWriteSites ./internal/runtime/server/...
go test -race ./internal/runtime/audit/...
The negative test (TestAuditWriteSites_GuardTripsOnMisroutedEntry)
ensures the guard still trips on a crafted misroute — it is a critical
regression barrier for the guard itself.
Maintenance — widening scope to new packages
When a new package starts writing audit rows (imports
internal/runtime/audit and calls .Write(...)), the guard must be
widened to cover it. Otherwise the package silently escapes the
invariant, which is exactly what happened with internal/runtime/subagent/
between PR #449 and the re-audit in #197.
Checklist for expanding the guard:
- Open
internal/runtime/server/lifecycle_provenance_test.go. - Find the
scopesslice insideTestAuditWriteSites_AllRouteThroughChainTracker. - Append a new
{relDir: "<new-package>", minSites: <N>}entry, whereNis the count of intentional audit-write sites in that package's production files. - Run the test locally — ensure all three tests still pass.
- Update the "Packages in scope" row in this runbook.
- Add a short note in the PR description pointing to this runbook so reviewers know the invariant expansion was intentional.
Known next candidate: internal/runtime/session/sweeper.go contains
an emitSessionCrashed path that routes through Tracker.RecordRoot +
Audit.Write — correctly compliant today, but outside the guard's
current walk. Adding the session package is a zero-risk widening
(parity is already 1/1/1) and should land with the next audit-write
addition in that package.
Lessons learned (from #197 re-audit)
The pattern that caused #449 to escape the guard scope was:
The guard test literally hard-coded
filepath.Dir(thisFile)as the walk root and lived in the same directory as the code it protected. When audit-write sites landed in a sibling package, nothing triggered a reviewer to widen the walk.
Process fixes applied in #197:
- Widened walker — the guard now walks a list of explicit scope
directories. Adding a new audit-writing package requires one
scopes = append(...)edit that the owning PR's author must make. - Nightly main-green CI —
main-green-nightly.ymlrunsgo test -race ./...against mergedmaindaily so guard drift post-merge is caught even when the per-PR path filter missed a trigger. This is the systemic fix for the T6/T8/T9 rebase-timing gap documented in the original #197 body. - PR checklist addition (recommended follow-up — not part of
#197 scope): when reviewing any PR that adds a new import of
internal/runtime/audit, reviewer must confirm the guard scope was widened in the same PR. A mechanical helper could eventually live in.github/pull_request_template.mdor a pre-commit hook.
Refs
- Issue #197 — re-audit comment that re-scoped this work
- PR #169 — T8, original single-choke-point guard
- PR #186 — T9 rebase where the runtime_server.go violation was caught
- PR #449 — introduced the subagent audit-write sites that escaped the guard scope
internal/runtime/audit/chain.go—ChainTrackerimplementationinternal/runtime/audit/chain_test.go— unit tests for the trackerdocs/lld/wave3-lld-12-child-session-lifecycle.md— §3 Race-Freedom Invariants (related chain-invariant documentation)