Retrospective: Shelfware Pattern in Waves 1–3

Date: 2026-04-15 Scope: PRD #380 Waves 1, 2, 3 delivery review Attendees (implicit): PM, Principal Architect, Backend SME, QA Engineer

The pattern

Across 3 waves of backend delivery (14 LLDs, 20 PRs), 6 distinct PRs shipped with interfaces defined and unit-tested but not wired into the production call path. Each was caught at architect review and required a re-review loop to fix.

Incidents

#	Wave	LLD	PR	Symptom	Caught by
1	2	6 Approval State Machine	#422	`NewPauserAdapter(nil, sessionManager)` — APPROVED decisions never resumed the paused session	Architect re-read main.go
2	2	7 Channel Adapters	#423	`CallbackHandler` + `AdminHandler` both defined but never mounted on HTTP mux. Email links dead; emergency rotate dead	Architect grep for mount sites
3	2	7 Channel Adapters	#423	`RotatorAudit` hook unwired; `Rotator` singleton not enforced	Same review
4	2	9 Policy Resolver	#427	Resolver interface existed, zero call sites. `ServiceConfig.PolicyResolver` never populated. `Service.Request` never called it	Architect grep for call sites
5	3	14 Approval Gate	#449	Entire approval orchestrator unwired: 0 production callers of `NewApprovalOrchestrator`, `EnqueueApproval`, `NewSubagentApprovalStore`, `ResolveSubagent`, `InvokeAfterApproval`	Architect explicit grep
6	3	Real coordinator swap	#449 → #452	`InvokeAfterApproval` wired to `stubCoordinatorInvoker` in production — approvals granted would fail loudly, not spawn children	Architect caught the stub was reachable in prod

Cost

3 distinct PR re-review loops (each ~30 min architect time, ~45 min backend-sme time)
Each incident adds one review cycle to the LLD
Combined: ~4 hours of additional rework across the session

If any of these had shipped to prod, the user-visible impact would have been:

Incident 1: silent approval success with stuck sessions (worst-case corruption of approval audit trail)
Incident 2: email-based approval flow would silently never work
Incident 3: audit gap on rotation events (SOC2 regression)
Incident 4: tool calls that should require approval would silently execute
Incident 5: the entire approval gate would be absent despite PRD sign-off claiming it
Incident 6: approvals granted would error loudly but no child spawn (less bad because loud)

Incidents 1, 4, 5 were silent-failure class bugs. That's what makes this pattern alarming — happy-path unit tests pass, but the feature isn't reachable from production.

Root cause analysis

Why does this keep happening?

Root cause: the Go type system lets you define an interface + implementation + unit tests without any compile-time check that something CALLS the constructor with the new implementation. Unit tests happily exercise the component in isolation. go test ./... passes. go build ./... passes. go vet ./... passes. The production path uses the OLD interface (or a stub, or nil) and there is no signal.

Contributing factors:

Subagent scope discipline — Each LLD was deliberately scoped to "implement primitives, expose seam for next LLD". Backend-sme rightly stayed in scope. The "wiring" step was implicitly the LAST LLD's job, but each LLD's author thought their LLD was done at the seam boundary.
LLD documents often underspecify wiring — LLD 9's doc said "resolver wired into Service.Request". It did not say "and construct the resolver in cmd/agent-orchestrator/main.go at startup, pass tenant source, load seed file, pass to approval service config". Backend-sme read "wire the interface" and did just the interface side.
Architect's review checklist had a gap — reviews focused on correctness of each function, security of the design, test coverage. They did NOT include a mechanical "grep for production callers of newly-added exported entities". The pattern was invisible until architect did this grep manually, starting from Wave 2 LLD 9 review.
Unit tests exercise interfaces in isolation — TestResolver_*, TestApprovalOrchestrator_* all pass because they construct the component directly. The failure mode (constructor never called in prod) is invisible to unit tests.
Integration tests are often scoped too narrow — tests exercise the component + its immediate collaborators, stubbing far enough back that the "is it wired?" question isn't asked.

What didn't fix it on its own

Review rigor increased after LLD 6 (first incident) — didn't prevent LLDs 7, 9, 14 from repeating the pattern
Backend-sme self-reporting of deviations — they'd note "seam left for next LLD" without realizing that was the gap
Founder-decision tracking — decisions were clear; the execution gap was downstream of decisions

Fixes

Fix 1 (gate): Architect checklist must include production-caller grep

Before any architect APPROVE on a PR that adds new exported entities, run:

# For every newly-added public entity (struct, function, interface, method)
grep -rn "NewThing\|ThingInterface\|ThingMethod" cmd/ internal/ --include="*.go" | grep -v _test.go

Each must have AT LEAST ONE production caller. If zero: REQUEST_CHANGES until wired.

This gate is non-negotiable. Add to .claude/agents/principal-architect.md.

Fix 2 (gate): Backend-sme must include grep verification in final report

Every PR report from backend-sme must include a grep verification section showing production call sites for newly-added exported entities. The report should make the zero-callers case impossible to overlook.

Pattern established in LLD 14 round 2: worked immediately. Backend-sme caught their own stub.

Add to .claude/agents/backend-sme.md.

Fix 3 (practice): LLD docs must specify wiring

LLD authors (architect) must include an explicit ## Wiring section in every LLD describing:

Which production file(s) construct the new components (exact file path)
Which existing code paths call the new component (caller site)
What main.go startup changes are needed
Any seed file / config updates

"Seam left for next LLD" is acceptable ONLY when the NEXT LLD is in the same wave AND has an explicit wiring requirement in its own Wiring section.

Add to architect's LLD template.

Fix 4 (process): Integration tests must exercise production wiring

At least one test per LLD must:

Construct the system EXACTLY as main.go does (or via a helper that main.go also uses)
Invoke the public entry point
Verify behavior propagates end-to-end

"Unit test with all fakes" is not sufficient. The test must fail if wiring is missing.

Fix 5 (culture): "Shelfware" is a named bug class

When backend-sme or architect suspects a PR may have shelfware risk, they can use the term "shelfware check" and run the grep immediately. This normalizes the concern and makes the fix fast.

Not-fixes (considered and rejected)

Automated dead-code lint in CI: Go's standard unused lint is too noisy for interface-based code; distinguishing "unused by design" (test doubles) from "dead shelfware" is non-trivial. Human grep is faster for now.
Require TDD: would slow velocity without addressing root cause (the component would still pass tests in isolation).
Pair programming on every LLD: not practical with async agent workflow.

Follow-up actions

✅ Update .claude/agents/principal-architect.md with Fix 1 checklist
✅ Update .claude/agents/backend-sme.md with Fix 2 reporting requirement
Architect updates LLD template with Fix 3 (Wiring section)
Before Wave 4 kickoff, PM enforces these gates in every LLD review prompt

Incident cadence after fixes

Track: starting Wave 4, zero shelfware incidents expected. If any occur, log here and re-retro.

Appendix: commits fixing each incident

Incident 1: #422 commit c3564fc (wire PauserAdapter with real pause manager)
Incident 2: #423 commits 33bac15, 56c2254 (mount callback + admin handlers)
Incident 3: #423 commits 5af650c, cb84f95 (audit sink + singleton lock)
Incident 4: #427 commits 37172c6, cb7fee7 (wire resolver + ceiling)
Incident 5: #449 commits d0fcc69, fd0654e, 2152617, 4678319, 7fcdfed (5-commit wiring fix)
Incident 6: #452 commit e2a3681 (stub → real coordinator swap)

The pattern​

Incidents​

Cost​

Root cause analysis​

Why does this keep happening?​

What didn't fix it on its own​

Fixes​

Fix 1 (gate): Architect checklist must include production-caller grep​

Fix 2 (gate): Backend-sme must include grep verification in final report​

Fix 3 (practice): LLD docs must specify wiring​

Fix 4 (process): Integration tests must exercise production wiring​

Fix 5 (culture): "Shelfware" is a named bug class​

Not-fixes (considered and rejected)​

Follow-up actions​

Incident cadence after fixes​

Appendix: commits fixing each incident​