ADR-0010: Gosec runs informational-only until G115 baseline is triaged

Status: Proposed (awaiting founder sign-off per CLAUDE.md escalation list: security-adjacent policy change)
Date: 2026-04-20
Decision owners: DevOps Engineer (proposer), Principal Architect (review), Founder (sign-off)
Related: Issue #728 §P3, Workflow .github/workflows/security-scan.yml, ADR-0011 (future — G115 per-site remediation)

Context

The Security Scan GitHub Actions workflow had a 96% failure rate over 438 runs / 14 days (2026-04-06 → 2026-04-20). Root-cause investigation (issue #728 P3) sampled 14 recent failing runs and produced the following per-job failure rate:

Job (check-run name)	Required by branch protection	Fails in sample
`Go Vulnerability Check` (govulncheck)	Yes	0 / 12
`Container Image Scan` (trivy)	Yes	0 / 12
`Data-Class Registry Coverage` (class-coverage)	Yes	0 / 12
`Go Security Analysis` (gosec)	No	12 / 12

(The remaining 2 of 14 runs were from PR #732's path-filter feature branch and were excluded as branch-local noise.)

Gosec is the sole driver of the 96% figure. It is not in the required-checks list, so its failures never blocked a merge — the signal has been red-rotting without anyone acting on it. 54 findings were reported, uniform across all sampled runs:

51 × G115 (CWE-190, integer overflow int → int32) — all in gRPC response marshaling where the source int is already bounded by page size, CSV row count, or scope registry size. Example sites: internal/bulkimport/service.go (8 sites), internal/audit/grpcserver/server.go (3 sites), internal/compliance/grpcserver.go (4 sites), plus 36 more in proto response builders. One existing // #nosec G115 -- bounded by page_size and wall-clock annotation at internal/audit/grpcserver/server.go:200 confirms the team has already reviewed this pattern once and accepted it.
2 × G118 (CWE-400, goroutine uses context.Background) at internal/gateway/jti.go:183 and cmd/compliance-engine/main.go:355. Both sites have inline comments documenting intentional context-detachment (avoid caller-timeout aborting a long-running refresh / lease-release).
1 × G703 (CWE-22, path traversal via taint) at cmd/reconcile-report/main.go:145. Internal ops CLI tool; outputDir is an operator-supplied flag and reportDate.Format("2006-01-02") is a controlled date string. Not a network-exposed surface.

None of the 54 findings is an actionable remote-exploit vector. They are a mix of proto-marshaling false positives (G115), documented intentional patterns (G118), and one low-risk internal-CLI site (G703).

Decision

Set continue-on-error: true on the gosec job in .github/workflows/security-scan.yml, mirroring the existing govulncheck pattern (in the same workflow, adopted for the analogous stdlib-CVE-noise problem).

This keeps gosec in the pipeline: the check-run continues to be posted on every PR, the logs continue to be captured, and new findings introduced in future commits remain visible to reviewers. It merely stops gosec from failing the overall workflow until the G115 baseline is addressed.

We do not remove gosec, do not add it to branch protection, and do not silence any specific rule via allowlist. Those are separate future decisions to be made after the G115 remediation ADR is written (ADR-0011, follow-up).

Consequences

Positive:

Security Scan's measured success rate jumps from ~4% to ≥80% once the change lands. AC-P1-c ("post-fix success rate > 80% over 7-day window") becomes measurable with confidence — the 3 non-gosec jobs have a 100% success rate in the sample, so the workflow pass-rate will track the changes job's pass-rate (near-100%) once gosec stops being a gate.
Eliminates false-red noise in PR check-run UIs. Reviewers stop habituating to a red "Security Scan" badge on every PR.
Zero loss of signal — the gosec log is still attached to every run. A reviewer or scheduled scan can still surface new findings.

Negative:

Any newly introduced gosec finding (including potentially a real vulnerability) is now non-blocking. Mitigations:
- The ADR-0011 follow-up will categorize every existing finding and either fix it in code or annotate it with a justified // #nosec <rule> -- <reason>. After that, re-enabling blocking mode becomes viable.
- If a critical vulnerability surfaces in the interim, the code-review process (required architect approval on every PR) remains the primary line of defence.

Neutral:

The 54 findings documented above constitute a baseline. ADR-0011 should treat any finding not on the documented baseline as a new issue that requires remediation. A simple mechanism: stored SARIF from the current baseline, compared run-over-run via gosec-baseline or equivalent (follow-up work, out of scope for this ADR).

Rollback

Revert the two-line workflow edit (continue-on-error: true + the explanatory comment block) via a revert PR. No data migration, no configuration drift — gosec returns to blocking mode immediately on merge.

Out of scope

Per-site // #nosec annotations for the 3 non-G115 findings. Deferred to ADR-0011 / a small follow-up PR once this ADR is accepted.
Platform-wide gosec rule exclusions (e.g., -exclude=G115). Considered but rejected as a first step: a config-level exclusion hides future real G115 findings along with the baseline; the continue-on-error approach keeps all findings visible.
Moving gosec into branch-protection required-checks. Cannot do until the baseline is remediated; deferred to ADR-0011.
Modifying branch protection, required checks list, or any other gate in this workflow.

Context​

Decision​

Consequences​

Rollback​

Out of scope​

Context

Decision

Consequences

Rollback

Out of scope