LLD 21 — SBOM CI + SIEM Export Worker (Wave 4B)
| Field | Value |
|---|---|
| Parent HLD | #456 agent-runtime-wave-4b-enterprise-compliance.md |
| PRD | #380 (P4.8.7) |
| Issue | #464 |
| Milestone | 9 |
| Wave | 4B |
| Size | L (~2000 LoC) |
| Depends on | LLD 18 (#469), LLD 19 (#477) |
| Parallel with | LLD 20 retention (#463) |
Founder decisions (binding, from #380):
- Q3: SIEM format = OCSF JSONL only.
format="cef"is rejected at intake. - Q4: Wave 4B ships SBOMs unsigned; cosign signing is a 30-day follow-up.
- Q5: Compliance engine runs single-replica. SIEM worker shares the advisory-lock gate.
This LLD was authored in parallel with implementation (#464) to unblock the Wave 4B close-out. The shipped code is the authoritative spec; this document is a human-readable companion.
1. Scope
Two procurement gates shipped together because they share the compliance gRPC service surface:
- SBOM generation in CI — Syft-based CycloneDX + SPDX for every release binary + container image. Attached to GitHub releases.
make sbomtarget for local dev. - SIEM Export Worker — a per-shard daemon that reads
agent_audit_logrows, translates them into OCSF v1.2 Application Activity events (class_uid=6005), and dispatches to each tenant's configured webhook with HMAC-SHA256 signing + idempotency keys.
2. SBOM (Part A)
2.1 Tooling
- Syft v1.x (anchore/syft) — generates CycloneDX + SPDX from Go binaries and OCI images.
- Installed from
https://raw.githubusercontent.com/anchore/syft/main/install.shinto./bin/when absent.
2.2 Script
Location: scripts/generate-sbom.sh. Env overrides:
SBOM_OUT_DIR— default./dist/sbomSBOM_IMAGE_REF— optional container image to scanSBOM_SKIP_BINARIES— comma-separated binary names to skip
2.3 CI Workflow
Location: .github/workflows/sbom.yml. Triggers:
release.types = [published]— attach SBOMs to the release.pushtomainoncmd/**,go.*,Dockerfile, script, workflow.pull_requeston the same paths (catches bit-rot).workflow_dispatchfor ad-hoc runs.
Artefacts are uploaded via actions/upload-artifact@v4 with 90-day retention, and attached to the GitHub release via gh release upload.
2.4 Signing (30-day follow-up)
Cosign signing is explicitly deferred to Q2-R1 follow-up. The workflow already reserves id-token: write so Sigstore keyless signing can be added without a permission change.
3. SIEM Export Worker (Part B)
3.1 Data model (migration 053)
Three tables:
siem_endpoints— per-tenant webhook URL, HMACkey_id, vault path, circuit breaker state. RLS on.siem_export_cursor— per-tenant(last_exported_event_id, last_exported_created_at, lag_seconds). RLS on.siem_export_events— per-event dedup ledger withUNIQUE(org_id, audit_event_id). Append-only (REVOKE UPDATE, DELETE). RLS on. Preserved on DeleteSiemEndpoint (audit proof — #464 acceptance criterion).
Feature flag: compliance.siem_export_enabled (default false).
3.2 Wire format
POST <webhook_url>
Content-Type: application/x-ndjson
X-Upsquad-Schema: ocsf/1.2.0
X-Upsquad-Tenant-Id: <org_uuid>
X-Upsquad-Idempotency-Key: <audit_event_uuid>
X-Upsquad-Timestamp: <unix_seconds>
X-Upsquad-Key-Id: <key_id>
X-Upsquad-Signature: hex(hmac_sha256(body || "\n" || ts || "\n" || tenant_id, secret))
<OCSF JSON> \n
Idempotency key is the audit event UUID — globally unique and side-effect-free.
3.3 OCSF v1.2 mapping
class_uid = 6005(Application Activity),category_uid = 6000.activity_idmapped fromaction_typeviaocsf.ActivityIDMap.severity_idmapped viaocsf.SeverityMap— 1=Informational for run-of-the-mill, 3–4 for security-relevant.type_uid = class_uid*100 + activity_id.- Sensitive keys (
input,output,prompt,completion,message,content,raw_body,email_body,attachment_bytes) are dropped during transform.
3.4 Reliability
- Delivery: at-least-once. The cursor advances only after each row is recorded in
siem_export_events. Duplicate retries are de-duped byUNIQUE(org_id, audit_event_id). Customer-side dedup usesX-Upsquad-Idempotency-Key. - Retry: exponential backoff (200ms, 800ms, 3.2s, …) capped at 10s; max 3 attempts; then DLQ.
- Circuit breaker: 5 consecutive 5xx → OPEN for 1h. Isolated per tenant.
- Back-pressure (drop-oldest): a slow customer never blocks fresh events for others — breaker-OPEN events park to DLQ immediately.
3.5 Security
- HMAC secret: per-tenant, stored in pgcrypto vault under
provider_keys/<org>/_default/siem_hmac:<key_id>. Returned once atConfigureSiemEndpoint; omitted fromGetSiemEndpoint. - Rotation: 90-day cadence (reuses Wave 2 LLD 7 pattern).
- Sensitive payloads: never exported (only hashes propagate).
3.6 Shard leasing (founder Q5)
- Default
SIEM_SHARD_COUNT=1,SIEM_SHARD_ID=0. - Each shard holds a PG advisory lock keyed on
fnv64("upsquad.siem_export_worker:<shard_id>"). - The
Recreatedeployment strategy prevents two replicas racing the lock during a rolling update.
3.7 Filter classes
Three tenant-settable filter classes:
all— every auditable action.audit(default) — governance / approval / delegation events.high_severity— denied / blocked / crashed / budget-exhausted events only.
3.8 Metrics
siem_export_total{tenant, outcome}siem_export_duration_seconds{tenant}siem_dlq_depth{tenant}siem_export_lag_seconds{tenant}siem_circuit_open_total{tenant}
4. Wiring
New code lives under:
internal/compliance/siem/— worker + client + ocsf + circuit_breaker + shard + metrics + secrets + transform.internal/compliance/store/siem.go— persistence layer.internal/compliance/service_siem.go— domain service.cmd/siem-export-worker/— standalone binary.deployments/siem-export-worker/base/— k8s manifests (single-replica).
Extended (clearly delineated blocks):
Removed in #924. The embedded SIEM worker incmd/compliance-engine/main.go—startSIEMWorker(...)gated bySIEM_EXPORT_DISABLED.cmd/compliance-engineraced the standalone container for thesiem/shardadvisory lock; the standalonecmd/siem-export-workeris now the sole SIEM owner in every topology (ICS, dev, prod). TheSIEM_EXPORT_DISABLEDkill-switch is honoured exclusively by the standalone binary.internal/compliance/grpcserver.go— three new RPC handlers (ConfigureSiemEndpoint,GetSiemEndpoint,DeleteSiemEndpoint) wired viaWithSIEMService(...).proto/upsquad/compliance/v1/compliance.proto— three new RPCs + four new messages +SiemFilterClassenum.
CI:
.github/workflows/sbom.ymlscripts/generate-sbom.shMakefilesbom:target.
5. Production callers (shelfware gate)
| Export | Caller |
|---|---|
siem.NewWorker | cmd/siem-export-worker/main.go |
siem.NewHTTPClient | cmd/siem-export-worker/main.go |
siem.AcquireShardLease | cmd/siem-export-worker/main.go |
siem.NewSecretStore | cmd/siem-export-worker/main.go, cmd/agent-orchestrator/main.go (RPC side) |
siem.NewBreaker | cmd/siem-export-worker/main.go |
siem.TransformAuditRow | internal/compliance/siem/worker.go (hot path) |
store.NewSIEMStore | cmd/siem-export-worker/main.go |
compliance.NewSIEMService | cmd/agent-orchestrator/main.go (via WithSIEMService) |
ConfigureSiemEndpoint RPC | internal/compliance/grpcserver.go |
GetSiemEndpoint RPC | internal/compliance/grpcserver.go |
DeleteSiemEndpoint RPC | internal/compliance/grpcserver.go |
6. Tests (21 SIEM-package tests under -race)
Unit:
TestTransform_*— OCSF mapping, nil-row, JSON roundtrip, sensitive-payload drop, unknown-action degrade, type_uid composition.TestDispatch_SignsHMAC— HMAC canonical form matches receiver-side verification.TestDispatch_TransientOn5xx/TestDispatch_PermanentOn4xx/TestDispatch_RejectsBadInput.TestBreaker_*— OPEN after threshold, HALF_OPEN cooldown, success closes, failure re-opens.TestShard_OwnsTenant*— single-replica + multi-replica selection.
Integration (in-process):
TestWorker_EndToEndDelivery— audit row → transform → HMAC → dispatch → recorded delivery → cursor advance.TestWorker_IdempotentRetry— duplicate(org_id, audit_event_id)dedup.TestWorker_CircuitBreakerOpens— consecutive 5xx trip → DLQ fast path.TestWorker_OCSFSchemaValid— POSTed body is valid OCSF JSON with required keys.
7. Follow-ups
- Cosign SBOM signing (30-day follow-up per founder Q4).
- S3 archive of SBOMs under
s3://upsquad-sbom-archive/<sha>/— deferred to ops-managed Terraform. - Wiring tests in
test/integration/wave4b/— real Postgres harness (currently stubbed by in-memory worker tests). - Multi-shard rollout —
SIEM_SHARD_COUNT>1is functional but unused until Wave 5 scaling.