Skip to main content

LLD 21 — SBOM CI + SIEM Export Worker (Wave 4B)

FieldValue
Parent HLD#456 agent-runtime-wave-4b-enterprise-compliance.md
PRD#380 (P4.8.7)
Issue#464
Milestone9
Wave4B
SizeL (~2000 LoC)
Depends onLLD 18 (#469), LLD 19 (#477)
Parallel withLLD 20 retention (#463)

Founder decisions (binding, from #380):

  • Q3: SIEM format = OCSF JSONL only. format="cef" is rejected at intake.
  • Q4: Wave 4B ships SBOMs unsigned; cosign signing is a 30-day follow-up.
  • Q5: Compliance engine runs single-replica. SIEM worker shares the advisory-lock gate.

This LLD was authored in parallel with implementation (#464) to unblock the Wave 4B close-out. The shipped code is the authoritative spec; this document is a human-readable companion.


1. Scope

Two procurement gates shipped together because they share the compliance gRPC service surface:

  1. SBOM generation in CI — Syft-based CycloneDX + SPDX for every release binary + container image. Attached to GitHub releases. make sbom target for local dev.
  2. SIEM Export Worker — a per-shard daemon that reads agent_audit_log rows, translates them into OCSF v1.2 Application Activity events (class_uid=6005), and dispatches to each tenant's configured webhook with HMAC-SHA256 signing + idempotency keys.

2. SBOM (Part A)

2.1 Tooling

  • Syft v1.x (anchore/syft) — generates CycloneDX + SPDX from Go binaries and OCI images.
  • Installed from https://raw.githubusercontent.com/anchore/syft/main/install.sh into ./bin/ when absent.

2.2 Script

Location: scripts/generate-sbom.sh. Env overrides:

  • SBOM_OUT_DIR — default ./dist/sbom
  • SBOM_IMAGE_REF — optional container image to scan
  • SBOM_SKIP_BINARIES — comma-separated binary names to skip

2.3 CI Workflow

Location: .github/workflows/sbom.yml. Triggers:

  • release.types = [published] — attach SBOMs to the release.
  • push to main on cmd/**, go.*, Dockerfile, script, workflow.
  • pull_request on the same paths (catches bit-rot).
  • workflow_dispatch for ad-hoc runs.

Artefacts are uploaded via actions/upload-artifact@v4 with 90-day retention, and attached to the GitHub release via gh release upload.

2.4 Signing (30-day follow-up)

Cosign signing is explicitly deferred to Q2-R1 follow-up. The workflow already reserves id-token: write so Sigstore keyless signing can be added without a permission change.


3. SIEM Export Worker (Part B)

3.1 Data model (migration 053)

Three tables:

  • siem_endpoints — per-tenant webhook URL, HMAC key_id, vault path, circuit breaker state. RLS on.
  • siem_export_cursor — per-tenant (last_exported_event_id, last_exported_created_at, lag_seconds). RLS on.
  • siem_export_events — per-event dedup ledger with UNIQUE(org_id, audit_event_id). Append-only (REVOKE UPDATE, DELETE). RLS on. Preserved on DeleteSiemEndpoint (audit proof — #464 acceptance criterion).

Feature flag: compliance.siem_export_enabled (default false).

3.2 Wire format

POST <webhook_url>
Content-Type: application/x-ndjson
X-Upsquad-Schema: ocsf/1.2.0
X-Upsquad-Tenant-Id: <org_uuid>
X-Upsquad-Idempotency-Key: <audit_event_uuid>
X-Upsquad-Timestamp: <unix_seconds>
X-Upsquad-Key-Id: <key_id>
X-Upsquad-Signature: hex(hmac_sha256(body || "\n" || ts || "\n" || tenant_id, secret))

<OCSF JSON> \n

Idempotency key is the audit event UUID — globally unique and side-effect-free.

3.3 OCSF v1.2 mapping

  • class_uid = 6005 (Application Activity), category_uid = 6000.
  • activity_id mapped from action_type via ocsf.ActivityIDMap.
  • severity_id mapped via ocsf.SeverityMap — 1=Informational for run-of-the-mill, 3–4 for security-relevant.
  • type_uid = class_uid*100 + activity_id.
  • Sensitive keys (input, output, prompt, completion, message, content, raw_body, email_body, attachment_bytes) are dropped during transform.

3.4 Reliability

  • Delivery: at-least-once. The cursor advances only after each row is recorded in siem_export_events. Duplicate retries are de-duped by UNIQUE(org_id, audit_event_id). Customer-side dedup uses X-Upsquad-Idempotency-Key.
  • Retry: exponential backoff (200ms, 800ms, 3.2s, …) capped at 10s; max 3 attempts; then DLQ.
  • Circuit breaker: 5 consecutive 5xx → OPEN for 1h. Isolated per tenant.
  • Back-pressure (drop-oldest): a slow customer never blocks fresh events for others — breaker-OPEN events park to DLQ immediately.

3.5 Security

  • HMAC secret: per-tenant, stored in pgcrypto vault under provider_keys/<org>/_default/siem_hmac:<key_id>. Returned once at ConfigureSiemEndpoint; omitted from GetSiemEndpoint.
  • Rotation: 90-day cadence (reuses Wave 2 LLD 7 pattern).
  • Sensitive payloads: never exported (only hashes propagate).

3.6 Shard leasing (founder Q5)

  • Default SIEM_SHARD_COUNT=1, SIEM_SHARD_ID=0.
  • Each shard holds a PG advisory lock keyed on fnv64("upsquad.siem_export_worker:<shard_id>").
  • The Recreate deployment strategy prevents two replicas racing the lock during a rolling update.

3.7 Filter classes

Three tenant-settable filter classes:

  • all — every auditable action.
  • audit (default) — governance / approval / delegation events.
  • high_severity — denied / blocked / crashed / budget-exhausted events only.

3.8 Metrics

  • siem_export_total{tenant, outcome}
  • siem_export_duration_seconds{tenant}
  • siem_dlq_depth{tenant}
  • siem_export_lag_seconds{tenant}
  • siem_circuit_open_total{tenant}

4. Wiring

New code lives under:

  • internal/compliance/siem/ — worker + client + ocsf + circuit_breaker + shard + metrics + secrets + transform.
  • internal/compliance/store/siem.go — persistence layer.
  • internal/compliance/service_siem.go — domain service.
  • cmd/siem-export-worker/ — standalone binary.
  • deployments/siem-export-worker/base/ — k8s manifests (single-replica).

Extended (clearly delineated blocks):

  • cmd/compliance-engine/main.gostartSIEMWorker(...) gated by SIEM_EXPORT_DISABLED. Removed in #924. The embedded SIEM worker in cmd/compliance-engine raced the standalone container for the siem/shard advisory lock; the standalone cmd/siem-export-worker is now the sole SIEM owner in every topology (ICS, dev, prod). The SIEM_EXPORT_DISABLED kill-switch is honoured exclusively by the standalone binary.
  • internal/compliance/grpcserver.go — three new RPC handlers (ConfigureSiemEndpoint, GetSiemEndpoint, DeleteSiemEndpoint) wired via WithSIEMService(...).
  • proto/upsquad/compliance/v1/compliance.proto — three new RPCs + four new messages + SiemFilterClass enum.

CI:

  • .github/workflows/sbom.yml
  • scripts/generate-sbom.sh
  • Makefile sbom: target.

5. Production callers (shelfware gate)

ExportCaller
siem.NewWorkercmd/siem-export-worker/main.go
siem.NewHTTPClientcmd/siem-export-worker/main.go
siem.AcquireShardLeasecmd/siem-export-worker/main.go
siem.NewSecretStorecmd/siem-export-worker/main.go, cmd/agent-orchestrator/main.go (RPC side)
siem.NewBreakercmd/siem-export-worker/main.go
siem.TransformAuditRowinternal/compliance/siem/worker.go (hot path)
store.NewSIEMStorecmd/siem-export-worker/main.go
compliance.NewSIEMServicecmd/agent-orchestrator/main.go (via WithSIEMService)
ConfigureSiemEndpoint RPCinternal/compliance/grpcserver.go
GetSiemEndpoint RPCinternal/compliance/grpcserver.go
DeleteSiemEndpoint RPCinternal/compliance/grpcserver.go

6. Tests (21 SIEM-package tests under -race)

Unit:

  • TestTransform_* — OCSF mapping, nil-row, JSON roundtrip, sensitive-payload drop, unknown-action degrade, type_uid composition.
  • TestDispatch_SignsHMAC — HMAC canonical form matches receiver-side verification.
  • TestDispatch_TransientOn5xx / TestDispatch_PermanentOn4xx / TestDispatch_RejectsBadInput.
  • TestBreaker_* — OPEN after threshold, HALF_OPEN cooldown, success closes, failure re-opens.
  • TestShard_OwnsTenant* — single-replica + multi-replica selection.

Integration (in-process):

  • TestWorker_EndToEndDelivery — audit row → transform → HMAC → dispatch → recorded delivery → cursor advance.
  • TestWorker_IdempotentRetry — duplicate (org_id, audit_event_id) dedup.
  • TestWorker_CircuitBreakerOpens — consecutive 5xx trip → DLQ fast path.
  • TestWorker_OCSFSchemaValid — POSTed body is valid OCSF JSON with required keys.

7. Follow-ups

  1. Cosign SBOM signing (30-day follow-up per founder Q4).
  2. S3 archive of SBOMs under s3://upsquad-sbom-archive/<sha>/ — deferred to ops-managed Terraform.
  3. Wiring tests in test/integration/wave4b/ — real Postgres harness (currently stubbed by in-memory worker tests).
  4. Multi-shard rolloutSIEM_SHARD_COUNT>1 is functional but unused until Wave 5 scaling.