LLD: Wave 2 LLD 8 — Approval Delegation + Clearance Enforcement
| Field | Value |
|---|---|
| Parent HLD | docs/hld/agent-runtime-wave-2-approval-pause-resume.md §9 |
| PRD | #380 (P4.3.9 Delegation) |
| Tracker | #381 |
| Issue | #419 |
| PR | #426 |
| Milestone | Wave 2 |
| Author | backend-sme (as-built), principal-architect (original spec) |
| Date | 2026-04-13 (backfill) |
1. Scope
Layer a delegation primitive onto the LLD 6 approval state machine so an approver holding a pending governance_approvals row can hand decision authority to another operator, subject to a clearance check, a chain-depth cap, a TTL, and a cycle-detector. The engine exposes
Delegate(approval_id, from_member_id, to_member_id, reason, expires_at)- chain retrieval through
GetApproval.delegation_chain - authoritative-approver resolution consumed by
RecordDecision
on top of the existing governance_approvals table via a new child table approval_delegations. Every transition writes one approval_events row (event_type='delegated') so the full chain is reconstructible from the hash-chained audit trail in isolation from approval_delegations.
Non-goals: policy-resolver-driven "original approver" identity (deferred to LLD 9), SCIM-sourced member clearance, chain UI for ops consoles, background TTL sweep, escalation on delegation timeout (separate from approval timeout).
2. Schema / migration SQL
Migration 042 — see internal/context/store/migrations/042_approval_delegations.up.sql for canonical text.
CREATE TABLE approval_delegations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
org_id TEXT NOT NULL,
approval_id UUID NOT NULL REFERENCES governance_approvals(id) ON DELETE CASCADE,
chain_position INT NOT NULL CHECK (chain_position >= 1),
from_member_id TEXT NOT NULL,
to_member_id TEXT NOT NULL,
to_clearance INT NOT NULL,
reason TEXT,
expires_at TIMESTAMPTZ NOT NULL,
revoked_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE (approval_id, chain_position)
);
ALTER TABLE approval_delegations ENABLE ROW LEVEL SECURITY;
ALTER TABLE approval_delegations FORCE ROW LEVEL SECURITY;
CREATE POLICY scope_isolation ON approval_delegations
USING (org_id = current_setting('app.org_id', true));
CREATE INDEX ix_approval_delegations_chain
ON approval_delegations(approval_id, chain_position);
CREATE INDEX ix_approval_delegations_to_active
ON approval_delegations(org_id, to_member_id)
WHERE revoked_at IS NULL;
Reversible. Down migration drops indexes, policy, and the table. ON DELETE CASCADE on approval_id keeps state consistent if an approval row is ever hard-deleted (should not happen in production but supports dev fixtures).
No CHECK constraint on chain depth. The cap is policy-tunable; enforcing it in application code lets future waves raise the limit without a migration.
3. Go interfaces
// internal/runtime/approval/delegation.go
type DelegationManager struct {
store Store // pg persistence
clearance MemberClearanceLookup // LLD 8 seam, PG users table (issue #428)
dispatcher Dispatcher // optional; fan-out to delegatee channels
now func() time.Time // injectable
maxDepth int // MaxChainDepth = 3
defaultTTL time.Duration // DefaultDelegationTTL = 24h
}
// Delegate inserts a new chain hop. Validation order:
// 1. params validation (non-empty IDs; from != to)
// 2. approval loaded; must be status=pending
// 3. existing chain loaded
// 4. depth check: len(activeChain) + 1 <= MaxChainDepth
// 5. cycle check: to_member must not appear earlier (as from OR to)
// 6. current-approver check: subsequent hops must match currentTail
// (OR the original approver when all prior hops have lapsed —
// see §4 "All-lapsed fallback")
// 7. clearance check: clearance(to) >= approval.required_clearance
// 8. TTL clamp: expires_at = min(req or now+default, approval.expires_at)
// 9. Insert + append 'delegated' approval_event + fan-out
func (m *DelegationManager) Delegate(ctx context.Context, p DelegateParams) (DelegateOutcome, error)
// CurrentApprover reports the active tail ONLY. ok=false when chain
// is empty OR all hops have lapsed. Callers enforcing the all-lapsed
// fallback MUST use AuthoritativeApprover instead.
func (m *DelegationManager) CurrentApprover(ctx context.Context, orgID, approvalID string) (memberID string, ok bool, err error)
// AuthoritativeApprover is the function RecordDecision consults.
// no chain → ("", false) // no restriction
// active tail → (<tail>, true) // only tail may record
// all lapsed → (<orig>, true) // fallback to chain[0].FromMemberID
func (m *DelegationManager) AuthoritativeApprover(ctx context.Context, orgID, approvalID string) (memberID string, ok bool, err error)
// MemberClearanceLookup is the seam through which the PG users table
// (issue #428) supplies clearance integers. Production impl lives in
// internal/runtime/approval/clearance (PGLookup with 60s/5s TTL cache).
// ok=false => member unknown OR account disabled; Delegate treats
// both as ErrInsufficientClearance (fail closed).
type MemberClearanceLookup interface {
LookupClearance(ctx context.Context, orgID, memberID string) (clearance int32, ok bool)
}
Constants: MaxChainDepth = 3, DefaultDelegationTTL = 24h.
Sentinel errors: ErrInsufficientClearance, ErrChainDepthExceeded, ErrCycleDetected, ErrAlreadyResolved, ErrNotCurrentApprover, ErrSelfDelegation.
4. State transitions and authority rules
4.1 Chain evolution
Each approval starts with an empty chain. Delegate appends rows with monotonic chain_position starting at 1. A revoke marks revoked_at; an expiry is detected at read time (expires_at <= now). No row is ever mutated except for revoked_at.
4.2 Hop activity
func (l *DelegationLink) IsActive(now time.Time) bool {
return l.RevokedAt == nil && now.Before(l.ExpiresAt)
}
currentTail additionally consults MemberClearanceLookup so a disabled delegatee's account is treated as lapsed. This implements the founder-approved rule "if the delegatee's account is disabled, the delegation becomes ineffective; approval reverts to the next-up in the chain".
4.3 Authoritative approver resolution
| Chain state | CurrentApprover | AuthoritativeApprover |
|---|---|---|
| empty | ("", false) | ("", false) |
| at least one active hop | (tail.to, true) | (tail.to, true) |
| non-empty, all lapsed | ("", false) | (chain[0].from, true) |
RecordDecision MUST consult AuthoritativeApprover. The original-approver fallback (chain[0].FromMemberID) closes the security gap flagged in PR #426 architect review — a chain whose every hop had lapsed previously admitted any operator.
CurrentApprover is retained as the narrower "is there an active delegatee right now?" helper. It is used by introspection call sites (dashboard rendering) that specifically do not want the fallback baked into their answer.
4.4 Subsequent-hop guard in Delegate
Mirror rule: if the chain is non-empty and there is no active tail, a subsequent hop is allowed only when p.FromMemberID == chain[0].FromMemberID (the original approver). Any other operator attempting to extend a dead chain is rejected with ErrNotCurrentApprover.
5. Founder decision #7 — clearance rule
Delegation is allowed when
to_member.clearance >= approval.required_clearance
NOT when to_member.clearance >= from_member.clearance. The semantic is "can this operator discharge this specific approval?" so the delegator's own clearance on other decisions is not at issue. A low-clearance delegator CAN hand a high-clearance delegatee an approval the delegator could not personally have approved. This matches PRD P4.3.9 phrasing and the industry convention (least-privilege, no privilege chaining).
Locked in by TestDelegate_FounderDecision7_RuleIsVsRequired_NotVsFrom.
6. Unit + integration test plan (as-built)
All tests in internal/runtime/approval/{delegation_test.go, service_delegation_test.go, grpcserver_delegate_test.go}.
Isolation (delegation_test.go)
TestDelegate_RejectsSelf—from == torejectedTestDelegate_RejectsAlreadyResolved— non-pending approval rejectedTestDelegate_InsufficientClearance_Rejected—tobelowrequired_clearanceTestDelegate_UnknownDelegatee_FailsClosed—LookupClearanceok=falseTestDelegate_DisabledDelegatee_Rejected— disabled account rejectedTestDelegate_FounderDecision7_RuleIsVsRequired_NotVsFrom— founder rule lock-inTestDelegate_TTLClampedToApprovalDeadline— requested TTL > approval TTL clampedTestDelegate_DefaultTTL_AppliedWhenOmitted— default 24h, clampedTestDelegate_CycleDetected— A→B→A refusedTestDelegate_ChainDepthLimit— 4th hop on depth-3 chain refusedTestDelegate_SubsequentHopMustMatchTail— non-tail subsequent hop refusedTestDelegate_ChainPositionMonotonic— positions increment without gapsTestDelegate_EmitsAuditEvent—approval_events.event_type='delegated'TestDelegate_DispatchesNotificationToDelegatee— fan-out firesTestDelegate_RevokedHop_IgnoredForTailAndCycle— revoke clears tail but not cycle historyTestCurrentApprover_{EmptyChain,ActiveChain,DisabledTail,AllDisabled,ExpiredTail}— tail-only semanticsTestAuthoritativeApprover_{EmptyChain,ActiveChain,AllLapsed}— fallback semantics
Service-level (service_delegation_test.go)
TestService_Delegate_EndToEnd— Request → Delegate → delegatee approves → session resumesTestService_RecordDecision_RejectsNonTail— former holder and third party rejectedTestService_RecordDecision_NoChainDelegates_StillWorks— LLD 6 parity when no chainTestService_Delegate_NotConfigured_ReturnsError— no Clearance wiring → Delegate refusesTestService_Get_PopulatesDelegationChain—GetApproval.delegation_chainhydratedTestService_RecordDecision_AllLapsedChain_RevertsToOriginalApprover— security fix: all-lapsed → only original approver acceptedTestService_RecordDecision_PartialLapsedChain_OnlyActiveTailMayRecord— active tail wins over original-approver fallbackTestService_Delegate_AlreadyResolvedApproval—ErrAlreadyResolvedsurfaced
gRPC (grpcserver_delegate_test.go)
TestGRPC_Delegate_*— error mapping:InvalidArgument,FailedPrecondition,PermissionDenied,NotFoundTestGRPC_RecordDecision_NonTail_PermissionDenied—ErrNotCurrentApprover→PermissionDenied
7. gRPC error mapping
| Go sentinel | gRPC code | Caller semantic |
|---|---|---|
ErrInsufficientClearance | PermissionDenied | delegatee cannot hold this approval |
ErrChainDepthExceeded | FailedPrecondition | chain saturated |
ErrCycleDetected | FailedPrecondition | delegate would re-enter chain |
ErrAlreadyResolved | FailedPrecondition | approval terminal |
ErrNotCurrentApprover | PermissionDenied | caller is not the authoritative approver |
ErrSelfDelegation | InvalidArgument | from == to |
ErrNotFound | NotFound | no approval for (org, approval_id) |
8. Rollout plan
Wiring status
- Core engine: IN PLACE (PR #426)
MemberClearanceLookupimplementation: WIRED as of issue #428.internal/runtime/approval/clearance.PGLookupreads from theuserstable (migration 012) —(org_id, id, clerk_user_id, clearance, status, deleted_at)is the authoritative source.cmd/agent-orchestrator/main.goconstructs it withclearance.New(pool, clearance.Config{...})and passes it asServiceConfig.ClearancesoDelegationManageris instantiated.- Cache: 60s positive / 5s negative TTL in-process. Multi-replica deployments converge within TTL without cross-replica invalidation.
- Fail-closed: DB error OR soft-deleted row OR
status IN ('suspended', 'removed')→ok=false→ErrInsufficientClearance. A demoted operator whose JWT still carries a staleclearance=5claim cannot escalate — the DB row wins.
Feature flag
None. The lookup-presence check at service construction (Clearance != nil) is the implicit flag — a deployment that wires the lookup has LLD 8 on; one that does not has LLD 6 semantics unchanged.
Rollback
Revoke by writing revoked_at = now() on the delegation row. Supported by the store (RevokeDelegation(ctx, orgID, approvalID, chainPosition)). No gRPC surface in Wave 2 — ops-only via psql for now.
MTTR to disable delegation platform-wide: unset ServiceConfig.Clearance, restart. < 30 s.
9. Observability
Counters (added in this PR):
approval_delegations_total{tenant, result=created|revoked|expired}— chain-mutation volume by outcomeapproval_delegations_clearance_denied_total{tenant}— failed delegations due to clearance gap; a signal of RBAC drift
Emitted through the ServiceMetrics interface exactly as the existing approval counters; the orchestrator wires them through OtelServiceMetrics onto the shared runtime/metrics Instruments.
Only the created and clearance_denied buckets have live call sites in Wave 2. revoked and expired labels are defined so the counter is forward-compatible once revoke / expiry call sites land.
10. Open questions resolved
| # | Question | Resolution |
|---|---|---|
| 1 | MemberClearanceLookup source | users table (migration 012). Resolved by issue #428 — see internal/runtime/approval/clearance/ for the DB-backed implementation with 60s/5s TTL cache. The original "Clerk custom claim" sketch was never shipped; the table-backed path is the authoritative source of record. |
| 2 | Chain depth, default TTL | 3 hops, 24 h. Code constants. Change is one code edit, no migration. |
| 3 | RecordDecision on all-lapsed chain | Revert to chain[0].FromMemberID (original delegator). Fixed in PR #426 per architect review — do NOT defer to LLD 9. |
| 4 | Background TTL sweep | Not required for correctness. Every read path invokes IsActive(now). No stale-delegation usage risk. Follow-up for dashboard latency can land in Wave 3 if the reads become hot. |
| 5 | Original approver identity (vs. policy-derived) | Currently chain[0].FromMemberID. LLD 9 policy resolver can tighten this — the fallback is a floor, not a ceiling, and refining it is strictly additive. |
11. Known edge cases
from_member_idon the first hop is unchecked. The approval-creation-time policy already gated who could see the approval; we accept that as the initial grant. The value is captured verbatim in the audit row.- Revoked + expired simultaneously —
IsActivereturns false once either condition is met. Ordering does not matter. - Disabled delegator mid-chain does NOT invalidate the chain going forward. Only the delegatee's liveness matters at read time; the delegator's role was to hand authority over, which they already did.
- Concurrent
Delegaterace surfaces asErrIdempotencyCollisionfrom theUNIQUE (approval_id, chain_position)constraint — the loser retries after re-reading the chain. - Approval hard-deleted while chain exists —
ON DELETE CASCADEonapproval_idtears down delegations too. Hard-delete is not a production code path.
12. Estimated size
M–L (1.5–2 weeks): migration + chain logic + clearance lookup seam + gRPC error mapping + dispatcher fan-out to delegatee + unit and service integration tests + audit event + as-built doc + metrics. Story points ≈ 8.
Shipped in PR #426. LLD doc backfilled post-merge per architect follow-up.