Agent Runtime Wave 1 MVP — QA Sign-Off Report
Author: upsquad-qa-engineer[bot]
Date: 2026-04-10
Scope: Agent Runtime Wave 1 MVP
Trust chain: PRD #93 v1.4 → HLD #110 → LLD #111 → tracking #105 → PRs #126–#140
Recommendation: REJECTED — CONDITIONAL on P0 fix
1. Executive Summary
Wave 1 delivered a large and generally high-quality set of library components for the Agent Runtime (session manager, 5-step initializer, LLM adapters for Anthropic/OpenAI/Gemini, LangGraph executor, streaming pipeline, Redis sliding-window rate limiter, OTel metrics, Go migrations 016–017, K8s manifests, WebSocket gateway bridge). The individual components are well-designed, carry strong unit tests, enforce tenant isolation at the DB and subscription layers, and reflect the LLD faithfully.
However, the runtime is not end-to-end functional. The gRPC service methods that stitch these components together — LifecycleService.{CreateSession, SendMessage, GetSession, ListSessions, TerminateSession} and RuntimeService.{ExecuteStep, InitWorkerSession, TerminateSession, Checkpoint} — are stubbed with Unimplemented. The Python worker constructs a RuntimeServiceServicer but never registers it with its gRPC server. The session.Manager, streaming.StreamHandler, and audit.Writer all exist as library code but are not instantiated anywhere in cmd/agent-orchestrator/main.go. As a result, the claimed end-to-end path in #105 —
Python Worker → gRPC → Go StreamHandler → Redis pub/sub → WS Gateway → Browser
— cannot fire in any environment. No session can be created, no message can be sent, no token can stream.
This is a P0 integration gap that blocks the Wave 1 delivery sign-off. The fix is bounded: wire the already-built components into the gRPC servers and the orchestrator entrypoint. No new business logic is required.
Additional findings include missing unit tests for the audit and routing packages, a missing llm_usage_events attribution hook (per-agent billing attribution required by cross-platform memory contract), a shared platform namespace (no per-tenant K8s namespace) that weakens AR-F64 enforcement, and a race-condition disclaimer in agent_events.go that the author deferred to a future refactor.
2. Test Execution Status
2.1 Go test suite
Status: NOT EXECUTED in this QA session.
The QA sandbox environment blocked go test, go build, and even go env invocations (Bash permission denied for any go subcommand other than go version). I was unable to execute the suite. The static inspection below substitutes for runtime verification and a follow-up CI run is required before sign-off.
Inventory of existing Go tests (file-level; 91 top-level Test* functions found):
| Package | # TestFns | Notes |
|---|---|---|
internal/runtime/streaming | 19 | Backpressure, drop-oldest, sequence numbers, publish failure handling, status/completion/error fanout |
internal/runtime/metrics | 11 | OTel provider init, instruments, Prometheus endpoint |
internal/runtime/policy | 13 | Tenant & provider isolation, tier limits, sliding window expiry, fail-open, retry-after bounds |
internal/runtime/session | 16 | Snapshot hash determinism, order independence, tenant isolation on Get, manager CRUD |
internal/runtime/server | 11 | Tracing interceptor, tenant extraction + health-check bypass, metadata carrier |
internal/runtime/model | 21 | Cost router (budget/critical-role), recommender, cached registry, seed |
internal/gateway/ws | 37 | Hub multi-tenant isolation, agent events tenant isolation, subscribe/unsubscribe idempotency, rate limiter, concurrent subscribers |
cmd/agent-orchestrator | 4 | Config, readyz |
Packages with ZERO tests (coverage gaps):
internal/runtime/audit—writer.goandstore.goare completely untested. Audit is a Core Tenet requirement ("every action auditable") and a direct PRD NFR ("100% of actions in immutable audit log").internal/runtime/routing—router.go(in-memory least-connections worker pool) is untested.internal/runtime/server/grpc.go,runtime_server.go,lifecycle_server.go,model_registry_server.go— no direct test coverage (interceptors are covered separately).cmd/agent-orchestrator/main.go— no startup integration test; given the wiring gap this is how the P0 escaped review.
Reported coverage per the 100% tenet: cannot verify without running go test -cover. Given the above zero-test packages, the suite cannot possibly be at 100%.
2.2 Python (agent-worker) test suite
Status: NOT EXECUTED in this QA session. uv is not installed in the sandbox; python3 -m pytest was blocked by the same permission denial.
Inventory of existing Python tests (from static scan):
| File | # test fns | Notes |
|---|---|---|
tests/test_imports.py | 12 | Module import smoke tests |
tests/test_graph.py | 17 | State init, should_continue branches, loop limit, tool denial, guardrail violation, single/multi-turn |
tests/test_emitter.py | 8 | All event type serialization |
tests/test_streaming.py | 12 | Emitter/collector, queue-full drop, cancellation, elapsed_ms |
tests/test_executor.py | 8 | Initialize, execute_step, checkpoint, terminate, unknown session paths |
tests/test_llm.py | 27 | Circuit breaker FSM (closed→open→half-open), fallback chain, retry/rate-limit, Anthropic adapter, cost |
tests/test_openai_adapter.py | 17 | Streaming, tool calls, retry, cost, message conversion |
tests/test_gemini_adapter.py | 24 | Streaming, tool calls, retry, cost, error classification |
tests/conftest.py | fixtures | test_snapshot fixture etc. |
The Python suite is broad and well-structured. No tests were found for server.py (RuntimeServiceServicer) or main.py (gRPC server bootstrap) — which is why the "servicer is never registered with the grpc.aio.server" defect slipped through.
2.3 E2E integration test
NOT WRITTEN / NOT POSSIBLE. The integration target does not exist: the gRPC handlers are stubs (see §3.1), so writing an integration test today would fail at the first CreateSession call. I did not add failing E2E tests to the repo; instead this report documents the required tests under §6.
3. Critical Findings
3.1 [P0] Orchestrator gRPC server methods are all stubbed — no end-to-end path exists
Files: internal/runtime/server/lifecycle_server.go, internal/runtime/server/runtime_server.go, cmd/agent-orchestrator/main.go, services/agent-worker/src/agent_worker/main.py
Evidence:
lifecycle_server.golines 28–55: every method delegates toUnimplementedLifecycleServiceServer.SendMessage,CreateSession,GetSession,ListSessions,TerminateSessionall returncodes.Unimplemented.runtime_server.golines 30–50: every method returnsUnimplemented.ExecuteStep,InitWorkerSession,TerminateSession,Checkpoint.grpc.golines 82–87:lifecycleSrv := &lifecycleServer{}— the struct has nosessionMgr, nostreamHandler, noworkerfields, so the concretesession.Manager(which is fully implemented) is never invoked by the gRPC layer.cmd/agent-orchestrator/main.golines 72–107: the process starts apgxpool, a Redis client, and an OTel provider — then passes only the pool and Redis intoruntimeserver.New.NewManager,NewStreamHandler,NewRedisPublisher,NewRateLimiter,NewInMemoryRouter, andaudit.NewWriterare never called. Theinstrumentsvariable is literally_ = instruments. No audit writer is started.services/agent-worker/src/agent_worker/main.pyline 64–73: the comment is explicit — "When proto stubs are generated, this will use:runtime_pb2_grpc.add_RuntimeServiceServicer_to_server(servicer, server). For now, we create the servicer to validate it initializes correctly." The servicer object is constructed but never added to thegrpc.aio.server(). The worker pod therefore accepts TCP connections on :50052 but answers no RPCs.services/agent-worker/src/agent_worker/proto/is empty (__init__.pyonly) — Python proto stubs were never generated, so the comment's TODO is blocked by a missing codegen step.
Impact:
- No session can be created via
CreateSession— the portal cannot open a chat. - No message can be sent via
SendMessage— streaming pipeline is never exercised. - The worker cannot receive
InitWorkerSession— executor is never initialized in production. - The entire value chain advertised on #105 is non-functional; all underlying library code is orphaned.
- Every AR-F requirement that depends on the gRPC wire path (F01, F03, F07, F13–F21, F31–F34, F47–F50) is behaviourally uncovered regardless of component-level test pass rates.
Required fix (bounded):
- Extend
lifecycleServerto holdsession.Manager+streaming.StreamHandler+routing.Routerand implement all five methods against them. Thesession.Manager.Createpath is already complete. - Extend
runtimeServerto hold the streaming handler and a worker-stream forwarder that callsStreamHandler.HandleEventon each event received from an upstream workerExecuteStepstream, then publishes viaRedisPublisher. - Wire the full dependency graph in
cmd/agent-orchestrator/main.go: buildManager,StreamHandler,RedisPublisher,RateLimiter,Router,audit.Writer, start the writer's background flusher, pass them intoruntimeserver.New(Config{...}), and delete the_ = instrumentsplaceholder. - Generate Python proto stubs (
runtime_pb2,runtime_pb2_grpc) and updateservices/agent-worker/src/agent_worker/main.pyto callruntime_pb2_grpc.add_RuntimeServiceServicer_to_server(servicer, server)and makeRuntimeServiceServicerinherit from the generated base class. - Add a startup integration test in
cmd/agent-orchestratorthat builds a server with in-memory fakes and exercises oneCreateSession→SendMessage→TerminateSessionround-trip end-to-end.
Estimated effort: 1–2 days (1 backend engineer). No architectural change required.
3.2 [P1] llm_usage_events per-agent attribution hook is not implemented
Files: services/agent-worker/src/agent_worker/llm/*_adapter.py
Evidence: LLM adapters calculate per-call USD cost and expose it on the LLMEvent.usage and cost fields (verified in anthropic.py:302, openai_adapter.py:222, gemini_adapter.py:193), but no code path writes the cost into a metering/usage events table. The LLD #111 Section 5.3 and the cross-platform memory contract ("LLM Sourcing Modes (A/B/C/D) — mandatory llm_usage_events per-agent attribution hook") require this hook for AR-F19 and AR-F55.
Impact:
- AR-F19 ("per-call cost tracking → metering table") is PARTIAL — costs are computed but not persisted.
- AR-F55 ("billable metrics: LLM tokens per model") is measurement-only via OTel, not stored as the durable event stream billing will consume.
- AR-F73–F76 (per-agent token budget, alerts, hard-limit, dashboard) are gated on this table being populated and cannot deliver in Wave 2 without it.
Required fix: add llm_usage_events migration, write events from the LangGraph executor after each LLM call (or from a Go-side consumer of the streaming metrics event type), and add unit tests to both sides. Can be bundled with the §3.1 wiring work.
3.3 [P1] No unit tests for internal/runtime/audit package
Files: internal/runtime/audit/writer.go, internal/runtime/audit/store.go
Evidence: grep '^func Test' internal/runtime/audit/ returns zero results. The audit writer is a batching async buffer that governs the "100% of actions in immutable audit log" NFR. It contains non-trivial logic (batch threshold triggering, ticker flush, final flush on ctx.Done, error handling with a TODO for dead-letter) that must be test-covered before it is trusted in production.
Required fix: add writer_test.go covering:
- Buffered entries below batch size are not flushed until the ticker fires
- Reaching batch size triggers an immediate flush without waiting for the ticker
Stop()drains remaining entries on context cancel- Store error path logs and does not block the writer (dead-letter path TODO tracked)
- Concurrent
Write()calls are serialized correctly (race test)
3.4 [P1] internal/runtime/routing package has no tests
Files: internal/runtime/routing/router.go
Evidence: InMemoryRouter.SelectWorker (least-connections), RegisterWorker, DeregisterWorker, UpdateHealth — all untested. The router is the component that would decide which worker pod receives a new session; an off-by-one or a bad unhealthy-filter would silently route to a dead worker in production.
Required fix: add router_test.go covering no-healthy-workers error, least-connections tie-breaks, deregister-while-selecting race, UpdateHealth on unknown worker is a no-op.
3.5 [P2] Single shared platform K8s namespace — no per-tenant isolation
Files: deployments/agent-runtime/base/*.yaml
Evidence: worker-deployment.yaml line 5, orchestrator-deployment.yaml, network-policy.yaml all pin resources to namespace: platform. AR-F64 requires "CPU/memory enforced via K8s quotas per tenant namespace". The current manifests run a shared worker pool across all tenants, giving tenant A the ability to exhaust tenant B's CPU/memory via noisy-neighbour effects.
Impact: Tenant isolation is enforced at the DB (RLS) and Redis (key-prefix) layers — excellent — but NOT at the compute layer for the Wave 1 deployment. This is a known simplification for MVP scale (the PRD itself says AR-F64 "per tenant namespace" but the Wave 1 task scope does not call out namespace provisioning). I am raising it as P2 because it must be addressed before the MVP goes to multi-tenant production (even dev staging), not because it blocks Wave 1 sign-off.
Required fix (Wave 2): define a Pulumi module that materializes a namespace per tenant with worker Deployment, NetworkPolicy, ResourceQuota, and LimitRange per namespace. Update the Orchestrator routing layer to be namespace-aware.