ADR-0001: Hybrid Local Dev Stack (docker-compose, Flavor A)

Status: Accepted
Date: 2026-04-09
Deciders: Vaisakh, Ashik (approval on #57), Principal Architect (proposal)
Supersedes: None
Related: #57 (approval), #58 (Phase 4 re-validation gate), #59 (CloudSQL deferred, closed not planned)

1. Context

UpsQuad is mid-flight on the Context Engine (PR series #29–#36). There are currently:

No external design partners onboarded.
No ingest API alpha exposed to anyone outside the core team.
No SLO has been committed to any customer.
No managed-infra budget allocated for a permanent CloudSQL/GKE dev footprint.

At the same time, the PR #51 regression — where transaction-pooled GUCs silently dropped at the PgBouncer session boundary — made it clear that go test alone does not give us production-fidelity confidence. We must exercise the real topology: the pgx pool talking to PgBouncer in transaction pooling mode, RLS policies enforced end-to-end, the audit log side-effect path, fake-gcs for memory snapshots, and a fake OIDC issuer for scope-chain middleware.

Two options were evaluated in #57:

Flavor A — docker-compose hybrid stack (engine binary on host, dependencies containerised).
Flavor B — kind (Kubernetes in Docker) cluster.

Vaisakh explicitly rejected Flavor B: the Kubernetes layer adds operational burden without adding fidelity for the specific regression class we are trying to prevent. The engine's scaling, networking, and Pod lifecycle are not on the critical path at this stage.

Separately, #59 (permanent CloudSQL dev instance) was closed not planned on 2026-04-09. The reasoning from the #59 architect review stands: the compose stack's topology — engine → pgbouncer:6432 (transaction mode) → postgres:5432 — is architecturally identical to what CloudSQL would present (engine → CloudSQL PgBouncer → CloudSQL Postgres). The PR #51 regression class is fully exercised by the compose stack. Until a tripwire fires (see Section 8), the compose PG is the sole authoritative dev database.

This ADR records the Flavor A decision, the hard conditions attached to it, and the downstream work it unblocks.

2. Decision

We adopt Flavor A: a hybrid local dev stack where the Context Engine runs as a native Go binary on the host (go run ./cmd/context-engine or a locally built binary) and all infrastructure dependencies run in a single docker-compose.dev.yml stack on the loopback interface.

2.1 Authoritative topology

host:9001 ─┐
           ├─ context-engine (go binary, built from cmd/context-engine)
           │     │
           │     ├─→ 127.0.0.1:6432  pgbouncer (transaction mode)
           │     │         │
           │     │         └─→ postgres:5432 (PG 16 + pgvector)
           │     ├─→ 127.0.0.1:6379  redis (Redis Streams)
           │     ├─→ 127.0.0.1:4443  fake-gcs     (GCS API emulator)
           │     └─→ 127.0.0.1:8080  fake-jwt-issuer (OIDC-compatible)
           │
           └─ host:9091  /metrics, /healthz, /readyz (see Section 5.2)

Every dependency is addressable from the host on 127.0.0.1 only. The compose network is internal; no port is exposed on 0.0.0.0. The engine is NOT containerised for local dev — it runs on the host so that dlv, pprof, and editor integration work without friction.

2.2 Services, versions, and purpose

All versions are pinned. DevOps MUST NOT float these tags.

Service	Image / Version	Purpose
postgres	`pgvector/pgvector:pg16` (pinned digest required)	Primary DB. Must match the CloudSQL major version we will eventually target (PG 16).
pgbouncer	`edoburu/pgbouncer` (pinned digest required)	Transaction pooling — the fidelity surface for the #51 regression class.
redis	`redis:<Memorystore-matched>` (see 2.3)	Redis Streams, caching.
fake-gcs	`fsouza/fake-gcs-server` (pinned digest)	GCS API emulator for memory snapshot path.
fake-jwt-issuer	A minimal OIDC discovery + JWKS server (pinned)	Serves `/.well-known/openid-configuration` and a JWKS so the engine's OIDC verifier resolves without external calls.

Image digest pinning is a hard requirement — tag pinning alone is insufficient for reproducibility.

2.3 Redis version pin (hard condition)

Redis must be pinned to the exact major.minor that our target Memorystore tier will run. DevOps owns selecting and locking this version in docker-compose.dev.yml as a single-source constant. The constant must be referenced from the #58 Phase 4 re-validation so drift is detectable.

2.4 PgBouncer configuration (hard condition, non-negotiable)

The PgBouncer instance in the compose stack must be configured with the following options. These carry forward from the #59 architect review where the transaction-pooled GUC regression class was dissected:

pool_mode = transaction
server_reset_query = DISCARD ALL
ignore_startup_parameters = extra_float_digits,options
max_client_conn sized to support concurrent smoke test runs (minimum 50)
default_pool_size sized to match the engine's MAX_DB_CONNS (40 as of #2231; was 20). These two knobs must move together — default_pool_size is the server-side cap on CE→Postgres backends (in pool_mode = transaction, sv_active can never exceed it), while MAX_DB_CONNS is the CE-side pgxpool ceiling. If the client pool is larger than the server pool, CE just queues in-process below the pgbouncer ceiling; if smaller, the pgbouncer headroom is wasted. See the capacity analysis in #2231.
server_lifetime and server_idle_timeout set to values that force connection recycling during smoke tests (so reset-query correctness is exercised)
Logging set to verbose enough to surface reset-query failures (log_pooler_errors = 1)

Any deviation from these settings — even "just for local" — is a violation of this ADR and must be raised as a new architect concern, not a config tweak.

2.4.1 Pool sizing history (capacity analysis, #2231)

The MAX_DB_CONNS / default_pool_size pair was raised from 20 → 40 in #2231 as interim capacity relief after the CE pool saturated and took the beta-dev Approvals page down on 2026-07-27 ("database unavailable"). Root cause is #2230: the scope middleware holds one pooled conn for a request's whole duration, and AssembleContext amplifies that ~4× (outer scope conn pinned across the ~3s Vertex embedding call + a 3-way retrieval fan-out + a confidence-gate re-fan), so ~5 concurrent cache-miss AssembleContext calls exhausted the 20-conn pool.

Sizing math (verified on beta-dev, 2026-07-28):

Only CE connects through pgbouncer. agent-orchestrator connects directly to Postgres (DB_MAX_CONNS = 10, bypassing pgbouncer); agent-worker holds no DB connection. So pgbouncer's default_pool_size throttles CE's server backends and nothing else.
Postgres max_connections = 100, superuser_reserved_connections = 3 → 97 usable.
Peak Postgres backends at 40/40 = 40 (CE via pgbouncer) + 10 (orchestrator direct) + ~5 overhead = 55, leaving 42 headroom. A 50-concurrent-client burst through the live pgbouncer at the old size held sv_active = 20 with the surplus queued (cl_waiting, zero connection errors) — confirming pgbouncer is the hard server-side cap and postgres never sees CE's client-pool count.

default_pool_size = 40 roughly doubles CE's effective ceiling (from ~5 to ~10 concurrent cache-miss AssembleContext calls before saturation) while staying well clear of the Postgres ceiling. It is interim relief: the durable fix is #2230 (stop holding the outer scope conn across external I/O / fan-out), after which this pair can likely be walked back down. Any further change to this pair still routes through architect + devops per §2.4.

2.5 Engine connectivity

The engine is configured via environment variables (already defined in cmd/context-engine/config.go):

Env var	Value
`DATABASE_URL`	`postgres://upsquad:upsquad@127.0.0.1:6432/upsquad?sslmode=disable`
`REDIS_URL`	`redis://:upsquad-dev-redis@127.0.0.1:6379/0` (auth required, #250)
`GCS_BUCKET`	`upsquad-dev`
`GCS_ENDPOINT`	`http://127.0.0.1:4443/storage/v1/` (new env, must be added by backend)
`OIDC_ISSUER_URL`	`http://127.0.0.1:8080` (new env, must be added by backend)
`METRICS_PORT`	See Section 5.2 — current default `9090` collides with Prometheus default
`ENVIRONMENT`	`development`

Adding GCS_ENDPOINT and OIDC_ISSUER_URL to Config is a backend task (Section 9a).

3. Hard Conditions (normative requirements)

These are the conditions attached to the #57 approval. Each is normative — the stack is non-compliant if any is missing.

3.1 PgBouncer in compose, transaction mode

Covered in Section 2.4. The presence of PgBouncer is what makes this stack worth building; removing it turns the effort into a rounding error on go test.

3.2 fake-gcs-server in compose

The memory snapshot path (PR #36) writes to GCS. Without fake-gcs, the persistent memory code path cannot be exercised locally, which means the #58 Phase 4 re-validation cannot cover it — which means the #59 closure argument weakens. fake-gcs is therefore load-bearing for the decision to defer CloudSQL.

3.3 fake JWT issuer in compose

The scope-chain middleware resolves a JWKS over HTTP. A local OIDC-compatible issuer removes all external dependencies from the smoke test and makes the stack fully airgapped. The issuer must expose /.well-known/openid-configuration and a JWKS endpoint. It accepts any signature (see Section 7 on the tenet exception).

3.4 Redis version pin matching Memorystore

Covered in Section 2.3.

3.5 Lint rules (see Section 4)

3.6 #58 Phase 4 re-validation gate

Phase 4 of the Context Engine rollout plan (tracked in #58) must re-run against this stack as the condition for opening any Phase 6+ work. Phase 4 re-validation is the primary consumer of this stack and is the check that the #59 closure decision remains sound.

3.7 In-scope vs deferred (Class A)

In now-scope for this ADR:

This ADR itself.
The compose stack and its supporting config.
A ghcr.io image build workflow for the context-engine binary.
An empty Pulumi scaffold (see Section 9e).

Deferred until a tripwire fires:

Everything else in the original Class A list (managed CloudSQL, permanent GKE dev cluster, ArgoCD bootstrap, observability stack deployment, etc.).

4. Lint Rules (hard condition)

Two patterns are forbidden in internal/context/**. These must be enforced by automated lint — not code review — and must fail CI.

4.1 No dev-only code branches in `internal/context/**`

Forbidden:

Runtime env-gated branches such as if os.Getenv("DEV") == "1" { ... }, if cfg.Environment == "development" { ... }, or any equivalent.
Build-tag gated files such as //go:build dev or legacy // +build dev.
Any import path containing /devonly/ or /testfixture/ pulled into production packages.

Rationale: the whole point of running against a production-fidelity stack is that there is no dev-only code path to hide in. If a test needs different behaviour, it injects a different dependency; it does not flip a runtime flag inside the module under test.

4.2 No prepared statements in `internal/context/**`

Forbidden:

conn.Prepare(...), tx.Prepare(...), db.PrepareContext(...) on the standard library side.
Any pgx path that results in prepared-statement caching. The pgx pool config must set:
```
config.ConnConfig.DefaultQueryExecMode = pgx.QueryExecModeExec
```
or the equivalent at pool construction time.

Rationale: PgBouncer transaction mode and prepared statements are incompatible — a prepared statement issued on one backend connection is not visible when the next statement is routed to a different backend connection. This is the exact shape of the PR #51 regression class. Prevention is a lint concern, not a review concern.

4.3 Enforcement

Enforcement is via golangci-lint with a custom analyzer or a forbidigo ruleset, plus a small Go AST script under scripts/lint/ if the rule is beyond what forbidigo can express. DevOps picks the specific mechanism; the architect requirement is that both rules fail CI on violation. A failing lint must be reproducible locally with make lint or equivalent.

5. Smoke Test Contract (hard condition)

The smoke test harness is the sole gate on the dev stack being "green". It must assert all four of the following. A passing gRPC /healthz ping is not a smoke test.

5.1 RLS enforcement — cross-tenant read denial

Seed fixtures for two tenants, tenant_A and tenant_B.
Ingest one event for tenant_A via IngestEvent RPC with a JWT bound to tenant_A.
Call a retrieve RPC with a JWT bound to tenant_B.
Assert: zero rows returned AND no error leaking the existence of tenant_A's data (no "not found for your tenant" distinction from "not found at all").
Assert: the DB session variable app.tenant_id was set to tenant_B for the retrieve call (verify via audit log or a test hook).

5.2 Audit log writes

Perform one ingest and one retrieve against the stack.
Assert: one row in audit_log per call, with correct tenant_id, actor, action, and a non-null created_at.
Assert: the audit row is committed in the same transaction as the business write (i.e., rolling back the business write rolls back the audit row — this is the only way to guarantee the audit trail cannot be bypassed).

5.3 Ingest → retrieve round trip

Ingest a known event via IngestEvent for tenant_A.
Wait on the embedding worker's completion (bounded, with a hard timeout).
Retrieve with a query that should match.
Assert: the retrieved payload equals the ingested payload (up to expected normalisation).
Assert: at least one of vector, BM25, or recency signals contributed a non-zero score to the hit (verifies the fused retrieval path, not just a SQL SELECT *).

5.4 `EXPLAIN ANALYZE` of the hybrid retrieval query

Capture EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) of the live hybrid retrieval query as executed by the engine (not a hand-written re-creation).
Assert: the query uses the pgvector HNSW/IVFFlat index for the vector leg (index scan, not seq scan).
Assert: the query uses the BM25/tsvector GIN index for the lexical leg.
Assert: total Actual Rows is bounded (no accidental cartesian product).
The EXPLAIN output is written to a test artifact so regressions are visible in PR diffs.

Any of these four failing is a hard fail. There is no "flaky smoke test" budget.

6. Known preconditions (referenced, not resolved here)

This ADR surfaces two known issues that block a compliant stack. They are NOT resolved here — they are downstream backend tasks (Section 9a).

6.1 `/readyz` does not exist

cmd/context-engine/main.go around line 109 only wires /healthz, which returns 200 unconditionally. That is a liveness endpoint masquerading as readiness. A real /readyz endpoint is required that verifies:

PG pool can acquire a connection and round-trip SELECT 1 through PgBouncer.
Redis PING succeeds.
fake-gcs endpoint is reachable (HTTP GET on the bucket root or equivalent).
OIDC issuer discovery document is fetchable.

/readyz must return 503 if any check fails and 200 only when all pass. This is the signal the smoke test harness and docker-compose healthchecks depend on.

6.2 Metrics port default collides with Prometheus

cmd/context-engine/config.go line 18 and line 70 default METRICS_PORT to 9090, which is the Prometheus default scrape port. In any environment where a Prometheus container coexists with the engine on the same network namespace, the collision is guaranteed.

Resolution is backend's call between two options (the architect constraint is that the collision must not exist on the local stack):

Option A: change the default to 9091 (smallest diff, safest).
Option B: keep 9090 but require the compose stack to bind Prometheus (when eventually added) to a different port.

Recommendation: Option A. Backend decides in the implementing PR.

7. Tenet Exception — Security by Default

Core tenet #4 ("Security by default — every PR reviewed for OWASP top 10") is partially suspended for the local compose stack only. This exception is time-boxed.

7.1 What is exempted (only on `localhost` compose)

No TLS between engine and local PgBouncer (sslmode=disable in DATABASE_URL).
No TLS between engine and local Redis.
Fake JWT issuer accepts any signature — the JWKS it serves is a shared test key, and any token signed by that key is accepted.
fake-gcs does not enforce IAM — any request succeeds.
Secrets in docker-compose.dev.yml — database password, signing key, etc. may be committed as plaintext constants because the stack is localhost-only.

7.2 What is NOT exempted

Row-level security is still enforced. The compose Postgres has RLS policies on every tenant-scoped table, identical to what CloudSQL would carry. The smoke test (5.1) verifies this.
Audit log writes are still mandatory. The smoke test (5.2) verifies this.
Role separation is still enforced. The engine connects as a non-superuser role that cannot disable RLS. The migration role and the runtime role are distinct.
Lint rules (Section 4) are NOT exempted. They apply to all code in internal/context/**, dev stack or not.

7.3 Scope limit — does NOT extend to CloudSQL

This exception is scoped to the compose stack on localhost. Per the #59 closure, when a tripwire fires and we stand up managed infra, the exception does NOT carry over. CloudSQL / GKE / managed Redis must be stood up with full security-by-default from day one — the decision to defer them is not a decision to weaken them.

7.4 Time box

The exception expires on the earlier of:

2026-07-08 (90 days from this ADR's acceptance date), OR
The first tripwire firing (see Section 8).

On expiry, the architect must either re-affirm the exception in a new ADR or retire the compose stack in favour of managed infra. Silent continuation is not permitted.

8. Tripwires → Class A expansion

When any of the following fires, the decision to operate on the compose stack alone is revisited and Class A (managed CloudSQL, GKE dev, full observability) expansion begins:

First external design partner signs — fidelity now has a customer-shaped consequence.
Ingest API alpha is exposed to anyone outside the core team — uptime now has witnesses.
Portal integration (either upsquad-client or upsquad-admin) starts hitting the Context Engine — cross-repo contract surface expands beyond go test.
Any SLO commitment — once a number is promised, we need a monitored environment to measure it.

The first of these to fire triggers re-opening of #59 (or its successor) and expansion of the now-deferred Class A items. The #59 design is cached and expected to be reconstitutable in under one working day from the existing review notes.

9. Downstream tasks this ADR unblocks

The PjM (#57) should convert the following into tracked issues with appropriate labels and dependencies. This ADR is the sole architectural input required for each.

9a. Backend — `/readyz` and metrics port fix

Add a real /readyz HTTP handler to cmd/context-engine/main.go that verifies PG pool, Redis, fake-gcs, and OIDC issuer reachability (see Section 6.1).
Change METRICS_PORT default in cmd/context-engine/config.go (line 18, line 70) from 9090 (Section 6.2). Recommendation: 9091.
Add GCS_ENDPOINT and OIDC_ISSUER_URL to Config (Section 2.5) and wire them through to the GCS client and OIDC verifier.
Ensure the pgx pool is constructed with DefaultQueryExecMode = pgx.QueryExecModeExec (Section 4.2).
Label: backend, ready-to-pick.

9b. DevOps — compose stack, configs, workflows, lint wiring

Author docker-compose.dev.yml implementing Section 2 topology with all versions pinned by digest.
Author pgbouncer.ini matching Section 2.4 exactly.
Author the fake-jwt-issuer configuration (JWKS, discovery doc).
Wire three GitHub Actions workflows:
- build-context-engine — builds and tests the engine binary.
- dev-stack-smoke — stands up the compose stack and runs the QA smoke test harness (Section 5).
- A schema/kubeconform-style validation workflow as scaffolding for the later Pulumi/Helm work.
Wire the Section 4 lint rules into golangci-lint config (and a supporting AST script under scripts/lint/ if required). make lint must fail on violation.
Label: devops, ready-to-pick.

9c. QA — smoke test harness

Implement the four assertions in Section 5 as an executable test harness consumed by the dev-stack-smoke workflow.
Tests must be hermetic — no dependency on anything outside the compose network.
Failure output must pinpoint which of the four assertions failed.
Label: qa, ready-to-pick.

9d. DevOps — ghcr.io image build workflow

Publish context-engine binary container images to ghcr.io/upsquad-ai/upsquad-core/context-engine on tag and main push.
Images are NOT used by the local dev stack (which runs the host binary) but are required for the image build chain to stay warm ahead of Class A expansion.
Label: devops, ready-to-pick.

9e. DevOps — empty Pulumi scaffold

Create infra/pulumi/upsquad-infra/ directory.
Include an empty Pulumi TypeScript project (index.ts with no resources, Pulumi.yaml, package.json).
README.md with a "Who runs this" placeholder and a pointer back to this ADR and #59 for context on why it is empty.
Stack config for dev should exist but define no resources. This is scaffolding for Class A expansion.
Label: devops, ready-to-pick.

10. Consequences

10.1 What we gain

Production-fidelity coverage of the #51 regression class without managed-infra cost.
A concrete harness for #58 Phase 4 re-validation to land on, which is the condition for moving into Phase 6+.
A repeatable, airgapped local environment that any engineer (or agent) can stand up in minutes.
A decision cache — if a tripwire fires, we already know what we'll build.

10.2 What we accept

No managed-infra fidelity signal until a tripwire fires. We will not find "CloudSQL-specific" issues — CloudSQL upgrades, IAM edges, network-path quirks — until we stand up managed infra. The #59 review established that these are a different class of risk from the #51 regression class and can be deferred.
The #59 design is cached, not implemented. Reconstitution target: under one working day when needed.
Approximately one sprint of delivery slowdown (already agreed with Ashik on #57) as the stack, lint rules, and smoke test harness are built before feature work resumes.
A time-boxed security-by-default exception (Section 7), which introduces an obligation to re-affirm or retire within 90 days.
An operational discipline requirement — the compose stack must be kept green. A broken smoke test is the same severity as a broken CI: it blocks merges.

10.3 What this ADR does NOT do

It does not approve any managed infra spend.
It does not commit to a Kubernetes local environment.
It does not weaken RLS, audit logging, or the governance model — those remain fully enforced.
It does not supersede or soften any tenet beyond what Section 7 explicitly calls out.

11. Alternatives rejected

Flavor B (kind / Kubernetes in Docker) — rejected in #57 by Vaisakh. Kubernetes fidelity is not on the critical path for the #51 regression class, and the operational cost is disproportionate.
"Just run go test" with testcontainers per test — rejected. Does not exercise the long-lived PgBouncer pool behaviour, cannot host Phase 4 re-validation, and fragments the fidelity surface across test binaries.
Permanent CloudSQL dev instance (#59) — closed not planned. See the #59 closure comment for the full reasoning; the compose stack provides architectural equivalence for the regression class we actually care about.

1. Context​

2. Decision​

2.1 Authoritative topology​

2.2 Services, versions, and purpose​

2.3 Redis version pin (hard condition)​

2.4 PgBouncer configuration (hard condition, non-negotiable)​

2.4.1 Pool sizing history (capacity analysis, #2231)​

2.5 Engine connectivity​

3. Hard Conditions (normative requirements)​

3.1 PgBouncer in compose, transaction mode​

3.2 fake-gcs-server in compose​

3.3 fake JWT issuer in compose​

3.4 Redis version pin matching Memorystore​

3.5 Lint rules (see Section 4)​

3.6 #58 Phase 4 re-validation gate​

3.7 In-scope vs deferred (Class A)​

4. Lint Rules (hard condition)​

4.1 No dev-only code branches in internal/context/**​

4.2 No prepared statements in internal/context/**​

4.3 Enforcement​

5. Smoke Test Contract (hard condition)​

5.1 RLS enforcement — cross-tenant read denial​

5.2 Audit log writes​

5.3 Ingest → retrieve round trip​

5.4 EXPLAIN ANALYZE of the hybrid retrieval query​

6. Known preconditions (referenced, not resolved here)​

6.1 /readyz does not exist​

6.2 Metrics port default collides with Prometheus​

7. Tenet Exception — Security by Default​

7.1 What is exempted (only on localhost compose)​

7.2 What is NOT exempted​

7.3 Scope limit — does NOT extend to CloudSQL​

7.4 Time box​

8. Tripwires → Class A expansion​

9. Downstream tasks this ADR unblocks​

9a. Backend — /readyz and metrics port fix​

9b. DevOps — compose stack, configs, workflows, lint wiring​

9c. QA — smoke test harness​

9d. DevOps — ghcr.io image build workflow​

9e. DevOps — empty Pulumi scaffold​

10. Consequences​

10.1 What we gain​

10.2 What we accept​

10.3 What this ADR does NOT do​

11. Alternatives rejected​