Runbook: Org Model v2.3 — Phase 3 Legacy Drop (Dev)

Parent tracker: #570 · HLD: #556 · LLD: #560 §10.3 · Task issue: #694 · Milestone: Wave J+K — Phase 2 Cutover + Legacy Drop (#11)

This runbook covers the destructive, one-way migration that drops the legacy Pillar/Team tables after the Phase 2 cutover (docs/runbooks/org-v23-phase2-cutover.md) has settled. The founder is the operator — the runbook does not automate the migration on the founder's behalf.

0. Prod flip authority

Prod flip authority: TBD. Revisit this section when the first prod tenant onboards. Until then, every step below is dev only. Do not copy this runbook into a prod change ticket without replacing this section with the approved prod authority.

0.1 Pre-flight blockers — MUST be closed before migration 070 runs

PR #713 review round 4 identified three live-code surfaces that still read legacy tables dropped by migration 070. Those surfaces remain on the tier-1 structural allowlist of scripts/ci/check-legacy-table-refs.sh because the PR does not modify them, but they would 500 on every call once the tables are gone.

The migration does not run in any environment until every issue in the table below is closed. Each issue carries a lint+runtime fix; once closed, that surface is either flag-gated or removed and the risk it carries is retired.

#	Issue	Surface	Risk if skipped
1	#733	`internal/mcp/github/scope.go::IsRepoAllowed`	Resolved 2026-04-26. The dead `PolicyScopeChecker`, `NewPolicyScopeChecker`, and `ScopeDBTX` types (zero production callers, only the `team_governance_policies` query path) were deleted. `AllowAllScopeChecker` retained for tests. Issue closed.
2	#734	`internal/governance/{store,policy_store}.go` legacy team_policy reads	Protected architecturally by the `cascade_4layer` breaker; if the breaker trips post-drop, `engine.Check` falls back to the legacy 3-layer path and 500s. 12 refs across the two files.
3	#735	`internal/runtime/approval/policy/source_pg.go::GetTeamPolicies`	Called on every approval-policy resolution. 500s post-drop with no breaker in front. Requires a schema migration (adds `template / timeout_seconds / escalation_minutes_before_deadline / channels / team_kind` to `org_unit_governance_policies`) plus a data backfill, so carries a separate migration number. Shipped by migration `079_org_unit_policy_template.up.sql` + flag-gate in `PGSource.GetTeamPolicies`. Verify `read_from_org_units=true` pins `FROM org_unit_governance_policies` before running §5.

Verification before §1:

# Each must be CLOSED (not just merged into a branch).
for ISSUE in 733 734 735; do
  STATE=$(gh api repos/upsquad-ai/upsquad-core/issues/"$ISSUE" --jq '.state')
  echo "#${ISSUE}: ${STATE}"
  [[ "$STATE" == "closed" ]] || { echo "STOP: #$ISSUE is still open."; exit 1; }
done

1. What this runbook covers

Migration 070_drop_legacy.up.sql (this PR) drops the four legacy tables that are no longer referenced by any live read or write path:

Table	Origin	Superseded by
`pillars`	migration 001	`org_units` where `unit_type = 'pillar'` (migration 060)
`teams`	migration 001	`org_units` where `unit_type = 'team'` (migration 060)
`team_memberships`	migration 058	`org_unit_memberships` (migration 060)
`team_governance_policies`	migration 028	4-layer cascade policies (migration 063)

The drop is destructive — once migration 070 has run, those tables' rows are gone from the live database. The only way to recover row-accurate pre-drop state is to restore from the pg_dump backup captured in §4 below.

2. Who runs what

Dev: the founder runs the migration personally via §5 below. Agents never run the migration on the founder's behalf. If an agent is asked to execute the drop against a live database, the request is out of scope and must be escalated back to the founder.
Prod: TBD — see §0.

3. Pre-flight checks (before §5)

Run all of these before executing the migration. Any failure is a hard stop — fix and re-run from the top, do not attempt partial continuation.

3.1 Phase 2 cutover complete

The Phase 2 runbook (docs/runbooks/org-v23-phase2-cutover.md) must have landed all four flags cleanly:

Flag	Required state	Verification
`read_from_org_units`	`true`	`SELECT value FROM platform_feature_flags WHERE key = 'orgv23.read_from_org_units';` returns `true`
`cascade_4layer`	`true`	`SELECT value FROM platform_feature_flags WHERE key = 'orgv23.cascade_4layer';` returns `true`
`new_services_enabled`	`true`	`SELECT value FROM platform_feature_flags WHERE key = 'orgv23.new_services_enabled';` returns `true`
`dual_write_org_units`	`false`	`SELECT value FROM platform_feature_flags WHERE key = 'orgv23.dual_write_org_units';` returns `false`

Each flip must have been observed stable for at least one dev-release-cycle (founder decision 2026-04-19 — the original LLD §10.3 six-week window is waived pre-prod).

3.2 Zero active legacy writes

Run one-off counts against the legacy tables to confirm the dual- write off-switch has held:

-- Expected: 0 new rows in the observation window.
SELECT table_name,
       COUNT(*) FILTER (WHERE created_at > now() - interval '24 hours') AS created_24h
FROM (
  SELECT 'pillars'          AS table_name, created_at FROM pillars
  UNION ALL
  SELECT 'teams',                       created_at FROM teams
  UNION ALL
  SELECT 'team_memberships',            created_at FROM team_memberships
  UNION ALL
  SELECT 'team_governance_policies',    created_at FROM team_governance_policies
) AS s
GROUP BY table_name;

Any non-zero created_24h column means something is still writing to a legacy table — stop and investigate before dropping.

3.3 Reconciler drift at zero

# Port-forward to the reconciler's metrics endpoint:
kubectl -n platform port-forward deploy/dualwrite-reconciler 9119:9119

# In another shell:
curl -s localhost:9119/metrics | grep '^dualwrite_open_drift{'
# Every line's value must be 0.

If any dualwrite_open_drift gauge is non-zero, run one more heal cycle before proceeding. A stable-zero gate (over the observation window) is mandatory.

3.4 CI lint sanity

# From the repo root:
scripts/ci/check-legacy-table-refs.sh origin/main
# Expected: "legacy-table-refs: OK — no new legacy-table references outside the allowlist."

This confirms no PR merged between Phase 2 and Phase 3 sneaked a new reference into the non-allowlisted code paths.

Gotcha (issue #1350): the lint scans only the comment-stripped content of added lines, and treats **/*_test.go as structurally exempt. So the bare words teams / pillars / team_memberships in a //, --, #, or single-line /* … */ comment, or anywhere in a Go test file, will not trip it. If you genuinely need to reference a dropped table from live, non-test code, add the file to the allowlist in scripts/ci/check-legacy-table-refs.sh with a rationale. One sharp edge: a token inside a multi-line /* … */ block in a non-test source file can still trip — prefer // line comments for prose. The lint's own behaviour is pinned by scripts/ci/check-legacy-table-refs.test.sh (run in CI), so changes to the matcher can't silently regress.

4. Backup (non-negotiable)

Take a full logical backup of the database before running the migration. The backup is the ONLY recovery path once §5 has executed.

# Namespace + pod names for dev (adjust for your env).
NS=platform
PG_POD=$(kubectl -n "$NS" get pod -l app=postgres -o jsonpath='{.items[0].metadata.name}')

STAMP=$(date -u +%Y%m%dT%H%M%SZ)
BACKUP_DIR="/var/backups/upsquad/orgv23-phase3-drop-legacy"
mkdir -p "$BACKUP_DIR"

# 1. Full database dump (compressed). Takes a few minutes on dev.
kubectl -n "$NS" exec "$PG_POD" -- \
  pg_dump -U upsquad -Fc -Z 9 upsquad \
  > "$BACKUP_DIR/upsquad-full-${STAMP}.pgdump"

# 2. Per-table dumps (smaller, targeted restore surface).
for TBL in pillars teams team_memberships team_governance_policies; do
  kubectl -n "$NS" exec "$PG_POD" -- \
    pg_dump -U upsquad -Fc --data-only --table="public.${TBL}" upsquad \
    > "$BACKUP_DIR/${TBL}-${STAMP}.pgdump"
done

# 3. SHA-256 manifest — record in the tracker #570 comment.
sha256sum "$BACKUP_DIR"/*.pgdump > "$BACKUP_DIR/SHA256SUMS-${STAMP}.txt"
cat "$BACKUP_DIR/SHA256SUMS-${STAMP}.txt"

Record the backup path + SHA manifest in #570 and #694 as a comment before moving to §5. Without this record there is no audit trail of the backup's existence.

4.1 Restore smoke test (optional but recommended)

Spin up a scratch database and confirm the backup is readable:

kubectl -n "$NS" exec "$PG_POD" -- psql -U upsquad -d postgres -c \
  "CREATE DATABASE upsquad_backup_smoke;"
kubectl -n "$NS" exec -i "$PG_POD" -- \
  pg_restore -U upsquad -d upsquad_backup_smoke -v \
  < "$BACKUP_DIR/upsquad-full-${STAMP}.pgdump" | tail -20
kubectl -n "$NS" exec "$PG_POD" -- psql -U upsquad -d upsquad_backup_smoke -c \
  "SELECT COUNT(*) FROM pillars; SELECT COUNT(*) FROM teams;
   SELECT COUNT(*) FROM team_memberships;
   SELECT COUNT(*) FROM team_governance_policies;"
kubectl -n "$NS" exec "$PG_POD" -- psql -U upsquad -d postgres -c \
  "DROP DATABASE upsquad_backup_smoke;"

If the counts are non-zero and the restore completed without errors, the backup is trustworthy.

5. Execution

The migration is applied by golang-migrate, same tool the rest of the platform uses. It wraps 070_drop_legacy.up.sql in a single transaction.

# DATABASE_URL must point at the dev database; confirm twice.
echo "$DATABASE_URL" | grep -v prod || { echo "STOP: DATABASE_URL looks like prod"; exit 1; }

# Apply just the one migration — NOT `up` (that would be a no-op at
# this point, but best to be explicit).
make migrate-up

Expected output:

70/u drop_legacy (…ms)

If the migration errors (e.g. FK constraint surprise), the transaction rolls back and the legacy tables are still present. Investigate the error, correct, and retry. Do NOT patch the migration file in-place once it has succeeded on any other database — that breaks the single-source-of-truth invariant for migrations.

6. Post-verification

6.1 Tables gone, survivors intact

-- Expected: zero rows (legacy tables are gone).
SELECT tablename FROM pg_tables
WHERE schemaname = 'public'
  AND tablename IN ('pillars','teams','team_memberships','team_governance_policies');

-- Expected: non-zero rows on every survivor.
SELECT 'organizations'  AS t, COUNT(*) FROM organizations
UNION ALL
SELECT 'members',           COUNT(*) FROM members
UNION ALL
SELECT 'org_units',         COUNT(*) FROM org_units
UNION ALL
SELECT 'org_unit_memberships', COUNT(*) FROM org_unit_memberships
UNION ALL
SELECT 'agents',            COUNT(*) FROM agents;

6.2 OrgUnitService parity

Hit the running Context Engine with an authenticated request and confirm the read path returns the same team surface as before the drop. The QA engineer's parity check (issue #694 acceptance) covers this; founder runs it once before closing the tracker:

# With SMOKE_GATEWAY_URL + SMOKE_GATEWAY_TOKEN + SMOKE_GATEWAY_ORG_ID set:
make smoke-wave-j

Expected: ok on all sub-tests.

6.3 Client + Admin portal smoke

Open the Org Chart page in the client portal and the Member Admin page in the admin portal. Verify:

The full tree renders with no empty nodes.
Member counts match the pre-drop snapshot (from §6.1 backup).
No console errors referencing pillars / teams / team_memberships.

Any regression is a rollback signal — see §7.

7. Rollback

The migration is one-way at the data level. Rollback requires restoring from the §4 backup:

# 1. Take the service down to prevent writes against the dropped
#    schema.
kubectl -n platform scale deploy/context-engine --replicas=0

# 2. Run migrate down — recreates the schema (empty).
make migrate-down STEPS=1

# 3. Restore the four tables from the per-table backups.
for TBL in pillars teams team_memberships team_governance_policies; do
  kubectl -n "$NS" exec -i "$PG_POD" -- \
    pg_restore -U upsquad -d upsquad --table="public.${TBL}" --data-only \
    < "$BACKUP_DIR/${TBL}-${STAMP}.pgdump"
done

# 4. Verify counts match the pre-drop snapshot.
kubectl -n "$NS" exec "$PG_POD" -- psql -U upsquad -d upsquad -c \
  "SELECT 'pillars', COUNT(*) FROM pillars UNION ALL
   SELECT 'teams', COUNT(*) FROM teams UNION ALL
   SELECT 'team_memberships', COUNT(*) FROM team_memberships UNION ALL
   SELECT 'team_governance_policies', COUNT(*) FROM team_governance_policies;"

# 5. Bring the service back up.
kubectl -n platform scale deploy/context-engine --replicas=1

Then revert the Phase 2 flags (read_from_org_units, cascade_4layer, new_services_enabled → false; dual_write_org_units → true) per the Phase 2 runbook §7 "rollback" procedure, and open a tracking incident against #570 documenting the trigger for the rollback.

DO NOT attempt to recover state by running migrate down alone — that recreates empty tables with the right shape but no rows, which looks healthy to a shallow check and breaks downstream reads.

8. Close-out

When §6 is green:

Tick every acceptance box on #694.
Comment on #570 with the backup path + SHA manifest from §4.
Ping @project-manager to close the Phase 3 milestone (#11).
QA writes the parity report and attaches it to #694.

At this point the Org Model v2.3 migration is code-complete in dev. The only surviving references to pillars / teams / team_memberships / team_governance_policies are:

The migration files themselves (070's up + down, 001, 028, 058, 060 — history of the schema).
The CI lint script (scripts/ci/check-legacy-table-refs.sh) which stays in place to block any accidental reintroduction of the legacy identifiers.
The .allowlist paths in the lint script and the internal/tenant shim + internal/member/read_switch* code paths that continue to compile but are no longer exercised at runtime. A future hygiene pass (out of scope for K.2) will remove them.

0. Prod flip authority​

0.1 Pre-flight blockers — MUST be closed before migration 070 runs​

1. What this runbook covers​

2. Who runs what​

3. Pre-flight checks (before §5)​

3.1 Phase 2 cutover complete​

3.2 Zero active legacy writes​

3.3 Reconciler drift at zero​

3.4 CI lint sanity​

4. Backup (non-negotiable)​

4.1 Restore smoke test (optional but recommended)​

5. Execution​

6. Post-verification​

6.1 Tables gone, survivors intact​

6.2 OrgUnitService parity​

6.3 Client + Admin portal smoke​

7. Rollback​

8. Close-out​