Skip to main content

Runbook: Org Model v2.3 — Phase 3 Legacy Drop (Dev)

Parent tracker: #570 · HLD: #556 · LLD: #560 §10.3 · Task issue: #694 · Milestone: Wave J+K — Phase 2 Cutover + Legacy Drop (#11)

This runbook covers the destructive, one-way migration that drops the legacy Pillar/Team tables after the Phase 2 cutover (docs/runbooks/org-v23-phase2-cutover.md) has settled. The founder is the operator — the runbook does not automate the migration on the founder's behalf.


0. Prod flip authority

Prod flip authority: TBD. Revisit this section when the first prod tenant onboards. Until then, every step below is dev only. Do not copy this runbook into a prod change ticket without replacing this section with the approved prod authority.


0.1 Pre-flight blockers — MUST be closed before migration 070 runs

PR #713 review round 4 identified three live-code surfaces that still read legacy tables dropped by migration 070. Those surfaces remain on the tier-1 structural allowlist of scripts/ci/check-legacy-table-refs.sh because the PR does not modify them, but they would 500 on every call once the tables are gone.

The migration does not run in any environment until every issue in the table below is closed. Each issue carries a lint+runtime fix; once closed, that surface is either flag-gated or removed and the risk it carries is retired.

#IssueSurfaceRisk if skipped
1#733internal/mcp/github/scope.go::IsRepoAllowedResolved 2026-04-26. The dead PolicyScopeChecker, NewPolicyScopeChecker, and ScopeDBTX types (zero production callers, only the team_governance_policies query path) were deleted. AllowAllScopeChecker retained for tests. Issue closed.
2#734internal/governance/{store,policy_store}.go legacy team_policy readsProtected architecturally by the cascade_4layer breaker; if the breaker trips post-drop, engine.Check falls back to the legacy 3-layer path and 500s. 12 refs across the two files.
3#735internal/runtime/approval/policy/source_pg.go::GetTeamPoliciesCalled on every approval-policy resolution. 500s post-drop with no breaker in front. Requires a schema migration (adds template / timeout_seconds / escalation_minutes_before_deadline / channels / team_kind to org_unit_governance_policies) plus a data backfill, so carries a separate migration number. Shipped by migration 079_org_unit_policy_template.up.sql + flag-gate in PGSource.GetTeamPolicies. Verify read_from_org_units=true pins FROM org_unit_governance_policies before running §5.

Verification before §1:

# Each must be CLOSED (not just merged into a branch).
for ISSUE in 733 734 735; do
STATE=$(gh api repos/upsquad-ai/upsquad-core/issues/"$ISSUE" --jq '.state')
echo "#${ISSUE}: ${STATE}"
[[ "$STATE" == "closed" ]] || { echo "STOP: #$ISSUE is still open."; exit 1; }
done

1. What this runbook covers

Migration 070_drop_legacy.up.sql (this PR) drops the four legacy tables that are no longer referenced by any live read or write path:

TableOriginSuperseded by
pillarsmigration 001org_units where unit_type = 'pillar' (migration 060)
teamsmigration 001org_units where unit_type = 'team' (migration 060)
team_membershipsmigration 058org_unit_memberships (migration 060)
team_governance_policiesmigration 0284-layer cascade policies (migration 063)

The drop is destructive — once migration 070 has run, those tables' rows are gone from the live database. The only way to recover row-accurate pre-drop state is to restore from the pg_dump backup captured in §4 below.

2. Who runs what

  • Dev: the founder runs the migration personally via §5 below. Agents never run the migration on the founder's behalf. If an agent is asked to execute the drop against a live database, the request is out of scope and must be escalated back to the founder.
  • Prod: TBD — see §0.

3. Pre-flight checks (before §5)

Run all of these before executing the migration. Any failure is a hard stop — fix and re-run from the top, do not attempt partial continuation.

3.1 Phase 2 cutover complete

The Phase 2 runbook (docs/runbooks/org-v23-phase2-cutover.md) must have landed all four flags cleanly:

FlagRequired stateVerification
read_from_org_unitstrueSELECT value FROM platform_feature_flags WHERE key = 'orgv23.read_from_org_units'; returns true
cascade_4layertrueSELECT value FROM platform_feature_flags WHERE key = 'orgv23.cascade_4layer'; returns true
new_services_enabledtrueSELECT value FROM platform_feature_flags WHERE key = 'orgv23.new_services_enabled'; returns true
dual_write_org_unitsfalseSELECT value FROM platform_feature_flags WHERE key = 'orgv23.dual_write_org_units'; returns false

Each flip must have been observed stable for at least one dev-release-cycle (founder decision 2026-04-19 — the original LLD §10.3 six-week window is waived pre-prod).

3.2 Zero active legacy writes

Run one-off counts against the legacy tables to confirm the dual- write off-switch has held:

-- Expected: 0 new rows in the observation window.
SELECT table_name,
COUNT(*) FILTER (WHERE created_at > now() - interval '24 hours') AS created_24h
FROM (
SELECT 'pillars' AS table_name, created_at FROM pillars
UNION ALL
SELECT 'teams', created_at FROM teams
UNION ALL
SELECT 'team_memberships', created_at FROM team_memberships
UNION ALL
SELECT 'team_governance_policies', created_at FROM team_governance_policies
) AS s
GROUP BY table_name;

Any non-zero created_24h column means something is still writing to a legacy table — stop and investigate before dropping.

3.3 Reconciler drift at zero

# Port-forward to the reconciler's metrics endpoint:
kubectl -n platform port-forward deploy/dualwrite-reconciler 9119:9119

# In another shell:
curl -s localhost:9119/metrics | grep '^dualwrite_open_drift{'
# Every line's value must be 0.

If any dualwrite_open_drift gauge is non-zero, run one more heal cycle before proceeding. A stable-zero gate (over the observation window) is mandatory.

3.4 CI lint sanity

# From the repo root:
scripts/ci/check-legacy-table-refs.sh origin/main
# Expected: "legacy-table-refs: OK — no new legacy-table references outside the allowlist."

This confirms no PR merged between Phase 2 and Phase 3 sneaked a new reference into the non-allowlisted code paths.

4. Backup (non-negotiable)

Take a full logical backup of the database before running the migration. The backup is the ONLY recovery path once §5 has executed.

# Namespace + pod names for dev (adjust for your env).
NS=platform
PG_POD=$(kubectl -n "$NS" get pod -l app=postgres -o jsonpath='{.items[0].metadata.name}')

STAMP=$(date -u +%Y%m%dT%H%M%SZ)
BACKUP_DIR="/var/backups/upsquad/orgv23-phase3-drop-legacy"
mkdir -p "$BACKUP_DIR"

# 1. Full database dump (compressed). Takes a few minutes on dev.
kubectl -n "$NS" exec "$PG_POD" -- \
pg_dump -U upsquad -Fc -Z 9 upsquad \
> "$BACKUP_DIR/upsquad-full-${STAMP}.pgdump"

# 2. Per-table dumps (smaller, targeted restore surface).
for TBL in pillars teams team_memberships team_governance_policies; do
kubectl -n "$NS" exec "$PG_POD" -- \
pg_dump -U upsquad -Fc --data-only --table="public.${TBL}" upsquad \
> "$BACKUP_DIR/${TBL}-${STAMP}.pgdump"
done

# 3. SHA-256 manifest — record in the tracker #570 comment.
sha256sum "$BACKUP_DIR"/*.pgdump > "$BACKUP_DIR/SHA256SUMS-${STAMP}.txt"
cat "$BACKUP_DIR/SHA256SUMS-${STAMP}.txt"

Record the backup path + SHA manifest in #570 and #694 as a comment before moving to §5. Without this record there is no audit trail of the backup's existence.

Spin up a scratch database and confirm the backup is readable:

kubectl -n "$NS" exec "$PG_POD" -- psql -U upsquad -d postgres -c \
"CREATE DATABASE upsquad_backup_smoke;"
kubectl -n "$NS" exec -i "$PG_POD" -- \
pg_restore -U upsquad -d upsquad_backup_smoke -v \
< "$BACKUP_DIR/upsquad-full-${STAMP}.pgdump" | tail -20
kubectl -n "$NS" exec "$PG_POD" -- psql -U upsquad -d upsquad_backup_smoke -c \
"SELECT COUNT(*) FROM pillars; SELECT COUNT(*) FROM teams;
SELECT COUNT(*) FROM team_memberships;
SELECT COUNT(*) FROM team_governance_policies;"
kubectl -n "$NS" exec "$PG_POD" -- psql -U upsquad -d postgres -c \
"DROP DATABASE upsquad_backup_smoke;"

If the counts are non-zero and the restore completed without errors, the backup is trustworthy.

5. Execution

The migration is applied by golang-migrate, same tool the rest of the platform uses. It wraps 070_drop_legacy.up.sql in a single transaction.

# DATABASE_URL must point at the dev database; confirm twice.
echo "$DATABASE_URL" | grep -v prod || { echo "STOP: DATABASE_URL looks like prod"; exit 1; }

# Apply just the one migration — NOT `up` (that would be a no-op at
# this point, but best to be explicit).
make migrate-up

Expected output:

70/u drop_legacy (…ms)

If the migration errors (e.g. FK constraint surprise), the transaction rolls back and the legacy tables are still present. Investigate the error, correct, and retry. Do NOT patch the migration file in-place once it has succeeded on any other database — that breaks the single-source-of-truth invariant for migrations.

6. Post-verification

6.1 Tables gone, survivors intact

-- Expected: zero rows (legacy tables are gone).
SELECT tablename FROM pg_tables
WHERE schemaname = 'public'
AND tablename IN ('pillars','teams','team_memberships','team_governance_policies');

-- Expected: non-zero rows on every survivor.
SELECT 'organizations' AS t, COUNT(*) FROM organizations
UNION ALL
SELECT 'members', COUNT(*) FROM members
UNION ALL
SELECT 'org_units', COUNT(*) FROM org_units
UNION ALL
SELECT 'org_unit_memberships', COUNT(*) FROM org_unit_memberships
UNION ALL
SELECT 'agents', COUNT(*) FROM agents;

6.2 OrgUnitService parity

Hit the running Context Engine with an authenticated request and confirm the read path returns the same team surface as before the drop. The QA engineer's parity check (issue #694 acceptance) covers this; founder runs it once before closing the tracker:

# With SMOKE_GATEWAY_URL + SMOKE_GATEWAY_TOKEN + SMOKE_GATEWAY_ORG_ID set:
make smoke-wave-j

Expected: ok on all sub-tests.

6.3 Client + Admin portal smoke

Open the Org Chart page in the client portal and the Member Admin page in the admin portal. Verify:

  • The full tree renders with no empty nodes.
  • Member counts match the pre-drop snapshot (from §6.1 backup).
  • No console errors referencing pillars / teams / team_memberships.

Any regression is a rollback signal — see §7.

7. Rollback

The migration is one-way at the data level. Rollback requires restoring from the §4 backup:

# 1. Take the service down to prevent writes against the dropped
# schema.
kubectl -n platform scale deploy/context-engine --replicas=0

# 2. Run migrate down — recreates the schema (empty).
make migrate-down STEPS=1

# 3. Restore the four tables from the per-table backups.
for TBL in pillars teams team_memberships team_governance_policies; do
kubectl -n "$NS" exec -i "$PG_POD" -- \
pg_restore -U upsquad -d upsquad --table="public.${TBL}" --data-only \
< "$BACKUP_DIR/${TBL}-${STAMP}.pgdump"
done

# 4. Verify counts match the pre-drop snapshot.
kubectl -n "$NS" exec "$PG_POD" -- psql -U upsquad -d upsquad -c \
"SELECT 'pillars', COUNT(*) FROM pillars UNION ALL
SELECT 'teams', COUNT(*) FROM teams UNION ALL
SELECT 'team_memberships', COUNT(*) FROM team_memberships UNION ALL
SELECT 'team_governance_policies', COUNT(*) FROM team_governance_policies;"

# 5. Bring the service back up.
kubectl -n platform scale deploy/context-engine --replicas=1

Then revert the Phase 2 flags (read_from_org_units, cascade_4layer, new_services_enabledfalse; dual_write_org_unitstrue) per the Phase 2 runbook §7 "rollback" procedure, and open a tracking incident against #570 documenting the trigger for the rollback.

DO NOT attempt to recover state by running migrate down alone — that recreates empty tables with the right shape but no rows, which looks healthy to a shallow check and breaks downstream reads.

8. Close-out

When §6 is green:

  1. Tick every acceptance box on #694.
  2. Comment on #570 with the backup path + SHA manifest from §4.
  3. Ping @project-manager to close the Phase 3 milestone (#11).
  4. QA writes the parity report and attaches it to #694.

At this point the Org Model v2.3 migration is code-complete in dev. The only surviving references to pillars / teams / team_memberships / team_governance_policies are:

  • The migration files themselves (070's up + down, 001, 028, 058, 060 — history of the schema).
  • The CI lint script (scripts/ci/check-legacy-table-refs.sh) which stays in place to block any accidental reintroduction of the legacy identifiers.
  • The .allowlist paths in the lint script and the internal/tenant shim + internal/member/read_switch* code paths that continue to compile but are no longer exercised at runtime. A future hygiene pass (out of scope for K.2) will remove them.