Postgres-Extensions · jnasbyupgrade · May 15, 2026 · May 15, 2026 · May 15, 2026 · May 15, 2026
diff --git a/.claude/skills/ci/SKILL.md b/.claude/skills/ci/SKILL.md
@@ -0,0 +1,96 @@
+---
+name: ci
+description: |
+  Monitor GitHub Actions CI runs for pgxntool and/or pgxntool-test after a push.
+  Reports which branches are under test, per-job pass/fail, and failure details.
+  Uses shell scripts for all heavy work to minimize context consumption.
+
+  Use when: "monitor CI", "watch CI", "check CI", "/ci"
+allowed-tools: Bash(bash .claude/skills/ci/scripts/*), Read
+---
+
+# CI Monitor Skill
+
+Monitor GitHub Actions CI across both repos after a push. Always run in background.
+
+## Usage
+
+- `/ci` — monitor the most recent CI run on both repos for the current branch
+- `/ci pgxntool-test` — monitor pgxntool-test only
+- `/ci pgxntool` — monitor pgxntool only
+- `/ci <branch> <pgxntool-sha> <pgxntool-test-sha>` — monitor specific push SHAs (most reliable)
+
+## Workflow
+
+### 1. Start Monitor (Background)
+
+After every `git push`, immediately launch:
+
+```bash
+bash .claude/skills/ci/scripts/monitor-ci.sh [repos] [branch] [sha1] [sha2]
+```
+
+Arguments:
+- `repos`: `both` (default), `pgxntool-test`, or `pgxntool`
+- `branch`: the branch just pushed (default: current git branch)
+- `sha1`: SHA pushed to pgxntool-test (optional but recommended)
+- `sha2`: SHA pushed to pgxntool (optional but recommended)
+
+When pushing to both repos, always pass the SHAs to avoid a race condition where
+`--branch` might pick up a different concurrent push on the same branch.
+
+> **Race condition note**: `gh run list --branch` returns the most recent run on
+> that branch — if two pushes happen close together (e.g. two sessions pushing
+> in parallel), it may pick up the wrong run. Passing `--commit SHA` targets the
+> exact push and avoids this. When SHA is unavailable, always verify the
+> `=== BRANCHES: ===` line in the output matches the code you pushed.
+
+**Always use `run_in_background: true`.**
+
+### 2. Read Results
+
+When the background task completes, read the output. The script emits:
+
+```
+[pgxntool-test] Run 12345678 found
+[pgxntool-test] === BRANCHES: pgxntool-test=feature/foo pgxntool=feature/foo ===
+[pgxntool-test] Polling... (running: 🐘 PostgreSQL 13, 🐘 PostgreSQL 15)
+[pgxntool-test] PASS  🐘 PostgreSQL 12
+[pgxntool-test] PASS  🐘 PostgreSQL 15
+[pgxntool-test] FAIL  🐘 PostgreSQL 13
+[pgxntool-test] Run completed: FAILURE
+[pgxntool-test] === FAILURE: 🐘 PostgreSQL 13 ===
+... failure log lines ...
+OVERALL: FAIL
+```
+
+The **last line is always `OVERALL: <STATUS>`**. Check this first:
+
+| OVERALL | Exit code | Meaning |
+|---------|-----------|---------|
+| `ALL_PASS` | 0 | All jobs green — safe to proceed |
+| `FAIL` | 1 | One or more jobs failed — stop and report |
+| `TIMEOUT` | 2 | Run(s) did not complete within timeout |
+
+**Always verify the `=== BRANCHES ===` line** matches the code you just pushed —
+this is your primary safeguard against the `--branch` race condition. If the
+branches don't match, cancel the run and re-trigger: `gh run cancel <id> --repo
+<repo>` then re-push or re-run via `gh run rerun`.
+
+### 3. Enforce Results
+
+**CRITICAL RULES:**
+
+1. Any CI failure must be **reported to the user immediately**. Do not continue with other work.
+2. Diagnose from the **first** `not ok` line — ignore cascading failures below it.
+3. Failures in our workflow files (dependency installs, git config, etc.) are our problem to fix.
+4. Failures in test code (not ok from BATS) may be pre-existing — report to user and ask before touching test files.
+5. Never rationalize failures as "pre-existing" or "unrelated" without explicitly telling the user.
+6. If CI is taking longer than expected on pgxntool, it may be waiting up to 5 min for a pgxntool-test PR — that is normal.
+
+## Key rules
+
+1. **ALWAYS** monitor CI after every push — use this skill, never `gh run watch` directly
+2. When pushing to both repos, start two background monitors simultaneously (one per repo)
+3. Pass the exact push SHA when available — `--branch` has a race condition on rapid pushes
+4. The `=== BRANCHES ===` line in the output confirms which code is under test — always verify it matches your intent
diff --git a/.claude/skills/ci/scripts/monitor-ci.sh b/.claude/skills/ci/scripts/monitor-ci.sh
@@ -0,0 +1,209 @@
+#!/usr/bin/env bash
+# monitor-ci.sh [repos] [branch] [sha_pgxntool_test] [sha_pgxntool]
+#
+# Monitor GitHub Actions CI runs for pgxntool-test and/or pgxntool.
+# Designed to be run in background by Claude after every git push.
+#
+# Arguments:
+#   repos            : "both" (default), "pgxntool-test", or "pgxntool"
+#   branch           : branch name (default: current git branch)
+#   sha_pgxntool_test: exact SHA pushed to pgxntool-test (optional)
+#   sha_pgxntool     : exact SHA pushed to pgxntool (optional)
+#
+# Exit codes:
+#   0 : ALL_PASS  — all jobs succeeded
+#   1 : FAIL      — one or more jobs failed
+#   2 : TIMEOUT   — run(s) did not complete within the timeout
+#   3 : NO_RUNS   — no CI run found for this branch after waiting
+#
+# Requires: gh CLI authenticated with repo access.
+
+set -euo pipefail
+
+REPOS="${1:-both}"
+BRANCH="${2:-$(git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "")}"
+SHA_TEST="${3:-}"
+SHA_PGXN="${4:-}"
+
+REPO_TEST="Postgres-Extensions/pgxntool-test"
+REPO_PGXN="Postgres-Extensions/pgxntool"
+
+# pgxntool runs can take up to 10 min: 5 min waiting for a test PR + test time.
+# pgxntool-test runs typically take 2-3 min.
+TIMEOUT_TEST=300    # 5 minutes
+TIMEOUT_PGXN=600    # 10 minutes
+POLL_INTERVAL=10    # seconds between status polls
+
+# ─── Helper: wait for a run to appear, then poll until done ──────────────────
+monitor_one() {
+  local repo="$1"
+  local branch="$2"
+  local sha="$3"
+  local timeout="$4"
+  local label="[$repo]"
+  local elapsed=0
+
+  # Step 1: find the run ID
+  local run_id=""
+  echo "$label Waiting for CI run on branch '$branch'..."
+  while [[ -z "$run_id" ]]; do
+    if [[ -n "$sha" ]]; then
+      # Prefer exact SHA match — avoids race condition when multiple pushes
+      # are close together on the same branch.
+      run_id=$(gh run list --repo "$repo" --commit "$sha" \
+        --json databaseId --jq '.[0].databaseId // empty' 2>/dev/null || true)
+    fi
+    if [[ -z "$run_id" && -n "$branch" ]]; then
+      # Fallback: most recent pull_request run on the branch.
+      # NOTE: this can pick up a different run if two pushes happen rapidly.
+      run_id=$(gh run list --repo "$repo" --branch "$branch" \
+        --event pull_request --limit 1 \
+        --json databaseId --jq '.[0].databaseId // empty' 2>/dev/null || true)
+    fi
+    if [[ -z "$run_id" ]]; then
+      sleep 5
+      elapsed=$((elapsed + 5))
+      if [[ $elapsed -ge $timeout ]]; then
+        echo "$label ERROR: no CI run found after ${timeout}s" >&2
+        return 1
+      fi
+    fi
+  done
+  echo "$label Run $run_id found"
+
+  # Step 2: extract the BRANCHES line as soon as the first job starts.
+  # We use the direct jobs API (fast ~1s) rather than the zip-download log path
+  # (slow 3-10s). We only need one job — all jobs emit the same BRANCHES line.
+  local branches_line=""
+  local attempts=0
+  while [[ -z "$branches_line" && $elapsed -lt $timeout ]]; do
+    local first_job_id
+    first_job_id=$(gh run view "$run_id" --repo "$repo" \
+      --json jobs --jq '[.jobs[].databaseId][0] // empty' 2>/dev/null || true)
+
+    if [[ -n "$first_job_id" ]]; then
+      # grep may return non-zero if the line isn't present yet — that's fine.
+      branches_line=$(gh api "repos/${repo}/actions/jobs/${first_job_id}/logs" \
+        2>/dev/null | grep "^=== BRANCHES:" | tail -1 || true)
+    fi
+
+    if [[ -z "$branches_line" ]]; then
+      attempts=$((attempts + 1))
+      if [[ $attempts -ge 3 ]]; then
+        # Give up waiting for the BRANCHES line and move on to polling.
+        echo "$label (BRANCHES line not yet available; proceeding to poll)"
+        break
+      fi
+      sleep "$POLL_INTERVAL"
+      elapsed=$((elapsed + POLL_INTERVAL))
+    fi
+  done
+  if [[ -n "$branches_line" ]]; then
+    echo "$label $branches_line"
+  fi
+
+  # Step 3: poll until all jobs complete.
+  local status="in_progress"
+  local result=""
+  while [[ "$status" != "completed" && $elapsed -lt $timeout ]]; do
+    result=$(gh run view "$run_id" --repo "$repo" \
+      --json status,conclusion,jobs \
+      --jq '{status: .status, conclusion: .conclusion,
+             jobs: [.jobs[] | {name: .name, status: .status, conclusion: .conclusion}]}' \
+      2>/dev/null || true)
+
+    if [[ -z "$result" ]]; then
+      sleep "$POLL_INTERVAL"
+      elapsed=$((elapsed + POLL_INTERVAL))
+      continue
+    fi
+
+    status=$(echo "$result" | jq -r '.status')
+
+    if [[ "$status" != "completed" ]]; then
+      local running
+      running=$(echo "$result" | jq -r \
+        '[.jobs[] | select(.status == "in_progress") | .name] | join(", ")' || true)
+      if [[ -n "$running" ]]; then
+        echo "$label Polling... (running: $running)"
+      fi
+      sleep "$POLL_INTERVAL"
+      elapsed=$((elapsed + POLL_INTERVAL))
+    fi
+  done
+
+  if [[ $elapsed -ge $timeout ]]; then
+    echo "$label ERROR: timed out after ${timeout}s" >&2
+    return 2
+  fi
+
+  # Step 4: report per-job outcomes.
+  local conclusion
+  conclusion=$(echo "$result" | jq -r '.conclusion')
+  echo "$label Run $run_id completed: $(echo "$conclusion" | tr '[:lower:]' '[:upper:]')"
+  echo "$result" | jq -r '.jobs[] | "\(if .conclusion == "success" then "PASS" elif .conclusion == null then .status else .conclusion | ascii_upcase end)  \(.name)"' \
+    | sed "s|^|$label |"
+
+  # Step 5: for failed jobs, print the failure log (last 60 lines per job).
+  if [[ "$conclusion" != "success" ]]; then
+    local failed_job_ids
+    failed_job_ids=$(gh run view "$run_id" --repo "$repo" \
+      --json jobs \
+      --jq '[.jobs[] | select(.conclusion == "failure") | .databaseId] | .[]' \
+      2>/dev/null || true)
+
+    for job_id in $failed_job_ids; do
+      local job_name
+      job_name=$(gh run view "$run_id" --repo "$repo" \
+        --json jobs \
+        --jq --argjson id "$job_id" \
+        '[.jobs[] | select(.databaseId == $id) | .name] | .[0]' 2>/dev/null || true)
+      echo ""
+      echo "$label === FAILURE: ${job_name:-job $job_id} ==="
+      # Use --log-failed to get only the failed step output, keeping output compact.
+      gh run view --repo "$repo" --job "$job_id" --log-failed 2>&1 \
+        | grep -v "^$" | tail -60 || true
+    done
+
+    return 1
+  fi
+
+  return 0
+}
+
+# ─── Main: run monitors in parallel or series ─────────────────────────────────
+exit_code=0
+pid_test=""
+pid_pgxn=""
+
+case "$REPOS" in
+  pgxntool-test)
+    monitor_one "$REPO_TEST" "$BRANCH" "$SHA_TEST" "$TIMEOUT_TEST" || exit_code=1
+    ;;
+  pgxntool)
+    monitor_one "$REPO_PGXN" "$BRANCH" "$SHA_PGXN" "$TIMEOUT_PGXN" || exit_code=1
+    ;;
+  both|*)
+    # Run both in parallel. Each writes to stdout (interleaved but prefixed with
+    # the repo name for readability). Capture both PIDs and wait for both.
+    monitor_one "$REPO_TEST" "$BRANCH" "$SHA_TEST" "$TIMEOUT_TEST" &
+    pid_test=$!
+    monitor_one "$REPO_PGXN" "$BRANCH" "$SHA_PGXN" "$TIMEOUT_PGXN" &
+    pid_pgxn=$!
+
+    wait "$pid_test" || { r=$?; echo "[both] pgxntool-test CI FAILED"; [[ $r -gt $exit_code ]] && exit_code=$r; }
+    wait "$pid_pgxn" || { r=$?; echo "[both] pgxntool CI FAILED";      [[ $r -gt $exit_code ]] && exit_code=$r; }
+    ;;
+esac
+
+# Emit a parseable summary line. Claude should check this line rather than
+# parsing the full output. Convention matches the test skill's STATUS line.
+if [[ $exit_code -eq 0 ]]; then
+  echo "OVERALL: ALL_PASS"
+elif [[ $exit_code -eq 2 ]]; then
+  echo "OVERALL: TIMEOUT"
+else
+  echo "OVERALL: FAIL"
+fi
+
+exit $exit_code