From 03ce83d76718a885aa36217f31e29ebe52d37068 Mon Sep 17 00:00:00 2001 From: jnasbyupgrade Date: Fri, 15 May 2026 14:33:58 -0500 Subject: [PATCH 01/11] docs: add CI and Contributing section to README Explains the cross-repo PR workflow, branch naming convention, how the pgxntool CI wait logic works, the no-test-pr label and its write-protection, and what branch protection enforces. Co-Authored-By: Claude Sonnet 4.6 --- README.md | 41 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) diff --git a/README.md b/README.md index a306103..0292b4e 100644 --- a/README.md +++ b/README.md @@ -95,6 +95,47 @@ Tests are organized by filename pattern: Each test file automatically runs its prerequisites if needed, so they can be run individually or as a suite. +## CI and Contributing + +### Why two repos? + +`pgxntool` is the framework itself; `pgxntool-test` is the test harness for it. They are kept separate so the test harness can be used as a git subtree in other projects. This means a change to either repo may require a corresponding change in the other, and the CI is designed to handle this. + +### PR conventions + +When your change requires modifications to both repos, open PRs in **both repos using the same branch name**. For example, if your feature branch is named `feature/add-pgtle-support`, create that branch in both `pgxntool` and `pgxntool-test`. The CI uses the branch name to automatically find the corresponding PR in the other repo. + +If your change only affects one repo (e.g., a docs fix in pgxntool-test that doesn't require any pgxntool changes), see [the `no-test-pr` label](#the-no-test-pr-label) below. + +### How CI works + +**When you open a PR in `pgxntool-test`:** +CI checks whether `pgxntool` has a branch with the same name. If it does, tests run against that branch. If not, tests run against `pgxntool/master`. Results appear on your pgxntool-test PR. + +**When you open a PR in `pgxntool`:** +CI searches for an open PR in `pgxntool-test` with the same branch name and waits up to 5 minutes for one to appear (in case you open the second PR shortly after). If a matching test PR is found, CI waits for its checks to pass before running the pgxntool tests against it. Results appear on your pgxntool PR. + +If no matching test PR appears within 5 minutes and the PR does not have the `no-test-pr` label, the CI check fails and the PR cannot be merged. This is intentional — it ensures test coverage for pgxntool changes. + +### The `no-test-pr` label + +For pgxntool PRs that genuinely don't require any test changes (e.g., documentation updates, comment fixes), a maintainer can apply the `no-test-pr` label to the PR. This tells CI to skip waiting for a test PR and run against `pgxntool-test/master` instead. + +**This label is write-protected**: only maintainers with write access to the repository can add or remove it. If you add the label yourself, an automated workflow will remove it and explain why. Ask a maintainer to apply it if you believe your PR doesn't need test changes. + +To get the label applied: +1. Open your pgxntool PR as normal. +2. Leave a comment asking a maintainer to apply `no-test-pr` and explain why no test changes are needed. +3. A maintainer will review and apply the label if appropriate. + +### Branch protection + +The `resolve-test-ref` status check on pgxntool is a required check for merging to `master`. It only passes when either: +- A corresponding pgxntool-test PR CI has passed, or +- A maintainer has applied the `no-test-pr` label. + +This means you cannot merge a pgxntool PR without test coverage, even accidentally. + ## Development See [CLAUDE.md](CLAUDE.md) for detailed development guidelines and architecture documentation. From 9a02bfcb52506e08af33b27c816c9724b7e632b0 Mon Sep 17 00:00:00 2001 From: jnasbyupgrade Date: Fri, 15 May 2026 15:09:58 -0500 Subject: [PATCH 02/11] Add CI workflow and multi-session PR guard - .github/workflows/ci.yml: test against matching pgxntool branch (falls back to master if no matching branch exists) - CLAUDE.md: warn before touching existing PRs not opened in this session Co-Authored-By: Claude Sonnet 4.6 --- .github/workflows/ci.yml | 116 +++++++++++++++++++++++++++++++++++++++ CLAUDE.md | 13 +++++ 2 files changed, 129 insertions(+) create mode 100644 .github/workflows/ci.yml diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml new file mode 100644 index 0000000..37ad493 --- /dev/null +++ b/.github/workflows/ci.yml @@ -0,0 +1,116 @@ +name: CI + +on: + pull_request: + # We use 'pull_request' (not 'pull_request_target') deliberately. + # 'pull_request_target' runs with write access to the base repo, which is + # a security risk for untrusted fork code. Since this workflow only reads + # from other public repos (no secrets needed), 'pull_request' is correct + # and safe even for fork PRs. + +jobs: + test: + name: 🐘 PostgreSQL ${{ matrix.pg }} + runs-on: ubuntu-latest + container: pgxn/pgxn-tools + strategy: + matrix: + pg: [17, 16, 15, 14, 13, 12] + + steps: + - name: Start PostgreSQL ${{ matrix.pg }} + run: pg-start ${{ matrix.pg }} + + - name: Check out pgxntool-test + uses: actions/checkout@v4 + with: + path: pgxntool-test + # REQUIRED: pgxntool-test includes BATS as a git submodule at + # test/bats/. Without this, the submodule directory is empty and + # every test invocation fails with "bats: command not found" — a + # confusing error that looks like a PATH issue rather than a missing + # submodule init. + submodules: recursive + + - name: Resolve pgxntool branch + id: pgxntool-ref + run: | + BRANCH="${{ github.head_ref }}" + + # Check whether pgxntool has a branch with the same name as this PR. + # If it does, use it — changes in this test PR may depend on changes + # in a paired pgxntool branch. If not, fall back to master. + # + # --exit-code makes git ls-remote return non-zero when the ref is + # absent, so we can branch on the result cleanly without parsing output. + if git ls-remote --exit-code --heads \ + https://github.com/Postgres-Extensions/pgxntool.git \ + "refs/heads/${BRANCH}" > /dev/null 2>&1; then + echo "ref=${BRANCH}" >> "$GITHUB_OUTPUT" + echo "Using pgxntool branch: ${BRANCH}" + else + echo "ref=master" >> "$GITHUB_OUTPUT" + echo "pgxntool branch '${BRANCH}' not found; falling back to master" + fi + + - name: Check out pgxntool + uses: actions/checkout@v4 + with: + repository: Postgres-Extensions/pgxntool + ref: ${{ steps.pgxntool-ref.outputs.ref }} + path: pgxntool + submodules: false # pgxntool has no submodules + + - name: Configure git for CI + run: | + # The container may run as a different UID than the user who owns the + # checkout directories. Without safe.directory, git prints + # "fatal: dubious ownership" and refuses to operate — causing cryptic + # test failures unrelated to the actual code being tested. + git config --global --add safe.directory "$GITHUB_WORKSPACE/pgxntool" + git config --global --add safe.directory "$GITHUB_WORKSPACE/pgxntool-test" + + # Required for any git operations that create commits inside tests + # (e.g. the test repo created by foundation.bats). + git config --global user.email "ci@github-actions" + git config --global user.name "GitHub Actions" + + # actions/checkout leaves HEAD detached at the PR merge commit. + # Tests that call 'git subtree add' need a real local branch name, + # not a detached HEAD. -B force-creates the branch even if it already + # exists (plain -b errors on existing branches). + cd pgxntool + git checkout -B "${{ steps.pgxntool-ref.outputs.ref }}" + + - name: Install dependencies + run: | + apt-get update -qq + apt-get install -y -qq \ + rsync \ + ruby-asciidoctor \ + jq + # rsync: used extensively in test infrastructure (foundation.bats, + # build_test_repo_from_template, ensure_foundation). Absent rsync + # causes failures deep inside BATS output that look like test logic + # problems rather than a missing system dependency. + # + # ruby-asciidoctor: required by Makefile's check-readme target, called + # during test-setup. Missing it fails the setup phase before any tests + # run, with an error that mentions asciidoctor but not CI setup. + # + # jq: required by assert_valid_meta_json(). Listed proactively — + # pgxn-tools may add it eventually but it is not guaranteed, and the + # error when it's absent is a cryptic parse failure rather than + # "jq: not found". + + - name: Run tests + working-directory: pgxntool-test + env: + # Tell the test infrastructure which pgxntool branch to use. + # helpers.bash normally auto-detects this via 'git ls-remote', but + # in CI our pgxntool is checked out at a specific ref, not necessarily + # the tip of a named remote branch. Setting PGXNBRANCH explicitly + # bypasses auto-detection and prevents it from choosing the wrong + # branch or failing on the detached HEAD state. + PGXNBRANCH: ${{ steps.pgxntool-ref.outputs.ref }} + run: make test diff --git a/CLAUDE.md b/CLAUDE.md index 217a818..97a6946 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -2,6 +2,19 @@ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. +## Multiple Concurrent Sessions + +It is common to have multiple Claude Code sessions open simultaneously across +pgxntool and pgxntool-test. To avoid cross-session interference: + +**If you are asked to do something on an existing PR that you did not open or +are not already working on in this session, immediately ask for confirmation +before proceeding.** For example: "I see PR #21 exists. Were you asking me to +work on that, or did you mean to send this to a different session?" + +This applies to: editing PR branches, pushing to them, closing/reopening them, +adding commits, modifying PR descriptions, or any other PR-level action. + ## Git Commit Guidelines **CRITICAL**: Never attempt to commit changes on your own initiative. Always wait for explicit user instruction to commit. Even if you detect issues (like out-of-date files), inform the user and let them decide when to commit. From 3320dd6f186a113e494eb4032706a07f8c0043ee Mon Sep 17 00:00:00 2001 From: jnasbyupgrade Date: Fri, 15 May 2026 15:17:48 -0500 Subject: [PATCH 03/11] fix: use full clone for pgxntool checkout (git subtree requires it) git subtree add refuses to work with shallow clones; fails with "shallow roots are not allowed to be updated" which looks like a remote/ref problem rather than a depth issue. Co-Authored-By: Claude Sonnet 4.6 --- .github/workflows/ci.yml | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 37ad493..e37e283 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -60,6 +60,11 @@ jobs: ref: ${{ steps.pgxntool-ref.outputs.ref }} path: pgxntool submodules: false # pgxntool has no submodules + # REQUIRED: must be a full clone. 'git subtree add' refuses to work + # with shallow clones and fails with "shallow roots are not allowed + # to be updated" — an error that looks like a remote/ref problem + # rather than a depth issue. + fetch-depth: 0 - name: Configure git for CI run: | From cb08a2c0dd26c1dd1bc1084663a0f3ea6f82ebd9 Mon Sep 17 00:00:00 2001 From: jnasbyupgrade Date: Fri, 15 May 2026 15:20:14 -0500 Subject: [PATCH 04/11] docs: require CI monitoring background task after every push Co-Authored-By: Claude Sonnet 4.6 --- CLAUDE.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/CLAUDE.md b/CLAUDE.md index 97a6946..a668177 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -2,6 +2,17 @@ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. +## CI Monitoring After Every Push + +**REQUIRED**: After every `git push`, immediately start a background task to +monitor the CI run for that push. If you pushed to both pgxntool and +pgxntool-test, start a background task for each repo — do not monitor them +sequentially. + +Use `gh run watch` or poll with `gh run list` / `gh pr checks` in the +background task. Report failures to the user as soon as they are detected; +do not wait for all jobs to finish before reporting. + ## Multiple Concurrent Sessions It is common to have multiple Claude Code sessions open simultaneously across From d202419352fb408bde6c1b09f107f90261fd8a68 Mon Sep 17 00:00:00 2001 From: jnasbyupgrade Date: Fri, 15 May 2026 15:25:44 -0500 Subject: [PATCH 05/11] fix: install asciidoctor via gem; add branch context reporting - gem install asciidoctor instead of apt ruby-asciidoctor: the apt package binary is not on PATH in pgxn-tools containers (diagnosed from base.mk: "Could not find asciidoc or asciidoctor") - Print '=== BRANCHES: ===' line in resolve step so CI logs clearly show which pgxntool branch was selected for the run Co-Authored-By: Claude Sonnet 4.6 --- .github/workflows/ci.yml | 29 ++++++++++++++++++++--------- 1 file changed, 20 insertions(+), 9 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index e37e283..836ba0c 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -21,6 +21,16 @@ jobs: - name: Start PostgreSQL ${{ matrix.pg }} run: pg-start ${{ matrix.pg }} + - name: Report CI context + run: | + # Print both repos and branches at the top of every job so failures + # are easy to correlate to the right code. This is especially useful + # when debugging cross-repo issues where the wrong branch was picked. + echo "=== CI context ===" + echo "Triggered by : ${{ github.repository }} PR #${{ github.event.pull_request.number }}" + echo "pgxntool-test branch : ${{ github.head_ref }}" + echo "pgxntool branch (resolved below) : (see Resolve pgxntool branch step)" + - name: Check out pgxntool-test uses: actions/checkout@v4 with: @@ -47,10 +57,10 @@ jobs: https://github.com/Postgres-Extensions/pgxntool.git \ "refs/heads/${BRANCH}" > /dev/null 2>&1; then echo "ref=${BRANCH}" >> "$GITHUB_OUTPUT" - echo "Using pgxntool branch: ${BRANCH}" + echo "=== BRANCHES: pgxntool-test=${BRANCH} pgxntool=${BRANCH} ===" else echo "ref=master" >> "$GITHUB_OUTPUT" - echo "pgxntool branch '${BRANCH}' not found; falling back to master" + echo "=== BRANCHES: pgxntool-test=${BRANCH} pgxntool=master (branch not found in pgxntool) ===" fi - name: Check out pgxntool @@ -90,18 +100,19 @@ jobs: - name: Install dependencies run: | apt-get update -qq - apt-get install -y -qq \ - rsync \ - ruby-asciidoctor \ - jq + apt-get install -y -qq rsync jq ruby + gem install asciidoctor --no-document --quiet # rsync: used extensively in test infrastructure (foundation.bats, # build_test_repo_from_template, ensure_foundation). Absent rsync # causes failures deep inside BATS output that look like test logic # problems rather than a missing system dependency. # - # ruby-asciidoctor: required by Makefile's check-readme target, called - # during test-setup. Missing it fails the setup phase before any tests - # run, with an error that mentions asciidoctor but not CI setup. + # asciidoctor (via gem install, not apt): base.mk searches PATH for + # 'asciidoctor'. The apt package ruby-asciidoctor installs the gem + # but its binary lands in the Ruby gems bin dir, which is NOT on PATH + # in the pgxn-tools container. 'gem install' puts it in /usr/local/bin + # which IS on PATH. Diagnosed from: "Could not find asciidoc or + # asciidoctor. Add one of them to your PATH." in base.mk output. # # jq: required by assert_valid_meta_json(). Listed proactively — # pgxn-tools may add it eventually but it is not guaranteed, and the From 76e253662934c14f342b515c1133fc8eb0d34a42 Mon Sep 17 00:00:00 2001 From: jnasbyupgrade Date: Fri, 15 May 2026 15:30:56 -0500 Subject: [PATCH 06/11] Add /ci skill for post-push CI monitoring Shell-script-driven skill that monitors GitHub Actions runs across both repos after a push. Key features: - Uses --commit SHA when available to avoid branch-race-condition - Fast log API path for BRANCHES line extraction (~1s vs 3-10s zip) - Parallel monitoring of both repos by default - Per-job pass/fail summary + failure log tail - 10-min timeout for pgxntool (accounts for 5-min test-PR wait) - Prefixes all output with [repo] for readability when interleaved Co-Authored-By: Claude Sonnet 4.6 --- .claude/skills/ci/SKILL.md | 76 ++++++++++ .claude/skills/ci/scripts/monitor-ci.sh | 193 ++++++++++++++++++++++++ 2 files changed, 269 insertions(+) create mode 100644 .claude/skills/ci/SKILL.md create mode 100755 .claude/skills/ci/scripts/monitor-ci.sh diff --git a/.claude/skills/ci/SKILL.md b/.claude/skills/ci/SKILL.md new file mode 100644 index 0000000..3c5c3d8 --- /dev/null +++ b/.claude/skills/ci/SKILL.md @@ -0,0 +1,76 @@ +--- +name: ci +description: | + Monitor GitHub Actions CI runs for pgxntool and/or pgxntool-test after a push. + Reports which branches are under test, per-job pass/fail, and failure details. + Uses shell scripts for all heavy work to minimize context consumption. + + Use when: "monitor CI", "watch CI", "check CI", "/ci" +allowed-tools: Bash(bash .claude/skills/ci/scripts/*), Read +--- + +# CI Monitor Skill + +Monitor GitHub Actions CI across both repos after a push. Always run in background. + +## Usage + +- `/ci` — monitor the most recent CI run on both repos for the current branch +- `/ci pgxntool-test` — monitor pgxntool-test only +- `/ci pgxntool` — monitor pgxntool only +- `/ci ` — monitor specific push SHAs (most reliable) + +## Workflow + +### 1. Start Monitor (Background) + +After every `git push`, immediately launch: + +```bash +bash .claude/skills/ci/scripts/monitor-ci.sh [repos] [branch] [sha1] [sha2] +``` + +Arguments: +- `repos`: `both` (default), `pgxntool-test`, or `pgxntool` +- `branch`: the branch just pushed (default: current git branch) +- `sha1`: SHA pushed to pgxntool-test (optional but recommended) +- `sha2`: SHA pushed to pgxntool (optional but recommended) + +When pushing to both repos, always pass the SHAs to avoid a race condition where +`--branch` might pick up a different concurrent push. + +**Always use `run_in_background: true`.** + +### 2. Read Results + +When the background task completes, read the output. The script emits: + +``` +[pgxntool-test] Run 12345678 found +[pgxntool-test] === BRANCHES: pgxntool-test=feature/foo pgxntool=feature/foo === +[pgxntool-test] Polling... (running: 🐘 PostgreSQL 13, 🐘 PostgreSQL 15) +[pgxntool-test] PASS 🐘 PostgreSQL 12 +[pgxntool-test] PASS 🐘 PostgreSQL 15 +[pgxntool-test] FAIL 🐘 PostgreSQL 13 +[pgxntool-test] Run completed: FAILURE +[pgxntool-test] === FAILURE: 🐘 PostgreSQL 13 === +... failure log lines ... +``` + +### 3. Enforce Results + +**CRITICAL RULES:** + +1. Any CI failure must be **reported to the user immediately**. Do not continue with other work. +2. Diagnose from the **first** `not ok` line — ignore cascading failures below it. +3. Failures in our workflow files (dependency installs, git config, etc.) are our problem to fix. +4. Failures in test code (not ok from BATS) may be pre-existing — report to user and ask before touching test files. +5. Never rationalize failures as "pre-existing" or "unrelated" without explicitly telling the user. +6. If CI is taking longer than expected on pgxntool, it may be waiting up to 5 min for a pgxntool-test PR — that is normal. + +## Key rules + +1. **ALWAYS** monitor CI after every push — use this skill, never `gh run watch` directly +2. When pushing to both repos, start two background monitors simultaneously (one per repo) +3. Pass the exact push SHA when available — `--branch` has a race condition on rapid pushes +4. The `=== BRANCHES ===` line in the output confirms which code is under test — always verify it matches your intent diff --git a/.claude/skills/ci/scripts/monitor-ci.sh b/.claude/skills/ci/scripts/monitor-ci.sh new file mode 100755 index 0000000..75b3b52 --- /dev/null +++ b/.claude/skills/ci/scripts/monitor-ci.sh @@ -0,0 +1,193 @@ +#!/usr/bin/env bash +# monitor-ci.sh [repos] [branch] [sha_pgxntool_test] [sha_pgxntool] +# +# Monitor GitHub Actions CI runs for pgxntool-test and/or pgxntool. +# Designed to be run in background by Claude after every git push. +# +# Arguments: +# repos : "both" (default), "pgxntool-test", or "pgxntool" +# branch : branch name (default: current git branch) +# sha_pgxntool_test: exact SHA pushed to pgxntool-test (optional) +# sha_pgxntool : exact SHA pushed to pgxntool (optional) +# +# Exit code: 0 if all monitored runs succeed, 1 otherwise. +# +# Requires: gh CLI authenticated with repo access. + +set -euo pipefail + +REPOS="${1:-both}" +BRANCH="${2:-$(git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "")}" +SHA_TEST="${3:-}" +SHA_PGXN="${4:-}" + +REPO_TEST="Postgres-Extensions/pgxntool-test" +REPO_PGXN="Postgres-Extensions/pgxntool" + +# pgxntool runs can take up to 10 min: 5 min waiting for a test PR + test time. +# pgxntool-test runs typically take 2-3 min. +TIMEOUT_TEST=300 # 5 minutes +TIMEOUT_PGXN=600 # 10 minutes +POLL_INTERVAL=10 # seconds between status polls + +# ─── Helper: wait for a run to appear, then poll until done ────────────────── +monitor_one() { + local repo="$1" + local branch="$2" + local sha="$3" + local timeout="$4" + local label="[$repo]" + local elapsed=0 + + # Step 1: find the run ID + local run_id="" + echo "$label Waiting for CI run on branch '$branch'..." + while [[ -z "$run_id" ]]; do + if [[ -n "$sha" ]]; then + # Prefer exact SHA match — avoids race condition when multiple pushes + # are close together on the same branch. + run_id=$(gh run list --repo "$repo" --commit "$sha" \ + --json databaseId --jq '.[0].databaseId // empty' 2>/dev/null || true) + fi + if [[ -z "$run_id" && -n "$branch" ]]; then + # Fallback: most recent pull_request run on the branch. + # NOTE: this can pick up a different run if two pushes happen rapidly. + run_id=$(gh run list --repo "$repo" --branch "$branch" \ + --event pull_request --limit 1 \ + --json databaseId --jq '.[0].databaseId // empty' 2>/dev/null || true) + fi + if [[ -z "$run_id" ]]; then + sleep 5 + elapsed=$((elapsed + 5)) + if [[ $elapsed -ge $timeout ]]; then + echo "$label ERROR: no CI run found after ${timeout}s" >&2 + return 1 + fi + fi + done + echo "$label Run $run_id found" + + # Step 2: extract the BRANCHES line as soon as the first job starts. + # We use the direct jobs API (fast ~1s) rather than the zip-download log path + # (slow 3-10s). We only need one job — all jobs emit the same BRANCHES line. + local branches_line="" + local attempts=0 + while [[ -z "$branches_line" && $elapsed -lt $timeout ]]; do + local first_job_id + first_job_id=$(gh run view "$run_id" --repo "$repo" \ + --json jobs --jq '[.jobs[].databaseId][0] // empty' 2>/dev/null || true) + + if [[ -n "$first_job_id" ]]; then + # grep may return non-zero if the line isn't present yet — that's fine. + branches_line=$(gh api "repos/${repo}/actions/jobs/${first_job_id}/logs" \ + 2>/dev/null | grep "^=== BRANCHES:" | tail -1 || true) + fi + + if [[ -z "$branches_line" ]]; then + attempts=$((attempts + 1)) + if [[ $attempts -ge 3 ]]; then + # Give up waiting for the BRANCHES line and move on to polling. + echo "$label (BRANCHES line not yet available; proceeding to poll)" + break + fi + sleep "$POLL_INTERVAL" + elapsed=$((elapsed + POLL_INTERVAL)) + fi + done + if [[ -n "$branches_line" ]]; then + echo "$label $branches_line" + fi + + # Step 3: poll until all jobs complete. + local status="in_progress" + local result="" + while [[ "$status" != "completed" && $elapsed -lt $timeout ]]; do + result=$(gh run view "$run_id" --repo "$repo" \ + --json status,conclusion,jobs \ + --jq '{status: .status, conclusion: .conclusion, + jobs: [.jobs[] | {name: .name, status: .status, conclusion: .conclusion}]}' \ + 2>/dev/null || true) + + if [[ -z "$result" ]]; then + sleep "$POLL_INTERVAL" + elapsed=$((elapsed + POLL_INTERVAL)) + continue + fi + + status=$(echo "$result" | jq -r '.status') + + if [[ "$status" != "completed" ]]; then + local running + running=$(echo "$result" | jq -r \ + '[.jobs[] | select(.status == "in_progress") | .name] | join(", ")' || true) + if [[ -n "$running" ]]; then + echo "$label Polling... (running: $running)" + fi + sleep "$POLL_INTERVAL" + elapsed=$((elapsed + POLL_INTERVAL)) + fi + done + + if [[ $elapsed -ge $timeout ]]; then + echo "$label ERROR: timed out after ${timeout}s" >&2 + return 1 + fi + + # Step 4: report per-job outcomes. + local conclusion + conclusion=$(echo "$result" | jq -r '.conclusion') + echo "$label Run $run_id completed: $(echo "$conclusion" | tr '[:lower:]' '[:upper:]')" + echo "$result" | jq -r '.jobs[] | "\(if .conclusion == "success" then "PASS" elif .conclusion == null then .status else .conclusion | ascii_upcase end) \(.name)"' \ + | sed "s/^/$label /" + + # Step 5: for failed jobs, print the failure log (last 60 lines per job). + if [[ "$conclusion" != "success" ]]; then + local failed_job_ids + failed_job_ids=$(gh run view "$run_id" --repo "$repo" \ + --json jobs \ + --jq '[.jobs[] | select(.conclusion == "failure") | .databaseId] | .[]' \ + 2>/dev/null || true) + + for job_id in $failed_job_ids; do + local job_name + job_name=$(gh run view "$run_id" --repo "$repo" \ + --json jobs \ + --jq --argjson id "$job_id" \ + '[.jobs[] | select(.databaseId == $id) | .name] | .[0]' 2>/dev/null || true) + echo "" + echo "$label === FAILURE: ${job_name:-job $job_id} ===" + # Use --log-failed to get only the failed step output, keeping output compact. + gh run view --repo "$repo" --job "$job_id" --log-failed 2>&1 \ + | grep -v "^$" | tail -60 || true + done + + return 1 + fi + + return 0 +} + +# ─── Main: run monitors in parallel or series ───────────────────────────────── +exit_code=0 + +case "$REPOS" in + pgxntool-test) + monitor_one "$REPO_TEST" "$BRANCH" "$SHA_TEST" "$TIMEOUT_TEST" || exit_code=1 + ;; + pgxntool) + monitor_one "$REPO_PGXN" "$BRANCH" "$SHA_PGXN" "$TIMEOUT_PGXN" || exit_code=1 + ;; + both|*) + # Run both in parallel. Each writes to stdout (interleaved but prefixed with + # the repo name for readability). Capture both PIDs and wait for both. + monitor_one "$REPO_TEST" "$BRANCH" "$SHA_TEST" "$TIMEOUT_TEST" & + pid_test=$! + monitor_one "$REPO_PGXN" "$BRANCH" "$SHA_PGXN" "$TIMEOUT_PGXN" & + pid_pgxn=$! + + wait "$pid_test" || { echo "[both] pgxntool-test CI FAILED"; exit_code=1; } + wait "$pid_pgxn" || { echo "[both] pgxntool CI FAILED"; exit_code=1; } + ;; +esac + +exit $exit_code From dc1631cc619cb128f8cf711b2454d29ee973110f Mon Sep 17 00:00:00 2001 From: jnasbyupgrade Date: Fri, 15 May 2026 15:32:05 -0500 Subject: [PATCH 07/11] fix: pre-install pgtap to prevent concurrent-install race condition MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The concurrent-make-test.bats test runs two `make test` processes simultaneously to verify database-name isolation. Both processes check $(datadir)/extension/pgtap.control and, finding it absent, both invoke `pgxn install pgtap --sudo` concurrently. Their simultaneous `gmake install` calls race to write files into the shared /usr/share/postgresql//extension/ directory, producing: install: cannot create regular file '.../pgtap--X.Y.Z.sql': File exists install: cannot change permissions of '...': No such file or directory The race is most pronounced on older PostgreSQL versions (e.g. PG13) where pgtap's build applies more compat patches, widening the window during which both installs run in parallel. Fix: add a Pre-install pgtap step before `make test`. With pgtap.control already present, both concurrent make processes find the Make target's prerequisite satisfied and skip the install entirely — no race occurs. Co-Authored-By: Claude Sonnet 4.6 --- .github/workflows/ci.yml | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 836ba0c..187d9bc 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -119,6 +119,25 @@ jobs: # error when it's absent is a cryptic parse failure rather than # "jq: not found". + - name: Pre-install pgtap + run: | + # Pre-install pgtap before running tests to prevent a race condition + # in the concurrent-make-test.bats suite. + # + # The test "concurrent make test succeeds for both projects" runs two + # `make test` processes simultaneously. Both check for + # $(datadir)/extension/pgtap.control and, finding it absent, both + # invoke `pgxn install pgtap --sudo` at the same time. Their + # concurrent `gmake install` calls race to write files into the + # shared /usr/share/postgresql//extension/ directory, causing + # "File exists" and "No such file or directory" errors. + # + # Pre-installing pgtap here ensures the .control file is already + # present when the concurrent test runs. Both make processes then + # skip the install (the Make target's prerequisite is already + # satisfied) and no race occurs. + pgxn install pgtap --sudo + - name: Run tests working-directory: pgxntool-test env: From 6493b3eb46596b38dcc924266f5abea149f1b635 Mon Sep 17 00:00:00 2001 From: jnasbyupgrade Date: Fri, 15 May 2026 15:32:52 -0500 Subject: [PATCH 08/11] ci skill: add OVERALL line, exit code contract, timeout exit code - Script now prints OVERALL: ALL_PASS/FAIL/TIMEOUT as the last line - Exit codes: 0=pass, 1=fail, 2=timeout (distinguishable from failure) - SKILL.md documents the contract with a table so Claude knows exactly what to check without parsing the full output Co-Authored-By: Claude Sonnet 4.6 --- .claude/skills/ci/SKILL.md | 11 +++++++++++ .claude/skills/ci/scripts/monitor-ci.sh | 22 ++++++++++++++++++---- 2 files changed, 29 insertions(+), 4 deletions(-) diff --git a/.claude/skills/ci/SKILL.md b/.claude/skills/ci/SKILL.md index 3c5c3d8..b50e095 100644 --- a/.claude/skills/ci/SKILL.md +++ b/.claude/skills/ci/SKILL.md @@ -55,8 +55,19 @@ When the background task completes, read the output. The script emits: [pgxntool-test] Run completed: FAILURE [pgxntool-test] === FAILURE: 🐘 PostgreSQL 13 === ... failure log lines ... +OVERALL: FAIL ``` +The **last line is always `OVERALL: `**. Check this first: + +| OVERALL | Exit code | Meaning | +|---------|-----------|---------| +| `ALL_PASS` | 0 | All jobs green — safe to proceed | +| `FAIL` | 1 | One or more jobs failed — stop and report | +| `TIMEOUT` | 2 | Run(s) did not complete within timeout | + +Also verify the `=== BRANCHES ===` line matches the code you just pushed. + ### 3. Enforce Results **CRITICAL RULES:** diff --git a/.claude/skills/ci/scripts/monitor-ci.sh b/.claude/skills/ci/scripts/monitor-ci.sh index 75b3b52..60963ce 100755 --- a/.claude/skills/ci/scripts/monitor-ci.sh +++ b/.claude/skills/ci/scripts/monitor-ci.sh @@ -10,7 +10,11 @@ # sha_pgxntool_test: exact SHA pushed to pgxntool-test (optional) # sha_pgxntool : exact SHA pushed to pgxntool (optional) # -# Exit code: 0 if all monitored runs succeed, 1 otherwise. +# Exit codes: +# 0 : ALL_PASS — all jobs succeeded +# 1 : FAIL — one or more jobs failed +# 2 : TIMEOUT — run(s) did not complete within the timeout +# 3 : NO_RUNS — no CI run found for this branch after waiting # # Requires: gh CLI authenticated with repo access. @@ -130,7 +134,7 @@ monitor_one() { if [[ $elapsed -ge $timeout ]]; then echo "$label ERROR: timed out after ${timeout}s" >&2 - return 1 + return 2 fi # Step 4: report per-job outcomes. @@ -185,9 +189,19 @@ case "$REPOS" in monitor_one "$REPO_PGXN" "$BRANCH" "$SHA_PGXN" "$TIMEOUT_PGXN" & pid_pgxn=$! - wait "$pid_test" || { echo "[both] pgxntool-test CI FAILED"; exit_code=1; } - wait "$pid_pgxn" || { echo "[both] pgxntool CI FAILED"; exit_code=1; } + wait "$pid_test" || { r=$?; echo "[both] pgxntool-test CI FAILED"; [[ $r -gt $exit_code ]] && exit_code=$r; } + wait "$pid_pgxn" || { r=$?; echo "[both] pgxntool CI FAILED"; [[ $r -gt $exit_code ]] && exit_code=$r; } ;; esac +# Emit a parseable summary line. Claude should check this line rather than +# parsing the full output. Convention matches the test skill's STATUS line. +if [[ $exit_code -eq 0 ]]; then + echo "OVERALL: ALL_PASS" +elif [[ $exit_code -eq 2 ]]; then + echo "OVERALL: TIMEOUT" +else + echo "OVERALL: FAIL" +fi + exit $exit_code From 7374c071bee727c9e829e29660d10376d47b3ace Mon Sep 17 00:00:00 2001 From: jnasbyupgrade Date: Fri, 15 May 2026 16:12:40 -0500 Subject: [PATCH 09/11] fix: ci skill edge cases and race condition documentation - monitor-ci.sh: fix sed delimiter collision when $label contains '/' (sed "s/^/$label /" broke because $label = "[repo/name]"; use '|' delimiter) - monitor-ci.sh: initialize pid_test/pid_pgxn before case statement to prevent "unbound variable" error with set -u when repos=both - SKILL.md: add race condition warning box; clarify BRANCHES line verification as primary safeguard against --branch picking wrong run - CLAUDE.md: replace vague CI monitoring rule with specific SHA + BRANCHES line guidance Co-Authored-By: Claude Sonnet 4.6 --- .claude/skills/ci/SKILL.md | 13 +++++++++++-- .claude/skills/ci/scripts/monitor-ci.sh | 4 +++- CLAUDE.md | 12 +++++++++--- 3 files changed, 23 insertions(+), 6 deletions(-) diff --git a/.claude/skills/ci/SKILL.md b/.claude/skills/ci/SKILL.md index b50e095..0b389a8 100644 --- a/.claude/skills/ci/SKILL.md +++ b/.claude/skills/ci/SKILL.md @@ -37,7 +37,13 @@ Arguments: - `sha2`: SHA pushed to pgxntool (optional but recommended) When pushing to both repos, always pass the SHAs to avoid a race condition where -`--branch` might pick up a different concurrent push. +`--branch` might pick up a different concurrent push on the same branch. + +> **Race condition note**: `gh run list --branch` returns the most recent run on +> that branch — if two pushes happen close together (e.g. two sessions pushing +> in parallel), it may pick up the wrong run. Passing `--commit SHA` targets the +> exact push and avoids this. When SHA is unavailable, always verify the +> `=== BRANCHES: ===` line in the output matches the code you pushed. **Always use `run_in_background: true`.** @@ -66,7 +72,10 @@ The **last line is always `OVERALL: `**. Check this first: | `FAIL` | 1 | One or more jobs failed — stop and report | | `TIMEOUT` | 2 | Run(s) did not complete within timeout | -Also verify the `=== BRANCHES ===` line matches the code you just pushed. +**Always verify the `=== BRANCHES ===` line** matches the code you just pushed — +this is your primary safeguard against the `--branch` race condition. If the +branches don't match, cancel the run and re-trigger: `gh run cancel --repo +` then re-push or re-run via `gh run rerun`. ### 3. Enforce Results diff --git a/.claude/skills/ci/scripts/monitor-ci.sh b/.claude/skills/ci/scripts/monitor-ci.sh index 60963ce..1b232c6 100755 --- a/.claude/skills/ci/scripts/monitor-ci.sh +++ b/.claude/skills/ci/scripts/monitor-ci.sh @@ -142,7 +142,7 @@ monitor_one() { conclusion=$(echo "$result" | jq -r '.conclusion') echo "$label Run $run_id completed: $(echo "$conclusion" | tr '[:lower:]' '[:upper:]')" echo "$result" | jq -r '.jobs[] | "\(if .conclusion == "success" then "PASS" elif .conclusion == null then .status else .conclusion | ascii_upcase end) \(.name)"' \ - | sed "s/^/$label /" + | sed "s|^|$label |" # Step 5: for failed jobs, print the failure log (last 60 lines per job). if [[ "$conclusion" != "success" ]]; then @@ -173,6 +173,8 @@ monitor_one() { # ─── Main: run monitors in parallel or series ───────────────────────────────── exit_code=0 +pid_test="" +pid_pgxn="" case "$REPOS" in pgxntool-test) diff --git a/CLAUDE.md b/CLAUDE.md index a668177..2331ac0 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -9,9 +9,15 @@ monitor the CI run for that push. If you pushed to both pgxntool and pgxntool-test, start a background task for each repo — do not monitor them sequentially. -Use `gh run watch` or poll with `gh run list` / `gh pr checks` in the -background task. Report failures to the user as soon as they are detected; -do not wait for all jobs to finish before reporting. +Always use the `/ci` skill (`bash .claude/skills/ci/scripts/monitor-ci.sh`). +Pass the exact push SHA when available — `gh run list --branch` has a race +condition: if two pushes land close together on the same branch (e.g., two +Claude sessions pushing in parallel), `--branch` may pick up the wrong run. +`--commit SHA` targets the exact push and avoids this. + +**After every monitor run, check the `=== BRANCHES: pgxntool=X +pgxntool-test=Y ===` line** to verify the right code is under test. If the +branches don't match what you pushed, cancel the run and re-trigger. ## Multiple Concurrent Sessions From 3c2ffeee301338a8d791c8ab246b7b94ac03f1af Mon Sep 17 00:00:00 2001 From: jnasbyupgrade Date: Fri, 15 May 2026 16:12:47 -0500 Subject: [PATCH 10/11] fix: replace undefined fail() with error() in BATS tests `fail` is not a BATS built-in and is not defined anywhere in the test helpers. When a test assertion fails and `fail "message"` is called, bash reports "fail: command not found" (exit 127), which masks the real failure message entirely. The correct replacement is `error`, which is defined in test/lib/helpers.bash: error() { out "ERROR: $*"; return 1; } 8 sites fixed across two files: - test/standard/concurrent-make-test.bats (7 uses) - test/standard/make-test.bats (1 use) Co-Authored-By: Claude Sonnet 4.6 --- test/standard/concurrent-make-test.bats | 14 +++++++------- test/standard/make-test.bats | 2 +- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/test/standard/concurrent-make-test.bats b/test/standard/concurrent-make-test.bats index 0c85555..78d885c 100644 --- a/test/standard/concurrent-make-test.bats +++ b/test/standard/concurrent-make-test.bats @@ -51,13 +51,13 @@ setup() { dbname1=$(make -C "$repo1" print-REGRESS_DBNAME 2>&1 | sed -n 's/.*set to "\(.*\)"/\1/p') dbname2=$(make -C "$repo2" print-REGRESS_DBNAME 2>&1 | sed -n 's/.*set to "\(.*\)"/\1/p') - [ -n "$dbname1" ] || fail "Could not extract REGRESS_DBNAME from repo1" - [ -n "$dbname2" ] || fail "Could not extract REGRESS_DBNAME from repo2" + [ -n "$dbname1" ] || error "Could not extract REGRESS_DBNAME from repo1" + [ -n "$dbname2" ] || error "Could not extract REGRESS_DBNAME from repo2" out "repo1 REGRESS_DBNAME: $dbname1" out "repo2 REGRESS_DBNAME: $dbname2" - [ "$dbname1" != "$dbname2" ] || fail "Both repos have the same REGRESS_DBNAME: $dbname1" + [ "$dbname1" != "$dbname2" ] || error "Both repos have the same REGRESS_DBNAME: $dbname1" } @test "single-extension repo has exactly one --dbname flag" { @@ -69,7 +69,7 @@ setup() { count=$(echo "$count" | tr -d ' ') out "single-extension --dbname count: $count" - [ "$count" -eq 1 ] || fail "Expected exactly 1 --dbname flag, got $count" + [ "$count" -eq 1 ] || error "Expected exactly 1 --dbname flag, got $count" } @test "multi-extension repo has exactly one --dbname flag" { @@ -81,7 +81,7 @@ setup() { count=$(echo "$count" | tr -d ' ') out "multi-extension --dbname count: $count" - [ "$count" -eq 1 ] || fail "Expected exactly 1 --dbname flag, got $count" + [ "$count" -eq 1 ] || error "Expected exactly 1 --dbname flag, got $count" } @test "concurrent make test succeeds for both projects" { @@ -118,8 +118,8 @@ setup() { rm -f "$log1" "$log2" - [ $status1 -eq 0 ] || fail "single-extension make test failed" - [ $status2 -eq 0 ] || fail "multi-extension make test failed" + [ $status1 -eq 0 ] || error "single-extension make test failed" + [ $status2 -eq 0 ] || error "multi-extension make test failed" } # vi: expandtab sw=2 ts=2 diff --git a/test/standard/make-test.bats b/test/standard/make-test.bats index 71041bd..4743275 100755 --- a/test/standard/make-test.bats +++ b/test/standard/make-test.bats @@ -101,7 +101,7 @@ EOF # Output format: REGRESS_DBNAME is simple variable set to "value" local expected_dbname expected_dbname=$(make print-REGRESS_DBNAME 2>&1 | sed -n 's/.*set to "\(.*\)"/\1/p') - [ -n "$expected_dbname" ] || fail "Could not extract REGRESS_DBNAME from make" + [ -n "$expected_dbname" ] || error "Could not extract REGRESS_DBNAME from make" # Manually create the expected output file mkdir -p test/expected From dd3563d09df134c2c175b4c5bbd64ca7f62e97bd Mon Sep 17 00:00:00 2001 From: jnasbyupgrade Date: Fri, 15 May 2026 16:25:16 -0500 Subject: [PATCH 11/11] docs: update CI docs for new check-test-pr design - README: rewrite CI and Contributing section to reflect new fast check-test-pr behavior, 'commit-with-no-tests' label, clear instructions for "no paired test PR" failures - Remove references to 5-minute wait and old 'no-test-pr' label name Co-Authored-By: Claude Sonnet 4.6 --- README.md | 42 ++++++++++++++++++++++++++---------------- 1 file changed, 26 insertions(+), 16 deletions(-) diff --git a/README.md b/README.md index 0292b4e..d008ff6 100644 --- a/README.md +++ b/README.md @@ -103,38 +103,48 @@ Each test file automatically runs its prerequisites if needed, so they can be ru ### PR conventions -When your change requires modifications to both repos, open PRs in **both repos using the same branch name**. For example, if your feature branch is named `feature/add-pgtle-support`, create that branch in both `pgxntool` and `pgxntool-test`. The CI uses the branch name to automatically find the corresponding PR in the other repo. - -If your change only affects one repo (e.g., a docs fix in pgxntool-test that doesn't require any pgxntool changes), see [the `no-test-pr` label](#the-no-test-pr-label) below. +When your change requires modifications to both repos, open PRs in **both repos using the same branch name**. For example, if your feature branch is named `feature/add-pgtle-support`, create that branch in both `pgxntool` and `pgxntool-test`. **Branch names must match exactly** — the CI uses the branch name to find the paired PR. ### How CI works **When you open a PR in `pgxntool-test`:** -CI checks whether `pgxntool` has a branch with the same name. If it does, tests run against that branch. If not, tests run against `pgxntool/master`. Results appear on your pgxntool-test PR. +CI checks whether `pgxntool` has a branch with the same name. If it does, tests run against that pgxntool branch. If not, tests run against `pgxntool/master`. Results appear directly on your pgxntool-test PR. **When you open a PR in `pgxntool`:** -CI searches for an open PR in `pgxntool-test` with the same branch name and waits up to 5 minutes for one to appear (in case you open the second PR shortly after). If a matching test PR is found, CI waits for its checks to pass before running the pgxntool tests against it. Results appear on your pgxntool PR. +CI immediately checks whether an open PR exists in `pgxntool-test` with the same branch name. There is **no waiting** — the test PR must already be open when your pgxntool CI runs. + +- **If a paired test PR is found**: pgxntool CI passes immediately. Tests run in the pgxntool-test PR's own CI; there is no duplication. +- **If no paired test PR is found**: pgxntool CI fails (see below). + +### What to do when pgxntool CI fails with "No paired test PR found" + +This failure means CI couldn't find an open PR in pgxntool-test with a matching branch name. The fix is almost always: + +1. **Open a PR in pgxntool-test** from a branch with the **same name** as your pgxntool branch. +2. Re-run the failing CI check on your pgxntool PR (or push a new commit to trigger it). + +**Branch names must match exactly.** If your pgxntool branch is `fix/parse-bug`, your pgxntool-test branch must also be `fix/parse-bug`. -If no matching test PR appears within 5 minutes and the PR does not have the `no-test-pr` label, the CI check fails and the PR cannot be merged. This is intentional — it ensures test coverage for pgxntool changes. +### The `commit-with-no-tests` label -### The `no-test-pr` label +For pgxntool PRs that genuinely don't require any test changes (documentation fixes, comment updates, etc.), a maintainer can apply the `commit-with-no-tests` label. This tells CI to run tests against `pgxntool-test/master` directly. -For pgxntool PRs that genuinely don't require any test changes (e.g., documentation updates, comment fixes), a maintainer can apply the `no-test-pr` label to the PR. This tells CI to skip waiting for a test PR and run against `pgxntool-test/master` instead. +**This is not a normal shortcut.** Most pgxntool changes touch behavior that tests must cover. This label is for the rare case where a pgxntool change is truly orthogonal to the test suite. -**This label is write-protected**: only maintainers with write access to the repository can add or remove it. If you add the label yourself, an automated workflow will remove it and explain why. Ask a maintainer to apply it if you believe your PR doesn't need test changes. +**This label is write-protected**: only maintainers with write access to the repository can add or remove it. If a non-maintainer applies it, an automated workflow removes it immediately and posts an explanation. -To get the label applied: -1. Open your pgxntool PR as normal. -2. Leave a comment asking a maintainer to apply `no-test-pr` and explain why no test changes are needed. +To request the label: +1. Open your pgxntool PR. +2. Leave a comment explaining why no test changes are needed. 3. A maintainer will review and apply the label if appropriate. ### Branch protection -The `resolve-test-ref` status check on pgxntool is a required check for merging to `master`. It only passes when either: -- A corresponding pgxntool-test PR CI has passed, or -- A maintainer has applied the `no-test-pr` label. +The `check-test-pr` status check on pgxntool is a required check for merging to `master`. It only passes when either: +- A corresponding pgxntool-test PR exists (with a matching branch name), or +- A maintainer has applied the `commit-with-no-tests` label. -This means you cannot merge a pgxntool PR without test coverage, even accidentally. +This ensures pgxntool changes cannot be merged without test coverage. ## Development