Skip to content

Fix the docker channel's negative-control and token-interop CI steps#11

Merged
kixelated merged 3 commits into
mainfrom
claude/docker-relay-container-leak
Jun 17, 2026
Merged

Fix the docker channel's negative-control and token-interop CI steps#11
kixelated merged 3 commits into
mainfrom
claude/docker-relay-container-leak

Conversation

@kixelated

@kixelated kixelated commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Background

The scheduled smoke job's docker runner had a chain of latent bugs in its later steps that never surfaced before, because an earlier step always failed first and the rest were skipped. Now that #10 fixed the docker Smoke step (the old Catalog.fetch API), those steps run for the first time. This PR fixes the two that were broken.

1. Relay container leaks across steps (Negative control)

The docker relay wrapper (clients/docker/moq-relay) does exec docker run --rm -i --network host …. smoke.sh tears the relay down with kill_tree (SIGKILL), which kills the docker client but leaves the container running — still bound to port 4443 via --network host. The Smoke and Negative control steps run consecutively in the same job, so the second run's port preflight saw the leftover container and aborted with something is already listening on 127.0.0.1:4443.

Fix — reap the container at the source:

  • clients/docker/moq-relay: run with a stable --name, and docker rm -f any leftover before binding (covers a crashed run whose cleanup never fired).
  • smoke.sh cleanup(): export the shared container name and docker rm -f it after kill_treegated on the docker wrapper actually being the relay (RELAY ends in clients/docker/moq-relay), so a non-docker run on a host that happens to have docker can't delete an unrelated container of that name. Honors the SMOKE_DOCKER=podman override.
  • smoke.sh preflight: briefly wait for the host-network socket to release after the container is reaped, instead of failing instantly.

2. rust token impl unavailable on the docker channel (Token interop)

Token interop ran the rust impl (which needs moq-token-cli on PATH) on every channel, but the docker channel installs no such binary — it covers the Rust verifier via the moqdev/moq-token-cli image (rust-docker). So every rust generator/verifier combo failed with "generator/verifier unavailable".

Fix: drop plain rust from the impl set on the docker channel (.github/workflows/smoke.yml), keeping js-node, js-bun, and rust-docker — which the CI run log shows already cross-verify. Other channels are unchanged.

Verification

  • Negative control fix: confirmed on CI — the docker job's Smoke and Negative control both passed (Negative control passed on docker for the first time). Also exercised the shared smoke.sh changes locally (non-docker channel) running Smoke then Negative control back-to-back with no manual port clearing: both pass.
  • Token interop fix: the same CI run showed js-node/js-bun/rust-docker token combos all PASS; only the rust combos failed (the bug this commit removes). --network host and the docker token image are Linux-only, so the docker path is validated by CI.

Notes for reviewers

  • Container name defaults to moq-relay-smoke, overridable via MOQ_RELAY_CONTAINER.
  • No change to any client-under-test or to the media path — purely harness/CI lifecycle.

🤖 Generated with Claude Code

The docker channel's relay wrapper does `exec docker run --rm -i --network
host ...`. smoke.sh tears the relay down with kill_tree (SIGKILL), which
kills the runtime client but leaves the detached container running -- still
bound to the relay's port via --network host. The next smoke.sh run in the
same job (the workflow's "Negative control" step after "Smoke") then aborts
its preflight with "something is already listening on 127.0.0.1:4443".

This was latent until the catalog fix let the docker Smoke step pass, so the
Negative control step ran for the first time and hit the leftover container.

Fix at the source: name the relay container and reap it by name.
- clients/docker/moq-relay: run with --name, and `docker rm -f` any leftover
  before binding (covers a crashed run whose cleanup never fired).
- smoke.sh: export the shared container name; cleanup() removes it after
  kill_tree (no-op when the runtime isn't installed / non-docker channels).
- smoke.sh: wait briefly in the port preflight for the host-network socket to
  release after the container is reaped, instead of failing instantly.

Verified locally (non-docker channel) running Smoke then Negative control
back-to-back with no manual port clearing: both pass. The docker path is
Linux/--network-host only, so it's validated by CI.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

The changes introduce a stable container name (moq-relay-smoke by default, configurable via MOQ_RELAY_CONTAINER) shared between the relay wrapper and the smoke test harness. The relay wrapper (clients/docker/moq-relay) removes any pre-existing container with that name before starting and passes --name "$name" to the container runtime. In smoke.sh, the variable is exported, cleanup() is extended to forcibly remove the named container when using the docker channel, and a retry loop polling /certificate.sha256 is added before the stale-relay readiness check to allow the port to fully release between runs.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title accurately describes the main changes: fixing CI steps for the docker channel's negative-control and token-interop tests that were previously blocked by an earlier failing step.
Description check ✅ Passed The description is directly related to the changeset, providing detailed background on the bugs fixed, the specific changes made to relay container lifecycle management and token configuration, and verification of the fixes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch claude/docker-relay-container-leak

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@smoke.sh`:
- Around line 134-136: The cleanup condition in the RELAY_CONTAINER removal
block currently only checks if the RELAY_CONTAINER variable is set and if
docker/podman exists on PATH, but does not verify that docker was actually
selected as the relay channel for the current test run. Modify the conditional
to additionally check that the relay channel was actually set to docker before
attempting to remove the container, preventing accidental deletion of unrelated
containers when docker exists on the system but was not used for the relay in
this particular test execution.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8f815c8f-0f9a-49e9-bd2d-bde7bfebd1b1

📥 Commits

Reviewing files that changed from the base of the PR and between ff9bd39 and e7f2497.

📒 Files selected for processing (2)
  • clients/docker/moq-relay
  • smoke.sh

Comment thread smoke.sh Outdated
kixelated and others added 2 commits June 16, 2026 16:27
cleanup() removed the named relay container whenever a container runtime was
on PATH, so a non-docker run (cargo/apt/brew/nix) on a host that happens to
have docker installed could delete an unrelated container named
moq-relay-smoke. Gate the reap on the docker wrapper actually being the relay
(RELAY path ends in clients/docker/moq-relay), which also covers the podman
override since it shares that wrapper.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Token interop ran `rust` (which needs moq-token-cli on PATH) on every channel,
but the docker channel installs no such binary -- it covers the Rust verifier
via the moqdev/moq-token-cli image (rust-docker). So every rust generator/
verifier combo failed there with "generator/verifier unavailable".

This was latent: the docker job used to fail earlier (Negative control) and
skip Token interop entirely. With that fixed, Token interop runs on docker for
the first time. Drop plain `rust` from the impl set on the docker channel,
keeping js-node, js-bun, and rust-docker (which the run log shows cross-verify).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@kixelated kixelated changed the title Reap the docker-channel relay container between smoke runs Fix the docker channel's negative-control and token-interop CI steps Jun 16, 2026
@kixelated kixelated merged commit 3a9a627 into main Jun 17, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant