Skip to content

ci: publish CI images to internal registry first#4002

Draft
realFlowControl wants to merge 23 commits into
masterfrom
florian/ci-images
Draft

ci: publish CI images to internal registry first#4002
realFlowControl wants to merge 23 commits into
masterfrom
florian/ci-images

Conversation

@realFlowControl

@realFlowControl realFlowControl commented Jun 22, 2026

Copy link
Copy Markdown
Member

Description

When building CI docker images, this PR changes the process to:

  • publish images to registry.ddbuild.io (Datadog internal container registry)
  • use those images directly for GitLab Jobs (they are authenticated anyway)
  • use the public-images downstream job to magically sync those images to Docker Hub for usage with GitHub CI and external contributors
  • CI images jobs are generated by reading from the docker-compose.yml + .env files
  • Windows build + publish jobs are now generated from docker-compose.yml too (previously hand-written), so adding/removing a Windows target is also a one-line compose change. The Windows base images (windows-base-*) are now part of the build matrix, so they exist in the internal registry before the publish step mirrors them.

Wins

  • no logging in to Docker Hub to get a PAT
  • no manually starting a GitLab CI run anymore (with that PAT)
  • no manual syncing public Docker Hub images to our internal registry (well running a script and making a PR and finding someone to approve)
  • the job list is generated from the compose files, so adding/removing a PHP version is a one-line change with no pipeline edits
  • publish is independent of building, so Docker Hub can be (re)provisioned from existing internal images without a rebuild
  • CentOS/Alpine PHP images now build FROM the freshly-built internal base (the CI_REGISTRY_IMAGE build arg is passed through, matching Bookworm's BUILD_BASE), instead of silently falling back to the Docker Hub base

Note

Windows is generated from docker-compose.yml like the Linux images, but its build/publish jobs target the internal registry and need CI Identities auth on the Windows shell runners (onboarding in progress, see #ci-identities) — the host docker config only has the ECR cred helper, which can't authenticate registry.ddbuild.io. Until that lands, the Windows build/publish jobs (all when: manual) won't succeed, and Windows CI images are still built/consumed via the Docker Hub mirror as before. Once a Windows build succeeds against the internal registry, the tracer/package generators get flipped to consume from it in a one-line follow-up.

Reviewer checklist

  • Test coverage seems ok.
  • Appropriate labels assigned.

@datadog-official

datadog-official Bot commented Jun 22, 2026

Copy link
Copy Markdown

Pipelines  Tests

Fix all issues with BitsAI

⚠️ Warnings

🚦 6 Pipeline jobs failed

DataDog/apm-reliability/dd-trace-php | appsec integration tests: [test8.4-release]   View in Datadog   GitLab

🧪 1 Test failed

All test failures are known flaky.

❄️ Known flaky: extended heartbeat re-emits configuration, dependencies and integrations() from com.datadog.appsec.php.integration.TelemetryExtendedHeartbeatTests   View in Datadog
java.lang.AssertionError: phpredis not emitted via app-started/app-integrations-change; saw: []. Expression: (phpredis in flushed). Values: flushed = []

java.lang.AssertionError: phpredis not emitted via app-started/app-integrations-change; saw: []. Expression: (phpredis in flushed). Values: flushed = []
	at org.codehaus.groovy.runtime.InvokerHelper.createAssertError(InvokerHelper.java:416)
	at com.datadog.appsec.php.integration.TelemetryExtendedHeartbeatTests.extended heartbeat re-emits configuration, dependencies and integrations(TelemetryExtendedHeartbeatTests.groovy:70)
	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)

Not introduced in this PR.

DataDog/apm-reliability/dd-trace-php | ASAN test_c with multiple observers: [8.3]   View in Datadog   GitLab

DataDog/apm-reliability/dd-trace-php | test_extension_ci: [7.0]   View in Datadog   GitLab

View all 6 failed jobs.

ℹ️ Info

No other issues found (see more)

❄️ No new flaky tests detected

🎯 Code Coverage (details)
Patch Coverage: 100.00%
Overall Coverage: 54.08% (+0.00%)

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 64b4e3c | Docs | Datadog PR Page | Give us feedback!

@realFlowControl realFlowControl changed the title Publish CI images to internal ddbuild registry ci: publish CI images to internal registry first Jun 22, 2026
- Update .gitlab/ci-images.yml to change the default CI_REGISTRY to registry.ddbuild.io and target the ddbuild registry path registry.ddbuild.io/ci/dd-trace-php/dd-trace-ci.
- Make docker logins dynamic to support local builds, Docker Hub logins, and AWS ECR logins depending on the target registry server.
- Bypass runner credential helper issues in Linux container environments by resetting ~/.docker/config.json.
- Make registry and base image names fully configurable in docker-compose.yml and Dockerfiles, allowing parent base images to be dynamically resolved from ddbuild during child compilation steps.
- Update all GitLab CI generator scripts (.gitlab/generate-*.php) to use internal CI images from registry.ddbuild.io/ci/dd-trace-php/dd-trace-ci instead of pulling from Docker Hub via the mirror path.
- This ensures test jobs use the newly compiled images directly from our project's ECR registry namespace.
- Add a new 'ci-publish' stage to .gitlab-ci.yml.
- Implement 4 parallel matrix trigger jobs in .gitlab/ci-images.yml (Publish CentOS, Publish Bookworm, Publish Alpine, and Publish Windows) to run automatically after their respective build jobs succeed.
- Each trigger calls the DataDog/public-images pipeline, passing the corresponding internal ddbuild ECR image as source and targeting public Docker Hub as destination under the exact same tag.
- Update all occurrences of bookworm-8 and shared-ext-8 to bookworm-9 and shared-ext-9 globally across .gitlab CI test generators, .gitlab/ci-images.yml, and .github workflows.
- Update BOOKWORM_VERSION from 8 to 9 in tooling/bin/build-debug-artifact to ensure local debug builds pull and compile with the new version.
- Export MAKEFLAGS=-j at the top of build-extensions.sh.
- This forces all underlying make invocations triggered by pecl install (including the heavy single-threaded gRPC, MongoDB, and parallel builds) to compile in parallel, drastically reducing build times on multi-core runner environments.
- Remove obsolete CI_REGISTRY, CI_REGISTRY_USER, and CI_REGISTRY_TOKEN from .gitlab/ci-images.yml.
- Remove all complex, dynamic ECR/Docker Hub login shell blocks and AWS CLI installations from CentOS, Alpine, Bookworm, and Windows build jobs.
- Rely entirely on the runner's native, pre-configured credentials for registry.ddbuild.io, significantly simplifying the pipeline configuration.
- Clean up dockerfiles/ci/README.md to document the new automated, secure internal ECR build flow.
- Clarify that project collaborators no longer need to configure Personal Access Tokens (PATs) or credentials when building CI images.
- Document how to trigger the manual sync to the public Docker Hub registry via downstream triggers in the 'ci-publish' stage.
@realFlowControl realFlowControl force-pushed the florian/ci-images branch 13 times, most recently from 228828a to ba9d133 Compare June 24, 2026 05:19
The image list (PHP versions and tags) is derived from the docker-compose.yml
+ .env files in each dockerfiles/ci/<os>/ dir (single source of truth).
.gitlab/generate-ci-images.php renders .gitlab/ci-images.yml.tpl, emitting per
Linux OS:
  - <OS> build      : one matrix job over PHP version; 'docker buildx bake
                      --no-cache --pull --push' builds both arches (x-bake
                      platforms from compose) on the amd64 runner's managed ci
                      builder and pushes a multi-arch manifest to
                      registry.ddbuild.io
  - <OS> publish:<v>: manual mirror to Docker Hub via DataDog/public-images,
                      dependency-free (just syncs whatever is in the internal
                      registry)

Static preamble + Windows jobs live in .gitlab/ci-images.static.yml (Windows
is single-arch). The generator runs in generate-templates and is triggered as
a child pipeline via the manual 'ci-images' job; the old .gitlab/ci-images.yml
local include is removed.
Point the php-8.5 image at the 8.5.8RC1 RC sources (php-8.5_bookworm tracks
the latest 8.5.x). Reverts to a distributions/ tarball once 8.5.8 ships GA
(~2 Jul 2026); just update phpTarGzUrl + phpSha256Hash.
bake delegates the compile to the managed "ci" buildx builder instance, so the
job pod only orchestrates and doesn't need 8 CPU / 16Gi. Master set no
KUBERNETES_* on these jobs either — fall back to cluster defaults. MAKE_JOBS
(builder compile parallelism) is kept, pinned to a literal 8 since it no longer
derives from KUBERNETES_CPU_LIMIT.
Drop comments that referenced earlier in-branch states (per-arch + manifest
fuse design, master's KUBERNETES_* settings, the old MAKE_JOBS derivation) and
fix the generator docblock that still said manifest / per-service publish.
Comments now describe only the current state.
Add a short 'How it works' overview (source of truth, generator, buildx-bake
multi-arch build, public-images mirror) and how-to sections for adding/updating
a PHP version and the Docker Hub UNAUTHORIZED publish gotcha, while keeping the
local-build instructions.
The bookworm-9 CI images are rebuilt with 'pecl install parallel' (latest,
>= 1.2.14), so the workaround that reinstalled the fixed parallel over the old
1.2.13 from the image is no longer needed.
The 8.5 tailcall VM crash is fixed in 8.5.8 (now built for bookworm via
8.5.8RC1), so 8.5 no longer needs excluding from the profiler language tests.
The dedicated .php_language_profiler_targets anchor only existed for that
exclusion and is now identical to .all_profiler_targets, so the language-test
job uses that directly.
Windows images aren't built/pushed/mirrored to the internal
registry.ddbuild.io/ci/dd-trace-php/dd-trace-ci (only the Linux images were
migrated), so pulls 404'd (manifest unknown for php-8.4_windows). Revert the
Windows image refs in the tracer/package generators back to the
registry.ddbuild.io/images/mirror/datadog/dd-trace-ci mirror, matching master.
Merge ci-images.static.yml into ci-images.yml.tpl: the literal preamble (stages,
job templates, Windows jobs) now lives at the top of the template, above the PHP
loops that generate the Linux jobs. Literal text in a .tpl is emitted verbatim,
so the Windows PowerShell needs no escaping — the separate file and the
file_get_contents indirection bought nothing. Generated output is unchanged.
@realFlowControl

Copy link
Copy Markdown
Member Author

@codex

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d019ffb8eb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread dockerfiles/ci/centos/7/php.Dockerfile
Comment thread .gitlab/ci-images.yml.tpl Outdated
Comment thread .gitlab/generate-ci-images.php Outdated
"Bookworm" => "dockerfiles/ci/bookworm",
"CentOS" => "dockerfiles/ci/centos/7",
"Alpine" => "dockerfiles/ci/alpine_compile_extension",
];

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add windows here, to avoid listing every last of them explicitly. Should be trivial to do?

Fold ci-images.yml.tpl into generate-ci-images.php: the parsing/logic runs at
the top, then a single `?>` drops into the literal pipeline preamble (stages,
job templates, Windows jobs) followed by the PHP loops that emit the Linux
jobs. One file instead of two; generated pipeline is unchanged. Docblock,
emitted header comment and README updated to drop the now-gone .tpl.
…s jobs

- centos/alpine compose now pass CI_REGISTRY_IMAGE as a build arg (anchor on
  base, merged into php services) so PHP images build FROM the freshly-built
  internal base instead of the Docker Hub fallback (matches bookworm BUILD_BASE).
- Windows build + publish jobs are generated from docker-compose.yml; the
  build matrix now includes the windows-base-* services so they exist in the
  internal registry before publish. Linux + Windows publish share .image_publish.
Now that the Windows build jobs push php-*_windows to
registry.ddbuild.io/ci/dd-trace-php/dd-trace-ci, point the tracer/package
generators at the internal registry instead of the Docker Hub mirror, matching
the Linux images. Re-applies what ce65269 reverted (the blocker - Windows not
pushed to the internal registry - is fixed by this PR). The httpbin-windows and
php-request-replayer-2.0-windows helper images stay on the mirror; they aren't
built here.
The Windows shell runner uses the host docker config, whose ecr-login
credsStore fails `list` with MissingRegion during the anonymous
mcr.microsoft.com base-image pull. Give the helper a region so it stops
erroring, leaving the rest of the host config (incl. ambient
registry.ddbuild.io auth) untouched.
…ing from mirror

The Windows shell runners have no working registry.ddbuild.io creds (host
docker config only has the ECR cred helper, which fails with
NoCredentialProviders). Drop the dead-end AWS_REGION workaround and add the
CI Identities id_tokens to the Windows build jobs, the supported auth for
non-K8s runners (onboarding pending, #ci-identities). The assume-role +
docker-config script wiring is deferred until it can be verified post-onboarding.

Windows jobs are all manual, so they don't block the pipeline. Until internal
Windows images exist, revert the tracer/package generators to consume Windows
images from the Docker Hub mirror (ce65269 state) to avoid 404s.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants