Support CUDA 13.3 by rwgk · Pull Request #2139 · NVIDIA/cuda-python

rwgk · 2026-05-26T19:09:08Z

This PR adds CUDA 13.3 support across the CUDA Python packages, updates the release documentation for the associated public releases, and includes a few CI/test compatibility fixes found while validating the branch.

Suggestions for reviewers

There is no need to review the generated commits in detail. They are the first two commits in the stack and are the expected output of the CUDA 13.3 binding generation.

Please focus review on these commits:

commit 1bc62ab — transfer of accumulated non-generated changes from ctk-next (ctk-next PR 329)
commit 84b21cd — Update ci/versions.yml: build with 13.3.0
commit 10f0523 — Guard NVLink 6 mapping for older bindings (see comment below)
commit 89245f0 — Add cuda-pathfinder 1.5.5 release notes
commit 667d7be — Consolidate cuda-bindings 13.3.0 release notes
commit a50542d — Add CUDA Python 13.3.0 release notes
commit 446ea75 — Add 12.9.7 release notes
commit d727990 — Enable security scans on ctk-next (ctk-next PR 331)
commit ba7debd — Xfail MCDM mempool OOM with older bindings (see comment below)

What changed

CUDA 13.3 support

Regenerates and transfers the CUDA 13.3 binding updates for driver, runtime, NVRTC, NVML, NVJitLink, NVFatbin, CUDLA, and related generated docs.
Updates ci/versions.yml so CI builds against CUDA 13.3.0.
Adds CUDA 13.3 CUPTI DLL discovery support in cuda-pathfinder.

Mixed-version compatibility

Guards the cuda-core NVLink 6.0 mapping so cuda-core can still import when tested with older supported cuda-bindings wheels that do not expose NvlinkVersion.VERSION_6_0.
Keeps the known Windows MCDM memory-pool OOM test xfail behavior working when cuda-core tests run against older published cuda-bindings wheels that do not ship the shared mempool test helper.

Release documentation

Adds cuda-pathfinder 1.5.5 release notes and docs version metadata.
Consolidates cuda-bindings 13.3.0 release notes, including the unreleased 13.2.1 note content, and removes the standalone 13.2.1 release note.
Adds cuda-python 13.3.0 release notes and docs version metadata.
Adds cuda-bindings and cuda-python 12.9.7 release notes, including docs version metadata for the 12.9.x releases.

Security workflows

Enables Bandit and CodeQL scans for ctk-next pushes.
Grants the scanner workflows the read permissions needed to check out repository contents and inspect workflow runs.

Validation notes

CI is expected to validate the generated wheels and the CUDA 13.3 package matrix.
Follow-up fixes in this branch address the CI failures seen in mixed-version cuda-core test lanes: older cuda-bindings wheels with newer cuda-core, and Windows MCDM memory-pool OOM handling.

…ain_preview_2026-05-26+0012_non_gen_transfer.patch (NO MANUAL CHANGES)

copy-pr-bot · 2026-05-26T19:09:12Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

rwgk · 2026-05-26T19:09:52Z

/ok to test

rwgk · 2026-05-26T19:59:44Z

PR 2139 Initial CI Failure Analysis

Observed against CI run https://github.com/NVIDIA/cuda-python/actions/runs/26469382256?pr=2139 for head commit 84b21cd.

High-level status

The CUDA 13.3.0 wheel build jobs passed across Linux x86_64, Linux aarch64,
and Windows for the tested Python versions.
Docs and sdist checks passed.
The failures are concentrated in cuda-core test jobs.
At the time of inspection, the CI run still had some queued or in-progress
jobs, but all failed jobs checked shared the same failure signature.

Common failure signature

Every failed job inspected failed while importing cuda.core before the core
test suite could run:

ImportError while loading conftest ...
tests/conftest.py:20: in <module>
    import cuda.core
...
cuda/core/system/_nvlink.pxi:14: in init cuda.core.system._device
    nvml.NvlinkVersion.VERSION_6_0: (6, 0),
E   AttributeError: type object 'NvlinkVersion' has no attribute 'VERSION_6_0'

This is not a normal test assertion failure. It is an import-time compatibility
failure between the cuda-core wheel built from PR 2139 and the cuda-bindings
package installed in older matrix entries.

Likely root cause

PR 2139 adds NvlinkVersion.VERSION_6_0 to the generated NVML bindings and also
adds it to cuda_core/cuda/core/system/_nvlink.pxi:

_NVLINK_VERSION_MAPPING = {
    nvml.NvlinkVersion.VERSION_1_0: (1, 0),
    nvml.NvlinkVersion.VERSION_2_0: (2, 0),
    nvml.NvlinkVersion.VERSION_2_2: (2, 2),
    nvml.NvlinkVersion.VERSION_3_0: (3, 0),
    nvml.NvlinkVersion.VERSION_3_1: (3, 1),
    nvml.NvlinkVersion.VERSION_4_0: (4, 0),
    nvml.NvlinkVersion.VERSION_5_0: (5, 0),
    nvml.NvlinkVersion.VERSION_6_0: (6, 0),
}

That works when cuda-core is tested with the newly generated CUDA 13.3
bindings. However, the CI matrix also tests the new cuda-core wheel against
older bindings:

CUDA 13.2 jobs install published cuda-bindings==13.2.*, which resolved to
cuda-bindings 13.2.0 in the sampled logs.
CUDA 12.9 jobs install the 12.9 backport bindings, for example
cuda-bindings 12.9.7.dev7+ga4e89b567 in the sampled logs.

Those older bindings do not expose nvml.NvlinkVersion.VERSION_6_0, so the
module-level _NVLINK_VERSION_MAPPING construction raises AttributeError at
import time.

Scope

The failure reproduces across Linux x86_64, Linux aarch64, WSL, and Windows
jobs. It is not tied to a specific GPU model, Python version, or OS.

Representative failed jobs included:

Test linux-64 / Python 3.10, CUDA 13.2.1 (wheels), GPU l4
Test linux-64 / Python 3.11, CUDA 12.9.1 (wheels), GPU t4, wsl
Test linux-aarch64 / Python 3.11, CUDA 12.9.1 (wheels), GPU l4
Test win-64 / Python 3.10, CUDA 12.9.1 (wheels), GPU rtx2080 (WDDM)
Test win-64 / Python 3.11, CUDA 13.2.1 (wheels), GPU rtx4090 (WDDM)

Interpretation

The 13.3 build path itself appears healthy from the initial CI results. The
observed failures are compatibility failures in older CUDA-version test lanes.

Merging the ci_test_matrix_13.3.0 update would likely remove many CUDA 13.2
matrix failures by testing the new CUDA 13.3 lane instead. It would not address
the CUDA 12.9 compatibility failures, because those jobs still pair the new
cuda-core with older 12.x bindings.

Suggested fix direction

cuda_core/cuda/core/system/_nvlink.pxi should avoid referencing
nvml.NvlinkVersion.VERSION_6_0 unless the installed bindings expose it.

The likely minimal fix is to build _NVLINK_VERSION_MAPPING in a way that
conditionally adds VERSION_6_0, while preserving the current behavior for
bindings that support it. This keeps cuda-core import-compatible with older
supported cuda-bindings versions while still recognizing NVLink 6.0 when
running with CUDA 13.3 bindings.

Run Bandit and CodeQL on ctk-next pushes and grant the scanner jobs the read permissions needed to checkout private repository contents and inspect workflow runs.

rwgk · 2026-05-26T20:21:04Z

/ok to test

rwgk · 2026-05-26T20:59:25Z

PR 2139 Windows MCDM OOM Failure Analysis

Failed job:
https://github.com/NVIDIA/cuda-python/actions/runs/26472915693/job/77953936004?pr=2139

CI head commit: d727990

Job context

The failing job is:

Test win-64 / Python 3.14, CUDA 13.2.1 (wheels), GPU h100 (x2) (MCDM)

Important environment/package details from the log:

BUILD_CUDA_VER: 13.3.0
CUDA_VER: 13.2.1
LOCAL_CTK: 0
BINDINGS_SOURCE: published
TEST_CUDA_MAJOR: 13
TEST_CUDA_MINOR: 2

The job installs published CUDA 13.2 bindings:

Installing bindings (source: published)
Collecting cuda-bindings==13.2.*
Downloading cuda_bindings-13.2.0-cp314-cp314-win_amd64.whl
Successfully installed cuda-bindings-13.2.0

and tests a PR-built cuda-core wheel:

cuda-core 1.0.2.dev44+gd72799005
cuda-bindings 13.2.0
cuda-pathfinder 1.5.5.dev96+gd72799005
cuda-toolkit 13.2.1

Failure signature

Two cuda_core memory tests fail:

tests/test_memory.py::test_pinned_mr_numa_id_default_no_ipc FAILED
tests/test_memory.py::test_pinned_mr_numa_id_explicit FAILED

Both fail while constructing PinnedMemoryResource(...):

cuda\core\_memory\_pinned_memory_resource.pyx:230: in cuda.core._memory._pinned_memory_resource._PMR_init
    MP_init_create_pool(
cuda\core\_memory\_memory_pool.pyx:233: in cuda.core._memory._memory_pool.MP_init_create_pool
    HANDLE_RETURN(get_last_error())
cuda\core\_utils\cuda_utils.pyx:65: in cuda.core._utils.cuda_utils.HANDLE_RETURN
    return _check_driver_error(err)
...
E   cuda.core._utils.cuda_utils.CUDAError: CUDA_ERROR_OUT_OF_MEMORY: The API call failed because it was unable to allocate enough memory or other resources to perform the requested operation.

This is the Windows MCDM mempool OOM failure class that the earlier hardening
PRs were intended to xfail.

Relationship to PRs 2000, 2084, and 2096

This failure is related to the series:

PR 2000: test: xfail Windows MCDM mempool OOM setup failures
PR 2084: Preserve memory pool CUDA errors and harden OOM tests
PR 2096: test: xfail pinned NUMA mempool constructor OOM

PR 2096 specifically changed the pinned NUMA tests to construct pinned memory
resources through create_pinned_memory_resource_or_xfail(...):

mr = create_pinned_memory_resource_or_xfail(PinnedMemoryResourceOptions(), xfail_device=device)

That part is present in the PR 2139 CI head.

The escape happens one layer deeper. create_pinned_memory_resource_or_xfail
delegates to:

from cuda.bindings._test_helpers.mempool import xfail_if_mempool_oom

But this CI lane is a wheels lane using published cuda-bindings==13.2.0.
That published wheel predates PR 2000 and does not provide
cuda.bindings._test_helpers.mempool.

cuda_core/tests/conftest.py therefore uses its fallback:

except ModuleNotFoundError:
    # Older cuda.bindings artifacts (for example 12.9.x backports) do not ship
    # this helper yet. In that case, keep the primary failure visible instead of
    # xfail-ing the known Windows MCDM mempool setup issue.
    def xfail_if_mempool_oom(err_or_exc, api_name=None, device=0):
        return

Because that fallback is a no-op, the known Windows MCDM OOM is re-raised and
the test fails instead of being marked xfail.

Conclusion

This is not a different CUDA error from the one addressed by PRs 2000, 2084,
and 2096. It is the same Windows MCDM mempool OOM class.

The reason it still escapes is packaging/version skew in the CI matrix:

cuda-core comes from PR 2139 and includes the newer test call sites.
cuda-bindings comes from published 13.2.0 and does not include the shared
_test_helpers.mempool helper.
The local fallback in cuda_core/tests/conftest.py intentionally preserves
the primary failure rather than xfail-ing it.

Suggested fix direction

Move enough of the Windows MCDM OOM xfail logic into the
cuda_core/tests/conftest.py fallback so old published cuda-bindings wheels
still get the same expected-failure behavior.

In other words, when cuda.bindings._test_helpers.mempool is unavailable,
cuda_core tests should locally detect:

the error is CUDA_ERROR_OUT_OF_MEMORY; and
the device is running in Windows MCDM mode;

then call pytest.xfail(...).

That preserves the intended behavior for new bindings while making old-bindings
matrix lanes behave consistently.

Keep cuda-core tests using published older cuda-bindings wheels from failing when the shared mempool xfail helper is unavailable.

rwgk · 2026-05-26T21:15:46Z

/ok to test

rwgk · 2026-05-27T00:07:54Z

Additional manual local testing with Tegra Thor 13.3.0:

tegra-ubuntu:~/wrk/forked/cuda-python $ grep_pytest_summary `nlog`
/home/rgrossekunst/wrk/logs/cuda-python_qa_bindings_linux_2026-05-27+000250+0000_tests_log.txt
rootdir: /home/rgrossekunst/wrk/forked/cuda-python/cuda_pathfinder
======================== 978 passed, 1 skipped in 4.02s ========================
rootdir: /home/rgrossekunst/wrk/forked/cuda-python/cuda_bindings
================= 405 passed, 34 skipped, 1 xfailed in 46.73s ==================
rootdir: /home/rgrossekunst/wrk/forked/cuda-python/cuda_bindings
================= 405 passed, 34 skipped, 1 xfailed in 44.98s ==================
rootdir: /home/rgrossekunst/wrk/forked/cuda-python/cuda_bindings
============================== 9 passed in 1.59s ===============================
rootdir: /home/rgrossekunst/wrk/forked/cuda-python/cuda_core
====== 3367 passed, 158 skipped, 1 xfailed, 1 warning in 91.41s (0:01:31) ======
rootdir: /home/rgrossekunst/wrk/forked/cuda-python/cuda_core
============================== 1 passed in 0.27s ===============================

github-actions · 2026-05-27T00:41:22Z

Doc Preview CI
Preview removed because the pull request was closed or merged.

rwgk added 4 commits May 26, 2026 10:09

run_cybind_cython_gen 13.3.0 ../ctk-next (NO MANUAL CHANGES)

7507dd8

run_cybind_native 13.3.0 ../ctk-next (NO MANUAL CHANGES)

17708e0

git apply --index /home/rgrossekunst/stash/squash_merge_into_public_m…

1bc62ab

…ain_preview_2026-05-26+0012_non_gen_transfer.patch (NO MANUAL CHANGES)

Update ci/versions.yml: build with 13.3.0

84b21cd

rwgk added this to the cuda.bindings 13.3.0 & 12.9.7 milestone May 26, 2026

rwgk self-assigned this May 26, 2026

rwgk added P0 High priority - Must do! CI/CD CI/CD infrastructure cuda.bindings Everything related to the cuda.bindings module cuda.core Everything related to the cuda.core module cuda.pathfinder Everything related to the cuda.pathfinder module labels May 26, 2026

This comment has been minimized.

Sign in to view

rwgk and others added 6 commits May 26, 2026 13:10

Guard NVLink 6 mapping for older bindings

10f0523

Add cuda-pathfinder 1.5.5 release notes

89245f0

Consolidate cuda-bindings 13.3.0 release notes

667d7be

Add CUDA Python 13.3.0 release notes

a50542d

Add 12.9.7 release notes

446ea75

Enable security scans on ctk-next

d727990

Run Bandit and CodeQL on ctk-next pushes and grant the scanner jobs the read permissions needed to checkout private repository contents and inspect workflow runs.

Xfail MCDM mempool OOM with older bindings

ba7debd

Keep cuda-core tests using published older cuda-bindings wheels from failing when the shared mempool xfail helper is unavailable.

rwgk requested review from Andy-Jost, kkraus14 and rparolin May 26, 2026 21:29

rwgk marked this pull request as ready for review May 26, 2026 21:30

rwgk mentioned this pull request May 26, 2026

ci/test_matrix.yml: test with cuda-toolkit 13.3.0 [ON HOLD until the new cuda-toolkit release becomes available] #2140

Draft

Andy-Jost approved these changes May 26, 2026

View reviewed changes

rwgk merged commit cc50515 into NVIDIA:main May 27, 2026
102 checks passed

rwgk deleted the ctk13030 branch May 27, 2026 00:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support CUDA 13.3#2139

Support CUDA 13.3#2139
rwgk merged 11 commits into
NVIDIA:mainfrom
rwgk:ctk13030

rwgk commented May 26, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented May 26, 2026

Uh oh!

rwgk commented May 26, 2026

Uh oh!

This comment has been minimized.

rwgk commented May 26, 2026

Uh oh!

rwgk commented May 26, 2026

Uh oh!

rwgk commented May 26, 2026

Uh oh!

rwgk commented May 26, 2026

Uh oh!

rwgk commented May 27, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rwgk commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Suggestions for reviewers

What changed

CUDA 13.3 support

Mixed-version compatibility

Release documentation

Security workflows

Validation notes

Uh oh!

copy-pr-bot Bot commented May 26, 2026

Uh oh!

rwgk commented May 26, 2026

Uh oh!

This comment has been minimized.

rwgk commented May 26, 2026

PR 2139 Initial CI Failure Analysis

High-level status

Common failure signature

Likely root cause

Scope

Interpretation

Suggested fix direction

Uh oh!

rwgk commented May 26, 2026

Uh oh!

rwgk commented May 26, 2026

PR 2139 Windows MCDM OOM Failure Analysis

Job context

Failure signature

Relationship to PRs 2000, 2084, and 2096

Conclusion

Suggested fix direction

Uh oh!

rwgk commented May 26, 2026

Uh oh!

rwgk commented May 27, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rwgk commented May 26, 2026 •

edited

Loading