Skip to content

Support CUDA 13.3#2139

Merged
rwgk merged 11 commits into
NVIDIA:mainfrom
rwgk:ctk13030
May 27, 2026
Merged

Support CUDA 13.3#2139
rwgk merged 11 commits into
NVIDIA:mainfrom
rwgk:ctk13030

Conversation

@rwgk
Copy link
Copy Markdown
Contributor

@rwgk rwgk commented May 26, 2026

This PR adds CUDA 13.3 support across the CUDA Python packages, updates the release documentation for the associated public releases, and includes a few CI/test compatibility fixes found while validating the branch.

Suggestions for reviewers

There is no need to review the generated commits in detail. They are the first two commits in the stack and are the expected output of the CUDA 13.3 binding generation.

Please focus review on these commits:

  • commit 1bc62ab — transfer of accumulated non-generated changes from ctk-next (ctk-next PR 329)
  • commit 84b21cd — Update ci/versions.yml: build with 13.3.0
  • commit 10f0523 — Guard NVLink 6 mapping for older bindings (see comment below)
  • commit 89245f0 — Add cuda-pathfinder 1.5.5 release notes
  • commit 667d7be — Consolidate cuda-bindings 13.3.0 release notes
  • commit a50542d — Add CUDA Python 13.3.0 release notes
  • commit 446ea75 — Add 12.9.7 release notes
  • commit d727990 — Enable security scans on ctk-next (ctk-next PR 331)
  • commit ba7debd — Xfail MCDM mempool OOM with older bindings (see comment below)

See also:

What changed

CUDA 13.3 support

  • Regenerates and transfers the CUDA 13.3 binding updates for driver, runtime, NVRTC, NVML, NVJitLink, NVFatbin, CUDLA, and related generated docs.
  • Updates ci/versions.yml so CI builds against CUDA 13.3.0.
  • Adds CUDA 13.3 CUPTI DLL discovery support in cuda-pathfinder.

Mixed-version compatibility

  • Guards the cuda-core NVLink 6.0 mapping so cuda-core can still import when tested with older supported cuda-bindings wheels that do not expose NvlinkVersion.VERSION_6_0.
  • Keeps the known Windows MCDM memory-pool OOM test xfail behavior working when cuda-core tests run against older published cuda-bindings wheels that do not ship the shared mempool test helper.

Release documentation

  • Adds cuda-pathfinder 1.5.5 release notes and docs version metadata.
  • Consolidates cuda-bindings 13.3.0 release notes, including the unreleased 13.2.1 note content, and removes the standalone 13.2.1 release note.
  • Adds cuda-python 13.3.0 release notes and docs version metadata.
  • Adds cuda-bindings and cuda-python 12.9.7 release notes, including docs version metadata for the 12.9.x releases.

Security workflows

  • Enables Bandit and CodeQL scans for ctk-next pushes.
  • Grants the scanner workflows the read permissions needed to check out repository contents and inspect workflow runs.

Validation notes

  • CI is expected to validate the generated wheels and the CUDA 13.3 package matrix.
  • Follow-up fixes in this branch address the CI failures seen in mixed-version cuda-core test lanes: older cuda-bindings wheels with newer cuda-core, and Windows MCDM memory-pool OOM handling.

@rwgk rwgk added this to the cuda.bindings 13.3.0 & 12.9.7 milestone May 26, 2026
@rwgk rwgk self-assigned this May 26, 2026
@rwgk rwgk added P0 High priority - Must do! CI/CD CI/CD infrastructure cuda.bindings Everything related to the cuda.bindings module cuda.core Everything related to the cuda.core module cuda.pathfinder Everything related to the cuda.pathfinder module labels May 26, 2026
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented May 26, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@rwgk
Copy link
Copy Markdown
Contributor Author

rwgk commented May 26, 2026

/ok to test

@github-actions

This comment has been minimized.

@rwgk
Copy link
Copy Markdown
Contributor Author

rwgk commented May 26, 2026

PR 2139 Initial CI Failure Analysis

Observed against CI run https://github.com/NVIDIA/cuda-python/actions/runs/26469382256?pr=2139 for head commit 84b21cd.

High-level status

  • The CUDA 13.3.0 wheel build jobs passed across Linux x86_64, Linux aarch64,
    and Windows for the tested Python versions.
  • Docs and sdist checks passed.
  • The failures are concentrated in cuda-core test jobs.
  • At the time of inspection, the CI run still had some queued or in-progress
    jobs, but all failed jobs checked shared the same failure signature.

Common failure signature

Every failed job inspected failed while importing cuda.core before the core
test suite could run:

ImportError while loading conftest ...
tests/conftest.py:20: in <module>
    import cuda.core
...
cuda/core/system/_nvlink.pxi:14: in init cuda.core.system._device
    nvml.NvlinkVersion.VERSION_6_0: (6, 0),
E   AttributeError: type object 'NvlinkVersion' has no attribute 'VERSION_6_0'

This is not a normal test assertion failure. It is an import-time compatibility
failure between the cuda-core wheel built from PR 2139 and the cuda-bindings
package installed in older matrix entries.

Likely root cause

PR 2139 adds NvlinkVersion.VERSION_6_0 to the generated NVML bindings and also
adds it to cuda_core/cuda/core/system/_nvlink.pxi:

_NVLINK_VERSION_MAPPING = {
    nvml.NvlinkVersion.VERSION_1_0: (1, 0),
    nvml.NvlinkVersion.VERSION_2_0: (2, 0),
    nvml.NvlinkVersion.VERSION_2_2: (2, 2),
    nvml.NvlinkVersion.VERSION_3_0: (3, 0),
    nvml.NvlinkVersion.VERSION_3_1: (3, 1),
    nvml.NvlinkVersion.VERSION_4_0: (4, 0),
    nvml.NvlinkVersion.VERSION_5_0: (5, 0),
    nvml.NvlinkVersion.VERSION_6_0: (6, 0),
}

That works when cuda-core is tested with the newly generated CUDA 13.3
bindings. However, the CI matrix also tests the new cuda-core wheel against
older bindings:

  • CUDA 13.2 jobs install published cuda-bindings==13.2.*, which resolved to
    cuda-bindings 13.2.0 in the sampled logs.
  • CUDA 12.9 jobs install the 12.9 backport bindings, for example
    cuda-bindings 12.9.7.dev7+ga4e89b567 in the sampled logs.

Those older bindings do not expose nvml.NvlinkVersion.VERSION_6_0, so the
module-level _NVLINK_VERSION_MAPPING construction raises AttributeError at
import time.

Scope

The failure reproduces across Linux x86_64, Linux aarch64, WSL, and Windows
jobs. It is not tied to a specific GPU model, Python version, or OS.

Representative failed jobs included:

  • Test linux-64 / Python 3.10, CUDA 13.2.1 (wheels), GPU l4
  • Test linux-64 / Python 3.11, CUDA 12.9.1 (wheels), GPU t4, wsl
  • Test linux-aarch64 / Python 3.11, CUDA 12.9.1 (wheels), GPU l4
  • Test win-64 / Python 3.10, CUDA 12.9.1 (wheels), GPU rtx2080 (WDDM)
  • Test win-64 / Python 3.11, CUDA 13.2.1 (wheels), GPU rtx4090 (WDDM)

Interpretation

The 13.3 build path itself appears healthy from the initial CI results. The
observed failures are compatibility failures in older CUDA-version test lanes.

Merging the ci_test_matrix_13.3.0 update would likely remove many CUDA 13.2
matrix failures by testing the new CUDA 13.3 lane instead. It would not address
the CUDA 12.9 compatibility failures, because those jobs still pair the new
cuda-core with older 12.x bindings.

Suggested fix direction

cuda_core/cuda/core/system/_nvlink.pxi should avoid referencing
nvml.NvlinkVersion.VERSION_6_0 unless the installed bindings expose it.

The likely minimal fix is to build _NVLINK_VERSION_MAPPING in a way that
conditionally adds VERSION_6_0, while preserving the current behavior for
bindings that support it. This keeps cuda-core import-compatible with older
supported cuda-bindings versions while still recognizing NVLink 6.0 when
running with CUDA 13.3 bindings.

rwgk and others added 6 commits May 26, 2026 13:10
@rwgk
Copy link
Copy Markdown
Contributor Author

rwgk commented May 26, 2026

/ok to test

@rwgk
Copy link
Copy Markdown
Contributor Author

rwgk commented May 26, 2026

PR 2139 Windows MCDM OOM Failure Analysis

Failed job:
https://github.com/NVIDIA/cuda-python/actions/runs/26472915693/job/77953936004?pr=2139

CI head commit: d727990

Job context

The failing job is:

Test win-64 / Python 3.14, CUDA 13.2.1 (wheels), GPU h100 (x2) (MCDM)

Important environment/package details from the log:

BUILD_CUDA_VER: 13.3.0
CUDA_VER: 13.2.1
LOCAL_CTK: 0
BINDINGS_SOURCE: published
TEST_CUDA_MAJOR: 13
TEST_CUDA_MINOR: 2

The job installs published CUDA 13.2 bindings:

Installing bindings (source: published)
Collecting cuda-bindings==13.2.*
Downloading cuda_bindings-13.2.0-cp314-cp314-win_amd64.whl
Successfully installed cuda-bindings-13.2.0

and tests a PR-built cuda-core wheel:

cuda-core 1.0.2.dev44+gd72799005
cuda-bindings 13.2.0
cuda-pathfinder 1.5.5.dev96+gd72799005
cuda-toolkit 13.2.1

Failure signature

Two cuda_core memory tests fail:

tests/test_memory.py::test_pinned_mr_numa_id_default_no_ipc FAILED
tests/test_memory.py::test_pinned_mr_numa_id_explicit FAILED

Both fail while constructing PinnedMemoryResource(...):

cuda\core\_memory\_pinned_memory_resource.pyx:230: in cuda.core._memory._pinned_memory_resource._PMR_init
    MP_init_create_pool(
cuda\core\_memory\_memory_pool.pyx:233: in cuda.core._memory._memory_pool.MP_init_create_pool
    HANDLE_RETURN(get_last_error())
cuda\core\_utils\cuda_utils.pyx:65: in cuda.core._utils.cuda_utils.HANDLE_RETURN
    return _check_driver_error(err)
...
E   cuda.core._utils.cuda_utils.CUDAError: CUDA_ERROR_OUT_OF_MEMORY: The API call failed because it was unable to allocate enough memory or other resources to perform the requested operation.

This is the Windows MCDM mempool OOM failure class that the earlier hardening
PRs were intended to xfail.

Relationship to PRs 2000, 2084, and 2096

This failure is related to the series:

  • PR 2000: test: xfail Windows MCDM mempool OOM setup failures
  • PR 2084: Preserve memory pool CUDA errors and harden OOM tests
  • PR 2096: test: xfail pinned NUMA mempool constructor OOM

PR 2096 specifically changed the pinned NUMA tests to construct pinned memory
resources through create_pinned_memory_resource_or_xfail(...):

mr = create_pinned_memory_resource_or_xfail(PinnedMemoryResourceOptions(), xfail_device=device)

That part is present in the PR 2139 CI head.

The escape happens one layer deeper. create_pinned_memory_resource_or_xfail
delegates to:

from cuda.bindings._test_helpers.mempool import xfail_if_mempool_oom

But this CI lane is a wheels lane using published cuda-bindings==13.2.0.
That published wheel predates PR 2000 and does not provide
cuda.bindings._test_helpers.mempool.

cuda_core/tests/conftest.py therefore uses its fallback:

except ModuleNotFoundError:
    # Older cuda.bindings artifacts (for example 12.9.x backports) do not ship
    # this helper yet. In that case, keep the primary failure visible instead of
    # xfail-ing the known Windows MCDM mempool setup issue.
    def xfail_if_mempool_oom(err_or_exc, api_name=None, device=0):
        return

Because that fallback is a no-op, the known Windows MCDM OOM is re-raised and
the test fails instead of being marked xfail.

Conclusion

This is not a different CUDA error from the one addressed by PRs 2000, 2084,
and 2096. It is the same Windows MCDM mempool OOM class.

The reason it still escapes is packaging/version skew in the CI matrix:

  • cuda-core comes from PR 2139 and includes the newer test call sites.
  • cuda-bindings comes from published 13.2.0 and does not include the shared
    _test_helpers.mempool helper.
  • The local fallback in cuda_core/tests/conftest.py intentionally preserves
    the primary failure rather than xfail-ing it.

Suggested fix direction

Move enough of the Windows MCDM OOM xfail logic into the
cuda_core/tests/conftest.py fallback so old published cuda-bindings wheels
still get the same expected-failure behavior.

In other words, when cuda.bindings._test_helpers.mempool is unavailable,
cuda_core tests should locally detect:

  • the error is CUDA_ERROR_OUT_OF_MEMORY; and
  • the device is running in Windows MCDM mode;

then call pytest.xfail(...).

That preserves the intended behavior for new bindings while making old-bindings
matrix lanes behave consistently.

Keep cuda-core tests using published older cuda-bindings wheels from failing when the shared mempool xfail helper is unavailable.
@rwgk
Copy link
Copy Markdown
Contributor Author

rwgk commented May 26, 2026

/ok to test

@rwgk
Copy link
Copy Markdown
Contributor Author

rwgk commented May 27, 2026

Additional manual local testing with Tegra Thor 13.3.0:

tegra-ubuntu:~/wrk/forked/cuda-python $ grep_pytest_summary `nlog`
/home/rgrossekunst/wrk/logs/cuda-python_qa_bindings_linux_2026-05-27+000250+0000_tests_log.txt
rootdir: /home/rgrossekunst/wrk/forked/cuda-python/cuda_pathfinder
======================== 978 passed, 1 skipped in 4.02s ========================
rootdir: /home/rgrossekunst/wrk/forked/cuda-python/cuda_bindings
================= 405 passed, 34 skipped, 1 xfailed in 46.73s ==================
rootdir: /home/rgrossekunst/wrk/forked/cuda-python/cuda_bindings
================= 405 passed, 34 skipped, 1 xfailed in 44.98s ==================
rootdir: /home/rgrossekunst/wrk/forked/cuda-python/cuda_bindings
============================== 9 passed in 1.59s ===============================
rootdir: /home/rgrossekunst/wrk/forked/cuda-python/cuda_core
====== 3367 passed, 158 skipped, 1 xfailed, 1 warning in 91.41s (0:01:31) ======
rootdir: /home/rgrossekunst/wrk/forked/cuda-python/cuda_core
============================== 1 passed in 0.27s ===============================

@rwgk rwgk merged commit cc50515 into NVIDIA:main May 27, 2026
102 checks passed
@rwgk rwgk deleted the ctk13030 branch May 27, 2026 00:10
@github-actions
Copy link
Copy Markdown

Doc Preview CI
Preview removed because the pull request was closed or merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD CI/CD infrastructure cuda.bindings Everything related to the cuda.bindings module cuda.core Everything related to the cuda.core module cuda.pathfinder Everything related to the cuda.pathfinder module P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants