Support CUDA 13.3#2139
Conversation
…ain_preview_2026-05-26+0012_non_gen_transfer.patch (NO MANUAL CHANGES)
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
This comment has been minimized.
This comment has been minimized.
PR 2139 Initial CI Failure AnalysisObserved against CI run https://github.com/NVIDIA/cuda-python/actions/runs/26469382256?pr=2139 for head commit 84b21cd. High-level status
Common failure signatureEvery failed job inspected failed while importing This is not a normal test assertion failure. It is an import-time compatibility Likely root causePR 2139 adds _NVLINK_VERSION_MAPPING = {
nvml.NvlinkVersion.VERSION_1_0: (1, 0),
nvml.NvlinkVersion.VERSION_2_0: (2, 0),
nvml.NvlinkVersion.VERSION_2_2: (2, 2),
nvml.NvlinkVersion.VERSION_3_0: (3, 0),
nvml.NvlinkVersion.VERSION_3_1: (3, 1),
nvml.NvlinkVersion.VERSION_4_0: (4, 0),
nvml.NvlinkVersion.VERSION_5_0: (5, 0),
nvml.NvlinkVersion.VERSION_6_0: (6, 0),
}That works when
Those older bindings do not expose ScopeThe failure reproduces across Linux x86_64, Linux aarch64, WSL, and Windows Representative failed jobs included:
InterpretationThe 13.3 build path itself appears healthy from the initial CI results. The Merging the Suggested fix direction
The likely minimal fix is to build |
Run Bandit and CodeQL on ctk-next pushes and grant the scanner jobs the read permissions needed to checkout private repository contents and inspect workflow runs.
|
/ok to test |
PR 2139 Windows MCDM OOM Failure AnalysisFailed job: CI head commit: d727990 Job contextThe failing job is: Important environment/package details from the log: The job installs published CUDA 13.2 bindings: and tests a PR-built Failure signatureTwo Both fail while constructing This is the Windows MCDM mempool OOM failure class that the earlier hardening Relationship to PRs 2000, 2084, and 2096This failure is related to the series:
PR 2096 specifically changed the pinned NUMA tests to construct pinned memory mr = create_pinned_memory_resource_or_xfail(PinnedMemoryResourceOptions(), xfail_device=device)That part is present in the PR 2139 CI head. The escape happens one layer deeper. from cuda.bindings._test_helpers.mempool import xfail_if_mempool_oomBut this CI lane is a
except ModuleNotFoundError:
# Older cuda.bindings artifacts (for example 12.9.x backports) do not ship
# this helper yet. In that case, keep the primary failure visible instead of
# xfail-ing the known Windows MCDM mempool setup issue.
def xfail_if_mempool_oom(err_or_exc, api_name=None, device=0):
returnBecause that fallback is a no-op, the known Windows MCDM OOM is re-raised and ConclusionThis is not a different CUDA error from the one addressed by PRs 2000, 2084, The reason it still escapes is packaging/version skew in the CI matrix:
Suggested fix directionMove enough of the Windows MCDM OOM xfail logic into the In other words, when
then call That preserves the intended behavior for new bindings while making old-bindings |
Keep cuda-core tests using published older cuda-bindings wheels from failing when the shared mempool xfail helper is unavailable.
|
/ok to test |
|
Additional manual local testing with Tegra Thor 13.3.0: |
|
This PR adds CUDA 13.3 support across the CUDA Python packages, updates the release documentation for the associated public releases, and includes a few CI/test compatibility fixes found while validating the branch.
Suggestions for reviewers
There is no need to review the generated commits in detail. They are the first two commits in the stack and are the expected output of the CUDA 13.3 binding generation.
Please focus review on these commits:
See also:
What changed
CUDA 13.3 support
ci/versions.ymlso CI builds against CUDA 13.3.0.Mixed-version compatibility
NvlinkVersion.VERSION_6_0.Release documentation
Security workflows
Validation notes