-
Notifications
You must be signed in to change notification settings - Fork 724
Pull requests: NVIDIA/TransformerEngine
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Add license to framework sdist builds
2.16.0
#3002
opened May 17, 2026 by
ksivaman
Member
Loading…
6 of 13 tasks
Optimize function that loads pointers on GPU
cpu_overhead
refactor
#3001
opened May 16, 2026 by
timmoon10
Collaborator
Loading…
8 of 14 tasks
TritonKernelCall: CUDA graph compatibility
#3000
opened May 15, 2026 by
tdophung
Collaborator
Loading…
6 of 13 tasks
CP Tests batching using subprocess worker pool
#2993
opened May 14, 2026 by
sudhakarsingh27
Collaborator
Loading…
8 of 9 tasks
refactor(distributed): deduplicate TE module class lookups with caching
#2992
opened May 14, 2026 by
muutot
Contributor
Loading…
3 of 13 tasks
Improve TE Group MLP CPU Overhead
cpu_overhead
#2991
opened May 14, 2026 by
zhongbozhu
Collaborator
Loading…
13 tasks
[JAX] Support for cuDNN-backed flex attention
2.16.0
#2985
opened May 13, 2026 by
vcherepanov-nv
Collaborator
Loading…
4 of 13 tasks
[PyTorch] Support for cuDNN-backed flex attention
2.16.0
#2984
opened May 13, 2026 by
vcherepanov-nv
Collaborator
Loading…
4 of 13 tasks
Split grouped quantize/activations and dbias for faster compilation on multicore machines
#2983
opened May 12, 2026 by
ptrendx
Member
Loading…
1 of 6 tasks
GGEMM+srelu kernels for MxFP8 Nemotron
#2981
opened May 12, 2026 by
sraman-rgb
Loading…
8 of 13 tasks
[Common, PyTorch] Improve mHC to match DeepSeek's implementation
#2978
opened May 12, 2026 by
kainzhong
Collaborator
Loading…
9 of 13 tasks
[JAX] Improve JAX tutorial documentation
2.16.0
#2976
opened May 11, 2026 by
jberchtold-nvidia
Collaborator
Loading…
8 of 13 tasks
[Pytorch][Bug] DCP Checkpoint Loading Fixes for FSDP2 with QuantizedModelInit
2.16.0
#2974
opened May 11, 2026 by
vthumbe1503
Collaborator
Loading…
13 tasks
Implement 4over6 NVFP4 recipe
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
fp4
#2972
opened May 9, 2026 by
zianglih
Contributor
Loading…
8 of 13 tasks
[common] Grouped gemm update - nvfp4 for blackwell and fp8 blockwise hopper
2.16.0
#2971
opened May 8, 2026 by
pggPL
Collaborator
Loading…
9 of 13 tasks
[JAX] [PyT] Tighten SWA checks in DPA, MHA and other APIs before passing onto cuDNN fused attn & unfused attn backends
attention
enhancement
New feature or request
jax
#2970
opened May 8, 2026 by
KshitijLakhani
Collaborator
Loading…
7 of 13 tasks
[PyTorch] Batch CP attention tests in single torchrun to amortize NCC…
2.16.0
#2965
opened May 6, 2026 by
sudhakarsingh27
Collaborator
Loading…
7 of 8 tasks
[All] Refactor nvte_get_fused_attn_backend with cudnn-frontend calls
#2964
opened May 6, 2026 by
cyanguwa
Collaborator
Loading…
10 of 13 tasks
MXFP8 + FSDP2 checkpoint resume crashes in reset_sharded_param - add mxfp8 recpipe to fully shard
#2951
opened May 1, 2026 by
savitha-eng
Loading…
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.