Skip to content

Doc: add AICore kernel programming guide + warn against CCE topology intrinsics#962

Merged
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
hw-native-sys-bot:docs/spmd-id-accessors-warning
Jun 1, 2026
Merged

Doc: add AICore kernel programming guide + warn against CCE topology intrinsics#962
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
hw-native-sys-bot:docs/spmd-id-accessors-warning

Conversation

@hw-native-sys-bot
Copy link
Copy Markdown
Collaborator

@hw-native-sys-bot hw-native-sys-bot commented Jun 1, 2026

Summary

  • simpler's tensormap_and_ringbuffer runtime maintains its own SPMD execution context (block_idx, block_num, sub_block_id) in LocalContext / GlobalContext structures appended to the kernel args[] tail. The matching accessors are get_block_idx(args), get_block_num(args), get_sub_block_id(args) in src/{a2a3,a5}/runtime/tensormap_and_ringbuffer/common/intrinsic.h.
  • The CCE built-in topology intrinsics get_subblockid(), get_block_idx(), get_block_num() (from kernel_operator.h / tikcfw) read AICore hardware registers that the simpler runtime does NOT program. A kernel that uses them silently gets stale values — most notably get_subblockid() returns 0 for both AIV0 and AIV1 of every MIX cluster, so AIV1 redoes AIV0's work and AIV1's share of the output is never written.
  • This was the partial-zero failure mode in issue [Bug] A2A3 spmd_paged_attention_highperf hardware run times out or produces partial zero output while a2a3sim passes #900 / PR High performance Paged Attention A2A3 ST Test #899 spmd_paged_attention_highperf: a kernel ported from native CANN compiled clean, ran without error, produced half-zero output on a2a3 hardware. Resolved kernel-side in PR High performance Paged Attention A2A3 ST Test #899.

Adds three layers of documentation so the next port catches it before the same debugging round-trip:

  • New: docs/aicore-kernel-programming.md — the kernel-author contract for this runtime. Reference doc with §1 args layout, §2 SPMD execution context (logical vs physical block_dim), §3 the CCE-intrinsics warning + porting checklist + worked example (PR High performance Paged Attention A2A3 ST Test #899), §4 related links. Structured to grow into a fuller programming guide (tensor args, FFTS sync, tiling) as future work lands.
  • docs/developer-guide.md — one-line link from the existing Example / Test Layout section so the kernel-author contract is discoverable from "kernels/".
  • src/{a2a3,a5}/runtime/tensormap_and_ringbuffer/common/intrinsic.h — IMPORTANT block at the top with the gotcha inline (grep-discoverable) plus a back-link to the new guide.

Doc-only — no code or API changes.

Test plan

  • pre-commit (check-headers / clang-format / cpplint / clang-tidy / markdownlint-cli2) — all green
  • Both intrinsic.h files unchanged structurally; warning is a pure C block comment
  • Verified referenced example dirs exist: tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention/ and spmd_multiblock_mix/

🤖 Generated with Claude Code

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds parallel documentation warnings to a2a3 and a5 runtime topology intrinsic headers, advising against mixing simpler's intrinsics with CCE built-in topology intrinsics and directing users to use (args) accessor variants instead.

Changes

Topology Intrinsics Compatibility Documentation

Layer / File(s) Summary
Topology intrinsics compatibility documentation
src/a2a3/runtime/tensormap_and_ringbuffer/common/intrinsic.h, src/a5/runtime/tensormap_and_ringbuffer/common/intrinsic.h
IMPORTANT documentation blocks added to both a2a3 and a5 runtime headers warning against mixing CCE built-in topology intrinsics (get_subblockid(), get_block_idx(), get_block_num()) with simpler's topology accessors. Documents a stale AICore sub-block register failure mode under MIX dispatch and directs users to use (args) variants (get_sub_block_id(args), get_block_idx(args), get_block_num(args)) so values reflect logical block dimensions rather than physical core configuration.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

🐰 Two warnings whispered in parallel arrays,
Where topology intrinsics lead stray,
Use (args), dear kernels, not CCE's bare call,
For simpler's logic trumps hardware's hall! 🌟

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title accurately captures the main change: adding documentation and a warning against mixing CCE topology intrinsics with the tensormap_and_ringbuffer runtime's SPMD accessors.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, providing context, the problem statement, the solution, and test verification.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds detailed documentation comments to intrinsic.h in both the a2a3 and a5 runtimes. The comments warn against mixing CCE built-in topology intrinsics (get_subblockid(), get_block_idx(), get_block_num()) with the custom runtime intrinsics, explaining the hardware register behavior and potential failure modes, and advising the use of the (args) variants instead. There are no review comments to address, and I have no further feedback to provide.

…y intrinsics

simpler's tensormap_and_ringbuffer runtime maintains its own SPMD
context (block_idx, block_num, sub_block_id) in LocalContext /
GlobalContext structures referenced from the kernel args[] tail.
The CCE built-in intrinsics get_subblockid(), get_block_idx(),
get_block_num() (declared in kernel_operator.h / tikcfw) read
AICore hardware registers that the runtime does NOT program, so a
kernel that mixes them with the args-based accessors gets stale
values — most importantly get_subblockid() returns 0 for BOTH
AIV0 and AIV1 of every MIX cluster, causing AIV1 to silently redo
AIV0's work and leaving AIV1's share of the output unwritten.

This was the partial-zero failure mode in issue hw-native-sys#900 / PR hw-native-sys#899
spmd_paged_attention_highperf: a kernel ported from native CANN
compiled clean, ran without error, produced half-zero output on
a2a3 hardware. Resolved kernel-side in PR hw-native-sys#899 by routing all three
IDs through the args-based accessors.

Add three layers of documentation so the next port catches this
before the same debugging round-trip:

- `docs/aicore-kernel-programming.md` (new) — the kernel-author
  contract for this runtime: SPMD execution context, accessor
  functions, logical-vs-physical block_dim, the CCE-intrinsics
  warning with porting checklist, and pointers to working
  examples. Structured so future kernel-authoring topics (tensor
  args, FFTS sync, tiling) can grow under it.
- `docs/developer-guide.md` — link from the existing Example /
  Test Layout section so someone reading the dev guide finds the
  kernel-author contract from "kernels/" without searching.
- `src/{a2a3,a5}/runtime/tensormap_and_ringbuffer/common/intrinsic.h`
  — IMPORTANT block at the top of the file with the gotcha
  inline (for the grep-and-read discovery path) and a back-link
  to the programming guide for the full context.

Doc-only — no code or API changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@hw-native-sys-bot hw-native-sys-bot force-pushed the docs/spmd-id-accessors-warning branch from dc3c53c to bc87757 Compare June 1, 2026 09:24
@hw-native-sys-bot hw-native-sys-bot changed the title Doc: warn against CCE topology intrinsics in tensormap_and_ringbuffer SPMD kernels Doc: add AICore kernel programming guide + warn against CCE topology intrinsics Jun 1, 2026
@ChaoWao ChaoWao merged commit c2f2350 into hw-native-sys:main Jun 1, 2026
15 of 16 checks passed
@ChaoWao ChaoWao deleted the docs/spmd-id-accessors-warning branch June 1, 2026 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants