Skip to content

Fix: detect unresolved AICore .text relocations and fail loud (#900)#958

Merged
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
hw-native-sys-bot:fix/issue-900-aicore-elf-relocations
Jun 1, 2026
Merged

Fix: detect unresolved AICore .text relocations and fail loud (#900)#958
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
hw-native-sys-bot:fix/issue-900-aicore-elf-relocations

Conversation

@hw-native-sys-bot
Copy link
Copy Markdown
Collaborator

Summary

Scope (and what's out)

Test plan

  • tests/ut/py/test_elf_parser.py — 7 cases: byte-identical regression guard for the common path, three rejection cases for .text._Zfoo (raises with issue link, names the offending section, mentions the always_inline workaround), two for .rela.text (raises with issue link, reports entry count), and the missing-.text case.
  • Local pre-commit: ruff check / ruff format / pyright / check-headers all green.
  • Real-world .o round-trip: compiled tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/aic/paged_attention_highperf.cpp (both AIC and AIV variants) via kernel_compiler, confirmed extract_text_section returns byte-identical payload (97156 / 64804 bytes).
  • Future: when the loader gains relocation-application support, drop one of the PA kernel's always_inline annotations and verify the new error fires with actionable message.

🤖 Generated with Claude Code

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 283748ed-48ae-4ce3-8609-f5b5efad9ea5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

ELF loader validation is strengthened to reject .text sections with out-of-line code (.text.*) or unresolved relocations (.rela.text*), instead of silently returning incomplete bytes. New error reporting includes diagnostic details and workaround guidance. Comprehensive test coverage validates success and all rejection scenarios using synthetic ELF64 buffers.

Changes

ELF .text Extraction Validation

Layer / File(s) Summary
Documentation, constants, and function contracts
simpler_setup/elf_parser.py, tests/ut/py/test_elf_parser.py
Module docstring expanded to document .text extraction semantics, rejection of out-of-line code and relocations, and failure modes. ELF64 constants (section-header size, relocation-entry size, section types) introduced. Function docstring refined to explicitly mention out-of-line code rejection.
Core extraction and validation logic
simpler_setup/elf_parser.py
_extract_text_elf64 reworked to scan all section headers, capture .text bytes, and detect .text.* out-of-line sections and .rela.text* relocation sections; raises ValueError if any found. New _raise_unresolved_text_error helper formats actionable diagnostic aggregating offending section names, sizes, relocation entry counts, issue link, and always-inline workaround with verification commands.
Test infrastructure and ELF buffer builders
tests/ut/py/test_elf_parser.py
Test module docstring documents expected rejection behavior. Packing helpers (_pack_ehdr, _pack_shdr, _strtab) introduced to assemble ELF headers and string tables. Specialized builders (_build_single_text_elf, _build_elf_with_extra_text_section, _build_elf_with_rela_text) construct synthetic ELF64 byte buffers for each test scenario.
Validation test cases
tests/ut/py/test_elf_parser.py
TestSingleTextNoRelocations asserts successful extraction when .text is alone. TestRejectsOutOfLineCodeSection asserts ValueError when .text._Z* section exists, with assertions on issue link, section name, and workaround mention. TestRejectsTextRelocations asserts ValueError when .rela.text entries present, verifying issue link and entry count in error. TestMissingText asserts error when .text section absent.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 The ELF parser hops with care,
Detecting text in disarray—
No stray sections hiding there,
No relocations to betray!
With builders true and tests so fair,
The loader's crisp and clean today! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.22% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: detecting unresolved AICore .text relocations and raising errors instead of silently dropping them, with reference to the issue number.
Description check ✅ Passed The description thoroughly documents the problem, solution, scope, test plan, and verification approach, all directly related to the changeset of adding detection and rejection of unsupported ELF sections.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the ELF parser in simpler_setup/elf_parser.py to detect and reject ELF files with out-of-line code sections or unresolved relocations, preventing silent failures on AICore. It also introduces comprehensive unit tests in tests/ut/py/test_elf_parser.py to validate these changes. The review feedback suggests critical security improvements to prevent out-of-bounds reads by adding bounds checks on section header indices, string table offsets, and section data boundaries.

Comment thread simpler_setup/elf_parser.py
Comment thread simpler_setup/elf_parser.py
Comment thread simpler_setup/elf_parser.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
simpler_setup/elf_parser.py (1)

103-111: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add explicit ELF bounds validation before header/section reads.

Line 108 and Lines 121-126 trust e_shoff/e_shnum/e_shstrndx and per-section offsets without range checks. A truncated or malformed object can fail with low-level unpack/slice errors instead of a clear ValueError tied to source_name.

Proposed fix
 def _extract_text_elf64(elf_data: bytes, source_name: str) -> bytes:
@@
     e_shoff = struct.unpack("<Q", elf_data[40:48])[0]
     e_shnum = struct.unpack("<H", elf_data[60:62])[0]
     e_shstrndx = struct.unpack("<H", elf_data[62:64])[0]
+
+    section_table_end = e_shoff + e_shnum * _SHDR_SIZE
+    if e_shnum == 0 or section_table_end > len(elf_data):
+        raise ValueError(f"Invalid ELF section table in: {source_name}")
+    if e_shstrndx >= e_shnum:
+        raise ValueError(f"Invalid ELF shstrndx in: {source_name}")
@@
     shstr_offset = e_shoff + e_shstrndx * _SHDR_SIZE
+    if shstr_offset + _SHDR_SIZE > len(elf_data):
+        raise ValueError(f"Invalid ELF shstr section header in: {source_name}")
     shstr_sh_offset = struct.unpack("<Q", elf_data[shstr_offset + 24 : shstr_offset + 32])[0]
     shstr_sh_size = struct.unpack("<Q", elf_data[shstr_offset + 32 : shstr_offset + 40])[0]
+    if shstr_sh_offset + shstr_sh_size > len(elf_data):
+        raise ValueError(f"Invalid ELF shstr table range in: {source_name}")
@@
     for i in range(e_shnum):
         section_offset = e_shoff + i * _SHDR_SIZE
+        if section_offset + _SHDR_SIZE > len(elf_data):
+            raise ValueError(f"Invalid ELF section header range in: {source_name}")
@@
         if sh_type == _SHT_PROGBITS and section_name == ".text":
+            if sh_offset + sh_size > len(elf_data):
+                raise ValueError(f"Invalid .text section range in: {source_name}")
             text_data = elf_data[sh_offset : sh_offset + sh_size]

Also applies to: 121-131

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@simpler_setup/elf_parser.py` around lines 103 - 111, Validate ELF buffer
bounds before performing struct.unpack/slices: ensure e_shoff, e_shnum and
e_shstrndx are sane (e.g., e_shoff + e_shnum * _SHDR_SIZE and e_shoff +
e_shstrndx * _SHDR_SIZE are <= len(elf_data)) and that each computed
shstr_offset, shstr_sh_offset and the range
shstr_sh_offset:shstr_sh_offset+shstr_sh_size are within the buffer; similarly,
in the section iteration that uses those values (the code around e_shoff/e_shnum
handling at lines ~121-131), check each section header offset and size before
reading (sh_offset/sh_size) and raise a clear ValueError including source_name
if any computed offset/length would read outside len(elf_data).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/ut/py/test_elf_parser.py`:
- Line 101: The inline comment on the assignment to text_bytes uses the Unicode
multiplication glyph '×' which triggers lint RUF003; update the comment in the
test_elf_parser.py line where text_bytes = b"\xd6\x5f\x03\xc0" * 4 to replace
'×' with plain ASCII 'x' (e.g., "RET x 4; arbitrary payload, never executed") so
the linter no longer complains while keeping the comment meaning intact.

---

Outside diff comments:
In `@simpler_setup/elf_parser.py`:
- Around line 103-111: Validate ELF buffer bounds before performing
struct.unpack/slices: ensure e_shoff, e_shnum and e_shstrndx are sane (e.g.,
e_shoff + e_shnum * _SHDR_SIZE and e_shoff + e_shstrndx * _SHDR_SIZE are <=
len(elf_data)) and that each computed shstr_offset, shstr_sh_offset and the
range shstr_sh_offset:shstr_sh_offset+shstr_sh_size are within the buffer;
similarly, in the section iteration that uses those values (the code around
e_shoff/e_shnum handling at lines ~121-131), check each section header offset
and size before reading (sh_offset/sh_size) and raise a clear ValueError
including source_name if any computed offset/length would read outside
len(elf_data).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ee80a5de-15ec-496c-8b22-eac0741e1f7a

📥 Commits

Reviewing files that changed from the base of the PR and between d50fa68 and 78a5062.

📒 Files selected for processing (2)
  • simpler_setup/elf_parser.py
  • tests/ut/py/test_elf_parser.py

Comment thread tests/ut/py/test_elf_parser.py Outdated
Fixes hw-native-sys#900

The AICore kernel loader (`simpler_setup/elf_parser.py`) silently
dropped `.text._Z*` group sections (out-of-line template
instantiations) and `.rela.text*` relocations when extracting a `.text`
payload from a `.o`. The unresolved `BL`/`B` targets in `.text` then
branched to garbage on device, manifesting as CANN 507018 watchdog
timeouts (issue hw-native-sys#831 / PR hw-native-sys#830) or silently-wrong partial output
(issue hw-native-sys#900). Both symptoms are extremely hard to root-cause from the
runtime error alone.

This change is the minimum to keep the next person from repeating that
diagnostic loop: a pre-flight scan that fails loud if the `.o`
contains `.text._Z*` or any `.rela.text*` entries. The error names the
offending sections and points at the `always_inline` kernel-side
workaround. The literal-`.text` extraction path is otherwise unchanged
— working kernels stay byte-identical (verified against the PA
highperf `.o` from PR hw-native-sys#899 and the existing fully-inlined kernels).

Loader-side relocation application is a separable follow-up; this PR
just closes the silent-failure mode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@hw-native-sys-bot hw-native-sys-bot force-pushed the fix/issue-900-aicore-elf-relocations branch from 78a5062 to c4ac0bf Compare June 1, 2026 08:06
@ChaoWao ChaoWao merged commit 411157e into hw-native-sys:main Jun 1, 2026
30 of 31 checks passed
@ChaoWao ChaoWao deleted the fix/issue-900-aicore-elf-relocations branch June 1, 2026 08:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants