Skip to content

Triangular Inverse Kernel (continuation of Zouzias' impl)#830

Open
MirkoDeVita98 wants to merge 1 commit into
hw-native-sys:mainfrom
MirkoDeVita98:port/zouzias-pr-1
Open

Triangular Inverse Kernel (continuation of Zouzias' impl)#830
MirkoDeVita98 wants to merge 1 commit into
hw-native-sys:mainfrom
MirkoDeVita98:port/zouzias-pr-1

Conversation

@MirkoDeVita98
Copy link
Copy Markdown
Contributor

@MirkoDeVita98 MirkoDeVita98 commented May 20, 2026

python examples/a2a3/tensormap_and_ringbuffer/triangular_inverse/test_triangular_inverse.py -p a2a3sim

Added a few more tests.

python examples/a2a3/tensormap_and_ringbuffer/triangular_inverse/test_triangular_inverse.py -p a2a3
/home/zouzias/github-repos/mirko/simpler/venv/lib/python3.12/site-packages/torch_npu/utils/collect_env.py:58: UserWarning: Warning: The /usr/local/Ascend/cann-9.0.0 owner does not match the current owner.
  warnings.warn(f"Warning: The {path} owner does not match the current owner.")
/home/zouzias/github-repos/mirko/simpler/venv/lib/python3.12/site-packages/torch_npu/utils/collect_env.py:58: UserWarning: Warning: The /usr/local/Ascend/cann-9.0.0/aarch64-linux/ascend_ops_install.info owner does not match the current owner.
  warnings.warn(f"Warning: The {path} owner does not match the current owner.")
/home/zouzias/github-repos/mirko/simpler/venv/lib/python3.12/site-packages/torch_npu/utils/collect_env.py:58: UserWarning: Warning: The /usr/local/Ascend/cann-9.0.0 owner does not match the current owner.
  warnings.warn(f"Warning: The {path} owner does not match the current owner.")
/home/zouzias/github-repos/mirko/simpler/venv/lib/python3.12/site-packages/torch_npu/utils/collect_env.py:58: UserWarning: Warning: The /usr/local/Ascend/cann-9.0.0/aarch64-linux/ascend_ops_install.info owner does not match the current owner.
  warnings.warn(f"Warning: The {path} owner does not match the current owner.")

=== Runtime: tensormap_and_ringbuffer  Level: 2 ===
  TestTriangularInverse::Case_upper_tri_matrix_size_16 ... PASSED
  TestTriangularInverse::Case_upper_tri_matrix_size_32 ... PASSED
  TestTriangularInverse::Case_upper_tri_matrix_size_64 ... PASSED
  TestTriangularInverse::Case_upper_tri_matrix_size_128 ... PASSED
  TestTriangularInverse::Case_lower_tri_matrix_size_128 ... PASSED

Simulation

python examples/a2a3/tensormap_and_ringbuffer/triangular_inverse/test_triangular_inverse.py -p a2a3sim
/home/zouzias/github-repos/mirko/simpler/venv/lib/python3.12/site-packages/torch_npu/utils/collect_env.py:58: UserWarning: Warning: The /usr/local/Ascend/cann-9.0.0 owner does not match the current owner.
  warnings.warn(f"Warning: The {path} owner does not match the current owner.")
/home/zouzias/github-repos/mirko/simpler/venv/lib/python3.12/site-packages/torch_npu/utils/collect_env.py:58: UserWarning: Warning: The /usr/local/Ascend/cann-9.0.0/aarch64-linux/ascend_ops_install.info owner does not match the current owner.
  warnings.warn(f"Warning: The {path} owner does not match the current owner.")
/home/zouzias/github-repos/mirko/simpler/venv/lib/python3.12/site-packages/torch_npu/utils/collect_env.py:58: UserWarning: Warning: The /usr/local/Ascend/cann-9.0.0 owner does not match the current owner.
  warnings.warn(f"Warning: The {path} owner does not match the current owner.")
/home/zouzias/github-repos/mirko/simpler/venv/lib/python3.12/site-packages/torch_npu/utils/collect_env.py:58: UserWarning: Warning: The /usr/local/Ascend/cann-9.0.0/aarch64-linux/ascend_ops_install.info owner does not match the current owner.
  warnings.warn(f"Warning: The {path} owner does not match the current owner.")

=== Runtime: tensormap_and_ringbuffer  Level: 2 ===
  TestTriangularInverse::Case_upper_tri_matrix_size_16 ... [2026-06-01 13:57:52.164432][T0xe0792afdf180][INFO_V9] run: [aicpu_executor.cpp:687] Thread 3: orch_start=89015753608220953 orch_end=89015753608221520 orch_cost=11.340us
[2026-06-01 13:57:52.164490][T0xe0792afdf180][INFO_V9] run: [aicpu_executor.cpp:693] PTO2 total submitted tasks = 1, already executed 0 tasks
[2026-06-01 13:57:52.165331][T0xe0792bfff180][INFO_V9] log_l2_swimlane_summary: [scheduler_cold_path.cpp:401] Thread 0: sched_start=89015753608220962 sched_end=89015753608266520 sched_cost=911.160us
[2026-06-01 13:57:52.165366][T0xe0792bfff180][INFO_V9] log_l2_swimlane_summary: [scheduler_cold_path.cpp:542] Thread 0: Scheduler summary: total_time=861.580us, loops=666, tasks_scheduled=1
[2026-06-01 13:57:52.165334][T0xe0792b7ef180][INFO_V9] log_l2_swimlane_summary: [scheduler_cold_path.cpp:401] Thread 2: sched_start=89015753608220950 sched_end=89015753608266503 sched_cost=911.060us
[2026-06-01 13:57:52.165412][T0xe0792b7ef180][INFO_V9] log_l2_swimlane_summary: [scheduler_cold_path.cpp:542] Thread 2: Scheduler summary: total_time=865.520us, loops=823, tasks_scheduled=0
[2026-06-01 13:57:52.165334][T0xe07931fff180][INFO_V9] log_l2_swimlane_summary: [scheduler_cold_path.cpp:401] Thread 1: sched_start=89015753608220988 sched_end=89015753608266498 sched_cost=910.200us
[2026-06-01 13:57:52.165458][T0xe07931fff180][INFO_V9] log_l2_swimlane_summary: [scheduler_cold_path.cpp:542] Thread 1: Scheduler summary: total_time=844.320us, loops=795, tasks_scheduled=0
PASSED
  TestTriangularInverse::Case_upper_tri_matrix_size_32 ... [2026-06-01 13:57:52.191588][T0xe07929fbf180][INFO_V9] run: [aicpu_executor.cpp:687] Thread 3: orch_start=89015753609578531 orch_end=89015753609579324 orch_cost=15.860us
[2026-06-01 13:57:52.191631][T0xe07929fbf180][INFO_V9] run: [aicpu_executor.cpp:693] PTO2 total submitted tasks = 1, already executed 0 tasks
[2026-06-01 13:57:52.194091][T0xe07928f9f180][INFO_V9] log_l2_swimlane_summary: [scheduler_cold_path.cpp:401] Thread 0: sched_start=89015753609578842 sched_end=89015753609704529 sched_cost=2513.740us
[2026-06-01 13:57:52.194125][T0xe07928f9f180][INFO_V9] log_l2_swimlane_summary: [scheduler_cold_path.cpp:542] Thread 0: Scheduler summary: total_time=2429.700us, loops=2181, tasks_scheduled=1
[2026-06-01 13:57:52.194094][T0xe07923fff180][INFO_V9] log_l2_swimlane_summary: [scheduler_cold_path.cpp:401] Thread 2: sched_start=89015753609578841 sched_end=89015753609704517 sched_cost=2513.520us
[2026-06-01 13:57:52.194162][T0xe07923fff180][INFO_V9] log_l2_swimlane_summary: [scheduler_cold_path.cpp:542] Thread 2: Scheduler summary: total_time=2422.240us, loops=2354, tasks_scheduled=0
[2026-06-01 13:57:52.194102][T0xe079297af180][INFO_V9] log_l2_swimlane_summary: [scheduler_cold_path.cpp:401] Thread 1: sched_start=89015753609578476 sched_end=89015753609704518 sched_cost=2520.840us
[2026-06-01 13:57:52.194198][T0xe079297af180][INFO_V9] log_l2_swimlane_summary: [scheduler_cold_path.cpp:542] Thread 1: Scheduler summary: total_time=2427.880us, loops=2406, tasks_scheduled=0
PASSED

@MirkoDeVita98 MirkoDeVita98 changed the title Triangular Inverse Kernel (continuation of Zozuias's impl) Triangular Inverse Kernel (continuation of Zouzias's impl) May 20, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new triangular matrix inverse example, including a recursive unrolled AICore kernel, orchestration logic, and a Python test suite. It also updates the benchmark_bgemm test configuration and adds debug logging to the scene test framework. Review feedback suggests removing leftover debug print statements, correcting documentation comments about the configuration tensor layout, and simplifying the kernel dispatch function by removing redundant template parameters.

Comment thread simpler_setup/scene_test.py Outdated
@MirkoDeVita98 MirkoDeVita98 changed the title Triangular Inverse Kernel (continuation of Zouzias's impl) Triangular Inverse Kernel (continuation of Zouzias' impl) May 20, 2026
Comment thread Makefile Outdated
Comment thread simpler_setup/scene_test.py Outdated
Comment thread examples/a2a3/tensormap_and_ringbuffer/benchmark_bgemm/test_benchmark_bgemm.py Outdated
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cef1ba42-68ad-436b-9920-a8e94b7fad6c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR introduces a complete triangular matrix inversion feature: an AICORE kernel implementing a recursive unrolled algorithm for inverting upper/lower-triangular fp16/bf16 matrices, paired with orchestration dispatch and a test suite for validation.

Changes

Triangular Inverse Feature

Layer / File(s) Summary
AICORE Infrastructure and Memory Helpers
kernel_tri_inv_rec_unroll.cpp (lines 1–237)
File header and guards; inlined BSND utilities for computing tile offsets and valid sizes for fixed and variable-length addressing; L1→L0 copy helpers for diagonal fractals and alternating block sets; auxiliary matrix preparation via pipelined TMOV/TMATMUL with explicit synchronization.
AICORE Inversion Algorithm and Kernel Orchestrator
kernel_tri_inv_rec_unroll.cpp (lines 239–731)
Single-tile inversion using inv-trick iteration loop followed by unrolled recursion with odd/even block parity swapping; main kernel TriInvRecUnrollKernel that schedules work across cube iterations and tiles, loads inputs with BSND offset support and dynamic padding, invokes per-tile inversion, and stores results with BSND-aware output sizing.
AICORE Public API and Dispatchers
kernel_tri_inv_rec_unroll.cpp (lines 732–855)
Matrix-size dispatcher runKernelTriInvRecUnroll selecting compile-time instantiations; tiles-per-iteration selector run_tri_inv_rec_unroll_per_num_matrices routing on matrix count relative to block dimension and BSND mode; simpler-framework entry point kernel_entry unpacking tensor arguments and config from buffer.
Orchestration and Task Submission
triangular_inverse_orch.cpp (lines 1–68)
Orchestration config function declaring 4 tensor arguments; entrypoint extracting input/output tensors and host config (matrix_size, num_matrices, is_lower, block_dim), logging parameters, and submitting AIC task via FUNC_TRI_INV.
Test Suite with Golden Reference Validation
test_triangular_inverse.py (lines 1–115)
Helper random_tri_matrix generating upper/lower-triangular unit-diagonal matrices; helper linalg_inv computing per-matrix inverses with added identity via NumPy; test class TestTriangularInverse building fp16 inputs and validating kernel output against reference inverses across multiple matrix sizes and orientations.

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A triangular dance, recursive and deep,
Unrolling block by block where matrices sleep,
Inv-trick loops and L0 buffers gleam bright,
From AICORE to test, the inverse shines right!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main addition: a triangular inverse kernel implementation, which matches the substantial new kernel files and test infrastructure added in the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description demonstrates working test cases and includes command examples directly related to the triangular inverse kernel implementation being added.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@zouzias zouzias force-pushed the port/zouzias-pr-1 branch 3 times, most recently from d9154fd to 244882d Compare May 29, 2026 08:16
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
examples/a2a3/tensormap_and_ringbuffer/triangular_inverse_example/test_triangular_inverse.py (1)

91-93: ⚡ Quick win

Fix the stale matrix-construction comment.

Line 91–Line 93 say the diagonal is set to [0.5, 1.5], but this code path does not do that. Please update the comment to match the actual behavior (strictly triangular input, unit diagonal handled via + I in reference inversion).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@examples/a2a3/tensormap_and_ringbuffer/triangular_inverse_example/test_triangular_inverse.py`
around lines 91 - 93, Update the stale comment describing matrix construction to
reflect the actual behavior: note that the code builds strictly triangular fp16
matrices (zeros out the off-triangle and does not set diagonal values to
[0.5,1.5]) and that the unit diagonal is introduced only when computing the
reference inverse via "+ I" (i.e., the test uses strictly triangular input and
adds the identity in the reference inversion). Mention the
fp16/strictly-triangular inputs and the "+ I" reference inversion so future
readers understand how invertibility is handled.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@examples/a2a3/tensormap_and_ringbuffer/triangular_inverse_example/kernels/aic/kernel_tri_inv_rec_unroll.cpp`:
- Around line 765-786: The switch on matrix_size in the block that calls
runKernelTriInvRecUnroll (cases 16/32/64/128) lacks a default and thus silently
does nothing for unsupported sizes; either add a guarded default branch that
logs an error/throws/returns a failure status (ensuring M_inv is not left stale)
or validate matrix_size earlier in aicpu_orchestration_entry and fail fast with
a clear error; locate the switch using the symbol matrix_size and the
runKernelTriInvRecUnroll instantiations and implement one of these checks so
callers receive a loud, deterministic error for unsupported sizes.
- Line 23: Update the header/comment for args[3] to match the actual config
layout read by the kernel (where config values are extracted into matrix_size,
num_matrices, is_lower, block_dim); change the doc from "int64[3]: [matrix_size,
num_matrices, is_lower]" to "int64[4]: [matrix_size, num_matrices, is_lower,
block_dim]" so it accurately documents the values consumed by the code that
reads config into those four int64 variables.
- Around line 789-830: The wrapper run_tri_inv_rec_unroll_per_num_matrices
currently has unused template parameters NumTilesPerCubeIter and IsBSND; remove
them from its template parameter list so it becomes template<typename InputT,
typename OutputT> AICORE void run_tri_inv_rec_unroll_per_num_matrices(...),
leaving the internal forwards to run_tri_inv_rec_unroll(...) unchanged (those
calls still specify NumTilesPerCubeIter and IsBSND as before). Update any call
sites that instantiate run_tri_inv_rec_unroll_per_num_matrices (e.g., places
that passed <half, half, 1, false>) to drop the unused template arguments so the
wrapper is instantiated only with InputT and OutputT. Ensure symbols to edit:
run_tri_inv_rec_unroll_per_num_matrices and its callers; do not change
run_tri_inv_rec_unroll signature.
- Around line 77-89: The loop over seq_idx is unbounded and can read past
cu_seqlens via cu_seqlens[seq_idx + 1]; change the loop to be bounded by the
number of sequences (or cu_seqlens length) and handle the “not found” case:
accept an additional parameter like seq_count (or cu_seqlens_len) and iterate
using for (uint32_t seq_idx = 0; seq_idx + 1 < seq_count; ++seq_idx) (or check
seq_idx + 1 < cu_seqlens_len before indexing), keep the existing logic that
computes seq_end/seq_len/seq_num_chunks and returns when chunk_idx falls into
the range, and if the loop finishes without finding the chunk return a safe
default or error (e.g., an invalid tile info or throw/assert) to avoid
out-of-bounds reads; update all call sites of GetBSNDVarlenTileInfoFromCuSeqlens
accordingly.

In
`@examples/a2a3/tensormap_and_ringbuffer/triangular_inverse_example/kernels/orchestration/triangular_inverse_orch.cpp`:
- Line 20: The argument-layout comment is using the wrong tensor index: the
config is read via orch_args.tensor(3) in this file (see uses of
orch_args.tensor(3) around the config handling), but the doc comment currently
labels it as tensor(4) and omits tensor(3); update the comment so the config
line reads "tensor(3) = config (INPUT) int64[4]: [matrix_size, num_matrices,
is_lower, block_dim]" (and ensure other tensor indices in that comment are
sequential and accurate to the orch_args.tensor(N) usages).

---

Nitpick comments:
In
`@examples/a2a3/tensormap_and_ringbuffer/triangular_inverse_example/test_triangular_inverse.py`:
- Around line 91-93: Update the stale comment describing matrix construction to
reflect the actual behavior: note that the code builds strictly triangular fp16
matrices (zeros out the off-triangle and does not set diagonal values to
[0.5,1.5]) and that the unit diagonal is introduced only when computing the
reference inverse via "+ I" (i.e., the test uses strictly triangular input and
adds the identity in the reference inversion). Mention the
fp16/strictly-triangular inputs and the "+ I" reference inversion so future
readers understand how invertibility is handled.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 42bed645-9f72-456c-a3a0-2d1c29509b75

📥 Commits

Reviewing files that changed from the base of the PR and between ccb95e3 and 244882d.

📒 Files selected for processing (3)
  • examples/a2a3/tensormap_and_ringbuffer/triangular_inverse_example/kernels/aic/kernel_tri_inv_rec_unroll.cpp
  • examples/a2a3/tensormap_and_ringbuffer/triangular_inverse_example/kernels/orchestration/triangular_inverse_orch.cpp
  • examples/a2a3/tensormap_and_ringbuffer/triangular_inverse_example/test_triangular_inverse.py

hw-native-sys-bot pushed a commit to hw-native-sys-bot/simpler that referenced this pull request Jun 1, 2026
Fixes hw-native-sys#900

The AICore kernel loader (`simpler_setup/elf_parser.py`) silently
dropped `.text._Z*` group sections (out-of-line template
instantiations) and `.rela.text*` relocations when extracting a `.text`
payload from a `.o`. The unresolved `BL`/`B` targets in `.text` then
branched to garbage on device, manifesting as CANN 507018 watchdog
timeouts (issue hw-native-sys#831 / PR hw-native-sys#830) or silently-wrong partial output
(issue hw-native-sys#900). Both symptoms are extremely hard to root-cause from the
runtime error alone.

This change is the minimum to keep the next person from repeating that
diagnostic loop: a pre-flight scan that fails loud if the `.o`
contains `.text._Z*` or any `.rela.text*` entries. The error names the
offending sections and points at the `always_inline` kernel-side
workaround. The literal-`.text` extraction path is otherwise unchanged
— working kernels stay byte-identical (verified against the PA
highperf `.o` from PR hw-native-sys#899 and the existing fully-inlined kernels).

Loader-side relocation application is a separable follow-up; this PR
just closes the silent-failure mode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ChaoWao added a commit that referenced this pull request Jun 1, 2026
Fixes #900

The AICore kernel loader (`simpler_setup/elf_parser.py`) silently
dropped `.text._Z*` group sections (out-of-line template
instantiations) and `.rela.text*` relocations when extracting a `.text`
payload from a `.o`. The unresolved `BL`/`B` targets in `.text` then
branched to garbage on device, manifesting as CANN 507018 watchdog
timeouts (issue #831 / PR #830) or silently-wrong partial output
(issue #900). Both symptoms are extremely hard to root-cause from the
runtime error alone.

This change is the minimum to keep the next person from repeating that
diagnostic loop: a pre-flight scan that fails loud if the `.o`
contains `.text._Z*` or any `.rela.text*` entries. The error names the
offending sections and points at the `always_inline` kernel-side
workaround. The literal-`.text` extraction path is otherwise unchanged
— working kernels stay byte-identical (verified against the PA
highperf `.o` from PR #899 and the existing fully-inlined kernels).

Loader-side relocation application is a separable follow-up; this PR
just closes the silent-failure mode.

Co-authored-by: Chao Wang <26245345+ChaoWao@users.noreply.github.com>
@zouzias zouzias force-pushed the port/zouzias-pr-1 branch 3 times, most recently from a2a7b86 to c3d4ade Compare June 1, 2026 11:57
@zouzias
Copy link
Copy Markdown
Contributor

zouzias commented Jun 1, 2026

@MirkoDeVita98 , can you please update the PR description with

python examples/a2a3/tensormap_and_ringbuffer/triangular_inverse/test_triangular_inverse.py -p a2a3sim

Added a few more tests.

python examples/a2a3/tensormap_and_ringbuffer/triangular_inverse/test_triangular_inverse.py -p a2a3
/home/zouzias/github-repos/mirko/simpler/venv/lib/python3.12/site-packages/torch_npu/utils/collect_env.py:58: UserWarning: Warning: The /usr/local/Ascend/cann-9.0.0 owner does not match the current owner.
  warnings.warn(f"Warning: The {path} owner does not match the current owner.")
/home/zouzias/github-repos/mirko/simpler/venv/lib/python3.12/site-packages/torch_npu/utils/collect_env.py:58: UserWarning: Warning: The /usr/local/Ascend/cann-9.0.0/aarch64-linux/ascend_ops_install.info owner does not match the current owner.
  warnings.warn(f"Warning: The {path} owner does not match the current owner.")
/home/zouzias/github-repos/mirko/simpler/venv/lib/python3.12/site-packages/torch_npu/utils/collect_env.py:58: UserWarning: Warning: The /usr/local/Ascend/cann-9.0.0 owner does not match the current owner.
  warnings.warn(f"Warning: The {path} owner does not match the current owner.")
/home/zouzias/github-repos/mirko/simpler/venv/lib/python3.12/site-packages/torch_npu/utils/collect_env.py:58: UserWarning: Warning: The /usr/local/Ascend/cann-9.0.0/aarch64-linux/ascend_ops_install.info owner does not match the current owner.
  warnings.warn(f"Warning: The {path} owner does not match the current owner.")

=== Runtime: tensormap_and_ringbuffer  Level: 2 ===
  TestTriangularInverse::Case_upper_tri_matrix_size_16 ... PASSED
  TestTriangularInverse::Case_upper_tri_matrix_size_32 ... PASSED
  TestTriangularInverse::Case_upper_tri_matrix_size_64 ... PASSED
  TestTriangularInverse::Case_upper_tri_matrix_size_128 ... PASSED
  TestTriangularInverse::Case_lower_tri_matrix_size_128 ... PASSED

Simulation

python examples/a2a3/tensormap_and_ringbuffer/triangular_inverse/test_triangular_inverse.py -p a2a3sim
/home/zouzias/github-repos/mirko/simpler/venv/lib/python3.12/site-packages/torch_npu/utils/collect_env.py:58: UserWarning: Warning: The /usr/local/Ascend/cann-9.0.0 owner does not match the current owner.
  warnings.warn(f"Warning: The {path} owner does not match the current owner.")
/home/zouzias/github-repos/mirko/simpler/venv/lib/python3.12/site-packages/torch_npu/utils/collect_env.py:58: UserWarning: Warning: The /usr/local/Ascend/cann-9.0.0/aarch64-linux/ascend_ops_install.info owner does not match the current owner.
  warnings.warn(f"Warning: The {path} owner does not match the current owner.")
/home/zouzias/github-repos/mirko/simpler/venv/lib/python3.12/site-packages/torch_npu/utils/collect_env.py:58: UserWarning: Warning: The /usr/local/Ascend/cann-9.0.0 owner does not match the current owner.
  warnings.warn(f"Warning: The {path} owner does not match the current owner.")
/home/zouzias/github-repos/mirko/simpler/venv/lib/python3.12/site-packages/torch_npu/utils/collect_env.py:58: UserWarning: Warning: The /usr/local/Ascend/cann-9.0.0/aarch64-linux/ascend_ops_install.info owner does not match the current owner.
  warnings.warn(f"Warning: The {path} owner does not match the current owner.")

=== Runtime: tensormap_and_ringbuffer  Level: 2 ===
  TestTriangularInverse::Case_upper_tri_matrix_size_16 ... [2026-06-01 13:57:52.164432][T0xe0792afdf180][INFO_V9] run: [aicpu_executor.cpp:687] Thread 3: orch_start=89015753608220953 orch_end=89015753608221520 orch_cost=11.340us
[2026-06-01 13:57:52.164490][T0xe0792afdf180][INFO_V9] run: [aicpu_executor.cpp:693] PTO2 total submitted tasks = 1, already executed 0 tasks
[2026-06-01 13:57:52.165331][T0xe0792bfff180][INFO_V9] log_l2_swimlane_summary: [scheduler_cold_path.cpp:401] Thread 0: sched_start=89015753608220962 sched_end=89015753608266520 sched_cost=911.160us
[2026-06-01 13:57:52.165366][T0xe0792bfff180][INFO_V9] log_l2_swimlane_summary: [scheduler_cold_path.cpp:542] Thread 0: Scheduler summary: total_time=861.580us, loops=666, tasks_scheduled=1
[2026-06-01 13:57:52.165334][T0xe0792b7ef180][INFO_V9] log_l2_swimlane_summary: [scheduler_cold_path.cpp:401] Thread 2: sched_start=89015753608220950 sched_end=89015753608266503 sched_cost=911.060us
[2026-06-01 13:57:52.165412][T0xe0792b7ef180][INFO_V9] log_l2_swimlane_summary: [scheduler_cold_path.cpp:542] Thread 2: Scheduler summary: total_time=865.520us, loops=823, tasks_scheduled=0
[2026-06-01 13:57:52.165334][T0xe07931fff180][INFO_V9] log_l2_swimlane_summary: [scheduler_cold_path.cpp:401] Thread 1: sched_start=89015753608220988 sched_end=89015753608266498 sched_cost=910.200us
[2026-06-01 13:57:52.165458][T0xe07931fff180][INFO_V9] log_l2_swimlane_summary: [scheduler_cold_path.cpp:542] Thread 1: Scheduler summary: total_time=844.320us, loops=795, tasks_scheduled=0
PASSED
  TestTriangularInverse::Case_upper_tri_matrix_size_32 ... [2026-06-01 13:57:52.191588][T0xe07929fbf180][INFO_V9] run: [aicpu_executor.cpp:687] Thread 3: orch_start=89015753609578531 orch_end=89015753609579324 orch_cost=15.860us
[2026-06-01 13:57:52.191631][T0xe07929fbf180][INFO_V9] run: [aicpu_executor.cpp:693] PTO2 total submitted tasks = 1, already executed 0 tasks
[2026-06-01 13:57:52.194091][T0xe07928f9f180][INFO_V9] log_l2_swimlane_summary: [scheduler_cold_path.cpp:401] Thread 0: sched_start=89015753609578842 sched_end=89015753609704529 sched_cost=2513.740us
[2026-06-01 13:57:52.194125][T0xe07928f9f180][INFO_V9] log_l2_swimlane_summary: [scheduler_cold_path.cpp:542] Thread 0: Scheduler summary: total_time=2429.700us, loops=2181, tasks_scheduled=1
[2026-06-01 13:57:52.194094][T0xe07923fff180][INFO_V9] log_l2_swimlane_summary: [scheduler_cold_path.cpp:401] Thread 2: sched_start=89015753609578841 sched_end=89015753609704517 sched_cost=2513.520us
[2026-06-01 13:57:52.194162][T0xe07923fff180][INFO_V9] log_l2_swimlane_summary: [scheduler_cold_path.cpp:542] Thread 2: Scheduler summary: total_time=2422.240us, loops=2354, tasks_scheduled=0
[2026-06-01 13:57:52.194102][T0xe079297af180][INFO_V9] log_l2_swimlane_summary: [scheduler_cold_path.cpp:401] Thread 1: sched_start=89015753609578476 sched_end=89015753609704518 sched_cost=2520.840us
[2026-06-01 13:57:52.194198][T0xe079297af180][INFO_V9] log_l2_swimlane_summary: [scheduler_cold_path.cpp:542] Thread 1: Scheduler summary: total_time=2427.880us, loops=2406, tasks_scheduled=0
PASSED

@zouzias zouzias force-pushed the port/zouzias-pr-1 branch from 4e0a2db to 1145957 Compare June 2, 2026 06:43
Comment thread pyproject.toml
@zouzias zouzias force-pushed the port/zouzias-pr-1 branch from 1145957 to 92971af Compare June 2, 2026 06:50
@zouzias zouzias force-pushed the port/zouzias-pr-1 branch from 92971af to 495663d Compare June 2, 2026 11:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants