Fix: recover missing orchestrator phases in L2 swimlane level 4 by indigo1973 · Pull Request #971 · hw-native-sys/simpler

indigo1973 · 2026-06-02T08:46:59Z

--enable-l2-swimlane level 4 produced an empty aicpu_orchestrator_phases section — every orchestrator-phase record was dropped. Two causes:

The orch-phase pool was indexed by the orchestrator's AICPU thread id (the last thread), but only lower indices were primed/drained, so record_orch_phase hit a null pool and dropped the record at write.
flush read the orch buffer's count through a SchedPhaseBuffer cast. count follows records[N], whose stride differs (sched 40B vs orch 32B), so the read landed past the buffer, saw 0, and skipped the flush.

Make orch a single instance at pool ordinal 0 (matching dep_gen / scope_stats): the orchestrator records and flushes pool 0 and uses its thread index only to select its own ready queue; entries are tagged with ordinal 0. Read/write count through the matching buffer type.

Split l2_swimlane_aicpu_flush_phase_buffers into flush_sched_phase_buffer (per scheduler thread) and flush_orch_phase_buffer (orchestrator only).
Host allocates buffers for one orch pool by actual usage (~24MB less); unused pool states are still zeroed for safe reconcile iteration.
Pass aicpu_thread_num to the host collector so the phase-metadata log reports the orchestrator's real AICPU thread number; bound the JSON export to the single orch lane.

coderabbitai · 2026-06-02T08:47:13Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d0bb5102-70e1-4fa2-a335-ea9c90cee243

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR refactors L2 swimlane phase profiling to decouple ready-queue ownership from host lane indexing. It replaces a unified flush API with per-kind variants, threads aicpu_thread_num through initialization, and consolidates orchestrator phase records into a single pool instance while distinguishing state zeroing from buffer allocation during setup.

Changes

L2 Swimlane Phase Profiling Refactor

Layer / File(s)	Summary
Public API and Contract Split `src/a2a3/platform/include/aicpu/l2_swimlane_collector_aicpu.h`, `src/a2a3/platform/include/host/l2_swimlane_collector.h`, `src/a2a3/platform/include/common/l2_swimlane_profiling.h`	Unified `l2_swimlane_aicpu_flush_phase_buffers()` is split into `l2_swimlane_aicpu_flush_sched_phase_buffer()` and `l2_swimlane_aicpu_flush_orch_phase_buffer()`. `L2SwimlaneCollector::initialize` gains an `aicpu_thread_num` parameter; private member `aicpu_thread_num_` is added to store orchestrator thread identity. Documentation updated to clarify per-pool ownership and single-instance orchestrator semantics.
Initialization Parameter Threading `src/a2a3/platform/onboard/host/device_runner.h`, `src/a2a3/platform/onboard/host/device_runner.cpp`, `src/a2a3/platform/sim/host/device_runner.h`, `src/a2a3/platform/sim/host/device_runner.cpp`	`aicpu_thread_num` is propagated from runtime through `DeviceRunner::run()` into `init_l2_swimlane()` and forwarded to `L2SwimlaneCollector::initialize()` in both onboard and simulator configurations.
Queue/Lane Decoupling in AICPU Phase Recording `src/a2a3/platform/shared/aicpu/l2_swimlane_collector_aicpu.cpp`	`switch_phase_buffer_kind` and `acquire_phase_slot` now accept separate `queue_idx` (SPSC ready-queue owner) and `core_index` (host lane tag). Scheduler-phase recording passes both as `thread_idx`; orchestrator-phase recording uses orchestrator thread for `queue_idx`, forces `core_index=0`, and writes all records to pool ordinal 0. End-of-run flushing is replaced with `flush_phase_pool` helper and two public flush variants with distinct pool/queue handling.
Host Collector Pool Initialization `src/a2a3/platform/shared/host/l2_swimlane_collector.cpp` (initialization section)	`L2SwimlaneCollector::initialize` stores `aicpu_thread_num` and refactors phase pool setup to separate `state_count` (states to zero) from `buffer_count` (buffers to allocate). Scheduler allocates buffers for all threads; orchestrator allocates buffers only for pool 0 while zeroing all orch states.
Host Metadata Reporting and JSON Export `src/a2a3/platform/shared/host/l2_swimlane_collector.cpp` (logging and export section)	`read_phase_header_metadata` derives orchestrator thread index as `aicpu_thread_num_ - 1` for consistent logging. `export_swimlane_json` emits only the active orchestrator lane count from the shared-memory header instead of the full MAX-sized array.
Scheduler Flush Integration Points `src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp`, `src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp`	Scheduler initialization sets orchestrator-phase pool count to 1. `on_orchestration_done` flushes orchestrator orch-phase buffer at `ORCH_PHASES` gating. `resolve_and_dispatch` flushes scheduler phase buffer at `SCHED_PHASES` gating using the new scheduler-specific flush function.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

hw-native-sys/simpler#941: Modifies the existing l2_swimlane_aicpu_flush_phase_buffers(...) behavior; this PR directly refactors that same flush path into scheduler vs orchestrator variants.
hw-native-sys/simpler#916: Introduced the unified l2_swimlane_aicpu_flush_phase_buffers(int) API that this PR replaces with separate scheduler and orchestrator variants.
hw-native-sys/simpler#946: Implements complementary changes to ready-queue sizing and documentation reflecting the same scheduler vs orchestrator phase-pool split design.

Poem

🐰 A queue and a lane, now split apart,
Each phase knows where to find its heart.
One pool for orch, threads for sched divine,
No more confusion—decoupled design!
Ready buffers flush in perfect order,
From core to queue across the border. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely summarizes the primary change: fixing missing orchestrator phases in L2 swimlane level 4 by restructuring orch-phase buffer handling.
Description check	✅ Passed	The description is detailed and directly related to the changeset, explaining both root causes of the bug and all implemented fixes across multiple files.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request refactors the L2 swimlane phase profiling to treat orchestrator phase records as a single instance (pool ordinal 0) instead of dynamically sizing them based on the AICPU thread count. It splits the phase buffer flushing into separate scheduler and orchestrator functions, passes the total AICPU thread count to the collector initialization, and optimizes buffer allocation and JSON export for the orchestrator lane. Feedback recommends adding a null check for shm_host_ and defensively clamping the orchestrator lane count retrieved from shared memory to prevent potential undefined behavior or memory corruption.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/a2a3/platform/shared/host/l2_swimlane_collector.cpp (1)

79-97: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate aicpu_thread_num parameter.

The parameter is stored without validation. At line 648, it's used as aicpu_thread_num_ - 1 to derive the orchestrator thread index. If aicpu_thread_num is 0 or negative, this will produce an invalid thread index (-1 or worse).

🛡️ Proposed fix to add validation

     num_aicore_ = num_aicore;
+    if (aicpu_thread_num <= 0) {
+        LOG_ERROR("Invalid aicpu_thread_num: %d (must be > 0)", aicpu_thread_num);
+        return -1;
+    }
     aicpu_thread_num_ = aicpu_thread_num;
     l2_swimlane_level_ = l2_swimlane_level_;

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/a2a3/platform/shared/host/l2_swimlane_collector.cpp` around lines 79 -
97, Validate the aicpu_thread_num parameter in L2SwimlaneCollector::initialize:
ensure aicpu_thread_num > 0 (and optionally <= a sensible max if applicable)
before assigning to aicpu_thread_num_; if invalid, log an error via LOG_ERROR
with a clear message and return -1 so downstream code that uses
aicpu_thread_num_ - 1 (e.g., computing the orchestrator thread index) cannot
produce a negative index. Include the function name
L2SwimlaneCollector::initialize and the member aicpu_thread_num_ when locating
where to add this check.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/a2a3/platform/shared/host/l2_swimlane_collector.cpp`:
- Around line 79-97: Validate the aicpu_thread_num parameter in
L2SwimlaneCollector::initialize: ensure aicpu_thread_num > 0 (and optionally <=
a sensible max if applicable) before assigning to aicpu_thread_num_; if invalid,
log an error via LOG_ERROR with a clear message and return -1 so downstream code
that uses aicpu_thread_num_ - 1 (e.g., computing the orchestrator thread index)
cannot produce a negative index. Include the function name
L2SwimlaneCollector::initialize and the member aicpu_thread_num_ when locating
where to add this check.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d6cd863e-5819-48ee-90f6-9677f59bc5e3

📥 Commits

Reviewing files that changed from the base of the PR and between a71ab6b and 4fb91de.

📒 Files selected for processing (11)

src/a2a3/platform/include/aicpu/l2_swimlane_collector_aicpu.h
src/a2a3/platform/include/common/l2_swimlane_profiling.h
src/a2a3/platform/include/host/l2_swimlane_collector.h
src/a2a3/platform/onboard/host/device_runner.cpp
src/a2a3/platform/onboard/host/device_runner.h
src/a2a3/platform/shared/aicpu/l2_swimlane_collector_aicpu.cpp
src/a2a3/platform/shared/host/l2_swimlane_collector.cpp
src/a2a3/platform/sim/host/device_runner.cpp
src/a2a3/platform/sim/host/device_runner.h
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp

--enable-l2-swimlane level 4 produced an empty aicpu_orchestrator_phases section — every orchestrator-phase record was dropped. Two causes: - The orch-phase pool was indexed by the orchestrator's AICPU thread id (the last thread), but only lower indices were primed/drained, so record_orch_phase hit a null pool and dropped the record at write. - flush read the orch buffer's `count` through a SchedPhaseBuffer cast. `count` follows records[N], whose stride differs (sched 40B vs orch 32B), so the read landed past the buffer, saw 0, and skipped the flush. Make orch a single instance at pool ordinal 0 (matching dep_gen / scope_stats): the orchestrator records and flushes pool 0 and uses its thread index only to select its own ready queue; entries are tagged with ordinal 0. Read/write `count` through the matching buffer type. - Split l2_swimlane_aicpu_flush_phase_buffers into flush_sched_phase_buffer (per scheduler thread) and flush_orch_phase_buffer (orchestrator only); gate the orch flush on >= ORCH_PHASES to match where orch records are actually emitted. - Name the phase-enqueue params for their now-distinct roles: thread_idx (selects the caller's own ready queue) vs pool_idx (host filing/recycle ordinal) — they coincide for sched but differ for the single orch instance (pool 0). - Host allocates buffers for one orch pool by actual usage (~24MB less); unused pool states are still zeroed for safe reconcile iteration. - Pass aicpu_thread_num to the host collector so the phase-metadata log reports the orchestrator's real AICPU thread number; bound the JSON export to the single orch lane, and guard its shm_host_ deref.

gemini-code-assist Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread src/a2a3/platform/shared/host/l2_swimlane_collector.cpp

coderabbitai Bot reviewed Jun 2, 2026

View reviewed changes

indigo1973 force-pushed the swim_0602 branch from 4fb91de to 349ff29 Compare June 2, 2026 13:17

hw-native-sys-bot mentioned this pull request Jun 2, 2026

Refactor: a2a3 swimlane AICore-self-identifying records (level=1 bypasses complete_task) #974

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: recover missing orchestrator phases in L2 swimlane level 4#971

Fix: recover missing orchestrator phases in L2 swimlane level 4#971
indigo1973 wants to merge 1 commit into
hw-native-sys:mainfrom
indigo1973:swim_0602

indigo1973 commented Jun 2, 2026

Uh oh!

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

indigo1973 commented Jun 2, 2026

Uh oh!

coderabbitai Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading