Skip to content

Fix: recover missing orchestrator phases in L2 swimlane level 4#971

Open
indigo1973 wants to merge 1 commit into
hw-native-sys:mainfrom
indigo1973:swim_0602
Open

Fix: recover missing orchestrator phases in L2 swimlane level 4#971
indigo1973 wants to merge 1 commit into
hw-native-sys:mainfrom
indigo1973:swim_0602

Conversation

@indigo1973
Copy link
Copy Markdown
Contributor

--enable-l2-swimlane level 4 produced an empty aicpu_orchestrator_phases section — every orchestrator-phase record was dropped. Two causes:

  • The orch-phase pool was indexed by the orchestrator's AICPU thread id (the last thread), but only lower indices were primed/drained, so record_orch_phase hit a null pool and dropped the record at write.
  • flush read the orch buffer's count through a SchedPhaseBuffer cast. count follows records[N], whose stride differs (sched 40B vs orch 32B), so the read landed past the buffer, saw 0, and skipped the flush.

Make orch a single instance at pool ordinal 0 (matching dep_gen / scope_stats): the orchestrator records and flushes pool 0 and uses its thread index only to select its own ready queue; entries are tagged with ordinal 0. Read/write count through the matching buffer type.

  • Split l2_swimlane_aicpu_flush_phase_buffers into flush_sched_phase_buffer (per scheduler thread) and flush_orch_phase_buffer (orchestrator only).
  • Host allocates buffers for one orch pool by actual usage (~24MB less); unused pool states are still zeroed for safe reconcile iteration.
  • Pass aicpu_thread_num to the host collector so the phase-metadata log reports the orchestrator's real AICPU thread number; bound the JSON export to the single orch lane.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d0bb5102-70e1-4fa2-a335-ea9c90cee243

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR refactors L2 swimlane phase profiling to decouple ready-queue ownership from host lane indexing. It replaces a unified flush API with per-kind variants, threads aicpu_thread_num through initialization, and consolidates orchestrator phase records into a single pool instance while distinguishing state zeroing from buffer allocation during setup.

Changes

L2 Swimlane Phase Profiling Refactor

Layer / File(s) Summary
Public API and Contract Split
src/a2a3/platform/include/aicpu/l2_swimlane_collector_aicpu.h, src/a2a3/platform/include/host/l2_swimlane_collector.h, src/a2a3/platform/include/common/l2_swimlane_profiling.h
Unified l2_swimlane_aicpu_flush_phase_buffers() is split into l2_swimlane_aicpu_flush_sched_phase_buffer() and l2_swimlane_aicpu_flush_orch_phase_buffer(). L2SwimlaneCollector::initialize gains an aicpu_thread_num parameter; private member aicpu_thread_num_ is added to store orchestrator thread identity. Documentation updated to clarify per-pool ownership and single-instance orchestrator semantics.
Initialization Parameter Threading
src/a2a3/platform/onboard/host/device_runner.h, src/a2a3/platform/onboard/host/device_runner.cpp, src/a2a3/platform/sim/host/device_runner.h, src/a2a3/platform/sim/host/device_runner.cpp
aicpu_thread_num is propagated from runtime through DeviceRunner::run() into init_l2_swimlane() and forwarded to L2SwimlaneCollector::initialize() in both onboard and simulator configurations.
Queue/Lane Decoupling in AICPU Phase Recording
src/a2a3/platform/shared/aicpu/l2_swimlane_collector_aicpu.cpp
switch_phase_buffer_kind and acquire_phase_slot now accept separate queue_idx (SPSC ready-queue owner) and core_index (host lane tag). Scheduler-phase recording passes both as thread_idx; orchestrator-phase recording uses orchestrator thread for queue_idx, forces core_index=0, and writes all records to pool ordinal 0. End-of-run flushing is replaced with flush_phase_pool helper and two public flush variants with distinct pool/queue handling.
Host Collector Pool Initialization
src/a2a3/platform/shared/host/l2_swimlane_collector.cpp (initialization section)
L2SwimlaneCollector::initialize stores aicpu_thread_num and refactors phase pool setup to separate state_count (states to zero) from buffer_count (buffers to allocate). Scheduler allocates buffers for all threads; orchestrator allocates buffers only for pool 0 while zeroing all orch states.
Host Metadata Reporting and JSON Export
src/a2a3/platform/shared/host/l2_swimlane_collector.cpp (logging and export section)
read_phase_header_metadata derives orchestrator thread index as aicpu_thread_num_ - 1 for consistent logging. export_swimlane_json emits only the active orchestrator lane count from the shared-memory header instead of the full MAX-sized array.
Scheduler Flush Integration Points
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp, src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp
Scheduler initialization sets orchestrator-phase pool count to 1. on_orchestration_done flushes orchestrator orch-phase buffer at ORCH_PHASES gating. resolve_and_dispatch flushes scheduler phase buffer at SCHED_PHASES gating using the new scheduler-specific flush function.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • hw-native-sys/simpler#941: Modifies the existing l2_swimlane_aicpu_flush_phase_buffers(...) behavior; this PR directly refactors that same flush path into scheduler vs orchestrator variants.
  • hw-native-sys/simpler#916: Introduced the unified l2_swimlane_aicpu_flush_phase_buffers(int) API that this PR replaces with separate scheduler and orchestrator variants.
  • hw-native-sys/simpler#946: Implements complementary changes to ready-queue sizing and documentation reflecting the same scheduler vs orchestrator phase-pool split design.

Poem

🐰 A queue and a lane, now split apart,
Each phase knows where to find its heart.
One pool for orch, threads for sched divine,
No more confusion—decoupled design!
Ready buffers flush in perfect order,
From core to queue across the border.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the primary change: fixing missing orchestrator phases in L2 swimlane level 4 by restructuring orch-phase buffer handling.
Description check ✅ Passed The description is detailed and directly related to the changeset, explaining both root causes of the bug and all implemented fixes across multiple files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the L2 swimlane phase profiling to treat orchestrator phase records as a single instance (pool ordinal 0) instead of dynamically sizing them based on the AICPU thread count. It splits the phase buffer flushing into separate scheduler and orchestrator functions, passes the total AICPU thread count to the collector initialization, and optimizes buffer allocation and JSON export for the orchestrator lane. Feedback recommends adding a null check for shm_host_ and defensively clamping the orchestrator lane count retrieved from shared memory to prevent potential undefined behavior or memory corruption.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/a2a3/platform/shared/host/l2_swimlane_collector.cpp
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/a2a3/platform/shared/host/l2_swimlane_collector.cpp (1)

79-97: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate aicpu_thread_num parameter.

The parameter is stored without validation. At line 648, it's used as aicpu_thread_num_ - 1 to derive the orchestrator thread index. If aicpu_thread_num is 0 or negative, this will produce an invalid thread index (-1 or worse).

🛡️ Proposed fix to add validation
     num_aicore_ = num_aicore;
+    if (aicpu_thread_num <= 0) {
+        LOG_ERROR("Invalid aicpu_thread_num: %d (must be > 0)", aicpu_thread_num);
+        return -1;
+    }
     aicpu_thread_num_ = aicpu_thread_num;
     l2_swimlane_level_ = l2_swimlane_level_;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/a2a3/platform/shared/host/l2_swimlane_collector.cpp` around lines 79 -
97, Validate the aicpu_thread_num parameter in L2SwimlaneCollector::initialize:
ensure aicpu_thread_num > 0 (and optionally <= a sensible max if applicable)
before assigning to aicpu_thread_num_; if invalid, log an error via LOG_ERROR
with a clear message and return -1 so downstream code that uses
aicpu_thread_num_ - 1 (e.g., computing the orchestrator thread index) cannot
produce a negative index. Include the function name
L2SwimlaneCollector::initialize and the member aicpu_thread_num_ when locating
where to add this check.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/a2a3/platform/shared/host/l2_swimlane_collector.cpp`:
- Around line 79-97: Validate the aicpu_thread_num parameter in
L2SwimlaneCollector::initialize: ensure aicpu_thread_num > 0 (and optionally <=
a sensible max if applicable) before assigning to aicpu_thread_num_; if invalid,
log an error via LOG_ERROR with a clear message and return -1 so downstream code
that uses aicpu_thread_num_ - 1 (e.g., computing the orchestrator thread index)
cannot produce a negative index. Include the function name
L2SwimlaneCollector::initialize and the member aicpu_thread_num_ when locating
where to add this check.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d6cd863e-5819-48ee-90f6-9677f59bc5e3

📥 Commits

Reviewing files that changed from the base of the PR and between a71ab6b and 4fb91de.

📒 Files selected for processing (11)
  • src/a2a3/platform/include/aicpu/l2_swimlane_collector_aicpu.h
  • src/a2a3/platform/include/common/l2_swimlane_profiling.h
  • src/a2a3/platform/include/host/l2_swimlane_collector.h
  • src/a2a3/platform/onboard/host/device_runner.cpp
  • src/a2a3/platform/onboard/host/device_runner.h
  • src/a2a3/platform/shared/aicpu/l2_swimlane_collector_aicpu.cpp
  • src/a2a3/platform/shared/host/l2_swimlane_collector.cpp
  • src/a2a3/platform/sim/host/device_runner.cpp
  • src/a2a3/platform/sim/host/device_runner.h
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp

--enable-l2-swimlane level 4 produced an empty aicpu_orchestrator_phases
section — every orchestrator-phase record was dropped. Two causes:

- The orch-phase pool was indexed by the orchestrator's AICPU thread id
  (the last thread), but only lower indices were primed/drained, so
  record_orch_phase hit a null pool and dropped the record at write.
- flush read the orch buffer's `count` through a SchedPhaseBuffer cast.
  `count` follows records[N], whose stride differs (sched 40B vs orch
  32B), so the read landed past the buffer, saw 0, and skipped the flush.

Make orch a single instance at pool ordinal 0 (matching dep_gen /
scope_stats): the orchestrator records and flushes pool 0 and uses its
thread index only to select its own ready queue; entries are tagged with
ordinal 0. Read/write `count` through the matching buffer type.

- Split l2_swimlane_aicpu_flush_phase_buffers into
  flush_sched_phase_buffer (per scheduler thread) and
  flush_orch_phase_buffer (orchestrator only); gate the orch flush on
  >= ORCH_PHASES to match where orch records are actually emitted.
- Name the phase-enqueue params for their now-distinct roles: thread_idx
  (selects the caller's own ready queue) vs pool_idx (host filing/recycle
  ordinal) — they coincide for sched but differ for the single orch
  instance (pool 0).
- Host allocates buffers for one orch pool by actual usage (~24MB less);
  unused pool states are still zeroed for safe reconcile iteration.
- Pass aicpu_thread_num to the host collector so the phase-metadata log
  reports the orchestrator's real AICPU thread number; bound the JSON
  export to the single orch lane, and guard its shm_host_ deref.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant