Skip to content

Add: add scalar float type data dump support.#966

Open
vegetabledoww wants to merge 1 commit into
hw-native-sys:mainfrom
vegetabledoww:main
Open

Add: add scalar float type data dump support.#966
vegetabledoww wants to merge 1 commit into
hw-native-sys:mainfrom
vegetabledoww:main

Conversation

@vegetabledoww
Copy link
Copy Markdown
Contributor

Summary
Tensor dump (--dump-tensor) now supports typed scalar values beyond the original uint64-only path. When a user passes float, int32_t, or uint32_t scalars via Arg::add_scalar(), the dump correctly decodes and displays them with the proper dtype instead of showing the raw uint64 bit pattern as UINT64.
Changes

  • data_type.h — add dtype_of() constexpr mapping: float → FLOAT32, int32_t → INT32, uint32_t → UINT32, all others default to UINT64 (backward compatible)
  • a2a3/a5 pto_types.h — store scalar_dtypes_[] in Arg; track dtype in add_scalar(), add_scalars(), add_scalars_i32(), copy_scalars_from()
  • a2a3/a5 pto_runtime2_types.h — add uint8_t scalar_dtypes[MAX_SCALAR_ARGS] to PTO2TaskPayload; copy from args in init()
  • tensor_dump_aicpu.h — read pl.scalar_dtypes[i] instead of hardcoded UINT64
  • tensor_dump_collector.cpp — decode FLOAT32 via memcpy from uint64 bit pattern; output decoded value with correct dtype; remove redundant raw_value field
  • dump_viewer.py — list and export shows decoded value for scalars
  • test_tensor_dump.py — update assertions for typed scalar dumps
    JSON schema
    Before (uint64-only):
    {"kind": "scalar", "dtype": "UINT64", "value": 1065353216, "raw_value": 1065353216}
    After:
    {"kind": "scalar", "dtype": "FLOAT32", "value": 1.0}
    Testing
  • test_tensor_dump.py passes on a2a3sim with --dump-tensor
  • JSON output verified: float scalars emit decoded value (e.g. 1.0), integer scalars emit integer value
  • dump_viewer.py list and dump_viewer.py export work correctly with scalar entries
    Fixes [Feature] Tensor dump: support float/int32 scalar types besides uint64 #965

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1b9dd588-b458-4269-a857-60d725d7a3db

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR extends tensor dump infrastructure to preserve and correctly represent float and int32 scalar argument types throughout the runtime stack, replacing the previous uint64-only approach with per-scalar dtype tracking and specialized export logic.

Changes

Scalar dtype tracking and export

Layer / File(s) Summary
Dtype utility foundation
src/common/task_interface/data_type.h
Introduces dtype_of<T>() template that maps C++ scalar types to corresponding DataType enum values via if constexpr branching, enabling compile-time dtype selection for float, double, fixed-width integers, and bool.
Arg struct dtype metadata
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_types.h, src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_types.h
Both a2a3 and a5 Arg structures add scalar_dtypes_ storage and extend add_scalar(), add_scalars(), add_scalars_i32(), and copy_scalars_from() to track per-scalar dtype metadata alongside values; new scalar_dtypes() accessor exposes dtype array.
PTO2TaskPayload dtype storage
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_runtime2_types.h, src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_runtime2_types.h
Both a2a3 and a5 PTO2TaskPayload structures add scalar_dtypes[MAX_SCALAR_ARGS] cold-data field, update init() to copy dtype bytes from Arg, and revise sizeof() assertion to include dtype byte storage.
Tensor dump export with dtype awareness
src/common/platform/shared/host/tensor_dump_collector.cpp, src/common/platform/include/aicpu/tensor_dump_aicpu.h
tensor_dump_collector.cpp adds data_type.h include and special-cases FLOAT32 by reinterpreting uint64 bits as float and formatting as JSON; tensor_dump_aicpu.h reads per-scalar dtype from pl.scalar_dtypes[i] instead of hardcoding UINT64.
Dump viewer kind and value columns
simpler_setup/tools/dump_viewer.py
write_tensor() emits tensor kind header and handles scalar path early; list_tensors() updates table format to include kind and shape/val columns showing scalar values or tensor shapes.
Test documentation
tests/st/a2a3/tensormap_and_ringbuffer/dfx/tensor_dump/test_tensor_dump.py
Assertion comment updated to document float scalar support semantics and expected dtype/value behavior in dumps.

Sequence Diagram(s)

sequenceDiagram
  participant Client as User Code
  participant Arg as Arg
  participant Payload as PTO2TaskPayload
  participant Dump as tensor_dump_collector
  Client->>Arg: add_scalar(1.5f)
  Arg->>Arg: record scalar value + dtype_of<float>()
  Note over Arg: scalar stored as uint64 bits, dtype=FLOAT32
  Arg->>Payload: init(args)
  Payload->>Payload: copy scalars[] + scalar_dtypes[]
  Note over Payload: dtype bytes preserved in cold data
  Dump->>Payload: read scalar_dtypes[i]
  alt dtype is FLOAT32
    Dump->>Dump: reinterpret uint64 as float
    Dump->>Dump: format 1.5 for JSON
  else other dtype
    Dump->>Dump: emit raw numeric value
  end
  Dump->>Dump: output value + correct dtype
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Scalars now speak their true type!
A float no longer hides as bits,
With dtype tracked through runtime's trip,
Tensor dumps show values that fit.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 43.48% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed Title clearly and concisely summarizes the main change: adding float scalar type support to tensor dumps.
Description check ✅ Passed Description provides detailed explanation of changes, implementation approach, JSON schema updates, and testing verification, all directly related to the changeset.
Linked Issues check ✅ Passed All coding requirements from issue #965 are met: dtype tracking via dtype_of() in Arg/PTO2TaskPayload, FLOAT32 decoding in tensor_dump_collector via memcpy, and correct dtype output.
Out of Scope Changes check ✅ Passed All changes are scoped to the stated objectives: dtype infrastructure, scalar metadata storage, decoding logic, and test updates directly support typed scalar dump requirements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for preserving and exporting scalar data types and values throughout the tensor dump pipeline, updating the payload structures, dump collector, and viewer tool. The code review identifies two critical issues: first, signed integer scalars are printed as large positive numbers in the JSON manifest because they are not cast back from uint64_t based on their DataType; second, mapping double to DataType::FLOAT32 causes a bit-pattern mismatch during packing/unpacking, which requires explicit overloads to convert double to float safely.

Comment on lines +566 to +578
if (dt.dtype == static_cast<uint8_t>(DataType::FLOAT32)) {
float f;
memcpy(&f, &dt.scalar_value, sizeof(float));
std::ostringstream val_ss;
val_ss << f;
std::string val_str = val_ss.str();
if (val_str.find('.') == std::string::npos && val_str.find('e') == std::string::npos) {
val_str += ".0";
}
json << ", \"value\": " << val_str;
} else {
json << ", \"value\": " << dt.scalar_value;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Negative values for signed integer types (e.g., INT32, INT16, INT8, INT64) are printed as large positive numbers in the JSON manifest because dt.scalar_value is a uint64_t and is printed directly without casting back to its original signed type. We should decode and cast the scalar value based on dt.dtype before printing it to the JSON manifest. Additionally, ensure 64-bit unsigned integers (like UINT64) are represented as strings in the JSON output to prevent precision loss in JavaScript-based parsers.

            if (dt.dtype == static_cast<uint8_t>(DataType::FLOAT32)) {
                float f;
                memcpy(&f, &dt.scalar_value, sizeof(float));
                std::ostringstream val_ss;
                val_ss << f;
                std::string val_str = val_ss.str();
                if (val_str.find('.') == std::string::npos && val_str.find('e') == std::string::npos) {
                    val_str += ".0";
                }
                json << ", \"value\": " << val_str;
            } else if (dt.dtype == static_cast<uint8_t>(DataType::INT32)) {
                int32_t val;
                memcpy(&val, &dt.scalar_value, sizeof(int32_t));
                json << ", \"value\": " << val;
            } else if (dt.dtype == static_cast<uint8_t>(DataType::INT16)) {
                int16_t val;
                memcpy(&val, &dt.scalar_value, sizeof(int16_t));
                json << ", \"value\": " << val;
            } else if (dt.dtype == static_cast<uint8_t>(DataType::INT8)) {
                int8_t val;
                memcpy(&val, &dt.scalar_value, sizeof(int8_t));
                json << ", \"value\": " << static_cast<int32_t>(val);
            } else if (dt.dtype == static_cast<uint8_t>(DataType::INT64)) {
                int64_t val;
                memcpy(&val, &dt.scalar_value, sizeof(int64_t));
                json << ", \"value\": " << val;
            } else if (dt.dtype == static_cast<uint8_t>(DataType::UINT32)) {
                uint32_t val;
                memcpy(&val, &dt.scalar_value, sizeof(uint32_t));
                json << ", \"value\": " << val;
            } else if (dt.dtype == static_cast<uint8_t>(DataType::UINT16)) {
                uint16_t val;
                memcpy(&val, &dt.scalar_value, sizeof(uint16_t));
                json << ", \"value\": " << val;
            } else if (dt.dtype == static_cast<uint8_t>(DataType::UINT8)) {
                uint8_t val;
                memcpy(&val, &dt.scalar_value, sizeof(uint8_t));
                json << ", \"value\": " << static_cast<uint32_t>(val);
            } else if (dt.dtype == static_cast<uint8_t>(DataType::UINT64)) {
                uint64_t val = dt.scalar_value;
                json << ", \"value\": \"" << val << "\"";
            } else {
                json << ", \"value\": \"" << dt.scalar_value << "\"";
            }
References
  1. When serializing 64-bit unsigned integers (such as task_id, tensor_id, or hardware addresses) to JSON, represent them as strings to prevent precision loss in JavaScript-based parsers that are limited to 2^53 - 1.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修复。在 tensor_dump_collector.cpp 中添加了 INT32 显式解码路径,通过 memcpy 将 scalar_value 按 int32_t 重新解释后输出,保证负值正确显示(如 -1)而非大整数(如 4294967295)。

} else if (dt.dtype == static_cast<uint8_t>(DataType::INT32)) {
    int32_t val;
    memcpy(&val, &dt.scalar_value, sizeof(int32_t));
    json << ", \"value\": " << val;
}

Comment thread src/common/task_interface/data_type.h
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
src/common/platform/shared/host/tensor_dump_collector.cpp (1)

566-578: ⚡ Quick win

INT32 scalars are not sign-extended, causing negative values to appear as large positive numbers.

The current implementation only decodes FLOAT32 scalars. INT32 values stored via add_scalars_i32 are zero-extended (e.g., int32_t(-1) becomes 0x00000000FFFFFFFF), so the JSON export outputs 4294967295 instead of -1.

While the PR objectives focus on FLOAT32 support, the infrastructure already tracks INT32 dtype. For consistency and completeness, consider decoding INT32 as well:

} else if (dt.dtype == static_cast<uint8_t>(DataType::INT32)) {
    int32_t val;
    memcpy(&val, &dt.scalar_value, sizeof(int32_t));
    json << ", \"value\": " << val;
} else {
    json << ", \"value\": " << dt.scalar_value;
}

If INT32 decoding is deferred by design, consider adding a TODO comment to track it.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/common/platform/shared/host/tensor_dump_collector.cpp` around lines 566 -
578, The JSON export treats INT32 scalar bytes as unsigned/zero-extended, so
negative values (added via add_scalars_i32) appear as large positives; update
the scalar handling in tensor_dump_collector.cpp where dt.dtype is inspected
(the dt.dtype / dt.scalar_value branch) to explicitly decode INT32
(DataType::INT32) by memcpy’ing into an int32_t and outputting that signed value
instead of the raw scalar_value; reference the existing FLOAT32 handling as the
pattern to follow and if you intentionally defer this change, add a TODO noting
INT32 sign-extension must be fixed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@simpler_setup/tools/dump_viewer.py`:
- Line 190: The separator print currently uses print("-" * 110) which misaligns
the table; change it to match the table width (print("-" * 101)) or, better,
compute dynamically from the header string (e.g., print("-" * len(header)) where
header is the table header being printed) so the separator always equals the
actual header/data width.

In `@src/common/platform/shared/host/tensor_dump_collector.cpp`:
- Around line 566-578: The FLOAT32 branch that converts dt.scalar_value to a
float (using float f and std::ostringstream val_ss) must guard against
non-finite values: if std::isnan(f) or std::isinf(f) produce a JSON string
(e.g., "NaN", "Infinity", "-Infinity") or null instead of emitting the raw
token, otherwise JSON becomes invalid; update the block handling
DataType::FLOAT32 (where dt.scalar_value is memcpy'd into f and written to json)
to check std::isnan/std::isinf and emit the appropriate quoted string (or null)
for those cases, falling back to the existing finite-value formatting for normal
floats.

In `@src/common/task_interface/data_type.h`:
- Around line 152-153: The mapping for double in the type-to-DataType branch
(the else if constexpr (std::is_same_v<T, double>) clause) is wrong: change its
return from DataType::FLOAT32 to DataType::UINT64 so 8-byte doubles stored via
to_u64(double_val) are decoded as 8-byte values (matching
tensor_dump_collector.cpp's scalar_value handling) rather than being truncated
by the FLOAT32 decoder; update that branch only to return UINT64 to preserve
full 8-byte representation.

---

Nitpick comments:
In `@src/common/platform/shared/host/tensor_dump_collector.cpp`:
- Around line 566-578: The JSON export treats INT32 scalar bytes as
unsigned/zero-extended, so negative values (added via add_scalars_i32) appear as
large positives; update the scalar handling in tensor_dump_collector.cpp where
dt.dtype is inspected (the dt.dtype / dt.scalar_value branch) to explicitly
decode INT32 (DataType::INT32) by memcpy’ing into an int32_t and outputting that
signed value instead of the raw scalar_value; reference the existing FLOAT32
handling as the pattern to follow and if you intentionally defer this change,
add a TODO noting INT32 sign-extension must be fixed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cf9e5e34-e04d-459a-a913-0531086b2382

📥 Commits

Reviewing files that changed from the base of the PR and between f04514e and f28b26e.

📒 Files selected for processing (9)
  • simpler_setup/tools/dump_viewer.py
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_runtime2_types.h
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_types.h
  • src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_runtime2_types.h
  • src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_types.h
  • src/common/platform/include/aicpu/tensor_dump_aicpu.h
  • src/common/platform/shared/host/tensor_dump_collector.cpp
  • src/common/task_interface/data_type.h
  • tests/st/a2a3/tensormap_and_ringbuffer/dfx/tensor_dump/test_tensor_dump.py

Comment thread simpler_setup/tools/dump_viewer.py Outdated
f" {'arg':>3} {'kind':>6} {'dtype':>8} {'shape/val':<22} {'bytes':>10}"
)
print("-" * 100)
print("-" * 110)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix separator width to match table header.

The separator line is 110 characters, but the table header and data rows are 101 characters wide (6+2+18+2+7+2+5+2+3+2+6+2+8+2+22+2+10 = 101), causing visual misalignment.

🎨 Proposed fix
-    print("-" * 110)
+    print("-" * 101)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print("-" * 110)
print("-" * 101)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@simpler_setup/tools/dump_viewer.py` at line 190, The separator print
currently uses print("-" * 110) which misaligns the table; change it to match
the table width (print("-" * 101)) or, better, compute dynamically from the
header string (e.g., print("-" * len(header)) where header is the table header
being printed) so the separator always equals the actual header/data width.

Comment thread src/common/platform/shared/host/tensor_dump_collector.cpp
Comment on lines +152 to +153
} else if constexpr (std::is_same_v<T, double>) {
return static_cast<uint8_t>(DataType::FLOAT32);
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Incorrect dtype mapping for double causes decoding errors.

Mapping double to FLOAT32 will cause incorrect decoding in tensor_dump_collector.cpp (lines 566-578). When a double is stored via to_u64(double_val), all 8 bytes are preserved in the uint64_t storage. However, the FLOAT32 decoder only reads the lower 4 bytes via memcpy(&f, &dt.scalar_value, sizeof(float)), producing a corrupted value.

Recommendation: Map double to DataType::UINT64 (fallback for unsupported types) instead of FLOAT32, since the DataType enum does not include FLOAT64 and the decoder cannot correctly interpret 8-byte floating-point values.

Proposed fix
     if constexpr (std::is_same_v<T, float>) {
         return static_cast<uint8_t>(DataType::FLOAT32);
     } else if constexpr (std::is_same_v<T, double>) {
-        return static_cast<uint8_t>(DataType::FLOAT32);
+        return static_cast<uint8_t>(DataType::UINT64);
     } else if constexpr (std::is_same_v<T, int32_t>) {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
} else if constexpr (std::is_same_v<T, double>) {
return static_cast<uint8_t>(DataType::FLOAT32);
} else if constexpr (std::is_same_v<T, double>) {
return static_cast<uint8_t>(DataType::UINT64);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/common/task_interface/data_type.h` around lines 152 - 153, The mapping
for double in the type-to-DataType branch (the else if constexpr
(std::is_same_v<T, double>) clause) is wrong: change its return from
DataType::FLOAT32 to DataType::UINT64 so 8-byte doubles stored via
to_u64(double_val) are decoded as 8-byte values (matching
tensor_dump_collector.cpp's scalar_value handling) rather than being truncated
by the FLOAT32 decoder; update that branch only to return UINT64 to preserve
full 8-byte representation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感谢指出。经确认,scalar 接口(Arg::add_scalar)的类型模板参数仅推导 float / int32_t / uint32_t,double 不在 scalar 类型范围内,因此 dtype_of<double>() 的映射问题不影响实际使用场景。

如后续有需要支持 double scalar,可单独处理。

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vegetabledoww, 感谢说明!确认理解——Arg::add_scalar 的模板参数当前仅实例化 floatint32_tuint32_t,因此 dtype_of<double>() 的分支在实践中不会被触发,当前映射不影响实际行为。

如有需要,我可以:

  1. 帮您在 dtype_of<double>() 分支处添加一条注释,说明 double 当前不受支持、实际不可达,以便后续维护者理解意图。
  2. 或者为未来的 double scalar 支持创建一个 follow-up issue 进行追踪。

请问需要我执行上述任一操作吗?另外,如果此 review comment 已无实际阻塞,我可以帮您将其 resolve。

(ᵔ◡ᵔ)っ🐇 感谢确认,已记录到 long-term memory 中以备后续 review 参考。


✏️ Learnings added
Learnt from: vegetabledoww
Repo: hw-native-sys/simpler PR: 966
File: src/common/task_interface/data_type.h:152-153
Timestamp: 2026-06-02T06:53:46.210Z
Learning: In the `hw-native-sys/simpler` repository, `Arg::add_scalar()` (in both `src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_types.h` and `src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_types.h`) only accepts `float`, `int32_t`, and `uint32_t` as scalar type arguments. `double` is explicitly out of scope for the scalar interface, so `dtype_of<double>()` in `src/common/task_interface/data_type.h` is never instantiated in practice. Future `double` scalar support, if needed, would require a dedicated PR.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: hw-native-sys-bot
Repo: hw-native-sys/simpler PR: 930
File: src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_types.h:63-65
Timestamp: 2026-05-31T06:36:21.921Z
Learning: When reviewing native-code changes in this repository, do not introduce new configuration knobs or environment-variable overrides (e.g., scheduling timeout env vars like PTO2_SCHED_TIMEOUT_MS) speculatively. Any new configurable surface must be backed by a concrete, measured workload that demonstrates the need. Prefer hardcoded constants validated on a measurable reproducer over premature tunability. If future tuning is required, require a separate PR that includes its own measured baseline before accepting new knobs.

…tagging

Scalar dtype is now stored in the upper 8 bits of each uint64_t scalar slot.
PTO2TaskPayload stays at 2880B baseline (+0B vs +64B before this change).

Changes
- data_type.h: add pack_scalar/unpack_scalar_dtype/unpack_scalar_value/from_u64_masked
- pto_types.h (a2a3/a5): pack dtype on add_scalar/add_scalars/add_scalars_i32; remove scalar_dtypes_[] member
- pto_runtime2_types.h (a2a3/a5): remove scalar_dtypes[] field; restore sizeof 2880B
- scheduler_dispatch.cpp (a2a3/a5): unmask dtype bits before dispatch
- scheduler_context.h (a2a3/a5): include data_type.h for unpack_scalar_value
- tensor_dump_aicpu.h: read dtype via unpack_scalar_dtype(pl.scalars[i])
- tensor_dump_collector.cpp: mask before memcpy decode; support BOOL as true/false in JSON

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Tensor dump: support float/int32 scalar types besides uint64

1 participant