Skip to content

feat(profiling): Add Perfetto trace format support#5659

Open
markushi wants to merge 46 commits into
masterfrom
feat/markushi/perfetto-profiling-support
Open

feat(profiling): Add Perfetto trace format support#5659
markushi wants to merge 46 commits into
masterfrom
feat/markushi/perfetto-profiling-support

Conversation

@markushi

@markushi markushi commented Feb 25, 2026

Copy link
Copy Markdown
Member

This PR adds the ability to ingest binary Perfetto traces (.pftrace) and convert them into the existing Sample v2 JSON format used by Sentry's profiling pipeline. This targets Android
profiling, where Perfetto is the native tracing format.

Key Changes

  1. New feature flag organizations:continuous-profiling-perfetto — gates all Perfetto processing.
  2. Compound item format via meta_length — profile chunk items can now carry [JSON metadata][binary blob], split at meta_length. This is how Perfetto traces arrive: metadata describes the
    profile, the binary blob is the raw .pftrace.
  3. Perfetto → Sample v2 conversion — new relay-profiling/src/perfetto/ module (proto definitions + conversion logic) parses binary Perfetto traces and produces the existing Sample v2
    JSON format.
  4. Raw profile passthrough to Kafka — after expansion, the original Perfetto binary is preserved alongside the expanded JSON. Two new fields on ProfileChunkKafkaMessage: raw_profile and
    raw_profile_content_type.
  5. New fields on existing structs:
    - Frame.package — library/container path for native/Java frames
    - ProfileMetadata.content_type — carries "perfetto" through the pipeline
    - DebugImage::native_image() constructor — creates debug images from Perfetto mapping data

This PR is part of a stack:

markushi and others added 11 commits February 25, 2026 09:16
Add support for ingesting binary Perfetto traces (.pftrace) as profile
chunks. The SDK sends an envelope with a ProfileChunk metadata item
paired with a ProfileChunkData item containing the raw Perfetto protobuf.

Relay decodes the Perfetto trace, extracts CPU profiling samples
(PerfSample and StreamingProfilePacket), converts them to the internal
Sample v2 format, and forwards both the expanded JSON and the original
binary blob to Kafka for downstream processing.

Key changes:
- New `perfetto` module in relay-profiling for protobuf decoding and
  conversion to Sample v2
- New `ProfileChunkData` envelope item type for binary profile payloads
- Pairing logic to associate ProfileChunk metadata with ProfileChunkData
- Raw profile blob preserved through to Kafka for further processing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
# Conflicts:
#	relay-server/src/envelope/item.rs
@markushi

Copy link
Copy Markdown
Member Author

@sentry review

Comment thread relay-profiling/src/perfetto/mod.rs Outdated
markushi and others added 2 commits March 23, 2026 09:01
…fer main thread

Separate the shared `strings` intern table into distinct `function_names`,
`mapping_paths`, and `build_ids` tables matching the Perfetto spec where
each InternedData field has its own ID namespace. Also infer the main
thread name when tid equals pid and no explicit name is provided.

Co-Authored-By: Claude <noreply@anthropic.com>
@markushi markushi marked this pull request as ready for review March 23, 2026 08:57
@markushi markushi requested a review from a team as a code owner March 23, 2026 08:57

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: First timestamp delta skipped in StreamingProfilePacket processing
    • Removed the 'i > 0' guard so that timestamp_delta_us[0] is now correctly applied to the first sample's timestamp.

Create PR

Or push these changes by commenting:

@cursor push ab5f3ec796
Preview (ab5f3ec796)
diff --git a/relay-profiling/src/perfetto/mod.rs b/relay-profiling/src/perfetto/mod.rs
--- a/relay-profiling/src/perfetto/mod.rs
+++ b/relay-profiling/src/perfetto/mod.rs
@@ -229,9 +229,7 @@
             Some(Data::StreamingProfilePacket(spp)) => {
                 let mut ts = packet.timestamp.unwrap_or(0);
                 for (i, &cs_iid) in spp.callstack_iid.iter().enumerate() {
-                    if i > 0
-                        && let Some(&delta) = spp.timestamp_delta_us.get(i)
-                    {
+                    if let Some(&delta) = spp.timestamp_delta_us.get(i) {
                         // `delta` is i64 (can be negative for out-of-order samples).
                         // Casting to u64 wraps negative values, which is correct because
                         // `wrapping_add` of a wrapped negative value subtracts as expected.

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

Comment thread relay-profiling/src/perfetto/mod.rs
markushi and others added 2 commits March 23, 2026 10:27
Replace `get().is_none()` with `!contains_key()` to satisfy clippy
and add a CHANGELOG entry for the Perfetto interning changes.

Co-Authored-By: Claude <noreply@anthropic.com>
The first delta in timestamp_delta_us was skipped due to an i > 0
guard, but per the Perfetto spec the first sample's timestamp should
be TracePacket.timestamp + timestamp_delta_us[0]. Update tests to use
non-zero first deltas to verify the fix.

Co-Authored-By: Claude <noreply@anthropic.com>
Comment thread relay-profiling/src/perfetto/mod.rs Outdated
markushi and others added 2 commits April 1, 2026 08:59
…kDescriptor

StreamingProfilePacket samples were all assigned tid=0 because the code
never resolved the trusted_packet_sequence_id → TrackDescriptor →
ThreadDescriptor chain. This collapsed multi-thread profiles into a
single thread. Now we build a seq_id→tid mapping from TrackDescriptor
packets and use it when processing StreamingProfilePacket samples.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread relay-profiling/src/perfetto/mod.rs Outdated
…al state resets

The two-pass architecture resolved all samples against the final state
of the intern tables. If a trace contained an incremental state reset
that reused an interning ID, samples collected before the reset would
silently resolve to the wrong function names. This merges the two
passes into a single pass that resolves callstacks immediately using
the current intern table state, and extracts debug images inline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread relay-server/src/processing/profile_chunks/mod.rs Outdated
markushi and others added 2 commits April 1, 2026 10:36
process_compound_item was parsing the entire metadata JSON into a
serde_json::Value tree just to read the content_type field, and
split_item_payload was doing the same on the expanded JSON (potentially
hundreds of KB). Instead, surface content_type from the already-
deserialized ProfileMetadata via ExpandedPerfettoChunk, and hardcode
"perfetto" in split_item_payload since compound items are always
validated as perfetto by process_compound_item.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous commit moved expand_perfetto before the content_type
check, which would waste work on non-perfetto payloads and produce
confusing errors. Restore the early content_type check using a minimal
serde struct that only deserializes the one field, and remove the
unused content_type field from ExpandedPerfettoChunk.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread relay-server/src/processing/profile_chunks/mod.rs Outdated
Comment thread relay-server/src/processing/profile_chunks/process.rs Outdated
Replace the hard-coded "perfetto" content type in split_item_payload
with a lightweight deserialization that reads only the content_type
field from the metadata JSON. This avoids a silent coupling that would
produce incorrect values if compound items support other formats.

Also remove a dead item.set_platform() call that wrote to an item
immediately replaced by a new one.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread relay-profiling/src/perfetto/mod.rs

@Dav1dde Dav1dde left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small things in the conversion code. I tried to mostly follow along logically but the interning and things are quite complicated according to the docs, so I mostly left like small improvements there.

For the actual processing pipeline, see my larger comment in the processor, we need to get some type safety in here. It will also help with code duplication of the duplicated split functions.
Let me know if you want me to help out with that.

Comment thread tests/integration/test_profile_chunks_perfetto.py
Comment thread tests/integration/test_profile_chunks_perfetto.py Outdated
Comment thread relay-dynamic-config/src/feature.rs
Comment thread relay-profiling/src/perfetto/mod.rs Outdated
Comment thread relay-profiling/src/perfetto/mod.rs Outdated
Comment thread relay-profiling/src/perfetto/mod.rs Outdated
Comment thread relay-server/src/processing/profile_chunks/process.rs
Comment thread relay-server/src/processing/profile_chunks/process.rs
Comment thread relay-server/src/processing/profile_chunks/process.rs
Comment thread relay-server/src/processing/profile_chunks/mod.rs
Comment thread relay-profiling/protos/generate.sh
Comment thread relay-profiling/src/sample/mod.rs
Comment thread relay-profiling/src/debug_image.rs
Comment thread relay-server/src/envelope/content_type.rs Outdated
Comment thread relay-profiling/src/perfetto/mod.rs Outdated
Comment thread relay-profiling/src/perfetto/mod.rs
Comment thread relay-profiling/src/perfetto/mod.rs
Comment thread relay-server/src/processing/profile_chunks/process.rs
Comment thread relay-profiling/src/debug_image.rs
Comment thread relay-profiling/src/lib.rs

@jjbayer jjbayer left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dav1dde and I will go over open comments to see what still needs to be done before we merge this.

Comment thread relay-profiling/src/perfetto/mod.rs Outdated
}

let image_size = mapping.end.unwrap_or(0).saturating_sub(image_addr);
let image_vmaddr = mapping.load_bias.unwrap_or(0);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DebugImage::image_vmaddr can be None, but here we map None to Some(0). Is this semantically correct? Same question for image_addr.

@markushi markushi Jun 11, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Yes, all perfetto props are optional. mapping.start.unwrap_or(0) or mapping.end.unwrap_or(0) would be incorrect.

message Mapping {
  ...
  optional uint64 start = 4;
  optional uint64 end = 5;
  optional uint64 load_bias = 6;
  ...
}

fn collect_debug_image(
mapping_id: u64,
tables: &InternTables,
seen_images: &mut HashSet<(String, u64)>,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it should be possible to represent seen images as references to path vectors instead of allocated Strings, potentially in a newtype with a custom hash or equality implementation.

Comment thread relay-server/src/processing/profile_chunks/process.rs
Comment thread relay-profiling/src/perfetto/mod.rs Outdated
Comment thread relay-profiling/src/perfetto/mod.rs
Comment thread relay-profiling/src/perfetto/mod.rs Outdated
Comment on lines +417 to +419
if image_addr.is_some_and(|image_addr| !seen_images.insert((code_file.clone(), image_addr))) {
return None;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The debug image deduplication check is bypassed when mapping.start is None because is_some_and short-circuits, which can lead to duplicate images in the final profile.
Severity: MEDIUM

Suggested Fix

Modify the deduplication logic to correctly handle cases where the image address is None. One approach is to use unwrap_or(0) on image_addr when creating the key for the seen_images set. This ensures that mappings without a start address are still consistently deduplicated based on their code_file and a default address.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: relay-profiling/src/perfetto/mod.rs#L417-L419

Potential issue: In `relay-profiling/src/perfetto/mod.rs`, the deduplication logic for
debug images is bypassed when a mapping's `start` address is `None`. The code uses
`image_addr.is_some_and(...)` which short-circuits when `image_addr` is `None`,
preventing the `seen_images.insert()` call. According to the proto schema,
`mapping.start` is optional and can be `None`. If a trace contains a mapping without a
start address that is referenced multiple times, a new `DebugImage` will be created and
added to the profile for each reference, resulting in duplicate entries and a bloated
output profile.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit dd8d044. Configure here.


if expanded.payload.len() > ctx.config.max_profile_size() {
return Err(relay_profiling::ProfileError::ExceedSizeLimit.into());
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compound profile size limit bypass

Medium Severity

After Perfetto expansion, max_profile_size is applied only to the expanded JSON bytes, not to the rebuilt compound payload that also appends the original binary blob. A chunk within the ingest limit can become roughly twice the configured cap once JSON grows and the raw trace is preserved.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit dd8d044. Configure here.

Comment on lines +251 to +253
Some(Data::ClockSnapshot(cs)) if clock_offset_ns.is_none() => {
clock_offset_ns = extract_clock_offset(&cs);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The code only attempts to calculate the clock offset from the first ClockSnapshot. If this snapshot is incomplete, subsequent snapshots are ignored, causing the entire trace conversion to fail.
Severity: HIGH

Suggested Fix

Modify the logic to iterate through all ClockSnapshot packets until a valid offset can be calculated, instead of only trying with the first one. Alternatively, collect and merge clock information across multiple snapshots before attempting to calculate the offset.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: relay-profiling/src/perfetto/mod.rs#L251-L253

Potential issue: The trace conversion logic only attempts to extract the clock offset
from the first `ClockSnapshot` packet encountered. The `extract_clock_offset` function
requires both `CLOCK_REALTIME` and `CLOCK_BOOTTIME` to be present in the same snapshot.
If the first snapshot is incomplete (lacking one of these clocks), `clock_offset_ns`
remains `None`. Because of the `if clock_offset_ns.is_none()` guard, all subsequent
`ClockSnapshot` packets are then ignored, even if they contain the necessary
information. This leads to a failure when the code later requires a valid
`clock_offset_ns`, causing the entire trace processing to fail for valid Perfetto traces
that happen to start with an incomplete clock snapshot.

Comment on lines +182 to +183
self.callstacks
.extend(callstacks.into_iter().filter_map(|c| Some((c.iid?, c))));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unbounded callstack depth enables CPU exhaustion DoS in Perfetto trace parsing

A crafted Perfetto trace can embed a single callstack with ~2 million frame_ids within the allowed 4 MB per-packet limit. When MAX_SAMPLES (100,000) samples all reference this callstack, resolve_callstack hashes a 2M-entry Vec<usize> for every sample via stack_index.get(), yielding O(200 billion) CPU operations from a ~5 MB input on any relay with ContinuousProfilingPerfetto enabled. Add a per-callstack frame depth cap (e.g., 2,000) at merge or resolution time.

Evidence
  • Callstack.frame_ids is Vec<u64> with packed = "false" (proto.rs); a 4 MB packet fits ~2 M frame IDs.
  • InternTables::merge at lines 182–183 calls .extend(callstacks.into_iter()...) with no check on c.frame_ids.len().
  • resolve_callstack (line 346) does Vec::with_capacity(callstack.frame_ids.len()), iterates all frame_ids, then calls ctx.stack_index.get(&resolved_frame_indices) — an O(n) hash over the full Vec.
  • With 100,000 PerfSample packets (each ~4 bytes) pointing to the same deep callstack, total work is O(MAX_SAMPLES × frame_ids.len()) = O(200 B) operations, fitting inside the 50 MiB max_profile_size limit.
  • The feature-flag check in process_compound_item (process.rs line ~123) gates this path, so exploitation requires ContinuousProfilingPerfetto to be enabled — feasible for any project using the new Android Perfetto profiling.

Identified by Warden security-review · WJF-LPZ

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants