Skip to content

feat: OTel context sharing#3970

Open
morrisonlevi wants to merge 16 commits into
masterfrom
levi/otel-thread-context
Open

feat: OTel context sharing#3970
morrisonlevi wants to merge 16 commits into
masterfrom
levi/otel-thread-context

Conversation

@morrisonlevi

@morrisonlevi morrisonlevi commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Description

This adds OTel context sharing using libdatadog. It adds it to both tracing (producer) and profiling (consumer).

This is mostly AI-written with a little human review as I went. It is not ready for merge, but it did pass some local end-to-end experimentation.

Reviewer checklist

  • Test coverage seems ok.
  • Appropriate labels assigned.

@morrisonlevi morrisonlevi added ☠️ do-not-merge/WIP profiling Relates to the Continuous Profiler tracing labels Jun 10, 2026
@datadog-datadog-prod-us1

datadog-datadog-prod-us1 Bot commented Jun 10, 2026

Copy link
Copy Markdown

Pipelines  Tests

Fix all issues with BitsAI

⚠️ Warnings

🚦 11 Pipeline jobs failed

DataDog/apm-reliability/dd-trace-php | ASAN test_c with multiple observers: [8.5]   View in Datadog   GitLab

DataDog/apm-reliability/dd-trace-php | ASAN test_c: [8.1, arm64]   View in Datadog   GitLab

DataDog/apm-reliability/dd-trace-php | check-big-regressions   View in Datadog   GitLab

View all 11 failed jobs.

ℹ️ Info

No other issues found (see more)

🧪 All tests passed
❄️ No new flaky tests detected

🔄 Datadog auto-retried 1 job - 1 passed on retry View in Datadog

🎯 Code Coverage (details)
Patch Coverage: 0.00%
Overall Coverage: 54.08% (-0.00%)

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 99bb029 | Docs | Datadog PR Page | Give us feedback!

@pr-commenter

pr-commenter Bot commented Jun 10, 2026

Copy link
Copy Markdown

Benchmarks [ profiler ]

Benchmark execution time: 2026-06-24 21:06:32

Comparing candidate commit 99bb029 in PR branch levi/otel-thread-context with baseline commit 81891ec in branch master.

Found 0 performance improvements and 3 performance regressions! Performance is the same for 25 metrics, 8 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

  • 🟩 = significantly better candidate vs. baseline
  • 🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

scenario:php-profiler-timeline-memory-control

  • 🟥 cpu_user_time [+30.797ms; +38.178ms] or [+5.143%; +6.376%]
  • 🟥 execution_time [+35.679ms; +38.622ms] or [+5.701%; +6.171%]

scenario:php-profiler-timeline-memory-with-profiler

  • 🟥 execution_time [+31.618ms; +55.656ms] or [+3.028%; +5.329%]

@pr-commenter

pr-commenter Bot commented Jun 10, 2026

Copy link
Copy Markdown

Benchmarks [ tracer ]

Benchmark execution time: 2026-06-24 22:07:28

Comparing candidate commit 99bb029 in PR branch levi/otel-thread-context with baseline commit 81891ec in branch master.

Found 3 performance improvements and 8 performance regressions! Performance is the same for 183 metrics, 0 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

  • 🟩 = significantly better candidate vs. baseline
  • 🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

scenario:ContextPropagationBench/benchExtractHeaders64Bit

  • 🟥 execution_time [+108.858ns; +197.142ns] or [+9.041%; +16.374%]

scenario:ContextPropagationBench/benchExtractHeaders64Bit-opcache

  • 🟥 execution_time [+60.976ns; +199.024ns] or [+3.053%; +9.966%]

scenario:ContextPropagationBench/benchExtractTraceContext128Bit

  • 🟥 execution_time [+120.378ns; +199.622ns] or [+6.618%; +10.974%]

scenario:ContextPropagationBench/benchExtractTraceContext128Bit-opcache

  • 🟥 execution_time [+78.666ns; +167.334ns] or [+4.209%; +8.953%]

scenario:MessagePackSerializationBench/benchMessagePackSerialization

  • 🟩 execution_time [-8.590µs; -7.670µs] or [-7.697%; -6.873%]

scenario:MessagePackSerializationBench/benchMessagePackSerialization-opcache

  • 🟩 execution_time [-5.701µs; -4.099µs] or [-5.102%; -3.668%]

scenario:SpanBench/benchOpenTelemetryAPI

  • 🟥 mem_peak [+4.487MB; +4.487MB] or [+9.412%; +9.412%]

scenario:SpanBench/benchOpenTelemetryAPI-opcache

  • 🟥 mem_peak [+4.485MB; +4.485MB] or [+10.030%; +10.030%]

scenario:SpanBench/benchOpenTelemetryInteroperability

  • 🟥 mem_peak [+643.602KB; +643.609KB] or [+2.233%; +2.233%]

scenario:SpanBench/benchOpenTelemetryInteroperability-opcache

  • 🟥 mem_peak [+641.570KB; +641.582KB] or [+2.478%; +2.478%]

scenario:TraceSerializationBench/benchSerializeTrace-opcache

  • 🟩 execution_time [-77.025µs; -46.975µs] or [-4.948%; -3.017%]

@pr-commenter

pr-commenter Bot commented Jun 23, 2026

Copy link
Copy Markdown

Benchmarks [ appsec ]

Benchmark execution time: 2026-06-24 21:31:00

Comparing candidate commit 99bb029 in PR branch levi/otel-thread-context with baseline commit 81891ec in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 12 metrics, 0 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

  • 🟩 = significantly better candidate vs. baseline
  • 🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

@cataphract cataphract marked this pull request as ready for review June 24, 2026 12:56
@cataphract cataphract requested review from a team as code owners June 24, 2026 12:56

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 259658d91b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tracer/span.c Outdated
span->root = DDTRACE_G(active_stack)->root_span;

ddtrace_set_global_span_properties(span);
ddtrace_update_otel_thread_context();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Refresh OTel context after trace ID changes

Updating the TLS record only on span open/close/switch leaves it stale when an existing root span's trace ID is replaced later, e.g. by DDTrace\consume_distributed_tracing_headers() or DDTrace\set_distributed_tracing_context(), which mutate root_span->trace_id and only call ddtrace_update_root_id_properties(). In requests that consume distributed headers after the autoroot span already exists and do not open another span, OTel readers keep seeing the originally generated trace ID until the next span transition.

Useful? React with 👍 / 👎.

Comment thread tracer/user_request.c
listener->finish_user_req(listener, &span->std);
}
#ifdef __linux__
ddtrace_clear_otel_thread_context_root_span();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Only clear the matching user-request override

This clears the global OTel root override regardless of which user-request span is finishing. If a second notify_start() begins before the first span is closed, the AppSec listener finishes/replaces the previous request but the first span still has notify_user_req_end; closing that older span later calls this path and detaches the override for the newer active user request, so subsequent OTel/profiling context falls back to the outer stack.

Useful? React with 👍 / 👎.

@bwoebi bwoebi left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the approach proposed by this PR is flawed / more expensive than needed.

The PR tries to synchronize state between otel context and root span.
The otel context is, at it's core, a pointer to some struct.

All we need is making the otel context part of the root_span_data struct.
Any updates to the trace_id / the roots span id -> update the local root span id / trace id part of the context, any updates to active -> update the span_id of the context.
And if there's no root span data, we just write NULL to the TLS variable.

I'm very not fond of "synchronizing" with otel context, any updates to the otel context should in my opinion simply happen in-place. There's also no reason for a "root span override".
It feels also way heavier than it should be. Updating a span id due to open/close/drop should be one 64 bit write. That's it.
Switching the span stack should cause one pointer write of the span stacks root span otel context to the TLS pointer. That's it, too.

Further, this PR at the very least forgot to handle ddtrace_update_root_id_properties().

@bwoebi

bwoebi commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

I also don't understand why this functionality depends on libdatadog. The code should be IMHO simpler and less LoC if we just manage this manually.

@morrisonlevi morrisonlevi requested a review from a team as a code owner June 24, 2026 18:27
@morrisonlevi morrisonlevi requested review from leoromanovsky and typotter and removed request for a team June 24, 2026 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

☠️ do-not-merge/WIP profiling Relates to the Continuous Profiler tracing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants