feat: OTel context sharing#3970
Conversation
|
Benchmarks [ profiler ]Benchmark execution time: 2026-06-24 21:06:32 Comparing candidate commit 99bb029 in PR branch Found 0 performance improvements and 3 performance regressions! Performance is the same for 25 metrics, 8 unstable metrics.
|
Benchmarks [ tracer ]Benchmark execution time: 2026-06-24 22:07:28 Comparing candidate commit 99bb029 in PR branch Found 3 performance improvements and 8 performance regressions! Performance is the same for 183 metrics, 0 unstable metrics.
|
…ntext # Conflicts: # Cargo.lock # Makefile
…trace-php into levi/otel-thread-context
Benchmarks [ appsec ]Benchmark execution time: 2026-06-24 21:31:00 Comparing candidate commit 99bb029 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 12 metrics, 0 unstable metrics.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 259658d91b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| span->root = DDTRACE_G(active_stack)->root_span; | ||
|
|
||
| ddtrace_set_global_span_properties(span); | ||
| ddtrace_update_otel_thread_context(); |
There was a problem hiding this comment.
Refresh OTel context after trace ID changes
Updating the TLS record only on span open/close/switch leaves it stale when an existing root span's trace ID is replaced later, e.g. by DDTrace\consume_distributed_tracing_headers() or DDTrace\set_distributed_tracing_context(), which mutate root_span->trace_id and only call ddtrace_update_root_id_properties(). In requests that consume distributed headers after the autoroot span already exists and do not open another span, OTel readers keep seeing the originally generated trace ID until the next span transition.
Useful? React with 👍 / 👎.
| listener->finish_user_req(listener, &span->std); | ||
| } | ||
| #ifdef __linux__ | ||
| ddtrace_clear_otel_thread_context_root_span(); |
There was a problem hiding this comment.
Only clear the matching user-request override
This clears the global OTel root override regardless of which user-request span is finishing. If a second notify_start() begins before the first span is closed, the AppSec listener finishes/replaces the previous request but the first span still has notify_user_req_end; closing that older span later calls this path and detaches the override for the newer active user request, so subsequent OTel/profiling context falls back to the outer stack.
Useful? React with 👍 / 👎.
bwoebi
left a comment
There was a problem hiding this comment.
I think the approach proposed by this PR is flawed / more expensive than needed.
The PR tries to synchronize state between otel context and root span.
The otel context is, at it's core, a pointer to some struct.
All we need is making the otel context part of the root_span_data struct.
Any updates to the trace_id / the roots span id -> update the local root span id / trace id part of the context, any updates to active -> update the span_id of the context.
And if there's no root span data, we just write NULL to the TLS variable.
I'm very not fond of "synchronizing" with otel context, any updates to the otel context should in my opinion simply happen in-place. There's also no reason for a "root span override".
It feels also way heavier than it should be. Updating a span id due to open/close/drop should be one 64 bit write. That's it.
Switching the span stack should cause one pointer write of the span stacks root span otel context to the TLS pointer. That's it, too.
Further, this PR at the very least forgot to handle ddtrace_update_root_id_properties().
|
I also don't understand why this functionality depends on libdatadog. The code should be IMHO simpler and less LoC if we just manage this manually. |
Description
This adds OTel context sharing using libdatadog. It adds it to both tracing (producer) and profiling (consumer).
This is mostly AI-written with a little human review as I went. It is not ready for merge, but it did pass some local end-to-end experimentation.
Reviewer checklist