An alternative TLS implementation to fix a crash on musl/aarch64#600
An alternative TLS implementation to fix a crash on musl/aarch64#600zhengyu123 wants to merge 26 commits into
Conversation
CI Test ResultsRun: #28100339136 | Commit:
Status Overview
Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled Failed Testsmusl-aarch64/debug / 17-librcaJob: View logs No detailed failure information available. Check the job logs. musl-aarch64/debug / 11-librcaJob: View logs No detailed failure information available. Check the job logs. musl-aarch64/debug / 21-librcaJob: View logs No detailed failure information available. Check the job logs. musl-aarch64/debug / 8-librcaJob: View logs No detailed failure information available. Check the job logs. musl-aarch64/debug / 25-librcaJob: View logs No detailed failure information available. Check the job logs. Summary: Total: 32 | Passed: 27 | Failed: 5 Updated: 2026-06-24 13:18:34 UTC |
|
Reliability & Chaos Results✅ All reliability & chaos checks passed Pipeline: https://gitlab.ddbuild.io/DataDog/java-profiler/-/pipelines/120757427 |
There was a problem hiding this comment.
Pull request overview
This PR introduces a pthread-key-based ThreadLocal helper as an alternative to C++ thread_local, and migrates LivenessTracker’s per-thread random/subsampling state to use it (motivated by musl/aarch64 TLS issues). It also adds C++ unit tests validating the new TLS wrapper behavior.
Changes:
- Added
ddprof-lib/src/main/cpp/threadLocal.himplementing aThreadLocal<T>wrapper on top ofpthread_key_*. - Updated
LivenessTracker::track()to replacethread_localPRNG/distribution/skipped-state withThreadLocalinstances. - Added
threadLocal_ut.cppgtest coverage forThreadLocalset/get semantics, per-thread isolation, and per-thread cleanup.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 12 comments.
| File | Description |
|---|---|
ddprof-lib/src/main/cpp/threadLocal.h |
New pthread-key TLS abstraction (generic + double specialization). |
ddprof-lib/src/main/cpp/livenessTracker.cpp |
Switches liveness subsampling TLS state from thread_local to the new ThreadLocal wrapper. |
ddprof-lib/src/test/cpp/threadLocal_ut.cpp |
Adds gtests validating correctness and cleanup behavior of the new TLS wrapper. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 13c88912f3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1de7f18db7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…r into zgu/tls_replacement
rkennke
left a comment
There was a problem hiding this comment.
There's still the destructor checks that I would like to have added in production code path, plus two (minor) C style casts that should be C++ casts.
rkennke
left a comment
There was a problem hiding this comment.
Thanks for making the changes. I'm approving this now.
You might want to add the else { assert .. } parts in the destructors, and I also did a quick pass and found more places where C-style casts could be done with C++ casts. I leave that up to you.
| ~ThreadLocal() { | ||
| if(_key_valid) { | ||
| pthread_key_delete(_key); | ||
| } |
There was a problem hiding this comment.
Ok good. are you also adding the else { assert ... } part?
| ~ThreadLocal() { | ||
| if(_key_valid) { | ||
| pthread_key_delete(_key); | ||
| } |
There was a problem hiding this comment.
same here: are you also adding the else { assert ... } part?
| return 0.0; | ||
| } | ||
|
|
||
| u64 val = (u64)p; |
There was a problem hiding this comment.
here is another C-style cast that should be reinterpret_cast<u64>(p).
| // std::mt19937 itself is noexcept, but std::random_device and `new` may throw. | ||
| // If that happens we let the failure terminate the process (same outcome as | ||
| // failing thread_local initialization previously). | ||
| return (void*)(new std::mt19937(std::random_device{}())); |
There was a problem hiding this comment.
Could use static_cast<void*>( .. ) here.
| static void* create_uniform_real_distribution() { | ||
| // std::uniform_real_distribution<> construction is noexcept, but `new` may throw. | ||
| // If allocation fails the process is likely to abort anyway. | ||
| return (void*)(new std::uniform_real_distribution<>(0, 1.0)); |
| } | ||
|
|
||
| static void free_mt19937(void* p) { | ||
| std::mt19937* mt = (std::mt19937*)p; |
| } | ||
|
|
||
| static void free_uniform_real_distribution(void* p) { | ||
| std::uniform_real_distribution<>* urd = (std::uniform_real_distribution<>*)p; |
What does this PR do?:
This PR is intended to replace
thread_localvariables (except Otel ones) with an alternative implementation, which is based onpthread_get/setspecific()API.Motivation:
We have encountered a quite few defects related to TLS - whose implementation is compiler specific. In some circumstances, the implementation is very restrictive that does not fit our usage pattern - it manifests in SCP-1238
Additional Notes:
To limit the scope, I only replaced 3 thread locals in
livenessTrackerwhich led to the crash reported in SCP-1238](https://datadoghq.atlassian.net/browse/SCP-1238)Converting
ProfiledThreadto use newly introducedThreadLocalwill be done in separate PR.How to test the change?:
For Datadog employees:
credentials of any kind, I've requested a security review (run the
dd:platform-security-reviewskill, or file a request via the PSEC review form).
bewairealso runs automatically on every PR.Unsure? Have a question? Request a review!