Allow OpenTelemetry*Client.JsonSerializerOptions to control full OTel message formatting#7558
Allow OpenTelemetry*Client.JsonSerializerOptions to control full OTel message formatting#7558jozkee wants to merge 2 commits into
Conversation
… message formatting
Change the default JsonSerializerOptions on OpenTelemetryChatClient and
OpenTelemetryRealtimeClient/Session from AIJsonUtilities.DefaultOptions
to OtelMessageSerializer.DefaultOptions (OtelContext-based, snake_case,
WriteIndented=true). The provided options now control the entire
serialization pipeline, including the final JSON output, not just
unknown content type resolution.
Users can now customize formatting (e.g. compact JSON) by copying the
default options and overriding the desired properties:
client.JsonSerializerOptions = new JsonSerializerOptions(
client.JsonSerializerOptions) { WriteIndented = false };
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the OpenTelemetry AI clients’ telemetry serialization so that user-supplied JsonSerializerOptions controls the full JSON formatting of gen_ai.input.messages / gen_ai.output.messages (including the final JSON output), rather than only influencing fallback serialization for unknown content types.
Changes:
- Switch default serialization options for OpenTelemetry chat + realtime telemetry from
AIJsonUtilities.DefaultOptionstoOtelMessageSerializer.DefaultOptions. - Enforce that assigned
JsonSerializerOptionsincludes the required OTel resolver (throwingArgumentExceptionotherwise). - Add/adjust tests to validate immutability, validation behavior, and
WriteIndentedformatting control.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| test/Libraries/Microsoft.Extensions.AI.Tests/Realtime/OpenTelemetryRealtimeClientTests.cs | Updates tests to provide valid serializer options and adds coverage for immutability/validation/formatting. |
| test/Libraries/Microsoft.Extensions.AI.Tests/ChatCompletion/OpenTelemetryChatClientTests.cs | Updates tests to chain the test resolver correctly and adds coverage for immutability/validation/formatting. |
| src/Libraries/Microsoft.Extensions.AI/Realtime/OpenTelemetryRealtimeClientSession.cs | Uses configured options for all OTel message JSON serialization in realtime session telemetry. |
| src/Libraries/Microsoft.Extensions.AI/Realtime/OpenTelemetryRealtimeClient.cs | Aligns default options + validates assigned options, and flows options into created sessions. |
| src/Libraries/Microsoft.Extensions.AI/Common/OtelMessageSerializer.cs | Adds validation helper and updates serialization to honor provided options end-to-end. |
| src/Libraries/Microsoft.Extensions.AI/ChatCompletion/OpenTelemetryChatClient.cs | Uses configured options for OTel message JSON formatting and validates assigned options. |
| private string SerializeMessage(RealtimeOtelMessage message) | ||
| { | ||
| return JsonSerializer.Serialize(new[] { message }, OtelContext.Default.IEnumerableRealtimeOtelMessage); | ||
| return JsonSerializer.Serialize(new[] { message }, _jsonSerializerOptions.GetTypeInfo(typeof(IEnumerable<RealtimeOtelMessage>))); | ||
| } |
There was a problem hiding this comment.
Agreed this is a real gap. SerializeMessage/SerializeMessages can run from the non-streaming path before the options are frozen (today they are only made read-only in GetStreamingResponseAsync), so a caller could mutate JsonSerializerOptions after the first serialization and get inconsistent telemetry, and it is not safe under concurrency. Suggest freezing at the point of first serialization (for example call MakeReadOnly() at the top of SerializeMessage/SerializeMessages, or freeze once in the setter) so the options stay immutable for the lifetime of any emitted span.
| private string SerializeMessages(IEnumerable<RealtimeOtelMessage> messages) | ||
| { | ||
| return JsonSerializer.Serialize(messages, OtelContext.Default.IEnumerableRealtimeOtelMessage); | ||
| return JsonSerializer.Serialize(messages, _jsonSerializerOptions.GetTypeInfo(typeof(IEnumerable<RealtimeOtelMessage>))); | ||
| } |
There was a problem hiding this comment.
Same root cause as the SerializeMessage comment above. SerializeMessages also reads _jsonSerializerOptions without ensuring it is read-only first, so options can be mutated between operations and produce inconsistent output. Whatever freezing approach we pick should cover both methods (freezing once on first use, or in the setter, would handle both at once).
| Throw.ArgumentException( | ||
| nameof(options), | ||
| "The provided JsonSerializerOptions is missing the required OpenTelemetry type resolver."); | ||
| } |
There was a problem hiding this comment.
Good point, the message is not actionable as written. Since the OtelContext resolver is internal, a user has no obvious way to satisfy the requirement starting from a blank JsonSerializerOptions. Suggest pointing them at the supported path, for example: "The provided JsonSerializerOptions is missing the required OpenTelemetry type resolver. Copy the client's existing JsonSerializerOptions (new JsonSerializerOptions(client.JsonSerializerOptions)) and override only the properties you need." That matches the pattern already shown in the new XML doc remarks.
🎉 Good job! The coverage increased 🎉
Full code coverage report: https://dev.azure.com/dnceng-public/public/_build/results?buildId=1459916&view=codecoverage-tab |
| /// Validates that the given options include the required OTel type resolver. | ||
| /// </summary> | ||
| internal static void ThrowIfMissingOtelResolver(JsonSerializerOptions options) | ||
| { |
There was a problem hiding this comment.
Design question: is the breaking change worth it, or could we avoid it entirely? Rejecting options that lack the resolver is what makes this source/binary breaking, since the setter previously accepted any options object silently. An alternative is to repair instead of reject: if the OtelContext resolver is missing from the supplied options, add it to the TypeInfoResolverChain (cloning first if the instance is read-only) rather than throwing. That keeps previously valid callers working, still honors the user formatting (the #7514 fix), and guarantees conformance because the resolver is always present. The tradeoff is that silently augmenting a caller-provided options object is its own minor surprise. If we deliberately want the strictness, that is fine, but it would be good to call out the breaking change explicitly in the PR description and changelog and confirm it is acceptable for this type.
There was a problem hiding this comment.
Thanks. This begs the question, should we repair also the PropertyNamingPolicy to always be SnakeCaseLower or were you suggesting just repairing the OtelContext and throw if PropertyNamingPolicy != SnakeCaseLower?
There was a problem hiding this comment.
Great question. I dug into this and the short version is that just repairing the OtelContext resolver is not enough, and repairing only PropertyNamingPolicy still leaves gaps.
The key finding is that OtelContext's source-gen settings are not baked into the property metadata when its resolver is reused on a different JsonSerializerOptions instance. The wire format follows the live options. A quick repro with a snake_case context:
- OtelContext.Default.Options: {"content":"hello","finish_reason":"stop"}
- fresh options + resolver, PropertyNamingPolicy = null: {"Type":null,"Content":"hello","FinishReason":"stop"}
- fresh options + resolver, PropertyNamingPolicy = CamelCase: {"type":null,"content":"hello","finishReason":"stop"}
- default encoder vs UnsafeRelaxedJsonEscaping changes escaping of non-ASCII and < > + characters.
So three settings are conformance critical and all come from the live options: PropertyNamingPolicy (must be SnakeCaseLower, otherwise the gen_ai attribute names are wrong), DefaultIgnoreCondition (WhenWritingNull, otherwise null parts leak in), and Encoder (UnsafeRelaxedJsonEscaping for readable UTF-8). WriteIndented is the only one that is safe to vary, and it is the actual ask in #7514.
That makes full silent repair unattractive: we would have to clone the supplied options (they are read-only), inject the resolver, and force three of the user's settings while keeping only WriteIndented. Since this is a get/set property, get would then return something different from what was set, and we would be silently discarding most of what the caller passed. That is more code and more surprising semantics than the break it avoids.
On whether avoiding the breaking change is worth it: the break is behavioral only (the property signature is unchanged), it fails loudly and early at the setter rather than corrupting data, it lands on a niche audience (only people who set this property today), and the behavior it replaces is the bug we are fixing. So I would lean toward not contorting the design to preserve it, and instead documenting it as an intentional low severity break.
My suggestions, in order of preference:
-
If an API adjustment is acceptable, make invalid states unrepresentable instead of validating a free-form options object. For example a configure callback Action that receives a clone of DefaultOptions (re-asserting the critical settings after it runs), or a small targeted knob such as a WriteIndented/compact toggle. This removes the repair-vs-throw question entirely and directly serves Allow controlling JSON formatting for gen_ai input/output message telemetry #7514.
-
If we keep the property, validate and throw, but strengthen the check beyond the resolver to also require PropertyNamingPolicy == SnakeCaseLower (the one that actually corrupts the convention's attribute names), and ideally DefaultIgnoreCondition and Encoder too. Pair it with the actionable "copy the existing options" message.
I would avoid the silent repair path: it is the only option that fully prevents the break, but it costs the most complexity and the least predictable behavior.
| @@ -36,8 +50,10 @@ private static JsonSerializerOptions CreateDefaultOptions() | |||
| } | |||
|
|
|||
| internal static string SerializeChatMessages( | |||
There was a problem hiding this comment.
While we are here: two other OTel middlewares call this same SerializeChatMessages but do not pass an options argument, so they always fall back to DefaultOptions and give the user no way to customize telemetry formatting (the same limitation #7514 describes for chat/realtime):
- OpenTelemetrySpeechToTextClient (SerializeChatMessages([new(ChatRole.Assistant, response.Contents)]))
- OpenTelemetryImageGenerator (two call sites, for input and output content)
Neither client exposes a JsonSerializerOptions property at all. Should this PR (or a follow-up) extend the same treatment to them, adding the property and flowing it through as options: _jsonSerializerOptions, so the behavior is consistent across all OTel clients that emit gen_ai message JSON? If it is out of scope here, a tracking issue would be good so they do not drift.
tarekgh
left a comment
There was a problem hiding this comment.
Left a few comments/questions. It will be good to address those in addition to copilot comments too before merging. LGTM, otherwise.
| /// </summary> | ||
| internal static void ThrowIfMissingOtelResolver(JsonSerializerOptions options) | ||
| { | ||
| if (!options.TypeInfoResolverChain.Any(r => r is OtelContext)) |
There was a problem hiding this comment.
I think this check is a bit too aggressive and I'm not aware of precedent elsewhere. Why do we need OtelContext specifically to present in the chain? It may not matter at all if OtelContext sits at the end of the chain with DefaultJsonTypeInfoResolver at the front (it resolves all types, so the serializer would never fall back to OtelContext).
There was a problem hiding this comment.
It may not matter at all if OtelContext sits at the end of the chain
Good point, it won't hold in that case.
For context, OtelContext is internal and is needed for AOT serialization of OTel message-part POCOs. The goal of the change is to allow some user-options to control serialization, so we can either:
a. Reject user options that wouldn't produce otel-conformant json, that is require snake-case and OtelContext first in the resolver chain, "first" is the simple, enforceable approximation that guarantees correctness at the cost of rejecting some valid-but-unusual chains. The exception message will tell users to copy the existing options (new JsonSerializerOptions(otelClient.JsonSerializerOptions)) and then override the properties they need.
b. Copy a curated set of user knobs onto our fixed options (e.g. WriteIndented, Encoder) and ignore the rest. This never throws, but silently dropping options the user set is its own surprise as @tarekgh mentioned, which is why I leaned toward throwing.
There was a problem hiding this comment.
c. Do nothing: accept any options, document the guidance. No validation, no throwing, no repairing. The remarks advise copying the default and overwriting only what you need. If a user passes options that break conformance, that's on them.
There was a problem hiding this comment.
a. and c. would be breaking in different ways:
(a) Reject — API/compile/throw break. Previously-accepted options now throw ArgumentException from the setter. That's a source/binary behavioral break: code that compiled and ran fine now fails. Clearly breaking.
(c) Do nothing — behavioral break, not an API break. The setter still accepts everything (no throw, same signature), so nothing breaks at compile/bind time. But options that used to be ignored for the envelope now control it, for example, a user passing new() will see errors in AOT for the missing OtelContext.
There was a problem hiding this comment.
Good catch, agreed, the check targets the wrong thing. Presence of OtelContext does not guarantee it is used (a DefaultJsonTypeInfoResolver at the front resolves everything first), and it is not necessary either, since any resolver that can resolve the OTel types works. GetTypeInfo already throws on its own when nothing resolves the types, so the guard adds no reliable value. I will drop it. Note too that the conformance critical bits (PropertyNamingPolicy, DefaultIgnoreCondition, Encoder) come from the live options, not the resolver, so if we want to guarantee correct telemetry the better levers are validating those settings or narrowing the API. Happy to go whichever way @eiriktsarpalis and @jozkee prefer.
There was a problem hiding this comment.
that is require snake-case and OtelContext first in the resolver chain
Users can always assign a new JSO to the setter but then modify its settings while it is still mutable. So I don't see how validation at the setter would prevent anything. You might consider running validation when you first try to serialize after the JSO has been marked read-only.
Can I ask why you specifically are looking for OtelContext? I don't think it's a good idea to demand the presence of what amounts to internal implementation detail and users cannot directly control. I would suggest instead to examine specific structural properties of the resultant JsonTypeInfo for the types you're interested in an validate on those.
There was a problem hiding this comment.
why you specifically are looking for OtelContext?
As a convenient way of spotting JSO not created with the copy ctor, also for AOT, if we don't do the check, we will get a cryptic InvalidOperationException: "JsonSerializerOptions instance must specify a TypeInfoResolver setting before being marked as read-only."
I would suggest instead to examine specific structural properties of the resultant JsonTypeInfo for the types you're interested in an validate on those
I will give it a try, but I suspect that may be too brittle.
User-supplied JsonSerializerOptions now flow into full OpenTelemetry gen-ai message serialization. To avoid emitting non-conformant JSON, the options are validated and frozen on first use (before MakeReadOnly): the snake_case naming policy must be JsonNamingPolicy.SnakeCaseLower and the built-in OpenTelemetry type resolver must be first in the TypeInfoResolverChain so a reflection-based resolver cannot shadow it. Non-conformant options throw InvalidOperationException with an actionable copy-and-override hint. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
OpenTelemetry clients hardcode serialization options for
gen_ai.input.messagesandgen_ai.output.messages, ignoring the user-supplied JsonSerializerOptions.This PR changes the default JsonSerializerOptions on OpenTelemetryChatClient and OpenTelemetryRealtimeClient/Session from AIJsonUtilities.DefaultOptions to OtelMessageSerializer.DefaultOptions (the previously hardcoded). The provided options now control the entire serialization pipeline, including the final JSON output, not just unknown content type resolution.
Users could now customize formatting (e.g. compact JSON) by copying the default options and overriding the desired properties:
Breaking change
Before this fix, the setter on JsonSerializerOptions accepted any valid options object silently, even ones missing the OTel resolver. Those options were only used for a narrow fallback (unknown AIContent subtype resolution), so the missing resolver was harmless; the final serialization always used the hardcoded DefaultOptions.
After this fix, the setter throws ArgumentException if the supplied options don't contain the OtelContext resolver in their TypeInfoResolverChain.
If acceptable,
fixes #7514.