Currently there is no way to modify/override each model's chat template defaults. For example, this would be useful to disable text output of thinking content.
Ideally this would be configurable on a per-response basis, such as a template_kwargs: dict = ... parameter within complete_streaming_chat.
This will likely require coordination with downstream dependencies (onnxruntime) to add the plumbing needed to support this.
Currently there is no way to modify/override each model's chat template defaults. For example, this would be useful to disable text output of thinking content.
Ideally this would be configurable on a per-response basis, such as a
template_kwargs: dict = ...parameter withincomplete_streaming_chat.This will likely require coordination with downstream dependencies (
onnxruntime) to add the plumbing needed to support this.