[feature][integration][java] Add built-in Google Gemini chat model integration#718
[feature][integration][java] Add built-in Google Gemini chat model integration#718section9-lab wants to merge 7 commits into
Conversation
…tegration Add a dedicated Gemini chat model module under integrations/chat-models/gemini/ using the official google-genai Java SDK (v1.56.0). This module handles Gemini's native protocol differences: - System messages passed as systemInstruction (not a system role) - Conversation roles: user/model (assistant maps to model) - Function calls via functionCall/functionResponse parts with native id - Gemini 3 thoughtSignature capture and echo-back for multi-turn tool use - Custom base_url support for proxy-based deployments (e.g. CC Switch) Scope: text + function calling on the Java side. Follow-ups: multimodal, streaming, Vertex AI auth, Python counterpart. Closes apache#648
…t, no blank lines between groups)
There was a problem hiding this comment.
Thanks for taking this on! A few questions inline, plus one cross-file finding below.
- The module builds (it's in the
integrations/chat-modelsreactor), but I don't see it added todist/pom.xml— every sibling connector is listed there (ollama, openai, anthropic, azureai, bedrock) so it gets bundled into the dist JARs copied into the Python wheel atpython/flink_agents/lib/. Without that entry,GeminiChatModelConnectionwon't be in the shipped artifact and users can't load it from the released distribution. Would adding the gemini artifact alongside the siblings be the right move here? One thing worth checking when you do:google-genai:1.56.0's POM drags inprotobuf-java,guava,google-auth-library-oauth2-http, Apachehttpclient, and a few Jackson modules — more transitive weight than the existing connectors carry, so the dist shade config may need a look for relocation/duplicate-class issues.
|
|
||
| case TOOL: | ||
| Object name = message.getExtraArgs().get("name"); | ||
| if (name == null) { |
There was a problem hiding this comment.
The TOOL branch requires name in extraArgs, but the runtime never supplies it. The only producer of TOOL messages is ChatModelAction.java (~line 476), which populates only externalId — every sibling connector reads externalId too (AnthropicChatModelConnection.java:271, OpenAIChatCompletionsUtils.java:96). So any multi-turn tool-calling agent driven through the Flink runtime throws Tool message must have a 'name' in extraArgs for Gemini the moment a tool result is replayed: single-shot chat and the first tool request work, but the tool-result turn fails — and function calling is the headline of #648.
The connector already has what it needs to resolve this without the hard requirement: ToolCallAction sets externalId == original_id, and this connector writes both name and original_id onto the assistant turn's tool-call map in convertFunctionCall (lines 384, 378). chat() receives the full messages list including that prior ASSISTANT turn. Would building an original_id → name map from the assistant turns' toolCalls and looking up by the TOOL message's externalId work here? Part.fromFunctionResponse(name, map) needs the function name specifically (it exists in 1.56.0), which is why externalId alone isn't enough and the name-resolution step is needed. Keeping a clear error only when the name genuinely can't be resolved would preserve the safety net.
One thing I'd love to understand: does the multi-turn tool-result e2e run through ChatModelAction, or through a harness that pre-seeds name? If it's the runtime path I'd have expected this to surface — so I'm probably missing something about how the e2e is wired, and would be glad to be corrected here.
There was a problem hiding this comment.
Good catch — fixed in ff6da88. Implemented exactly the lookup you suggested: chat() now scans prior ASSISTANT turns to build a original_id → name map (see buildToolCallIdToNameMap), and the TOOL branch in convertToContent resolves the function name via that map keyed by the runtime-supplied externalId. Explicit name in extraArgs still takes precedence as a safety net; the error is only raised when the name genuinely can't be resolved.
To your last paragraph — you weren't missing anything. The e2e I ran was driving the SDK directly with pre-seeded name, not through ChatModelAction, which is exactly why this hadn't surfaced. Apologies for the gap.
| @DisplayName("convertToContent maps TOOL role to a functionResponse part") | ||
| void testConvertToolMessage() { | ||
| ChatMessage tool = ChatMessage.tool("sunny, 22C"); | ||
| tool.getExtraArgs().put("name", "get_weather"); |
There was a problem hiding this comment.
This line hand-injects name into extraArgs, but the production runtime never does — it supplies only externalId. Paired with testConvertToolMessageWithoutName (line 124) asserting the throw path, the suite codifies the broken behavior as the contract: both tests pass against a message shape the runtime never emits, so a green suite gives false confidence that runtime tool-calling works (this is what masks the TOOL name issue at GeminiChatModelConnection.java:294).
Would a realistic two-turn test help pin the real contract? Something like, if it's useful: an ASSISTANT message whose toolCalls carry {name, original_id} as produced by convertFunctionCall, followed by a TOOL message carrying only externalId (== that original_id, no name), then assert convertToContent/chat produces a functionResponse with the correct name. A test in that shape would lock the runtime contract in place once the resolution above lands — happy to be redirected if you see a simpler way to cover it.
There was a problem hiding this comment.
Fixed in ff6da88. Removed testConvertToolMessageWithoutName (it was locking in the wrong contract) and added two tests in the shape you described:
testRuntimeShapeToolMessageResolvesNameFromExternalId— TOOL message with onlyexternalIdand a lookup map; asserts the resultingfunctionResponsecarries the correct name.testRuntimeShapeMultiTurn— full ASSISTANT ({name, original_id}fromconvertFunctionCall) → TOOL (onlyexternalId == original_id) round-trip, asserting the recovered name on the model side.
Also kept testConvertToolMessageWithExplicitName for the case where the caller does supply name directly, and added testConvertToolMessageThrowsWhenUnresolvable so the failure case is still locked down — but only when the name is genuinely missing.
| builder.temperature(((Number) temperature).floatValue()); | ||
| } | ||
|
|
||
| Object maxOutputTokens = arguments.remove("max_output_tokens"); |
There was a problem hiding this comment.
buildConfig reads temperature and max_output_tokens, but additional_kwargs (top_k, top_p, …) is never read here — even though GeminiChatModelSetup documents and forwards it via getParameters(). The net effect is a user who sets top_k/top_p through the documented additional_kwargs path sees them silently ignored: no error, just wrong sampling behavior. AnthropicChatModelConnection.java:212-217 removes and applies additional_kwargs via applyAdditionalKwargs. Would mirroring that — reading additional_kwargs and mapping known keys (top_k → topK, top_p → topP) onto the builder — be worth doing? GenerateContentConfig.Builder exposes topK/topP in 1.56.0.
There was a problem hiding this comment.
Fixed in ff6da88. Added applyAdditionalKwargs mirroring the Anthropic connector: top_k → topK, top_p → topP, stop_sequences → stopSequences. One small wrinkle worth flagging — Gemini's GenerateContentConfig.Builder types both topK and topP as Float (verified via javap on 1.56.0), which is unusual for topK but matches the upstream protocol. Unknown keys are silently ignored rather than rejected, mirroring how the sibling connectors handle forward-compatible parameters. Two new tests cover the forwarding (testApplyAdditionalKwargs) and the ignore-unknown behavior.
| recordUsage(result, modelName, response); | ||
|
|
||
| return result; | ||
| } catch (Exception e) { |
There was a problem hiding this comment.
The model-name validation IllegalArgumentException at line 182 is thrown inside this try, so it gets re-wrapped into a generic RuntimeException — while the constructor throws IllegalArgumentException directly (line 114). A caller catching IllegalArgumentException to distinguish validation errors will catch the constructor case but not the in-chat model-name case. Is there a reason to keep the contract asymmetric across the two entry points, or would validating the model name before the try (or rethrowing validation IAEs unwrapped) make it consistent?
There was a problem hiding this comment.
Fixed in ff6da88. Did both of the things you suggested as a belt-and-braces approach: (1) moved the model validation out of the try block so the IAE is thrown at the same lexical point as the constructor's IAE, and (2) added an explicit catch (IllegalArgumentException) { throw; } clause before the generic catch, so any future validation IAEs from downstream callees (e.g. in convertToContent when a tool-result can't be resolved) also surface unwrapped instead of being re-wrapped into a generic RuntimeException.
| builder.httpOptions(httpOptions.build()); | ||
| } | ||
|
|
||
| if (useVertex) { |
There was a problem hiding this comment.
The Vertex AI constructor branch (and the vertex_ai/project/location params the javadoc advertises) has no test, unlike the other constructor paths. Worth a construction smoke test if Vertex is in scope for #648, or a note that it's a documented-but-untested follow-up?
There was a problem hiding this comment.
Took the smoke-test path in ff6da88, with a caveat. Added testConstructorVertexAiIsWired plus a javadoc note that the Vertex branch is wired and smoke-tested at construction but not e2e-tested.
Caveat: the SDK eagerly resolves ADC at Client.build(), which means a CI box without ADC throws while a dev box with gcloud auth succeeds. The test accepts both outcomes — what it actually asserts is that the vertex_ai flag is not silently dropped (which would make the flag dead code). It's a weaker assertion than I'd like, but tightening it requires either an ADC fixture or mocking the SDK builder, both of which felt out of scope for this PR. Happy to take a stronger approach if you have a preference.
dist/pom.xml: - Register the gemini module as a dist dependency so it ships in the release JARs and Python wheel. GeminiChatModelConnection: - Recover the function name in TOOL messages from the prior ASSISTANT turn's tool-call map via externalId, matching how ChatModelAction actually emits tool-result turns (was previously requiring an explicit 'name' that the runtime never supplies). - Forward additional_kwargs (top_k, top_p, stop_sequences) to the Gemini config, mirroring the Anthropic connector. - Surface IllegalArgumentException from the model-name validation unwrapped, consistent with the constructor's error contract. - Clamp the HttpOptions timeout in long arithmetic to avoid int overflow. - Raise an explicit error when the response has no candidates (safety-block / quota) instead of returning an empty assistant message. - Switch the default model to gemini-3.1-pro-preview, since gemini-3-pro-preview is deprecated on the live Google endpoint. Tests: - Replace the test that locked in the broken 'name required' contract with a realistic two-turn shape (ASSISTANT carries tool-call map, TOOL carries only externalId). - Cover the additional_kwargs forwarding path. - Add a smoke test that the Vertex AI builder branch is wired (documented as a build-time check, e2e is a follow-up).
|
Thanks for the thorough review @weiqingy! All five inline comments are addressed in ff6da88 — replies inline on each. Two summary items for the top-level comment: 1.
|
- extractSystemInstruction: use Content.fromParts(Part...) instead of Content.builder().parts(...).build(), since the system instruction doesn't carry a role. - convertResponse: call response.checkFinishReason() so that unexpected finish reasons (SAFETY, MAX_TOKENS, RECITATION, …) raise IllegalArgumentException with the SDK's own message, instead of silently returning a partial/filtered assistant message. The IAE is propagated unwrapped by chat()'s catch block.
…tly ignoring Three small improvements informed by re-reading the function: - default branch (unknown key) and the type-mismatch branches now emit a SLF4J warning instead of dropping the kwarg silently. Reasoning: the Anthropic/OpenAI siblings can transparently forward unknown keys via SDK escape hatches (putAdditionalBodyProperty), but Gemini's GenerateContentConfig.Builder is AutoValue-generated and does not accept arbitrary body fields, so we cannot replicate the same transparency without risking client-level HttpOptions (baseUrl/ timeout) being lost via per-request override. Logging is the zero-risk middle ground: users can see which kwarg was dropped. - Extract the Number-to-Float coercion into applyFloat() so top_k and top_p share one place for type validation and the warn message. - Build the stop_sequences list with a stream pipeline instead of an explicit accumulator loop. Two new tests lock down that unknown keys and type-mismatched values are ignored (do not leak into known fields) and do not throw.
…own keys Re-checked the surrounding modules: all five sibling chat-model connectors (anthropic, openai, azureai, bedrock, ollama) and the rest of integrations/ avoid Logger almost entirely (1 usage across the entire integrations tree). The earlier patch added 3 LOG.warn calls, which would have made this the only Connection in the module to log. Trim to one: keep the warn on the default branch (truly unknown key — the Gemini SDK can't transparently forward those the way Anthropic/ OpenAI do via putAdditionalBodyProperty, so silent drop would be the worst option), drop the warns on type-mismatched values (caller's responsibility to pass the documented type, matches the lenient behavior of the sibling connectors). Drops the applyFloat helper since it's no longer needed at one call site.
…container startup)
|
@section9-lab Thanks for addressing all the comments. Confirming each:
LGTM from my side. |
|
Thanks for the thorough review @weiqingy! Could you submit a formal approval via the GitHub review button when you get a chance? I don't have merge access on the repo, so I'd need a committer to approve + merge. |
[## What is the purpose of the change
Add a dedicated Gemini chat model module (
integrations/chat-models/gemini/) using the officialgoogle-genaiJava SDK (v1.56.0), as proposed in #648.Key design decisions
systemInstruction, rolesuser/model, tool calls viafunctionCall/functionResponseparts with nativeid(notool_call_id).thoughtSignature: captured from response parts and echoed back on multi-turn tool-use turns (required by gemini-3-pro-preview).base_urlparameter routes requests through a local proxy (e.g. CC Switch). Whenapi_keyis omitted butbase_urlis set, a placeholder key satisfies the SDK validation while the proxy injects the real credential.com.google.genai:google-genai:1.56.0— supports both Gemini Developer API (API key) and Vertex AI (follow-up).Scope
Java side: text + function calling only. Multimodal, streaming, full Vertex AI auth wiring, and the Python counterpart are tracked as follow-ups per #648.
Files changed
integrations/chat-models/gemini/pom.xmlGeminiChatModelConnection.javachat(), message/tool/functionCall conversion, thoughtSignature round-tripGeminiChatModelSetup.javagetParameters()GeminiChatModelConnectionTest.javaGeminiChatModelSetupTest.javaintegrations/chat-models/pom.xmlgeminimoduleintegrations/pom.xmlgoogle.genai.versionpropertyTesting
gemini-3-pro-previewvia local proxy: text generation ✅, function calling ✅, thoughtSignature echo ✅, multi-turn tool-result → final answer ✅.Closes #648