[feature][integration][java] Add built-in Google Gemini chat model integration by section9-lab · Pull Request #718 · apache/flink-agents

section9-lab · 2026-05-31T14:06:02Z

[## What is the purpose of the change

Add a dedicated Gemini chat model module (integrations/chat-models/gemini/) using the official google-genai Java SDK (v1.56.0), as proposed in #648.

Key design decisions

Native Gemini protocol: system messages → systemInstruction, roles user/model, tool calls via functionCall/functionResponse parts with native id (no tool_call_id).
Gemini 3 thoughtSignature: captured from response parts and echoed back on multi-turn tool-use turns (required by gemini-3-pro-preview).
Proxy support: base_url parameter routes requests through a local proxy (e.g. CC Switch). When api_key is omitted but base_url is set, a placeholder key satisfies the SDK validation while the proxy injects the real credential.
SDK: com.google.genai:google-genai:1.56.0 — supports both Gemini Developer API (API key) and Vertex AI (follow-up).

Scope

Java side: text + function calling only. Multimodal, streaming, full Vertex AI auth wiring, and the Python counterpart are tracked as follow-ups per #648.

Files changed

File	Purpose
`integrations/chat-models/gemini/pom.xml`	New module POM (parent: chat-models, dep: google-genai SDK)
`GeminiChatModelConnection.java`	Connection class: `chat()`, message/tool/functionCall conversion, thoughtSignature round-trip
`GeminiChatModelSetup.java`	Setup class: defaults (model, temperature, maxOutputTokens), `getParameters()`
`GeminiChatModelConnectionTest.java`	11 unit tests (constructor validation, role mapping, tool-call round-trip, signature preservation)
`GeminiChatModelSetupTest.java`	5 unit tests (defaults, custom params, validation)
`integrations/chat-models/pom.xml`	Register `gemini` module
`integrations/pom.xml`	Add `google.genai.version` property

Testing

16 unit tests, all pass, no API key required (CI-safe).
End-to-end verified against real gemini-3-pro-preview via local proxy: text generation ✅, function calling ✅, thoughtSignature echo ✅, multi-turn tool-result → final answer ✅.

Closes #648

…tegration Add a dedicated Gemini chat model module under integrations/chat-models/gemini/ using the official google-genai Java SDK (v1.56.0). This module handles Gemini's native protocol differences: - System messages passed as systemInstruction (not a system role) - Conversation roles: user/model (assistant maps to model) - Function calls via functionCall/functionResponse parts with native id - Gemini 3 thoughtSignature capture and echo-back for multi-turn tool use - Custom base_url support for proxy-based deployments (e.g. CC Switch) Scope: text + function calling on the Java side. Follow-ups: multimodal, streaming, Vertex AI auth, Python counterpart. Closes apache#648

…t, no blank lines between groups)

weiqingy

Thanks for taking this on! A few questions inline, plus one cross-file finding below.

The module builds (it's in the integrations/chat-models reactor), but I don't see it added to dist/pom.xml — every sibling connector is listed there (ollama, openai, anthropic, azureai, bedrock) so it gets bundled into the dist JARs copied into the Python wheel at python/flink_agents/lib/. Without that entry, GeminiChatModelConnection won't be in the shipped artifact and users can't load it from the released distribution. Would adding the gemini artifact alongside the siblings be the right move here? One thing worth checking when you do: google-genai:1.56.0's POM drags in protobuf-java, guava, google-auth-library-oauth2-http, Apache httpclient, and a few Jackson modules — more transitive weight than the existing connectors carry, so the dist shade config may need a look for relocation/duplicate-class issues.

weiqingy · 2026-05-31T20:21:20Z

+
+            case TOOL:
+                Object name = message.getExtraArgs().get("name");
+                if (name == null) {


The TOOL branch requires name in extraArgs, but the runtime never supplies it. The only producer of TOOL messages is ChatModelAction.java (~line 476), which populates only externalId — every sibling connector reads externalId too (AnthropicChatModelConnection.java:271, OpenAIChatCompletionsUtils.java:96). So any multi-turn tool-calling agent driven through the Flink runtime throws Tool message must have a 'name' in extraArgs for Gemini the moment a tool result is replayed: single-shot chat and the first tool request work, but the tool-result turn fails — and function calling is the headline of #648.

The connector already has what it needs to resolve this without the hard requirement: ToolCallAction sets externalId == original_id, and this connector writes both name and original_id onto the assistant turn's tool-call map in convertFunctionCall (lines 384, 378). chat() receives the full messages list including that prior ASSISTANT turn. Would building an original_id → name map from the assistant turns' toolCalls and looking up by the TOOL message's externalId work here? Part.fromFunctionResponse(name, map) needs the function name specifically (it exists in 1.56.0), which is why externalId alone isn't enough and the name-resolution step is needed. Keeping a clear error only when the name genuinely can't be resolved would preserve the safety net.

One thing I'd love to understand: does the multi-turn tool-result e2e run through ChatModelAction, or through a harness that pre-seeds name? If it's the runtime path I'd have expected this to surface — so I'm probably missing something about how the e2e is wired, and would be glad to be corrected here.

Good catch — fixed in ff6da88. Implemented exactly the lookup you suggested: chat() now scans prior ASSISTANT turns to build a original_id → name map (see buildToolCallIdToNameMap), and the TOOL branch in convertToContent resolves the function name via that map keyed by the runtime-supplied externalId. Explicit name in extraArgs still takes precedence as a safety net; the error is only raised when the name genuinely can't be resolved.

To your last paragraph — you weren't missing anything. The e2e I ran was driving the SDK directly with pre-seeded name, not through ChatModelAction, which is exactly why this hadn't surfaced. Apologies for the gap.

weiqingy · 2026-05-31T20:21:20Z

+    @DisplayName("convertToContent maps TOOL role to a functionResponse part")
+    void testConvertToolMessage() {
+        ChatMessage tool = ChatMessage.tool("sunny, 22C");
+        tool.getExtraArgs().put("name", "get_weather");


This line hand-injects name into extraArgs, but the production runtime never does — it supplies only externalId. Paired with testConvertToolMessageWithoutName (line 124) asserting the throw path, the suite codifies the broken behavior as the contract: both tests pass against a message shape the runtime never emits, so a green suite gives false confidence that runtime tool-calling works (this is what masks the TOOL name issue at GeminiChatModelConnection.java:294).

Would a realistic two-turn test help pin the real contract? Something like, if it's useful: an ASSISTANT message whose toolCalls carry {name, original_id} as produced by convertFunctionCall, followed by a TOOL message carrying only externalId (== that original_id, no name), then assert convertToContent/chat produces a functionResponse with the correct name. A test in that shape would lock the runtime contract in place once the resolution above lands — happy to be redirected if you see a simpler way to cover it.

Fixed in ff6da88. Removed testConvertToolMessageWithoutName (it was locking in the wrong contract) and added two tests in the shape you described:

testRuntimeShapeToolMessageResolvesNameFromExternalId — TOOL message with only externalId and a lookup map; asserts the resulting functionResponse carries the correct name.

testRuntimeShapeMultiTurn — full ASSISTANT ({name, original_id} from convertFunctionCall) → TOOL (only externalId == original_id) round-trip, asserting the recovered name on the model side.

Also kept testConvertToolMessageWithExplicitName for the case where the caller does supply name directly, and added testConvertToolMessageThrowsWhenUnresolvable so the failure case is still locked down — but only when the name is genuinely missing.

weiqingy · 2026-05-31T20:21:20Z

+            builder.temperature(((Number) temperature).floatValue());
+        }
+
+        Object maxOutputTokens = arguments.remove("max_output_tokens");


buildConfig reads temperature and max_output_tokens, but additional_kwargs (top_k, top_p, …) is never read here — even though GeminiChatModelSetup documents and forwards it via getParameters(). The net effect is a user who sets top_k/top_p through the documented additional_kwargs path sees them silently ignored: no error, just wrong sampling behavior. AnthropicChatModelConnection.java:212-217 removes and applies additional_kwargs via applyAdditionalKwargs. Would mirroring that — reading additional_kwargs and mapping known keys (top_k → topK, top_p → topP) onto the builder — be worth doing? GenerateContentConfig.Builder exposes topK/topP in 1.56.0.

Fixed in ff6da88. Added applyAdditionalKwargs mirroring the Anthropic connector: top_k → topK, top_p → topP, stop_sequences → stopSequences. One small wrinkle worth flagging — Gemini's GenerateContentConfig.Builder types both topK and topP as Float (verified via javap on 1.56.0), which is unusual for topK but matches the upstream protocol. Unknown keys are silently ignored rather than rejected, mirroring how the sibling connectors handle forward-compatible parameters. Two new tests cover the forwarding (testApplyAdditionalKwargs) and the ignore-unknown behavior.

weiqingy · 2026-05-31T20:21:20Z

+            recordUsage(result, modelName, response);
+
+            return result;
+        } catch (Exception e) {


The model-name validation IllegalArgumentException at line 182 is thrown inside this try, so it gets re-wrapped into a generic RuntimeException — while the constructor throws IllegalArgumentException directly (line 114). A caller catching IllegalArgumentException to distinguish validation errors will catch the constructor case but not the in-chat model-name case. Is there a reason to keep the contract asymmetric across the two entry points, or would validating the model name before the try (or rethrowing validation IAEs unwrapped) make it consistent?

Fixed in ff6da88. Did both of the things you suggested as a belt-and-braces approach: (1) moved the model validation out of the try block so the IAE is thrown at the same lexical point as the constructor's IAE, and (2) added an explicit catch (IllegalArgumentException) { throw; } clause before the generic catch, so any future validation IAEs from downstream callees (e.g. in convertToContent when a tool-result can't be resolved) also surface unwrapped instead of being re-wrapped into a generic RuntimeException.

weiqingy · 2026-05-31T20:21:21Z

+            builder.httpOptions(httpOptions.build());
+        }
+
+        if (useVertex) {


The Vertex AI constructor branch (and the vertex_ai/project/location params the javadoc advertises) has no test, unlike the other constructor paths. Worth a construction smoke test if Vertex is in scope for #648, or a note that it's a documented-but-untested follow-up?

Took the smoke-test path in ff6da88, with a caveat. Added testConstructorVertexAiIsWired plus a javadoc note that the Vertex branch is wired and smoke-tested at construction but not e2e-tested.

Caveat: the SDK eagerly resolves ADC at Client.build(), which means a CI box without ADC throws while a dev box with gcloud auth succeeds. The test accepts both outcomes — what it actually asserts is that the vertex_ai flag is not silently dropped (which would make the flag dead code). It's a weaker assertion than I'd like, but tightening it requires either an ADC fixture or mocking the SDK builder, both of which felt out of scope for this PR. Happy to take a stronger approach if you have a preference.

dist/pom.xml: - Register the gemini module as a dist dependency so it ships in the release JARs and Python wheel. GeminiChatModelConnection: - Recover the function name in TOOL messages from the prior ASSISTANT turn's tool-call map via externalId, matching how ChatModelAction actually emits tool-result turns (was previously requiring an explicit 'name' that the runtime never supplies). - Forward additional_kwargs (top_k, top_p, stop_sequences) to the Gemini config, mirroring the Anthropic connector. - Surface IllegalArgumentException from the model-name validation unwrapped, consistent with the constructor's error contract. - Clamp the HttpOptions timeout in long arithmetic to avoid int overflow. - Raise an explicit error when the response has no candidates (safety-block / quota) instead of returning an empty assistant message. - Switch the default model to gemini-3.1-pro-preview, since gemini-3-pro-preview is deprecated on the live Google endpoint. Tests: - Replace the test that locked in the broken 'name required' contract with a realistic two-turn shape (ASSISTANT carries tool-call map, TOOL carries only externalId). - Cover the additional_kwargs forwarding path. - Add a smoke test that the Vertex AI builder branch is wired (documented as a build-time check, e2e is a follow-up).

section9-lab · 2026-06-01T07:46:52Z

Thanks for the thorough review @weiqingy! All five inline comments are addressed in ff6da88 — replies inline on each. Two summary items for the top-level comment:

1. `dist/pom.xml` registration

Added gemini to the dist dependency list alongside the other connectors. Verified by clean-rebuilding dist/flink-2.2 and checking the produced JARs:

GeminiChatModelConnection/GeminiChatModelSetup classes are present in both thin and fat jars ✓
com/google/genai/* SDK classes: 3363 ✓
com/google/auth/*: 188 ✓ ; io/grpc/*: 4003 ✓ ; com/google/protobuf/*: 1474 ✓

2. Shade impact (the relocation/duplicate-class question)

Audited the shade output. A few data points:

Fat-jar size. dist/common grows from ~118 MB to ~225 MB (+104 MB). The delta is exactly the transitive set you flagged: google-genai + protobuf-java + grpc + google-auth + Apache httpclient. Not small, but not pathological for an LLM SDK either.

Duplicate-class warnings. Total class-overlap warnings: ~41 (vs. ~38 pre-existing without gemini). The single largest new overlap is at dist/flink-2.x shade time:

flink-agents-dist-common.jar, google-genai-1.56.0.jar define 3323 overlapping classes and resources

This is the same structural pattern that already exists for ollama4j (common + ollama4j → 90 overlapping): dist/flink-2.x shades common (which already contains the transitive deps) AND the transitive deps directly. It's last-write-wins on identical classes, so no runtime breakage — but it does inflate jar size.

No new cross-vendor conflicts. google-genai's transitive guava/jackson/error_prone/etc. all align with versions already in the reactor. NOTICE/LICENSE entries are overlapping (same as every connector) but collapsed by the existing ManifestResourceTransformer.

My suggestion: the cleanup belongs in dist/flink-2.x shade config (exclude transitive-deps-already-in-common from being pulled in twice), and it applies to ollama4j today too — so I'd treat it as a follow-up cleanup rather than gating this PR on it, unless you'd prefer I include it here.

Other changes since the previous push

While addressing the review I also:

Switched the default model from gemini-3-pro-preview to gemini-3.1-pro-preview, because the former is now deprecated on the live Google Developer API (404 "no longer available" when I ran a direct e2e against generativelanguage.googleapis.com).
Clamped the HttpOptions timeout computation in long to avoid int overflow.
Made convertResponse raise an explicit IllegalStateException on empty candidates (safety-block / quota), instead of silently returning an empty assistant message. Matches the Anthropic connector's contract.

E2E

Verified the full chain (text → function call → thoughtSignature echo → tool-result replay → final answer) directly against the real Google Developer API endpoint with gemini-3.1-pro-preview. No proxy in the path this time.

PTAL!

- extractSystemInstruction: use Content.fromParts(Part...) instead of Content.builder().parts(...).build(), since the system instruction doesn't carry a role. - convertResponse: call response.checkFinishReason() so that unexpected finish reasons (SAFETY, MAX_TOKENS, RECITATION, …) raise IllegalArgumentException with the SDK's own message, instead of silently returning a partial/filtered assistant message. The IAE is propagated unwrapped by chat()'s catch block.

…tly ignoring Three small improvements informed by re-reading the function: - default branch (unknown key) and the type-mismatch branches now emit a SLF4J warning instead of dropping the kwarg silently. Reasoning: the Anthropic/OpenAI siblings can transparently forward unknown keys via SDK escape hatches (putAdditionalBodyProperty), but Gemini's GenerateContentConfig.Builder is AutoValue-generated and does not accept arbitrary body fields, so we cannot replicate the same transparency without risking client-level HttpOptions (baseUrl/ timeout) being lost via per-request override. Logging is the zero-risk middle ground: users can see which kwarg was dropped. - Extract the Number-to-Float coercion into applyFloat() so top_k and top_p share one place for type validation and the warn message. - Build the stop_sequences list with a stream pipeline instead of an explicit accumulator loop. Two new tests lock down that unknown keys and type-mismatched values are ignored (do not leak into known fields) and do not throw.

…own keys Re-checked the surrounding modules: all five sibling chat-model connectors (anthropic, openai, azureai, bedrock, ollama) and the rest of integrations/ avoid Logger almost entirely (1 usage across the entire integrations tree). The earlier patch added 3 LOG.warn calls, which would have made this the only Connection in the module to log. Trim to one: keep the warn on the default branch (truly unknown key — the Gemini SDK can't transparently forward those the way Anthropic/ OpenAI do via putAdditionalBodyProperty, so silent drop would be the worst option), drop the warns on type-mismatched values (caller's responsibility to pass the documented type, matches the lenient behavior of the sibling connectors). Drops the applyFloat helper since it's no longer needed at one call site.

…container startup)

weiqingy · 2026-06-02T04:18:53Z

@section9-lab Thanks for addressing all the comments.

Confirming each:

Tool-name resolution — buildToolCallIdToNameMap + resolveToolFunctionName recover the name from externalId, with explicit name as a precedence override and a clear throw only when it's genuinely unresolvable. testRuntimeShapeMultiTurn exercises the full ASSISTANT→TOOL round-trip against the runtime shape — that's the contract locked in.
Test contract — dropping testConvertToolMessageWithoutName and keeping testConvertToolMessageThrowsWhenUnresolvable for the genuinely-missing case is exactly right; the suite now fails if the runtime path regresses.
additional_kwargs — top_k/top_p/stop_sequences forwarded, and warning on dropped unknown keys is a nice touch beyond what I'd asked for. Good call flagging the Float topK — surprising, but it matches the upstream protocol.
IAE contract — validating model before the try plus the explicit catch (IllegalArgumentException) { throw; } makes both entry points symmetric; nice that checkFinishReason()'s IAE surfaces unwrapped too.
Vertex — smoke test + javadoc note is the right scope. Accepting both ADC outcomes while asserting the flag isn't dropped is a reasonable bound for a path that can't run e2e in CI.
dist/pom.xml — gemini is now bundled alongside the siblings, and CI staying green says google-genai's heavier transitive set (protobuf/guava/auth/httpclient) shaded cleanly without duplicate-class fallout. That was my main worry, so 👍.

LGTM from my side.

section9-lab · 2026-06-02T07:20:10Z

Thanks for the thorough review @weiqingy! Could you submit a formal approval via the GitHub review button when you get a chance? I don't have merge access on the repo, so I'd need a committer to approve + merge.

github-actions Bot added doc-label-missing The Bot applies this label either because none or multiple labels were provided. fixVersion/0.3.0 The feature or bug should be implemented/fixed in the 0.3.0 version. priority/major Default priority of the PR or issue. labels May 31, 2026

[hotfix] Fix import order to match spotless check (static imports las…

5a7ad4d

…t, no blank lines between groups)

weiqingy reviewed May 31, 2026

View reviewed changes

section9-lab added 4 commits June 1, 2026 15:59

[ci] Re-trigger CI to retry transient cross-language failure (Milvus …

6d01fc9

…container startup)

Conversation

section9-lab commented May 31, 2026

Key design decisions

Scope

Files changed

Testing

Uh oh!

weiqingy left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

section9-lab commented Jun 1, 2026

1. dist/pom.xml registration

2. Shade impact (the relocation/duplicate-class question)

Other changes since the previous push

E2E

Uh oh!

weiqingy commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

section9-lab commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

weiqingy left a comment •

edited

Loading

1. `dist/pom.xml` registration

weiqingy commented Jun 2, 2026 •

edited

Loading