Feat/mcp resilience4j#1033
Open
Pratyay wants to merge 1 commit into
Open
Conversation
041b43e to
256b873
Compare
Adds a new optional mcp-resilience4j module that wraps any McpClientTransport with configurable Resilience4j policies, making MCP tool calls resilient to transient failures, slow servers, and traffic spikes. ResilientMcpClientTransport implements McpClientTransport and applies up to five policies in the standard recommended order: Retry -> CircuitBreaker -> RateLimiter -> TimeLimiter -> Bulkhead All policies are optional. sendMessage() applies all five. connect() applies only CircuitBreaker and Retry — session establishment is not throttled or timed out. McpResilienceConfig provides a high-level fluent facade over the builder for the common configuration case. Includes fail-fast null validation on all setters and a WARN log when a registry name collision would silently discard a supplied config. Circuit breaker state transitions and retry events are logged automatically via Resilience4j event publishers at construction time. Also includes 13 unit tests and a README covering usage, policy ordering rationale, registry guidance, and observability. Bumps project version to 2.1.1-SNAPSHOT.
20bbd89 to
0e943f2
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Title: Add mcp-resilience4j module with transport-level resilience
Adds a new
mcp-resilience4jmodule that wraps anyMcpClientTransportwithconfigurable Resilience4j policies, making MCP tool calls resilient to transient
failures, slow servers, and traffic spikes.
Motivation and Context
MCP tool calls cross a network. Without resilience, a slow or flaky MCP server
can cause cascading failures in AI agent pipelines blocking threads indefinitely,
repeatedly hammering a server that cannot recover, or overwhelming a rate-limited
endpoint during a burst of parallel tool invocations.
McpClientTransportis the natural integration point: it is the single boundaryall MCP clients share, it is the interface frameworks like Google ADK expose for
custom transport injection, and wrapping it leaves the rest of the MCP client
stack entirely unchanged.
How Has This Been Tested
13 unit tests covering:
call receives CallNotPermittedException
All 13 tests pass locally (
mvn test -pl mcp-resilience4j).Breaking Changes
None. This is a new optional module. Existing code and dependencies are unchanged.
Types of Changes
Checklist
Additional Context
Policy ordering — Retry → CircuitBreaker → RateLimiter → TimeLimiter → Bulkhead
follows the standard Resilience4j recommended hierarchy. Retry is outermost so it
orchestrates the full inner chain per attempt. Bulkhead is innermost so concurrency
slots are released during Retry's backoff sleep rather than held, preventing slot
exhaustion from blocking healthy concurrent callers. RateLimiter is inside Retry so
each retry attempt consumes a token, keeping the local rate count aligned with actual
server-side request volume.
sendMessage()applies all five policies.connect()applies only CircuitBreakerand Retry, session establishment is not throttled or timed out.
Why not a client-level wrapper? An earlier design explored wrapping McpAsyncClient
directly. This was removed because McpAsyncClient has a package-private constructor
(not subclassable), and frameworks like Google ADK create McpSyncClient internally
with no injection point for a custom async client. The transport is the only hook
these frameworks expose.
ThreadPoolBulkhead is intentionally excluded. The semaphore Bulkhead is correct
for reactive code, injecting a ThreadPoolBulkheadOperator would force a thread-pool
handoff inside the reactive chain, competing with Reactor's own schedulers.
Registry name collisions Resilience4j registries silently return a cached
instance when a name already exists, ignoring any supplied config. The builder logs
a WARN when this is detected. Callers sharing a registry across multiple transports
must use unique names per transport.