Skip to content

PLEX-3032: avoid context deadline when action returns all different p…#22647

Merged
ilija42 merged 3 commits into
developfrom
PLEX-3032
May 27, 2026
Merged

PLEX-3032: avoid context deadline when action returns all different p…#22647
ilija42 merged 3 commits into
developfrom
PLEX-3032

Conversation

@fernandezlautaro
Copy link
Copy Markdown
Contributor

@fernandezlautaro fernandezlautaro commented May 26, 2026

…ayloads and no quorum is reached

https://smartcontract-it.atlassian.net/browse/PLEX-3032

We noticed that the WR on andesite was not working as expected returning the following to the user

context done before remote client received a quorum of responses
context deadline exceeded

Which was wrong, at that time the engine got all responses but due to a bug on the action/WR all payloads were different, making impossible to aggregate.
This PR fixes this scenario making an early return even with payloads still incoming if quorum is unreachable, providing a better error to the user.

Requires

Supports

@github-actions
Copy link
Copy Markdown
Contributor

I see you updated files related to core. Please run make gocs in the root directory to add a changeset as well as in the text include at least one of the following tags:

  • #added For any new functionality added.
  • #breaking_change For any functionality that requires manual action for the node to boot.
  • #bugfix For bug fixes.
  • #changed For any change to the existing functionality.
  • #db_update For any feature that introduces updates to database schema.
  • #deprecation_notice For any upcoming deprecation functionality.
  • #internal For changesets that need to be excluded from the final changelog.
  • #nops For any feature that is NOP facing and needs to be in the official Release Notes for the release.
  • #removed For any functionality/config that is removed.
  • #updated For any functionality that is updated.
  • #wip For any change that is not ready yet and external communication about it should be held off till it is feature complete.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 26, 2026

✅ No conflicts with other open PRs targeting develop

@fernandezlautaro fernandezlautaro marked this pull request as ready for review May 26, 2026 16:13
@fernandezlautaro fernandezlautaro requested a review from a team as a code owner May 26, 2026 16:13
Copilot AI review requested due to automatic review settings May 26, 2026 16:13
@fernandezlautaro fernandezlautaro requested a review from a team as a code owner May 26, 2026 16:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Risk Rating: MEDIUM — Changes alter quorum/termination behavior in the remote executable request aggregation path and user-visible error reporting.

This PR improves the remote executable client’s behavior when peer responses arrive but cannot reach an F+1 matching quorum (e.g., all payloads differ), returning an early, more accurate ConsensusFailed capability error instead of timing out with a misleading context deadline error.

Changes:

  • Add early-detection logic for when an OK-payload quorum is mathematically unreachable and return a public ConsensusFailed error immediately.
  • Ensure locally-produced caperrors.Error values are wrapped so callers can still errors.As into caperrors.Error.
  • Update/add tests to validate early “quorum unreachable” behavior and unwrap semantics.

Targeted areas for scrupulous human review:

  • ClientRequest.OnMessage quorum progression/termination logic (interaction between OK responses, error responses, and pending peers).
  • Correctness of quorumStillPossible across all response mixes (OK payloads + errors) and ensuring early-fail triggers in all intended cases.
  • User-facing error formatting and code (ConsensusFailed) consistency with downstream handling/metrics.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
core/capabilities/remote/executable/request/client_request.go Adds quorum-unreachable detection and returns a public ConsensusFailed capability error early; adds wrapper for locally-produced cap errors.
core/capabilities/remote/executable/request/client_request_test.go Updates message-validation test to expect early quorum-unreachable error instead of no response.
core/capabilities/remote/executable/request/client_request_internal_test.go Adds unit tests for quorum feasibility logic and early error send behavior (including unwrap chain expectations).
core/capabilities/remote/executable/client_test.go Updates client-level test expectations from timeout to early ConsensusFailed quorum-unreachable error.
Comments suppressed due to low confidence (1)

core/capabilities/remote/executable/request/client_request.go:402

  • trySendQuorumUnreachableError() is only invoked on the OK-response path. If the quorum becomes mathematically unreachable after processing an error response (e.g., a mix of unique OK payloads plus some error responses, where the last few responses are errors), the request can still sit until expiry even though maxMatchingResponseCount()+pending < requiredResponseConfirmations is already true. Consider calling trySendQuorumUnreachableError() (or an equivalent check) after handling non-OK messages as well, once responseReceived[sender] has been updated, so the early-fail behavior is not dependent on the last message being OK.
			if err != nil {
				return fmt.Errorf("failed to encode payload with metadata: %w", err)
			}

			c.sendResponse(clientResponse{Result: payload})
		} else {
			c.trySendQuorumUnreachableError()
		}
	} else {
		c.lggr.Debugw("received error from peer", "error", msg.Error, "errorMsg", msg.ErrorMsg, "peer", sender)
		if commoncap.ErrResponsePayloadNotAvailable.Is(errors.New(msg.ErrorMsg)) {

Comment thread core/capabilities/remote/executable/request/client_request.go
Comment thread core/capabilities/remote/executable/client_test.go
@trunk-io
Copy link
Copy Markdown

trunk-io Bot commented May 26, 2026

Static BadgeStatic BadgeStatic BadgeStatic Badge

View Full Report ↗︎Docs

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment thread core/capabilities/remote/executable/request/client_request.go
Comment thread core/capabilities/remote/executable/request/client_request.go
@cl-sonarqube-production
Copy link
Copy Markdown

@ilija42 ilija42 added this pull request to the merge queue May 27, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 27, 2026
@ilija42 ilija42 added this pull request to the merge queue May 27, 2026
Merged via the queue into develop with commit 7dc2744 May 27, 2026
215 checks passed
@ilija42 ilija42 deleted the PLEX-3032 branch May 27, 2026 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants