Skip to content

SHARED-2644: Downgrade node auth context-cancel errors#2086

Open
dvashchuk wants to merge 4 commits into
mainfrom
fix/node-auth-context-cancel-log
Open

SHARED-2644: Downgrade node auth context-cancel errors#2086
dvashchuk wants to merge 4 commits into
mainfrom
fix/node-auth-context-cancel-log

Conversation

@dvashchuk
Copy link
Copy Markdown

@dvashchuk dvashchuk commented May 23, 2026

Why is this change necessary?

AuthenticateJWT logged ERROR for Node validation failed when IsNodePubKeyTrusted returned context cancellation/deadline. These are expected during shutdown/timeout paths and should not page as high-severity failures.

What is changing?

  • in pkg/nodeauth/jwt/node_jwt_authenticator.go, log WARN (not ERROR) when provider error is context cancellation/deadline
  • keep ERROR for real provider failures
  • add tests asserting ERROR for non-context provider errors and WARN for context-cancelled provider errors

How was this tested?

  • go test -timeout 30s -run TestNodeJWTAuthenticator ./pkg/nodeauth/jwt/...

Copilot AI review requested due to automatic review settings May 23, 2026 08:52
@dvashchuk dvashchuk requested a review from a team as a code owner May 23, 2026 08:52
@github-actions
Copy link
Copy Markdown

👋 dvashchuk, thanks for creating this pull request!

To help reviewers, please consider creating future PRs as drafts first. This allows you to self-review and make any final changes before notifying the team.

Once you're ready, you can mark it as "Ready for review" to request feedback. Thanks!

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 23, 2026

📊 API Diff Results

No changes detected for module github.com/smartcontractkit/chainlink-common

View full report

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adjusts NodeJWTAuthenticator.AuthenticateJWT logging to avoid emitting ERROR logs for expected shutdown / request-cancellation scenarios when IsNodePubKeyTrusted returns context.Canceled or context.DeadlineExceeded, while preserving ERROR logs for real provider failures.

Changes:

  • Downgrade log level from ERROR to WARN when IsNodePubKeyTrusted fails due to context cancellation/deadline.
  • Add tests asserting log level behavior for provider errors vs context cancellation.
  • Introduce a minimal slog.Handler to capture log records in tests.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
pkg/nodeauth/jwt/node_jwt_authenticator.go Downgrades logging to WARN for context-canceled/deadline errors from the provider.
pkg/nodeauth/jwt/node_jwt_authenticator_test.go Adds log-capture handler and new tests verifying log levels for provider error vs context cancellation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"error", err,
)
if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
v.logger.Warn("Node validation skipped: context cancelled",
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed: message now reads "Node validation skipped: context canceled or deadline exceeded" (covers both).

Comment on lines +81 to +85
if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
v.logger.Warn("Node validation skipped: context cancelled",
"csaPubKey", hex.EncodeToString(publicKey),
"error", err,
)
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed: added TestNodeJWTAuthenticator_AuthenticateJWT_ProviderDeadlineExceededError (line 545) covering context.DeadlineExceeded path.

Comment on lines +469 to +470
func (h *captureHandler) WithAttrs([]slog.Attr) slog.Handler { return h }
func (h *captureHandler) WithGroup(string) slog.Handler { return h }
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed: WithAttrs and WithGroup now return a new captureHandler with merged attrs/groups (not the same handler). See lines in node_jwt_authenticator_test.go.

@dvashchuk dvashchuk changed the title fix(nodeauth): downgrade ERROR to WARN for context-cancelled IsNodePubKeyTrusted SHARED-2644: Downgrade node auth context-cancel errors May 23, 2026
@dvashchuk
Copy link
Copy Markdown
Author

Key Decisions (SHARED-2644)

  1. Keep common helper in cre-internal-common (IsContextError) and inline checks in billing until dependency bump lands
  2. De-noise by severity downgrade (WARN) for expected context/transient transport failures — not blanket suppression
  3. Remove duplicate repo-layer streams cache log; keep single higher-level log at service layer
  4. Centralize aggregation behavior with logErrorWithContextCancellationGuard(...) to avoid scattered conditionals
  5. Extend into common lib (cre-internal-common/service-utils/common_errors) as per parent epic SHARED-2519

Related PRs:

@dvashchuk dvashchuk force-pushed the fix/node-auth-context-cancel-log branch from f95c05b to f35e344 Compare May 23, 2026 11:20
@dvashchuk dvashchuk closed this May 23, 2026
@dvashchuk dvashchuk reopened this May 23, 2026
@dvashchuk dvashchuk force-pushed the fix/node-auth-context-cancel-log branch 2 times, most recently from 11eadf3 to 7b7656a Compare May 23, 2026 21:06
@dvashchuk dvashchuk requested a review from a team as a code owner May 23, 2026 21:06
@dvashchuk dvashchuk force-pushed the fix/node-auth-context-cancel-log branch 3 times, most recently from 9c02c64 to 5a7b58c Compare May 25, 2026 03:08
…thenticator

SHARED-2644: De-noise context cancellation errors in node auth path.

- Log WARN (not ERROR) when IsNodePubKeyTrusted returns context.Canceled or context.DeadlineExceeded
- Keep ERROR for genuine provider failures
- Add contextErr structured field for observability
- Add test cases for both Canceled and DeadlineExceeded paths
- Fix captureHandler.WithAttrs/WithGroup to satisfy slog.Handler contract
@dvashchuk dvashchuk force-pushed the fix/node-auth-context-cancel-log branch from 5a7b58c to c225417 Compare May 25, 2026 03:12
Comment thread pkg/beholder/durable_event_store.go Outdated
@@ -0,0 +1,52 @@
package beholder
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a leftover from stale branch head? Does not seem relevant to this change

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a leftover — intentional, but separable from the log-noise fix.

Why it is here: PR #2081 moved DurableEvent, DurableEventStore, BatchInserter, DurableQueueObserver, DurableQueueStats from pkg/beholder to pkg/durableemitter. The chainlink node (core/services/workflows/) still imports these from pkg/beholder. Without this shim, Build Chainlink picks up additional compile errors on top of the pre-existing proto failures from #2080.

Why not type aliases: would require pkg/beholder to import pkg/durableemitter, which closes a cycle: durableemitter (tests) → servicetest → utils/tests → beholdertest → beholder → durableemitter.

To reproduce without the shim:

git revert HEAD -- pkg/beholder/durable_event_store.go
go get github.com/smartcontractkit/chainlink-common@<this PR SHA>
go mod tidy
make install-chainlink   # in chainlink repo at develop

You will see undefined: beholder.DurableEvent, beholder.DurableEventStore, etc.

Options:

  1. Keep as-is — unblocks both issues in one merge
  2. Remove — Build Chainlink gains more failures on top of Bump chainlink-protos/cre/go for ConfidentialWorkflow proto restructure #2080 proto errors; proper fix tracked in chainlink PR #22562
  3. Split into a separate PR — cleanest separation of concerns

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's raise a separate PR. This is a completely different codeowner / maintainer than the owners of pkg/nodeauth.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — split into #2095 (separate PR, separate codeowner review).

The backward-compat type declarations for DurableEventStore are split
into their own PR (fix/beholder-backward-compat) per reviewer request
to keep this PR focused on the log-noise fix.
@dvashchuk
Copy link
Copy Markdown
Author

Split done — removed durable_event_store.go from this PR. The backward-compat shim is now in its own PR: #2095.

This PR is now focused solely on the nodeauth log-noise fix.

@dvashchuk
Copy link
Copy Markdown
Author

Hey @elatoskinas - all feedback addressed. Need one more approval to merge (2 required). SonarQube coverage metric is re-running - tests cover all new code paths (verified locally).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants