Adds retry support to the Amazon.Lambda.DurableExecution#2363
Draft
GarrettBeatty wants to merge 1 commit into
Draft
Adds retry support to the Amazon.Lambda.DurableExecution#2363GarrettBeatty wants to merge 1 commit into
Amazon.Lambda.DurableExecution#2363GarrettBeatty wants to merge 1 commit into
Conversation
GarrettBeatty
added a commit
that referenced
this pull request
May 12, 2026
711bf82 to
4f05fa9
Compare
This was referenced May 12, 2026
GarrettBeatty
added a commit
that referenced
this pull request
May 12, 2026
4f05fa9 to
54d18f9
Compare
GarrettBeatty
added a commit
that referenced
this pull request
May 12, 2026
54d18f9 to
599445f
Compare
GarrettBeatty
added a commit
that referenced
this pull request
May 12, 2026
599445f to
e7a85e4
Compare
GarrettBeatty
added a commit
that referenced
this pull request
May 12, 2026
e7a85e4 to
8f23ebb
Compare
GarrettBeatty
added a commit
that referenced
this pull request
May 13, 2026
8f23ebb to
e39e68e
Compare
GarrettBeatty
added a commit
that referenced
this pull request
May 13, 2026
e39e68e to
52055d3
Compare
GarrettBeatty
added a commit
that referenced
this pull request
May 13, 2026
b431212 to
095c948
Compare
GarrettBeatty
added a commit
that referenced
this pull request
May 13, 2026
095c948 to
81b9144
Compare
GarrettBeatty
added a commit
that referenced
this pull request
May 13, 2026
81b9144 to
531cbbe
Compare
GarrettBeatty
added a commit
that referenced
this pull request
May 13, 2026
531cbbe to
31ea7e8
Compare
31ea7e8 to
ef44439
Compare
GarrettBeatty
added a commit
that referenced
this pull request
May 13, 2026
ef44439 to
6bc97f2
Compare
GarrettBeatty
added a commit
that referenced
this pull request
May 13, 2026
GarrettBeatty
added a commit
that referenced
this pull request
May 13, 2026
6bc97f2 to
85eae3e
Compare
85eae3e to
0a32c0d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked PRs:
Amazon.Lambda.DurableExecution#2363#2216
What
Adds retry support to the
Amazon.Lambda.DurableExecutionSDK on top of the foundation in #2360. After this PR a step that throws can be retried with configurable backoff and jitter; durable executions resume after the retry timer elapses without billing Lambda compute during the wait.Public API introduced:
IRetryStrategyRetryDecisionIRetryStrategy.ShouldRetry—ShouldRetryflag plusDelay.RetryStrategyDefault,Transient,None,Exponential(...),FromDelegate(...).JitterStrategyNone/Half/Fullfor exponential backoff.StepSemanticsAtLeastOncePerRetry(default) /AtMostOncePerRetry.StepConfig.RetryStrategy,StepConfig.SemanticsWhy
Real workflows fail. A step that calls a flaky downstream service or hits a transient throttle needs to retry without restarting the whole workflow. Durable execution makes service-mediated retries possible: the SDK checkpoints a
RETRYoperation with aNextAttemptDelaySeconds, suspends the Lambda, and the service re-invokes us when the timer fires. The user's compute isn't billed during the wait.AtMostOncePerRetrysemantics handle non-idempotent steps (e.g. charging a card): aSTARTcheckpoint is durably persisted before user code runs, so a Lambda crash mid-execution can be detected on replay and routed through the retry strategy rather than re-executing.How
Retry control flow. When a step throws,
StepOperation.HandleStepFailureAsyncconsults the configuredIRetryStrategy.ShouldRetry(ex, attemptNumber). If the decision says retry, the SDK enqueues aRETRYcheckpoint carryingNextAttemptDelaySeconds, then suspends viaTerminationManager.SuspendAndAwaitsoRunAsyncreturnsPendingto the service. On the next invocation,StepOperation.ReplayAsyncseesStatus == PENDINGand either re-suspends (timer not yet elapsed) or re-executes (timer fired) with the carried-forward attempt counter.At-most-once semantics. For non-idempotent steps,
Semantics = AtMostOncePerRetrywrites aSTARTcheckpoint and blocks until the batcher flushes it before user code runs. If Lambda crashes between user code and theSUCCEEDflush, replay seesSTARTEDwith no terminal record and routes throughHandleStepFailureAsyncas a failed attempt instead of re-executing — the side effect runs at most once per attempt.Retry strategy contract.
IRetryStrategy.ShouldRetry(Exception, int attemptNumber)returns aRetryDecision.ExponentialRetryStrategysupports configurable max attempts, initial/max delay, backoff rate, jitter (None/Half/Full), and exception filtering by type or message regex. Built-in factories:RetryStrategy.Default(6 attempts, 5s/60s, 2× backoff, full jitter),Transient(3 attempts, 1s/5s, half jitter),None.RetryStrategy.FromDelegate(...)for arbitrary policies.Key files:
Config/IRetryStrategy.cs— strategy interface +RetryDecisionvalue typeConfig/RetryStrategy.cs— built-in strategies,ExponentialRetryStrategy,JitterStrategy,StepSemantics,DelegateRetryStrategyConfig/StepConfig.cs— addsRetryStrategyandSemanticspropertiesInternal/StepOperation.cs— addsPENDING(retry timer) andSTARTED(AtMostOnce crash recovery) replay arms;HandleStepFailureAsyncdecision treeInternal/TerminationManager.cs— addsRetryScheduledreasonTesting
21 new unit tests in
Amazon.Lambda.DurableExecution.Tests(130 total, up from 109 in #2360):RetryStrategyTests(14 tests) — exponential backoff math, jitter strategies, max-attempt exhaustion, exception-type and message-pattern filtering, delegate strategiesDurableContextTestsretry block (6 tests) —FailsWithRetryStrategy_CheckpointsRetryAndSuspends,FailsNoRetryStrategy_CheckpointsFail,RetryExhausted_CheckpointsFail,PendingWithFutureTimestamp_Suspends,PendingWithPastTimestamp_ReExecutes,AtMostOnce_FlushesStartBeforeExecution,AtMostOnce_StartedReplay_TriggersRetryHandlerIntegration tests (
Amazon.Lambda.DurableExecution.IntegrationTests) —RetrySucceedsandRetryExhaustsend-to-end against the real durable-execution service.Out of scope (follow-up PRs)
MapAsync/ParallelAsync/RunInChildContextAsync/WaitForConditionAsyncCallbackAsync,InvokeAsyncDefaultJsonCheckpointSerializerDurableLoggerreplay-suppression (currentlyNullLogger)[DurableExecution]attributeDurableTestRunner/Amazon.Lambda.DurableExecution.Testingpackagedotnet new lambda.DurableFunctionblueprint