Skip to content

fix: Add retry logic for RedisStorageClient and SqlStorageClient#1838

Open
Mantisus wants to merge 8 commits intoapify:masterfrom
Mantisus:storages-retry
Open

fix: Add retry logic for RedisStorageClient and SqlStorageClient#1838
Mantisus wants to merge 8 commits intoapify:masterfrom
Mantisus:storages-retry

Conversation

@Mantisus
Copy link
Copy Markdown
Collaborator

@Mantisus Mantisus commented Apr 7, 2026

Description

  • Adds retry_on_error decorator with exponential backoff
  • Applies retry to all operations in SqlStorageClient and RedisStorageClient to handle transient failures

Issues

@Mantisus Mantisus self-assigned this Apr 7, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a unified StorageWriteError and adds retry behavior for write operations (KeyValueStore.set_value and Dataset.push_data) to reduce the risk of data loss on transient storage failures (notably SQL/Redis).

Changes:

  • Add StorageWriteError and update SQL/Redis/FS storage clients to raise it on write failures instead of swallowing errors.
  • Add retry parameters (max_attempts, wait_time_between_retries) to KeyValueStore.set_value and Dataset.push_data.
  • Add unit tests asserting that client-level write failures surface as StorageWriteError.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/crawlee/errors.py Introduces StorageWriteError for consistent write-failure signaling.
src/crawlee/storages/_key_value_store.py Adds retry loop to KeyValueStore.set_value (currently logs instead of raising after retries).
src/crawlee/storages/_dataset.py Adds retry loop to Dataset.push_data (currently logs instead of raising after retries).
src/crawlee/storage_clients/_sql/_key_value_store_client.py Converts SQL write failures into StorageWriteError and avoids with_simple_commit swallow behavior.
src/crawlee/storage_clients/_sql/_dataset_client.py Converts SQL write failures into StorageWriteError and avoids with_simple_commit swallow behavior.
src/crawlee/storage_clients/_redis/_key_value_store_client.py Wraps Redis write failures as StorageWriteError.
src/crawlee/storage_clients/_redis/_dataset_client.py Wraps Redis write failures as StorageWriteError.
src/crawlee/storage_clients/_file_system/_key_value_store_client.py Wraps filesystem write failures as StorageWriteError.
src/crawlee/storage_clients/_file_system/_dataset_client.py Wraps filesystem write failures as StorageWriteError.
src/crawlee/storage_clients/_sql/_request_queue_client.py Adds an extra requests_db emptiness guard (currently redundant).
tests/unit/storage_clients/_sql/test_sql_kvs_client.py Adds test ensuring SQL KVS write errors raise StorageWriteError.
tests/unit/storage_clients/_sql/test_sql_dataset_client.py Adds test ensuring SQL Dataset write errors raise StorageWriteError.
tests/unit/storage_clients/_redis/test_redis_kvs_client.py Adds test ensuring Redis KVS write errors raise StorageWriteError.
tests/unit/storage_clients/_redis/test_redis_dataset_client.py Adds test ensuring Redis Dataset write errors raise StorageWriteError.
tests/unit/storage_clients/_file_system/test_fs_kvs_client.py Adds test ensuring FS KVS write errors raise StorageWriteError.
tests/unit/storage_clients/_file_system/test_fs_dataset_client.py Adds test ensuring FS Dataset write errors raise StorageWriteError.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/crawlee/storages/_key_value_store.py Outdated
Comment thread src/crawlee/storages/_dataset.py Outdated
Comment thread src/crawlee/storage_clients/_sql/_request_queue_client.py
@Mantisus Mantisus marked this pull request as ready for review April 8, 2026 20:12
@Mantisus Mantisus requested review from Pijukatel and vdusek April 8, 2026 20:12
@vdusek vdusek changed the title feat: Add retry logic for KeyValueStore.set_value and Dataset.push_data fix: Add retry logic for KeyValueStore.set_value and Dataset.push_data Apr 9, 2026
Copy link
Copy Markdown
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be thought through carefully as it extends the public interface of the public components (storages). We want to keep the public interface consistent in JS & Py implementations as much as possible.

Copy link
Copy Markdown
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just had a brief discussion about this with @janbuchar, and we think this should be handled at the storage client level rather than leaking into the storages (frontend). One reason is that the Apify API client used by ApifyStorageClient already has its own retry mechanism. Adding another layer here would duplicate that behavior. Therefore, each storage client should manage retries on its own.

@Mantisus Mantisus changed the title fix: Add retry logic for KeyValueStore.set_value and Dataset.push_data fix: Add retry logic for RedisStorageClient and SqlStorageClient Apr 15, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/crawlee/_utils/retry.py
Comment thread src/crawlee/_utils/retry.py Outdated
Comment thread src/crawlee/storage_clients/_sql/_request_queue_client.py
Comment thread src/crawlee/storage_clients/_redis/_request_queue_client.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/crawlee/_utils/retry.py Outdated
Comment thread src/crawlee/storage_clients/_sql/_dataset_client.py
Comment thread src/crawlee/storage_clients/_redis/_dataset_client.py
Mantisus and others added 2 commits April 15, 2026 17:46
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SqlStorageClient silently swallows write errors, causing data loss

4 participants