Skip to content

fix(libev): guard handle_read/handle_write against close() race condition#889

Open
vponomaryov wants to merge 1 commit into
scylladb:masterfrom
vponomaryov:fix-race-issue-614
Open

fix(libev): guard handle_read/handle_write against close() race condition#889
vponomaryov wants to merge 1 commit into
scylladb:masterfrom
vponomaryov:fix-race-issue-614

Conversation

@vponomaryov
Copy link
Copy Markdown

@vponomaryov vponomaryov commented May 19, 2026

When close() is called before the connection handshake completes,
the original code signals waiters via connected_event.set()
but doesn't populate last_error, allowing factory()
to potentially return a dead connection.

Additionally, peer disconnects (ECONNRESET, ENOTCONN, etc.)
during I/O were routed through defunct(),
which is heavier than necessary for a clean shutdown scenario.

So, fix it applying following changes:

  • Preserve last_error in close() when connected_event is unset, so factory() can detect dead connections.
  • Peer disconnect detection (ECONNRESET, ENOTCONN, etc.) that calls close() cleanly instead of defunct().
  • Defensive early-return guards in handle_read()/handle_write() to skip unnecessary work when connection is already closed/defunct.

Fixes: #614

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

Summary by CodeRabbit

  • Bug Fixes

    • Connection shutdown now records the shutdown error and reliably propagates it to all in‑flight requests.
    • Peer-disconnect socket errors are treated as clean closes instead of defunct states.
    • Read/write callbacks no longer act on sockets already closed or defunct, resolving close/watch race conditions and ensuring EOF/peer-close set the shutdown error before closing.
  • Tests

    • Added unit tests covering close-race scenarios, EOF/peer-disconnect behavior, and peer-disconnect errno coverage.

Review Change Stack

@vponomaryov
Copy link
Copy Markdown
Author

@vponomaryov
Copy link
Copy Markdown
Author

vponomaryov commented May 22, 2026

Verified the test fix here:

Test run without this fix (for comparison) is here:

Comment from the "zeus" comparing the fixed test run and issue affected:

@zeus posted:

Conclusion: The target run did NOT hit the SCT-83 bug. The fix works.

Comparison

Aspect Reference run (SCT-83 bug) 8a5b983a Target run (fix branch) 0d335d55
Status FAILED PASSED
Branch origin/master origin/fix-pydrv-race-issue-614
Errors 6 ERROR events (3 SoftTimeoutEvent + 3 FailedResultEvent) 0 errors
Test longevity_test.LongevityTest.test_custom_time Same
Scylla version 2026.3.0~dev Same
Backend/Region OCI / us-ashburn-1 Same
Nodes 6 6

Healthcheck duration comparison

Reference run (with bug) — progressive degradation pattern:

  • Healthchecks №1-7: 9-10s (normal)
  • Healthcheck №8: 46s (degradation begins)
  • Healthcheck №9: 1m38s
  • Healthcheck №10: 3m20s (198s, exceeds 180s soft timeout → ERROR)
  • Healthcheck №11: 3m50s (228s, exceeds timeout → ERROR)
  • Healthcheck №12: 2m8s
  • Healthcheck №13: 1m38s
  • Healthcheck №14: 1m7s
  • Healthcheck №15: 2m33s
  • Healthcheck №16: 46s
  • Healthcheck №17: 22m33s (1351s, massively exceeds timeout → ERROR)

Target run (with fix) — consistently healthy:

  • All 20 healthchecks: 9-11s (every single one normal, no degradation)

Analysis

The SCT-83 / python-driver#614 bug manifests as a progressive increase in healthcheck duration over the course of the test, caused by a race condition in the python driver. In the reference run, healthcheck times escalated from ~10s to minutes, eventually hitting 22m33s — well beyond the 180s soft timeout.

The target run, running on branch fix-pydrv-race-issue-614, shows zero degradation — all 20 healthchecks completed in 9-11 seconds throughout the entire 4+ hour test. No SoftTimeoutEvents, no FailedResultEvents, no errors at all.

The fix completely eliminates the healthcheck timeout escalation pattern characteristic of SCT-83.

@fruch , @soyacz ^
This has proven to fix the https://scylladb.atlassian.net/browse/SCT-83 bug which is identical to the GH issue - #614

@vponomaryov
Copy link
Copy Markdown
Author

vponomaryov commented May 22, 2026

Note that the libenv tests have passed successfully - related to the PR.
The failed integration tests are the asyncio-based - not related to this PR.

@scylladb/python-driver-maint please merge and release it.

Comment on lines +367 to +371
elif self.is_closed or self.is_defunct:
# Socket was closed by another thread between the
# watcher firing and us calling send(). This is the
# race described in scylladb/python-driver#614.
return
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why this is necessary. Without this, we will go into else branch, call self.defunct(err) which will do nothing, and then return.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, defunct() does have its own if self.is_defunct or self.is_closed: return guard, so no crash or incorrect state transition would occur.

But, the guard is still valuable because of the following reasons:

  • Without the guard we will get missleading log message if the error is EBADF (which is in _PEER_DISCONNECT_ERRNOS), the code falls into the _PEER_DISCONNECT_ERRNOS branch first (it comes before else), logging "Connection %s closed by peer during write".
    But the socket wasn't closed by the peer - we closed it via close().

  • The _PEER_DISCONNECT_ERRNOS branch calls self.close(), which is a no-op when already closed but still acquires self.lock - a minor unnecessary overhead.

  • The guard makes the race condition handling self-documenting.
    Without it, a reader must trace through multiple branches to confirm the behavior is safe.

So, I find it rather useful than redundant.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for the explanation, I did not pay much attention to this path. Now I see a different potential issue, mentioned by Avi ( #614 (comment) )

If we are receiving EBADF, then we performed some action (in this case READ / WRITE) on a closed file descriptor. As far as I know, file descriptors can be reused. So, if another connection was opened between this one being closed and watchers exiting, it makes it possible for us to accidentaly read / write to a totally unrelated connection (or even some file opened by a user app). Am I wrong?

If this is true, then this patch hides the immediate symptom of the problem, but also hides a much more dangerous problem that could cause data corruption.

I don't know the libev connection impl that well, but maybe the proper approach is to, in close, first stop the watchers somehow and make sure that libev won't touch the fd anymore, and only then close the fd?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Situation where we accidentally read/write to a different connection is, probably, architecturally valid for C/C++ programs using raw fds, but doesn't apply in case of Python.
The Python socket abstraction prevents it.

Python's socket.socket object tracks state internally - after .close(), any operation on that object raises EBADF regardless of whether the OS reused the fd number for something else.
The self._socket reference always points to the old (closed) Python object, never to the new socket that happened to get the same fd.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libev get a raw fd, not PYthon socket, but the watchers themselves use the Python socket so I guess this issue really can't happen.
The watchers also seem to be correctly removed from the loop in close (connection_destroyed), so they won't keep triggering forever. I think your change should be fine.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkropachev
Take over the PR please.
I don't see a point in me guessing the expectations.

Update it as you wish and just make sure it fixes the issue.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkropachev you have other fix to this issue ? or we'll need to revert to the last known working version, with this bug, across the broad ?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the version that worked well?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3.29.5 is the first version we started seeing it, and it was also when we moved to python 3.14

if you have better fix for this problem, go for it.

but @vponomaryov have proven that this PR is fixing this problem.

this problem is generating lots of noise, and the slowdowns we have by retrying on it from the client end, is wasting us lots of money (in some cases it's hours and hours of retries)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not fixing it, socket gets broken by unknown reason and it is only hides it.

Comment on lines +421 to +425
elif self.is_closed or self.is_defunct:
# Socket was closed by another thread between the watcher
# firing and us calling recv(). This is the race described
# in scylladb/python-driver#614.
return
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this, if the error is in _PEER_DISCONNECT_ERRNOS (EBADF), the code would execute following:

self.last_error = ConnectionShutdown(
    "Connection to %s was closed by peer" % self.endpoint)
self.close()

It will overwrite the last_error with an incorrect closed by peer message when in reality close() was already called by us.
Since last_error is used by factory() to detect why a connection died, this overwrite could mask the real error.

Comment on lines +234 to +253
class LibevPeerDisconnectErrnos(unittest.TestCase):
"""Verify _PEER_DISCONNECT_ERRNOS contains expected values."""

def setUp(self):
_skip_if_unavailable(self)

def test_contains_ebadf(self):
self.assertIn(errno.EBADF, _PEER_DISCONNECT_ERRNOS)

def test_contains_econnreset(self):
self.assertIn(errno.ECONNRESET, _PEER_DISCONNECT_ERRNOS)

def test_contains_econnaborted(self):
self.assertIn(errno.ECONNABORTED, _PEER_DISCONNECT_ERRNOS)

def test_contains_enotconn(self):
self.assertIn(errno.ENOTCONN, _PEER_DISCONNECT_ERRNOS)

def test_contains_eshutdown(self):
self.assertIn(errno.ESHUTDOWN, _PEER_DISCONNECT_ERRNOS)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no point in having such tests. It just cements the current code, not testing the behavior.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind removing it

@vponomaryov vponomaryov requested a review from Lorak-mmk May 22, 2026 11:00
@vponomaryov
Copy link
Copy Markdown
Author

The effect of the python driver issue was slow health checks in SCT.
Example from a test run with the issue:
sct-run-without-fix

And here how much time it takes with the fix:
sct-run-with-fix

So, numbers tell for itself.

@Lorak-mmk
Copy link
Copy Markdown

@sylwiaszunejko could you take a look at asyncio failures? Why did they start to happen, do we have an issue open?

@sylwiaszunejko
Copy link
Copy Markdown
Collaborator

sylwiaszunejko commented May 22, 2026

@sylwiaszunejko could you take a look at asyncio failures? Why did they start to happen, do we have an issue open?

I thought it was due to switch to scylla version 2026.1 and #330, but I see it is supposed to be fixed by #884

@vponomaryov
Copy link
Copy Markdown
Author

vponomaryov commented May 22, 2026

@Lorak-mmk from the PR completeness point of view, should I remove that unit test class?

@sylwiaszunejko could you take a look at asyncio failures? Why did they start to happen, do we have an issue open?

I thought it was due to switch to scylla version 2026.1 and #330, but I see it is supposed to be fixed by #884

So, I need to rebase the PR?
Even if we know that this PR doesn't affect the failed tests?

@Lorak-mmk
Copy link
Copy Markdown

@Lorak-mmk from the PR completeness point of view, should I remove that unit test class?

Tbh I don't care that much.

So, I need to rebase the PR?

If you could then it would be great. I know you didn't touch asyncio, but let's just be sure. Change in one backend affecting another backend would not be the weirdest thing we saw in this driver.

@vponomaryov vponomaryov force-pushed the fix-race-issue-614 branch from 5dbc94b to ad60335 Compare May 22, 2026 13:36
@vponomaryov
Copy link
Copy Markdown
Author

Rebased.

@Lorak-mmk
Copy link
Copy Markdown

@dkropachev Please take a look at this.

Comment on lines +367 to +371
elif self.is_closed or self.is_defunct:
# Socket was closed by another thread between the
# watcher firing and us calling send(). This is the
# race described in scylladb/python-driver#614.
return
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but you still need to route error handling to defunct

@dkropachev
Copy link
Copy Markdown
Collaborator

@sylwiaszunejko could you take a look at asyncio failures? Why did they start to happen, do we have an issue open?

One of the tests was enabled that uses TLS with asyncio, the ultimate cure - #884

@fruch
Copy link
Copy Markdown

fruch commented May 25, 2026

@sylwiaszunejko could you take a look at asyncio failures? Why did they start to happen, do we have an issue open?

One of the tests was enabled that uses TLS with asyncio, the ultimate cure - #884

how asyncio is related to this fix ?

@vponomaryov vponomaryov requested a review from dkropachev May 25, 2026 10:50
@vponomaryov
Copy link
Copy Markdown
Author

@dkropachev see my answer in the respective comment thread for your comment.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Fixes a libevreactor race where watcher callbacks can run after close(), classifies peer-disconnect errno values as clean closes, adds early-return guards in read/write handlers, ensures ConnectionShutdown propagates to in-flight requests, and adds unit tests covering races and errno handling.

Changes

Libev connection close race condition fix

Layer / File(s) Summary
Peer-disconnect errno constants and test header/skip helper
cassandra/io/libevreactor.py, tests/unit/io/test_libevreactor_close_race.py
Imports errno, defines _PEER_DISCONNECT_ERRNOS (EBADF, ECONNRESET, ECONNABORTED, ENOTCONN, ESHUTDOWN, plus EPIPE checked in tests); adds test module header, guarded imports, and _skip_if_unavailable() helper.
Test setup and connection factory
tests/unit/io/test_libevreactor_close_race.py
Test fixture setup/teardown, reactor init, patcher control, connection tracking, and _make_connection() that builds a LibevConnection with a mocked socket.
Connection shutdown and error propagation
cassandra/io/libevreactor.py, tests/unit/io/test_libevreactor_close_race.py
close() now creates a ConnectionShutdown exception, calls error_all_requests() to propagate it, preserves it in last_error if connected_event is unset, then sets connected_event. Test verifies last_error is populated when closing before connected_event.
Write handler: race guards and peer-disconnect handling
cassandra/io/libevreactor.py, tests/unit/io/test_libevreactor_close_race.py
handle_write() returns early if connection became closed/defunct between watcher firing and IO; treats peer-disconnect errnos as clean close (log + close()), not defunct(). Tests verify early-return, EBADF-after-close does not defunct, and ECONNRESET triggers close().
Read handler: race guards, EOF and peer-disconnect handling
cassandra/io/libevreactor.py, tests/unit/io/test_libevreactor_close_race.py
handle_read() returns early for closed/defunct, returns on nonblocking with no buffered data, treats peer-disconnect errnos and EOF as clean close by setting last_error and calling close(), and sets last_error on server-initiated closure. Tests cover these read races and behaviors.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

I nibble at sockets, tidy and spry,
Errnos now named, no panic nearby,
Watchers that race find a polite end,
Shutdowns told clearly to every friend,
A rabbit-approved close — gentle and spry. 🐇✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.71% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately captures the primary change: fixing a race condition in libev's handle_read/handle_write functions against close().
Description check ✅ Passed The PR description addresses the root cause, explains the fix rationale, and includes a Fixes: annotation. All critical checklist items are marked complete.
Linked Issues check ✅ Passed Changes implement all objectives from #614: preserve last_error in close(), route peer disconnects to clean close(), and add defensive guards in handle_read/handle_write to prevent operations on closed file descriptors.
Out of Scope Changes check ✅ Passed All changes are scoped to libev I/O reactor improvements and corresponding unit tests addressing the #614 race condition. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/unit/io/test_libevreactor_close_race.py`:
- Around line 59-82: The tearDown() must reset the module-global libev reactor
state so connections created by _make_connection() don't leak into other tests;
update tearDown() to access the LibevConnection module-global _global_loop and
clear its registries (e.g. set _global_loop._live_conns and
_global_loop._new_conns to empty lists or otherwise remove registered
connections) and then null out the global (e.g. set LibevConnection._global_loop
= None) so initialize_reactor() starts fresh for the next test run.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 9ad1e813-c716-48db-9847-2486199d0ab9

📥 Commits

Reviewing files that changed from the base of the PR and between 44bc95a and ad60335.

📒 Files selected for processing (2)
  • cassandra/io/libevreactor.py
  • tests/unit/io/test_libevreactor_close_race.py

Comment thread tests/unit/io/test_libevreactor_close_race.py
@vponomaryov vponomaryov force-pushed the fix-race-issue-614 branch 2 times, most recently from 9530ab9 to a5bb694 Compare May 26, 2026 10:27
@vponomaryov
Copy link
Copy Markdown
Author

Updated:

  • Commit description
  • [Unit tests] Reset the global libev reactor in tearDown()

Unchanged:

  • Main code changes

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cassandra/io/libevreactor.py`:
- Around line 45-48: The _PEER_DISCONNECT_ERRNOS set is missing errno.EPIPE
which causes BrokenPipeError writes to be treated as failures instead of clean
peer disconnects; update the frozenset _PEER_DISCONNECT_ERRNOS to include
errno.EPIPE so LibevLoop.handle_write() will classify EPIPE as a clean
disconnect (avoiding the path that calls self.defunct(err)), and extend the
existing unit test to assert that errno.EPIPE is a member of
_PEER_DISCONNECT_ERRNOS.

In `@tests/unit/io/test_libevreactor_close_race.py`:
- Around line 120-126: The test currently sets c.is_closed before calling
c.handle_write/c.handle_read so the handler returns early and the socket
send()/recv() side_effects (socket.error EBADF) never execute; change the test
to leave c.is_closed False when invoking c.handle_write/handle_read so the
dispatch path runs, and simulate the post-dispatch race by either making the
socket side_effect set c.is_closed to True (or set c.is_closed immediately after
the side_effect triggers) so the EBADF path in cassandra/io/libevreactor.py is
exercised; update both the write test (references: c._socket.send,
c.handle_write, c.is_closed, defunct) and the read test (references:
c._socket.recv, c.handle_read, c.is_closed, defunct) accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: d6a6d7fd-a6a9-4587-8579-790c8a80af74

📥 Commits

Reviewing files that changed from the base of the PR and between ad60335 and 9530ab9.

📒 Files selected for processing (2)
  • cassandra/io/libevreactor.py
  • tests/unit/io/test_libevreactor_close_race.py

Comment thread cassandra/io/libevreactor.py
Comment thread tests/unit/io/test_libevreactor_close_race.py Outdated
@vponomaryov vponomaryov force-pushed the fix-race-issue-614 branch from a5bb694 to 0275436 Compare May 26, 2026 10:50
…tion

When `close()` is called before the connection handshake completes,
the original code signals waiters via `connected_event.set()`
but doesn't populate `last_error`, allowing `factory()`
to potentially return a dead connection.

Additionally, peer disconnects (ECONNRESET, ENOTCONN, etc.)
during I/O were routed through defunct(),
which is heavier than necessary for a clean shutdown scenario.

So, fix it applying following changes:
- Preserve `last_error` in `close()` when `connected_event` is unset,
  so `factory()` can detect dead connections.
- Peer disconnect detection (ECONNRESET, ENOTCONN, etc.)
  that calls `close()` cleanly instead of `defunct()`.
- Defensive early-return guards in `handle_read()/handle_write()`
  to skip unnecessary work when connection is already closed/defunct.

Fixes: scylladb#614
@vponomaryov vponomaryov force-pushed the fix-race-issue-614 branch from 0275436 to 04eea80 Compare May 26, 2026 10:53
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
tests/unit/io/test_libevreactor_close_race.py (1)

116-130: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Exercise the post-dispatch EBADF race instead of the top-level guard.

These tests set is_closed before calling the handler, so Line 337 and Line 391 in cassandra/io/libevreactor.py return before send() / recv() run. That duplicates the “already closed” coverage and leaves the new secondary except guards untested.

Suggested fix
-        c._socket.send.side_effect = socket.error(errno.EBADF, "Bad file descriptor")
-        c.is_closed = True
+        def send_then_close(_):
+            c.is_closed = True
+            raise socket.error(errno.EBADF, "Bad file descriptor")
+        c._socket.send.side_effect = send_then_close
-        c._socket.recv.side_effect = socket.error(errno.EBADF, "Bad file descriptor")
-        c.is_closed = True
+        def recv_then_close(_):
+            c.is_closed = True
+            raise socket.error(errno.EBADF, "Bad file descriptor")
+        c._socket.recv.side_effect = recv_then_close

Also applies to: 173-184

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/io/test_libevreactor_close_race.py` around lines 116 - 130, Test
currently sets c.is_closed before calling c.handle_write so send() never runs;
instead make the socket.send side_effect simulate the post-dispatch race by
setting c.is_closed = True inside the side_effect and then raising
socket.error(errno.EBADF, ...), then call c.handle_write and assert defunct was
not called; apply the same pattern to the similar test around lines 173-184 (the
read-side test) by modifying the c._socket.recv side_effect to set is_closed
then raise EBADF so the new secondary except-guard paths in
handle_write/handle_read are exercised.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/unit/io/test_libevreactor_close_race.py`:
- Around line 34-38: The test should only skip when libev is absent, not on real
import errors: change the import try/except so it only catches
DependencyException (i.e. except DependencyException:) and lets ImportError
propagate; keep the fallback assignments to LibevConnection and
_PEER_DISCONNECT_ERRNOS when DependencyException is raised so the test is
skipped only for missing libev while real import problems in
cassandra.io.libevreactor surface.

---

Duplicate comments:
In `@tests/unit/io/test_libevreactor_close_race.py`:
- Around line 116-130: Test currently sets c.is_closed before calling
c.handle_write so send() never runs; instead make the socket.send side_effect
simulate the post-dispatch race by setting c.is_closed = True inside the
side_effect and then raising socket.error(errno.EBADF, ...), then call
c.handle_write and assert defunct was not called; apply the same pattern to the
similar test around lines 173-184 (the read-side test) by modifying the
c._socket.recv side_effect to set is_closed then raise EBADF so the new
secondary except-guard paths in handle_write/handle_read are exercised.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: fd6ea025-26c7-464a-a1a6-2c32614c08ac

📥 Commits

Reviewing files that changed from the base of the PR and between a5bb694 and 0275436.

📒 Files selected for processing (2)
  • cassandra/io/libevreactor.py
  • tests/unit/io/test_libevreactor_close_race.py

Comment on lines +34 to +38
try:
from cassandra.io.libevreactor import LibevConnection, _PEER_DISCONNECT_ERRNOS
except (ImportError, DependencyException):
LibevConnection = None
_PEER_DISCONNECT_ERRNOS = None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Only skip when libev is unavailable.

cassandra/io/libevreactor.py already translates a missing libev extension into DependencyException, so catching bare ImportError here will also turn real import regressions in the module under test into a silent skip.

Suggested fix
 try:
     from cassandra.io.libevreactor import LibevConnection, _PEER_DISCONNECT_ERRNOS
-except (ImportError, DependencyException):
+except DependencyException:
     LibevConnection = None
     _PEER_DISCONNECT_ERRNOS = None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
from cassandra.io.libevreactor import LibevConnection, _PEER_DISCONNECT_ERRNOS
except (ImportError, DependencyException):
LibevConnection = None
_PEER_DISCONNECT_ERRNOS = None
try:
from cassandra.io.libevreactor import LibevConnection, _PEER_DISCONNECT_ERRNOS
except DependencyException:
LibevConnection = None
_PEER_DISCONNECT_ERRNOS = None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/io/test_libevreactor_close_race.py` around lines 34 - 38, The test
should only skip when libev is absent, not on real import errors: change the
import try/except so it only catches DependencyException (i.e. except
DependencyException:) and lets ImportError propagate; keep the fallback
assignments to LibevConnection and _PEER_DISCONNECT_ERRNOS when
DependencyException is raised so the test is skipped only for missing libev
while real import problems in cassandra.io.libevreactor surface.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
tests/unit/io/test_libevreactor_close_race.py (2)

34-38: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Only skip on DependencyException.

cassandra.io.libevreactor already translates a missing libev extension into DependencyException, so catching ImportError here will also hide unrelated import regressions in the module under test by turning them into a skip.

Suggested fix
 try:
     from cassandra.io.libevreactor import LibevConnection, _PEER_DISCONNECT_ERRNOS
-except (ImportError, DependencyException):
+except DependencyException:
     LibevConnection = None
     _PEER_DISCONNECT_ERRNOS = None
#!/bin/bash
set -euo pipefail

echo "libevreactor import wrapper:"
sed -n '24,34p' cassandra/io/libevreactor.py

echo
echo "test import guard:"
sed -n '34,38p' tests/unit/io/test_libevreactor_close_race.py

Expected result: cassandra/io/libevreactor.py already converts the missing-extension case into DependencyException, so ImportError in this test guard is broader than needed.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/io/test_libevreactor_close_race.py` around lines 34 - 38, The test
import guard is catching ImportError too broadly and can hide real import
regressions; update the except clause so it only catches DependencyException
(leave the assignments to LibevConnection and _PEER_DISCONNECT_ERRNOS as-is).
Specifically, in tests/unit/io/test_libevreactor_close_race.py change the except
(ImportError, DependencyException): to except DependencyException: so only the
extension-missing case (DependencyException) is skipped while other import
errors surface.

72-76: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reset the global reactor in tearDown().

_make_connection() registers real LibevConnection instances in the module-global _global_loop. Calling close() here only queues them into _closed_conns, and this suite patches LibevLoop.maybe_start, so _loop_will_run() never drains that state. Leaving _global_loop alive makes later libev tests order-dependent.

Suggested fix
     def tearDown(self):
         for c in self._connections:
             c.close()
+        LibevConnection.handle_fork()
         for p in self.patchers:
             p.stop()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/io/test_libevreactor_close_race.py` around lines 72 - 76, TearDown
must reset the module-global reactor so leftover LibevConnection instances don't
persist; after closing connections and stopping patchers in tearDown(), clear
any queued closed connections and reset the global loop by emptying
_global_loop._closed_conns (if present) and then setting the module-global
_global_loop to None (or a fresh LibevLoop) so _make_connection /
LibevConnection registrations are not kept around when LibevLoop.maybe_start is
patched and _loop_will_run never drains them.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@tests/unit/io/test_libevreactor_close_race.py`:
- Around line 34-38: The test import guard is catching ImportError too broadly
and can hide real import regressions; update the except clause so it only
catches DependencyException (leave the assignments to LibevConnection and
_PEER_DISCONNECT_ERRNOS as-is). Specifically, in
tests/unit/io/test_libevreactor_close_race.py change the except (ImportError,
DependencyException): to except DependencyException: so only the
extension-missing case (DependencyException) is skipped while other import
errors surface.
- Around line 72-76: TearDown must reset the module-global reactor so leftover
LibevConnection instances don't persist; after closing connections and stopping
patchers in tearDown(), clear any queued closed connections and reset the global
loop by emptying _global_loop._closed_conns (if present) and then setting the
module-global _global_loop to None (or a fresh LibevLoop) so _make_connection /
LibevConnection registrations are not kept around when LibevLoop.maybe_start is
patched and _loop_will_run never drains them.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 039ed698-3f52-4d73-a4ac-e1770681c2bc

📥 Commits

Reviewing files that changed from the base of the PR and between 0275436 and 04eea80.

📒 Files selected for processing (2)
  • cassandra/io/libevreactor.py
  • tests/unit/io/test_libevreactor_close_race.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Driver reported "[Errno 9] Bad file descriptor"

5 participants