Fix race conditions and improve robustness during socket I/O#779
Fix race conditions and improve robustness during socket I/O#779julianz- wants to merge 1 commit into
Conversation
887399d to
4f1662e
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #779 +/- ##
==========================================
+ Coverage 78.09% 78.38% +0.29%
==========================================
Files 41 41
Lines 4788 4853 +65
Branches 547 553 +6
==========================================
+ Hits 3739 3804 +65
+ Misses 910 908 -2
- Partials 139 141 +2 |
4f1662e to
4833dac
Compare
048d898 to
f0471ca
Compare
|
@julianz- could you scan the modules more broadly for places where we could stop suppressing connection errors on the low level layers? I'm not looking into the tests until we figure out the architectural overview of the whole thing. But I feel like we're getting closer. I think we might need to split this PR into two at some point, will see. |
|
Thank you @webknjaz for all your great suggestions and points. I will take a look at everything. The layering is actually more complicated than I had realized. |
Yep, that's why I have incomplete pieces of code somewhere locally that I never finalized. It does require some effort.. |
There was a problem hiding this comment.
I think the current state of this module is good but Codecov shows that none of the newly added lines are covered with tests. We'll have to fix this before merging.
It looks like the updates to cheroot.ssl.pyopenssl could go into a standalone pull request. Could you extract this and add tests+changenote in a separate PR? I hope this will simplify reviewing this one.
| errno.EPIPE, | ||
| errno.ESHUTDOWN, # corresponds to BrokenPipeError in Python 3 | ||
| errno.ECONNRESET, # corresponds to ConnectionResetError in Python 3 | ||
| *((errno.WSAENOTSOCK,) if _compat.IS_WINDOWS else ()), |
There was a problem hiding this comment.
Does this exception correspond to EBADF semantically? If so, I'd probably keep them close to each other as suggested in https://github.com/cherrypy/cheroot/pull/779/files#r2432797948.
Additionally, could you add a comment hinting what might be causing it in practice?
There was a problem hiding this comment.
Sorry missed these from October! Now addressed.
| 'sendall', | ||
| 'settimeout', | ||
| 'gettimeout', | ||
| 'shutdown', |
There was a problem hiding this comment.
Will this interferre with the newly added decorator/methods? proxy_wrapper() below does effectively what we did with @_morph_syscall_to_connection_error and so shutdown() will end up double-decorated/locked if I'm reading this correctly.
| 'accept', | ||
| 'setblocking', | ||
| 'fileno', | ||
| 'close', |
There was a problem hiding this comment.
Will this interferre with the newly added decorator/methods? proxy_wrapper() below does effectively what we did with @_morph_syscall_to_connection_error and so shutdown() will end up double-decorated/locked if I'm reading this correctly.
| except Exception as exc: | ||
| pytest.fail(f'Unexpected exception raised: {type(exc).__name__}') |
There was a problem hiding this comment.
It's unclear why this needs to be wrapped with try/except: tests are supposed to be as simple as possible — adding more logic makes it difficult to reason about them. Additionally, any exception would bubble up and render in the output with the entire stack and respective context attached unlike a short failure message with the stack being lost.
There was a problem hiding this comment.
This is gone from the current version
| # Handle EBADF and other acceptable socket shutdown errors | ||
| if err.errno in _errors.acceptable_sock_shutdown_error_codes: | ||
| return |
There was a problem hiding this comment.
@julianz- do you know if error codes listed there contain things that aren't represented by ConnectionError? I'm wondering if newer Python versions would ever hit this code branch — is it possible that the above ConnectionError will handle this?
There was a problem hiding this comment.
Also, do we know what's causing the tests to always hit this line?
https://app.codecov.io/gh/cherrypy/cheroot/pull/779?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=checks&utm_campaign=pr+comments&utm_term=cherrypy#d99a7689c2159eab73db39f9e5b89a1d-R70.
It's best to make sure that this isn't due to some mocks but a real simulated scenario.
There was a problem hiding this comment.
EPIPE/ESHUTDOWN -> BrokenPipeError and ECONNRESET -> ConnectionResetError are ConnectionError subclasses. But EBADF and ENOTCONN are plain OSErrors, so the except OSError branch is needed for those. I added test_close_error_handling[ebadf] to cover this path.
Regarding the coverage, the Codecov report is for an older commit — the code has been significantly refactored since then. Not sure whether your question is still a concern?
| # Handle EBADF and other acceptable socket shutdown errors | ||
| if err.errno in _errors.acceptable_sock_shutdown_error_codes: | ||
| return | ||
| raise |
There was a problem hiding this comment.
I wish we have proper logging infra here to see what can actually happen. The coverage report shows that this line is never exercised in the tests.
There was a problem hiding this comment.
Perhaps, we could have a warning emitted, though..
There was a problem hiding this comment.
Covered now. Logging would be nice but maybe for another PR?
| try: | ||
| super().close() | ||
| except ConnectionError: | ||
| return |
There was a problem hiding this comment.
Looks like tests never hit this.
There was a problem hiding this comment.
Stale code i think. All lines in makefile.py covered now.
| if n == 0: | ||
| # If nothing was written we need to break | ||
| # to avoid infinte loops | ||
| break |
There was a problem hiding this comment.
The tests never hit this line. We should come up with a new regression test.
| if not n: | ||
| # Defensive: write() returned None or other falsy value, | ||
| # which shouldn't happen but could cause an infinite loop. | ||
| break |
There was a problem hiding this comment.
I think CodeQL is assuming n is an int but it's not clear that it has to be.
Fixes to make socket I/O more resilient during connection teardown. 1. BufferedWriter's write(): Added error handling to ignore common socket errors (e.g., ECONNRESET, EPIPE, ENOTCONN, EBADF) that occur when the underlying connection has been unexpectedly closed by the client or OS. This prevents a crash when attempting to write to a defunct socket. 2. BufferedWriters's close(): Made idempotent, allowing safe repeated calls without raising exceptions. 3. Needed to add explicit handling of WINDOWS environments as these are seen to throw Windows specific WSAENOTSOCK errors. Includes new unit tests to cover the idempotency and graceful handling of already closed underlying buffers.
Fixed race conditions to make socket I/O more resilient during connection teardown.
write(): Added error handling to ignore common socket errors (e.g.,ECONNRESET,EPIPE,ENOTCONN,EBADF) that occur when the underlying connection has been unexpectedly closed by the client or OS. This prevents a crash when attempting to write to a defunct socket.close(): Made idempotent, allowing safe repeated calls without raising exceptions.WSAENOTSOCKerrors.Includes new unit tests to cover the idempotency and graceful handling of already closed underlying buffers.
This change is