Skip to content

Fix race conditions and improve robustness during socket I/O#779

Open
julianz- wants to merge 1 commit into
cherrypy:mainfrom
julianz-:fix-socket-teardown
Open

Fix race conditions and improve robustness during socket I/O#779
julianz- wants to merge 1 commit into
cherrypy:mainfrom
julianz-:fix-socket-teardown

Conversation

@julianz-

@julianz- julianz- commented Oct 5, 2025

Copy link
Copy Markdown
Member

Fixed race conditions to make socket I/O more resilient during connection teardown.

  1. BufferedWriter's write(): Added error handling to ignore common socket errors (e.g., ECONNRESET, EPIPE, ENOTCONN, EBADF) that occur when the underlying connection has been unexpectedly closed by the client or OS. This prevents a crash when attempting to write to a defunct socket.
  2. BufferedWriters's close(): Made idempotent, allowing safe repeated calls without raising exceptions.
  3. Needed to add explicit handling of WINDOWS environments as these are seen to throw Windows specific WSAENOTSOCK errors.

Includes new unit tests to cover the idempotency and graceful handling of already closed underlying buffers.


This change is Reviewable

@julianz- julianz- closed this Oct 5, 2025
@julianz- julianz- reopened this Oct 5, 2025
@julianz- julianz- changed the title Fix race condition and improve robustness during socket I/O [DRAFT] Fix race condition and improve robustness during socket I/O Oct 5, 2025
@julianz- julianz- force-pushed the fix-socket-teardown branch 2 times, most recently from 887399d to 4f1662e Compare October 5, 2025 19:04
@codecov

codecov Bot commented Oct 5, 2025

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 98.52941% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 78.38%. Comparing base (3937fe1) to head (7fb95c2).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #779      +/-   ##
==========================================
+ Coverage   78.09%   78.38%   +0.29%     
==========================================
  Files          41       41              
  Lines        4788     4853      +65     
  Branches      547      553       +6     
==========================================
+ Hits         3739     3804      +65     
+ Misses        910      908       -2     
- Partials      139      141       +2     

@julianz- julianz- force-pushed the fix-socket-teardown branch from 4f1662e to 4833dac Compare October 5, 2025 19:09
@psf-chronographer psf-chronographer Bot added the bot:chronographer:provided A mark meaning that a new change log entry is present within the patch. label Oct 5, 2025
@julianz- julianz- marked this pull request as draft October 5, 2025 19:09
@julianz- julianz- changed the title [DRAFT] Fix race condition and improve robustness during socket I/O Fix race condition and improve robustness during socket I/O Oct 5, 2025
@julianz- julianz- changed the title Fix race condition and improve robustness during socket I/O Fix race conditions and improve robustness during socket I/O Oct 5, 2025
@julianz- julianz- marked this pull request as ready for review October 5, 2025 19:26
Comment thread cheroot/makefile.py Outdated
Comment thread cheroot/makefile.py Outdated
Comment thread cheroot/errors.py Outdated
Comment thread cheroot/errors.py Outdated
Comment thread cheroot/makefile.py Outdated
Comment thread cheroot/makefile.py Outdated
Comment thread cheroot/server.py Outdated
Comment thread cheroot/test/test_conn.py Outdated
Comment thread cheroot/test/test_conn.py Outdated
Comment thread docs/changelog-fragments.d/779.bugfix.rst
Comment thread docs/spelling_wordlist.txt
Comment thread cheroot/server.py Outdated
Comment thread cheroot/server.py Outdated
Comment thread cheroot/server.py Outdated
Comment thread cheroot/server.py Outdated
Comment thread cheroot/makefile.py Outdated
@julianz- julianz- force-pushed the fix-socket-teardown branch 2 times, most recently from 048d898 to f0471ca Compare October 7, 2025 03:27
Comment thread cheroot/makefile.py Outdated
Comment thread cheroot/server.py Outdated
Comment thread cheroot/server.py Outdated
Comment thread cheroot/server.py Outdated
Comment thread cheroot/server.py Outdated
Comment thread cheroot/server.py Outdated
Comment thread cheroot/server.py
Comment thread cheroot/ssl/pyopenssl.py Outdated
Comment thread cheroot/ssl/pyopenssl.py Outdated
Comment thread cheroot/ssl/pyopenssl.py Outdated
Comment thread cheroot/makefile.py Outdated
@webknjaz

Copy link
Copy Markdown
Member

@julianz- could you scan the modules more broadly for places where we could stop suppressing connection errors on the low level layers? I'm not looking into the tests until we figure out the architectural overview of the whole thing. But I feel like we're getting closer. I think we might need to split this PR into two at some point, will see.

@julianz-

Copy link
Copy Markdown
Member Author

Thank you @webknjaz for all your great suggestions and points. I will take a look at everything. The layering is actually more complicated than I had realized.

@webknjaz

Copy link
Copy Markdown
Member

The layering is actually more complicated than I had realized.

Yep, that's why I have incomplete pieces of code somewhere locally that I never finalized. It does require some effort..

Comment thread cheroot/makefile.py Outdated
Comment thread cheroot/server.py Outdated
Comment thread cheroot/ssl/pyopenssl.py

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current state of this module is good but Codecov shows that none of the newly added lines are covered with tests. We'll have to fix this before merging.

It looks like the updates to cheroot.ssl.pyopenssl could go into a standalone pull request. Could you extract this and add tests+changenote in a separate PR? I hope this will simplify reviewing this one.

@julianz- julianz- Jun 24, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests and pyopenssl was updated in #800 and #801.

Comment thread cheroot/errors.py Outdated
errno.EPIPE,
errno.ESHUTDOWN, # corresponds to BrokenPipeError in Python 3
errno.ECONNRESET, # corresponds to ConnectionResetError in Python 3
*((errno.WSAENOTSOCK,) if _compat.IS_WINDOWS else ()),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this exception correspond to EBADF semantically? If so, I'd probably keep them close to each other as suggested in https://github.com/cherrypy/cheroot/pull/779/files#r2432797948.

Additionally, could you add a comment hinting what might be causing it in practice?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@julianz- missed this question?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry missed these from October! Now addressed.

Comment thread cheroot/ssl/pyopenssl.py
'sendall',
'settimeout',
'gettimeout',
'shutdown',

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this interferre with the newly added decorator/methods? proxy_wrapper() below does effectively what we did with @_morph_syscall_to_connection_error and so shutdown() will end up double-decorated/locked if I'm reading this correctly.

Comment thread cheroot/ssl/pyopenssl.py
'accept',
'setblocking',
'fileno',
'close',

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this interferre with the newly added decorator/methods? proxy_wrapper() below does effectively what we did with @_morph_syscall_to_connection_error and so shutdown() will end up double-decorated/locked if I'm reading this correctly.

Comment thread cheroot/test/test_makefile.py Outdated
Comment on lines +113 to +114
except Exception as exc:
pytest.fail(f'Unexpected exception raised: {type(exc).__name__}')

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear why this needs to be wrapped with try/except: tests are supposed to be as simple as possible — adding more logic makes it difficult to reason about them. Additionally, any exception would bubble up and render in the output with the entire stack and respective context attached unlike a short failure message with the stack being lost.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is gone from the current version

Comment thread cheroot/makefile.py
Comment on lines +68 to +70
# Handle EBADF and other acceptable socket shutdown errors
if err.errno in _errors.acceptable_sock_shutdown_error_codes:
return

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@julianz- do you know if error codes listed there contain things that aren't represented by ConnectionError? I'm wondering if newer Python versions would ever hit this code branch — is it possible that the above ConnectionError will handle this?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do we know what's causing the tests to always hit this line?
https://app.codecov.io/gh/cherrypy/cheroot/pull/779?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=checks&utm_campaign=pr+comments&utm_term=cherrypy#d99a7689c2159eab73db39f9e5b89a1d-R70.

It's best to make sure that this isn't due to some mocks but a real simulated scenario.

@julianz- julianz- Jun 24, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EPIPE/ESHUTDOWN -> BrokenPipeError and ECONNRESET -> ConnectionResetError are ConnectionError subclasses. But EBADF and ENOTCONN are plain OSErrors, so the except OSError branch is needed for those. I added test_close_error_handling[ebadf] to cover this path.

Regarding the coverage, the Codecov report is for an older commit — the code has been significantly refactored since then. Not sure whether your question is still a concern?

Comment thread cheroot/makefile.py
# Handle EBADF and other acceptable socket shutdown errors
if err.errno in _errors.acceptable_sock_shutdown_error_codes:
return
raise

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish we have proper logging infra here to see what can actually happen. The coverage report shows that this line is never exercised in the tests.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, we could have a warning emitted, though..

@julianz- julianz- Jun 24, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covered now. Logging would be nice but maybe for another PR?

Comment thread cheroot/makefile.py
try:
super().close()
except ConnectionError:
return

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like tests never hit this.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale code i think. All lines in makefile.py covered now.

Comment thread cheroot/makefile.py
if n == 0:
# If nothing was written we need to break
# to avoid infinte loops
break

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests never hit this line. We should come up with a new regression test.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment thread cheroot/makefile.py
if not n:
# Defensive: write() returned None or other falsy value,
# which shouldn't happen but could cause an infinite loop.
break

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think CodeQL is assuming n is an int but it's not clear that it has to be.

@read-the-docs-community

read-the-docs-community Bot commented Jun 24, 2026

Copy link
Copy Markdown

Fixes to make socket I/O more resilient during connection teardown.

1. BufferedWriter's write(): Added error handling to ignore common
   socket errors (e.g., ECONNRESET, EPIPE, ENOTCONN, EBADF) that occur
   when the underlying connection has been unexpectedly closed by the
   client or OS. This prevents a crash when attempting to write to a
   defunct socket.
2. BufferedWriters's close(): Made idempotent, allowing safe repeated
   calls without raising exceptions.
3. Needed to add explicit handling of WINDOWS environments as these are
   seen to throw Windows specific WSAENOTSOCK errors.

Includes new unit tests to cover the idempotency and graceful handling
of already closed underlying buffers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bot:chronographer:provided A mark meaning that a new change log entry is present within the patch.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants