Speed up Cursor.fetchall by draining batches directly#196
Speed up Cursor.fetchall by draining batches directly#196TheDistributor wants to merge 2 commits into
Conversation
Accelerate data retrieval pathways in the Cursor class by optimizing row iteration and reducing interpreter overhead on high-frequency method loops. Detailed changes: - fetchone: Inline the `_check_closed()` state verification directly into the method body. This removes function-call stack overhead for every individual row retrieved. - fetchall: Refactor the loop to slice and drain the current in-memory result batch in bulk using `list.extend()` instead of sequentially invoking `fetchone()` row by row. - fetchall: Directly request the next network payload batch via `fetch_result_set_next` once the in-memory collection is exhausted, updating total `rownumber` tracking as a single block operation. - fetchall: Add a defensive guard clause against unallocated result sets to ensure strict DB-API compatibility.
madscientist
left a comment
There was a problem hiding this comment.
Thanks for looking into this @TheDistributor
| self._check_closed() | ||
| # Inline _check_closed to avoid per-row function-call overhead. | ||
| if self.closed: | ||
| raise Error("cursor is closed") | ||
| if self.session.closed: | ||
| raise Error("connection is closed") |
There was a problem hiding this comment.
Is this buying us enough to justify duplicating the code? I know method calls in python are slower but IMO we need to weigh the overhead of supporting multiple copies of the same code and the possibilities for behavior divergence.
My opinion is that no one who chooses to use the python driver is going to be fussed about an extra 50 nanoseconds of speed or whatever, even in a per-row basis. And if they do care about it then we should be working on using a compiled solution rather than a native python solution.
The change below to fetchall() is great. But to me this one feels like chasing cycles at the expense of code maintainability and IMO that's not the way we should be leaning, for a Python implementation.
There was a problem hiding this comment.
Why not make the same sort of improvement here, that you did to fetchall()?
request the next batch from the session, instead of looping through
fetchone() for every row
function call on every row