fix(utils): match_regex_list crashes on an empty-string pattern#6505
fix(utils): match_regex_list crashes on an empty-string pattern#6505devteamaegis wants to merge 1 commit into
Conversation
ericapisani
left a comment
There was a problem hiding this comment.
Thanks for opening the issue and pull request to address this.
There's a couple of things we'll need to address before this gets merged - let me know if you have any questions about what I've mentioned below.
|
|
||
| def test_match_regex_list_empty_string_pattern(): | ||
| # An empty-string pattern must not raise IndexError (regression test). | ||
| result = match_regex_list("anything", [""]) |
There was a problem hiding this comment.
Rather than have this be a standalone test, this case can be incorporated into the test above (test_match_regex_list) as one of the parameterized test cases. You'll just need to add the expected outcome as the last element in the array.
This means that last 2 assertions around foo/foobar can also be removed since we already have test cases that check similar behaviour within the array (lines 553 and 554 above)
|
|
||
| for item_matcher in regex_list: | ||
| if not substring_matching and item_matcher[-1] != "$": | ||
| if not substring_matching and (not item_matcher or item_matcher[-1] != "$"): |
There was a problem hiding this comment.
While this fixes the index error, the result that's returned from this change means match_regex_list("anything", [""]) will return True.
An empty string likely represents a typo/mistake on the developers part, and so we should treat this as if [] was passed in.
This would mean that this conditional should look like the following instead:
for item_matcher in regex_list:
if not item_matcher:
return False
if not substring_matching and item_matcher[-1] != "$":
item_matcher += "$"4043b13 to
d42a94a
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit d42a94a. Configure here.
|
|
||
| for item_matcher in regex_list: | ||
| if not substring_matching and item_matcher[-1] != "$": | ||
| if not substring_matching and (not item_matcher or item_matcher[-1] != "$"): |
There was a problem hiding this comment.
Empty pattern matches all strings
Medium Severity
The empty-string guard avoids IndexError by treating a falsy item_matcher like a pattern that still gets anchored to "$". With default substring_matching=False, re.search on that pattern matches any non-empty item, so match_regex_list returns True for [""]. Callers such as exclude_beat_tasks then treat every name as matched instead of ignoring the typo like an empty list.
Reviewed by Cursor Bugbot for commit d42a94a. Configure here.
| if not substring_matching and (not item_matcher or item_matcher[-1] != "$"): | ||
| item_matcher += "$" |
There was a problem hiding this comment.
Bug: An empty string "" in regex_list causes match_regex_list to always return True. The fix item_matcher += "$" is incorrect as re.search("$", ...) always matches.
Severity: HIGH
Suggested Fix
Instead of converting an empty item_matcher to "$", the function should explicitly handle an empty string in regex_list to mean "match nothing". This could be done by returning False immediately if item_matcher is empty after processing, or by ensuring the regex pattern for an empty string does not match everything.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: sentry_sdk/utils.py#L1743-L1744
Potential issue: The function `match_regex_list` incorrectly handles an empty string
`""` in `regex_list`. The proposed fix converts an empty `item_matcher` to `"$"` before
performing a regex search. However, `re.search("$", item)` always returns a match
because the `$` metacharacter matches the end-of-string position, which is present in
every string. As a result, if any `regex_list` contains an empty string, the function
will incorrectly return `True` for all items. For example, setting `exclude_beat_tasks =
[""]` in the Celery integration would silently exclude all beat tasks from
instrumentation.
Also affects:
sentry_sdk/integrations/celery/beat.py:130
Did we get this right? 👍 / 👎 to inform future reviews.


What's broken
match_regex_listchecksitem_matcher[-1] != "$"to decide whether to anchor each pattern. With the defaultsubstring_matching=False, an empty-string element makes""[-1]raise:exclude_beat_tasksin the Celery integration flows straight into this with the default mode, so an empty string in that user-supplied list crashes the beat instrumentation.Why it happens
Indexing
item_matcher[-1]on an empty string is out of range.Fix
Short-circuit on empty:
(not item_matcher or item_matcher[-1] != "$"). An empty matcher then becomes"$", which matches end-of-string — a sane result instead of a crash.Test
Added a case asserting
match_regex_list("anything", [""])returns without raising; existing anchored/substring behavior is unchanged.Fixes #6504