Support /pattern/i regex in broker-protection extract afterText/beforeText/separator#2777
Conversation
…eText/separator Extends the /pattern/ separator convention (#2242) to afterText and beforeText, and adds an optional `i` flag to all three so broker rules can match case-insensitively. Plain strings stay literal, so existing rules are unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Build Branch
Static preview entry points
QR codes (mobile preview)
Integration commandsnpm (Android / Extension): Swift Package Manager (Apple): .package(url: "https://github.com/duckduckgo/content-scope-scripts.git", branch: "pr-releases/randerson/regex-aftertext-beforetext")git submodule (Windows): git -C submodules/content-scope-scripts fetch origin pr-releases/randerson/regex-aftertext-beforetext
git -C submodules/content-scope-scripts checkout origin/pr-releases/randerson/regex-aftertext-beforetextPin to exact commitnpm (Android / Extension): Swift Package Manager (Apple): .package(url: "https://github.com/duckduckgo/content-scope-scripts.git", revision: "1fc06c3b0fe846aaba9fa9384a2a03afb574e052")git submodule (Windows): git -C submodules/content-scope-scripts fetch origin pr-releases/randerson/regex-aftertext-beforetext
git -C submodules/content-scope-scripts checkout 1fc06c3b0fe846aaba9fa9384a2a03afb574e052 |
[Beta] Generated file diffTime updated: Tue, 23 Jun 2026 05:19:11 GMT Android
File has changed AppleFile has changed IntegrationFile has changed WindowsFile has changed |
There was a problem hiding this comment.
Stale comment
Injected PR Evaluation: Web Compatibility & Security
Review of broker-protection extract regex support (
/pattern/iforafterText,beforeText,separator).Web Compatibility Assessment
File Lines Severity Finding injected/src/features/broker-protection/actions/extract.js358–383 info Changes are confined to broker-protection profile extraction. No browser API overrides, prototype patches, or DOM lifecycle changes that could affect third-party scripts on arbitrary sites. injected/src/features/broker-protection/actions/extract.js376–383 info Literal-string afterText/beforeTextpreserve the priorString#splitsemantics viasplitOnce. Existing configs using plain strings are unaffected.injected/src/features/broker-protection/actions/extract.js376–380 info RegExp afterText/beforeTextintentionally diverge fromString#splitfor multi-occurrence text (match + slice keeps everything after the first match). This is documented and tested; it only applies to/pattern/or/pattern/imatchers, not literals.injected/src/features/broker-protection/actions/extract.js358–360 warning parseRegexFromString(renamed fromparseRegexSeparator) changes how slash-wrapped separators ending in/icompile. Previously/age/ibecamenew RegExp('age/i')(no flag); now it becomesnew RegExp('age', 'i'). Any existingseparatorconfig that relied on the old parsing would behave differently. The new behavior is correct for the stated goal, but it is a config-level compatibility change worth noting in release notes.injected/src/features/broker-protection/actions/extract.js184–189 info afterText/beforeTextgain regex parsing for the first time. A config that used a literal value like/verified/asbeforeTextwould previously split on that exact string; it now compiles as a regex. Unlikely in practice, but possible for edge-case configs.Security Assessment
File Lines Severity Finding (scope) — info No changes to captured-globals.js, messaging transports, message bridge,wrapper-utils.js,shouldExemptMethod(), or platform entry points. Injected-runtime attack surface is unchanged.injected/src/features/broker-protection/actions/extract.js358–360 info Regex patterns originate from native broker action JSON (trusted config), not from page scripts. new RegExp(...)is appropriate for this trust boundary.injected/src/features/broker-protection/actions/extract.js377–378 info Regexes execute against page-derived innerText/textContent. A hostile page could supply long adversarial input against a config regex (ReDoS). This risk already existed forseparatorregexes; this PR extends the same model toafterText/beforeText. Mitigation is config-side pattern hygiene, not page-side.injected/src/features/broker-protection/actions/extract.js377 info matcher instanceof RegExpuses the ambientRegExpconstructor. In the broker-protection execution context this is low risk (patterns are constructed locally vianew RegExp), but capturedRegExpwould be marginally more consistent with hardened injected code elsewhere.injected/src/features/broker-protection/actions/extract.js360 info Invalid regex syntax in a /pattern/string will throw fromnew RegExpat extraction time. Failure is localized to the broker-protection action (not a page-wide throw), but a try/catch fallback to literal-string mode would make misconfigured patterns fail more gracefully.Risk Level
Low Risk — A narrowly scoped broker-protection extraction utility change with solid unit and integration test coverage; no injected API shims, messaging boundary changes, or global-capture regressions.
Recommendations
- Document the
/pattern/iseparator parsing fix in broker-protection release notes so config authors know/foo/inow enables case-insensitivity rather than matching a literal/isuffix in the pattern.- Consider wrapping
new RegExpin try/catch insideparseRegexFromStringto fall back to the literal string when compilation fails, matching the existing tolerant behavior for unsupported flags like/x/g.- ReDoS hygiene for new configs — prefer bounded, simple patterns (e.g.
/age:?\\s+/i,/\\s*aka\\s*/i) over nested quantifiers when authoring broker extract rules that run against untrusted page text.
✅ Unit tests verified locally:
broker-protection-extract.js— 28 specs, 0 failures.Sent by Cursor Automation: Web compat and sec
Dismissing stale approval — new commits pushed, awaiting Cursor re-review.
|
This PR requires a manual review and approval from a member of one of the following teams:
|
brianhall
left a comment
There was a problem hiding this comment.
This looks great, thanks!
Resolve conflicts with the field-extractor registry refactor (#2778): - extract.js: re-apply regex afterText/beforeText through splitOnce in shapeString; merge TextFieldSpec JSDoc (regex descriptions + attribute) - broker-protection-extract.js: migrate regex tests to stringsFromElements
There was a problem hiding this comment.
Injected PR Evaluation: Web Compatibility & Security
PR: Support /pattern/i regex in broker-protection extract afterText/beforeText/separator
Web Compatibility Assessment
| File | Lines | Severity | Finding |
|---|---|---|---|
injected/src/features/broker-protection/actions/extract.js |
444–446 | info | parseRegexFromString only accepts /pattern/ or /pattern/i. Unsupported flag forms (/x/g, /x/gi) remain literal strings — backward compatible and avoids global/sticky split surprises. |
injected/src/features/broker-protection/actions/extract.js |
462–468 | info | splitOnce uses match + slice for RegExp matchers instead of String#split, so afterText/beforeText keep everything after/before the first match (not between occurrences). This is a deliberate improvement over literal split semantics for regex; literal string matchers preserve existing split behavior exactly. |
injected/src/features/broker-protection/actions/extract.js |
308–315 | info | When a regex does not match, splitOnce returns undefined and shapeString falls back to the original value (?.trim() || value). No silent empty-string regressions for non-matching pages. |
injected/src/features/broker-protection/actions/extract.js |
444–446 | warning | Minor behavior change for separator: previously // (length 2, starts/ends with /) was compiled as an empty regex via the old parseRegexSeparator; now it stays a literal // because .+ requires at least one pattern character. Unlikely in real broker configs but technically a delta. |
injected/integration-test/... |
— | info | Integration test + synthetic HTML exercise case-insensitive extraction across mixed badge/label casing. Good coverage of the motivating scenario. |
No findings in API surface fidelity, prototype chain integrity, DOM safety, timing/races, platform-specific APIs, or third-party script detection — this change does not override browser APIs and only runs inside the config-gated broker-protection action executor.
Security Assessment
| File | Lines | Severity | Finding |
|---|---|---|---|
injected/src/features/broker-protection/actions/extract.js |
444–446, 462–468 | info | Regex patterns originate from DDG-authored broker action configs (native → content script), not from page scripts. Only the input text (innerText/textContent) is page-controlled. |
injected/src/features/broker-protection/actions/extract.js |
464 | warning | ReDoS surface (bounded): value.match(matcher) runs DDG-defined patterns against page-controlled strings. A hostile broker page could supply very long or adversarial text to amplify backtracking in a poorly chosen config pattern. Mitigated by: feature is opt-in/PIR-scoped, patterns are internally reviewed, and only the i flag is exposed (no g/m/s). Consider documenting safe-pattern guidance for config authors. |
injected/src/features/broker-protection/actions/extract.js |
446 | info | Invalid regex syntax in config (e.g. /[/) throws from new RegExp; execute() catches and returns ErrorResponse — no unhandled rejection or page-visible throw. |
| — | — | info | No changes to captured globals, messaging/nativeData, message bridge, postMessage, iframe access, or remote-config gating. RegExp usage is in the content-script isolated realm, not a page-tamperable shim boundary. |
No critical or error-level security findings.
Risk Level
Low Risk — Scoped broker-protection extraction helper change with no API overrides, messaging changes, or global capture modifications; behavior is backward compatible for literal strings and well-covered by unit + integration tests.
Recommendations
- (info) Document in broker-config authoring guidance that only
/pattern/and/pattern/iare supported, and thatafterText/beforeTextregex matchers anchor on the first match (differs from multi-split literal behavior). - (info) When adding production broker configs with regex, prefer possessive/simple patterns (e.g.
/age:?\\s+/i) over nested quantifiers to limit ReDoS exposure against adversarial page text. - (info) The
//→ literal behavior change is harmless for real configs; no action required unless internal configs relied on empty-regex splitting.
Cursor automation — injected PR evaluation (web compat & security)
Sent by Cursor Automation: Web compat and sec


Asana Task/Github Issue: https://app.asana.com/1/137249556945/project/1209121419454298/task/1213630318520583
Description
separatoralready lets a broker rule pass a regex via the/pattern/convention (#2242), butafterTextandbeforeTextwere still plain literal matches, and none of the three could match case-insensitively. We want these fields to be as flexible as possible, so that broker rules can adapt to whatever shape a page throws at them without needing a code change.So this extends the same
/pattern/convention toafterTextandbeforeText, and adds an optionaliflag across all three:Plain strings stay literal, so there's no behaviour change for existing rules – the regex path only kicks in when a value is wrapped in
/.../. Anything with an unsupported flag (i.e. noti) falls back to being treated as a literal string rather than throwing, so a mistyped config fails gently rather than breaking the whole action.Testing Steps
There's no broker that needs this yet, so there's no good way to test it by hand. Coverage is via tests instead:
npm run test-int(broker-protection specs) →extract with regex afterText / beforeText / separator. Runs against a synthetic people-search fixture (results-regex.html) with several same-name results where only one is the right match, and the badge / age-label / alias-separator casing varies between cards. Asserts the matched profile resolves toJohn Smith, age38, Chicago, which only holds when theiflag is honoured.npm run test-unitcovers the/pattern/parser (/x/i, unsupported flags staying literal, the/and//edge cases) and the afterText/beforeText/separator behaviour, including the match-and-slice semantics that keep capture groups out of the result.Checklist
Please tick all that apply:
🤖 Generated with Claude Code
Note
Medium Risk
Touches shared profile scraping logic used in production broker actions; behavior is backward compatible but misconfigured regex could change extracted PII fields until rules are fixed.
Overview
Broker extract rules can now use the same
/pattern/and/pattern/iJSON convention forafterTextandbeforeText, not onlyseparator, so labels like age and AKA can be matched case-insensitively without code changes.parseRegexFromStringreplaces the old separator-only parser and accepts an optionaliflag; unsupported flags (e.g./x/g) stay literal strings.shapeStringroutesafterText/beforeTextthrough newsplitOnce, which keeps literalString#splitbehavior for plain strings and uses match + slice for regex so capture groups do not pollute results and “after” keeps everything past the first match.Plain-string configs are unchanged. Coverage adds unit tests for the parser and regex shaping, plus an integration test on a synthetic results page with mixed casing on badges and labels.
Reviewed by Cursor Bugbot for commit 9966bdb. Bugbot is set up for automated code reviews on this repo. Configure here.