Support `/pattern/i` regex in broker-protection extract afterText/beforeText/separator by noisysocks · Pull Request #2777 · duckduckgo/content-scope-scripts

noisysocks · 2026-06-18T04:41:21Z

Asana Task/Github Issue: https://app.asana.com/1/137249556945/project/1209121419454298/task/1213630318520583

Description

separator already lets a broker rule pass a regex via the /pattern/ convention (#2242), but afterText and beforeText were still plain literal matches, and none of the three could match case-insensitively. We want these fields to be as flexible as possible, so that broker rules can adapt to whatever shape a page throws at them without needing a code change.

So this extends the same /pattern/ convention to afterText and beforeText, and adds an optional i flag across all three:

"age":                  { "selector": ".age", "afterText": "/age:?\\s+/i" },
"alternativeNamesList": { "selector": ".aka", "separator": "/\\s*aka\\s*/i" }

Plain strings stay literal, so there's no behaviour change for existing rules – the regex path only kicks in when a value is wrapped in /.../. Anything with an unsupported flag (i.e. not i) falls back to being treated as a literal string rather than throwing, so a mistyped config fails gently rather than breaking the whole action.

Testing Steps

There's no broker that needs this yet, so there's no good way to test it by hand. Coverage is via tests instead:

npm run test-int (broker-protection specs) → extract with regex afterText / beforeText / separator. Runs against a synthetic people-search fixture (results-regex.html) with several same-name results where only one is the right match, and the badge / age-label / alias-separator casing varies between cards. Asserts the matched profile resolves to John Smith, age 38, Chicago, which only holds when the i flag is honoured.
npm run test-unit covers the /pattern/ parser (/x/i, unsupported flags staying literal, the / and // edge cases) and the afterText/beforeText/separator behaviour, including the match-and-slice semantics that keep capture groups out of the result.

Checklist

Please tick all that apply:

I have tested this change locally
I have tested this change locally in all supported browsers
This change will be visible to users
I have added automated tests that cover this change
I have ensured the change is gated by config
This change was covered by a ship review
This change was covered by a tech design
Any dependent config has been merged

🤖 Generated with Claude Code

Note

Medium Risk
Touches shared profile scraping logic used in production broker actions; behavior is backward compatible but misconfigured regex could change extracted PII fields until rules are fixed.

Overview
Broker extract rules can now use the same /pattern/ and /pattern/i JSON convention for afterText and beforeText, not only separator, so labels like age and AKA can be matched case-insensitively without code changes.

parseRegexFromString replaces the old separator-only parser and accepts an optional i flag; unsupported flags (e.g. /x/g) stay literal strings. shapeString routes afterText / beforeText through new splitOnce, which keeps literal String#split behavior for plain strings and uses match + slice for regex so capture groups do not pollute results and “after” keeps everything past the first match.

Plain-string configs are unchanged. Coverage adds unit tests for the parser and regex shaping, plus an integration test on a synthetic results page with mixed casing on badges and labels.

^{Reviewed by Cursor Bugbot for commit 9966bdb. Bugbot is set up for automated code reviews on this repo. Configure here.}

…eText/separator Extends the /pattern/ separator convention (#2242) to afterText and beforeText, and adds an optional `i` flag to all three so broker rules can match case-insensitively. Plain strings stay literal, so existing rules are unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-18T04:41:53Z

Build Branch


Branch	`pr-releases/randerson/regex-aftertext-beforetext`
Commit	`1fc06c3b0f`
Updated	June 23, 2026 at 5:18:33 AM UTC

Static preview entry points

Docs: docs/index.html
Static pages: build/integration/pages/index.html
Integration pages: injected/integration-test/test-pages/index.html

QR codes (mobile preview)

Entry point	QR code
Docs
Static pages
Integration pages

Integration commands

npm (Android / Extension):

npm i github:duckduckgo/content-scope-scripts#pr-releases/randerson/regex-aftertext-beforetext

Swift Package Manager (Apple):

.package(url: "https://github.com/duckduckgo/content-scope-scripts.git", branch: "pr-releases/randerson/regex-aftertext-beforetext")

git submodule (Windows):

git -C submodules/content-scope-scripts fetch origin pr-releases/randerson/regex-aftertext-beforetext
git -C submodules/content-scope-scripts checkout origin/pr-releases/randerson/regex-aftertext-beforetext

Pin to exact commit

npm (Android / Extension):

npm i github:duckduckgo/content-scope-scripts#1fc06c3b0fe846aaba9fa9384a2a03afb574e052

Swift Package Manager (Apple):

.package(url: "https://github.com/duckduckgo/content-scope-scripts.git", revision: "1fc06c3b0fe846aaba9fa9384a2a03afb574e052")

git submodule (Windows):

git -C submodules/content-scope-scripts fetch origin pr-releases/randerson/regex-aftertext-beforetext
git -C submodules/content-scope-scripts checkout 1fc06c3b0fe846aaba9fa9384a2a03afb574e052

github-actions · 2026-06-18T04:42:35Z

[Beta] Generated file diff

Time updated: Tue, 23 Jun 2026 05:19:11 GMT

Android

    - android/autofillImport.js

android/brokerProtection.js

File has changed

Apple

    - apple/contentScopeIsolated.js

File has changed

Integration

    - integration/contentScope.js

File has changed

Windows

    - windows/contentScope.js

File has changed

cursor

Stale comment

Injected PR Evaluation: Web Compatibility & Security

Review of broker-protection extract regex support (/pattern/i for afterText, beforeText, separator).

Web Compatibility Assessment

File Lines Severity Finding

injected/src/features/broker-protection/actions/extract.js 358–383 info Changes are confined to broker-protection profile extraction. No browser API overrides, prototype patches, or DOM lifecycle changes that could affect third-party scripts on arbitrary sites.

injected/src/features/broker-protection/actions/extract.js 376–383 info Literal-string afterText/beforeText preserve the prior String#split semantics via splitOnce. Existing configs using plain strings are unaffected.

injected/src/features/broker-protection/actions/extract.js 376–380 info RegExp afterText/beforeText intentionally diverge from String#split for multi-occurrence text (match + slice keeps everything after the first match). This is documented and tested; it only applies to /pattern/ or /pattern/i matchers, not literals.

injected/src/features/broker-protection/actions/extract.js 358–360 warning parseRegexFromString (renamed from parseRegexSeparator) changes how slash-wrapped separators ending in /i compile. Previously /age/i became new RegExp('age/i') (no flag); now it becomes new RegExp('age', 'i'). Any existing separator config that relied on the old parsing would behave differently. The new behavior is correct for the stated goal, but it is a config-level compatibility change worth noting in release notes.

injected/src/features/broker-protection/actions/extract.js 184–189 info afterText/beforeText gain regex parsing for the first time. A config that used a literal value like /verified/ as beforeText would previously split on that exact string; it now compiles as a regex. Unlikely in practice, but possible for edge-case configs.

Security Assessment

File Lines Severity Finding

(scope) — info No changes to captured-globals.js, messaging transports, message bridge, wrapper-utils.js, shouldExemptMethod(), or platform entry points. Injected-runtime attack surface is unchanged.

injected/src/features/broker-protection/actions/extract.js 358–360 info Regex patterns originate from native broker action JSON (trusted config), not from page scripts. new RegExp(...) is appropriate for this trust boundary.

injected/src/features/broker-protection/actions/extract.js 377–378 info Regexes execute against page-derived innerText/textContent. A hostile page could supply long adversarial input against a config regex (ReDoS). This risk already existed for separator regexes; this PR extends the same model to afterText/beforeText. Mitigation is config-side pattern hygiene, not page-side.

injected/src/features/broker-protection/actions/extract.js 377 info matcher instanceof RegExp uses the ambient RegExp constructor. In the broker-protection execution context this is low risk (patterns are constructed locally via new RegExp), but captured RegExp would be marginally more consistent with hardened injected code elsewhere.

injected/src/features/broker-protection/actions/extract.js 360 info Invalid regex syntax in a /pattern/ string will throw from new RegExp at extraction time. Failure is localized to the broker-protection action (not a page-wide throw), but a try/catch fallback to literal-string mode would make misconfigured patterns fail more gracefully.

Risk Level

Low Risk — A narrowly scoped broker-protection extraction utility change with solid unit and integration test coverage; no injected API shims, messaging boundary changes, or global-capture regressions.

Recommendations

Document the /pattern/i separator parsing fix in broker-protection release notes so config authors know /foo/i now enables case-insensitivity rather than matching a literal /i suffix in the pattern.

Consider wrapping new RegExp in try/catch inside parseRegexFromString to fall back to the literal string when compilation fails, matching the existing tolerant behavior for unsupported flags like /x/g.

ReDoS hygiene for new configs — prefer bounded, simple patterns (e.g. /age:?\\s+/i, /\\s*aka\\s*/i) over nested quantifiers when authoring broker extract rules that run against untrusted page text.

✅ Unit tests verified locally: broker-protection-extract.js — 28 specs, 0 failures.

_{Sent by Cursor Automation: Web compat and sec}

Dismissing stale approval — new commits pushed, awaiting Cursor re-review.

github-actions · 2026-06-23T00:43:54Z

⚠️ Cursor assessed this PR as Medium Risk (only Low Risk is auto-approved).

This PR requires a manual review and approval from a member of one of the following teams:

@duckduckgo/content-scope-scripts-owners
@duckduckgo/apple-devs
@duckduckgo/android-devs
@duckduckgo/team-windows-development
@duckduckgo/extension-owners
@duckduckgo/config-aor
@duckduckgo/breakage-aor
@duckduckgo/breakage

brianhall

This looks great, thanks!

Resolve conflicts with the field-extractor registry refactor (#2778): - extract.js: re-apply regex afterText/beforeText through splitOnce in shapeString; merge TextFieldSpec JSDoc (regex descriptions + attribute) - broker-protection-extract.js: migrate regex tests to stringsFromElements

cursor

Injected PR Evaluation: Web Compatibility & Security

PR: Support /pattern/i regex in broker-protection extract afterText/beforeText/separator

Web Compatibility Assessment

File	Lines	Severity	Finding
`injected/src/features/broker-protection/actions/extract.js`	444–446	info	`parseRegexFromString` only accepts `/pattern/` or `/pattern/i`. Unsupported flag forms (`/x/g`, `/x/gi`) remain literal strings — backward compatible and avoids global/sticky `split` surprises.
`injected/src/features/broker-protection/actions/extract.js`	462–468	info	`splitOnce` uses `match` + `slice` for RegExp matchers instead of `String#split`, so `afterText`/`beforeText` keep everything after/before the first match (not between occurrences). This is a deliberate improvement over literal `split` semantics for regex; literal string matchers preserve existing `split` behavior exactly.
`injected/src/features/broker-protection/actions/extract.js`	308–315	info	When a regex does not match, `splitOnce` returns `undefined` and `shapeString` falls back to the original value (`?.trim() \|\| value`). No silent empty-string regressions for non-matching pages.
`injected/src/features/broker-protection/actions/extract.js`	444–446	warning	Minor behavior change for `separator`: previously `//` (length 2, starts/ends with `/`) was compiled as an empty regex via the old `parseRegexSeparator`; now it stays a literal `//` because `.+` requires at least one pattern character. Unlikely in real broker configs but technically a delta.
`injected/integration-test/...`	—	info	Integration test + synthetic HTML exercise case-insensitive extraction across mixed badge/label casing. Good coverage of the motivating scenario.

No findings in API surface fidelity, prototype chain integrity, DOM safety, timing/races, platform-specific APIs, or third-party script detection — this change does not override browser APIs and only runs inside the config-gated broker-protection action executor.

Security Assessment

File	Lines	Severity	Finding
`injected/src/features/broker-protection/actions/extract.js`	444–446, 462–468	info	Regex patterns originate from DDG-authored broker action configs (native → content script), not from page scripts. Only the input text (`innerText`/`textContent`) is page-controlled.
`injected/src/features/broker-protection/actions/extract.js`	464	warning	ReDoS surface (bounded): `value.match(matcher)` runs DDG-defined patterns against page-controlled strings. A hostile broker page could supply very long or adversarial text to amplify backtracking in a poorly chosen config pattern. Mitigated by: feature is opt-in/PIR-scoped, patterns are internally reviewed, and only the `i` flag is exposed (no `g`/`m`/`s`). Consider documenting safe-pattern guidance for config authors.
`injected/src/features/broker-protection/actions/extract.js`	446	info	Invalid regex syntax in config (e.g. `/[/`) throws from `new RegExp`; `execute()` catches and returns `ErrorResponse` — no unhandled rejection or page-visible throw.
—	—	info	No changes to captured globals, messaging/`nativeData`, message bridge, `postMessage`, iframe access, or remote-config gating. `RegExp` usage is in the content-script isolated realm, not a page-tamperable shim boundary.

No critical or error-level security findings.

Risk Level

Low Risk — Scoped broker-protection extraction helper change with no API overrides, messaging changes, or global capture modifications; behavior is backward compatible for literal strings and well-covered by unit + integration tests.

Recommendations

(info) Document in broker-config authoring guidance that only /pattern/ and /pattern/i are supported, and that afterText/beforeText regex matchers anchor on the first match (differs from multi-split literal behavior).
(info) When adding production broker configs with regex, prefer possessive/simple patterns (e.g. /age:?\\s+/i) over nested quantifiers to limit ReDoS exposure against adversarial page text.
(info) The // → literal behavior change is harmless for real configs; no action required unless internal configs relied on empty-regex splitting.

Cursor automation — injected PR evaluation (web compat & security)

_{Sent by Cursor Automation: Web compat and sec}

noisysocks requested review from a team as code owners June 18, 2026 04:41

daxtheduck previously approved these changes Jun 18, 2026

View reviewed changes

cursor Bot reviewed Jun 18, 2026

View reviewed changes

noisysocks requested a review from brianhall June 18, 2026 04:44

Merge branch 'main' into randerson/regex-aftertext-beforetext

affaf36

github-actions Bot added the semver-patch Bug fix / internal — no release needed label Jun 23, 2026

cursor Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread injected/src/features/broker-protection/actions/extract.js

Comment thread injected/src/features/broker-protection/actions/extract.js

brianhall approved these changes Jun 23, 2026

View reviewed changes

noisysocks added this pull request to the merge queue Jun 23, 2026

cursor Bot reviewed Jun 23, 2026

View reviewed changes

Merged via the queue into main with commit ec0d6da Jun 23, 2026
39 of 40 checks passed

noisysocks deleted the randerson/regex-aftertext-beforetext branch June 23, 2026 05:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support `/pattern/i` regex in broker-protection extract afterText/beforeText/separator#2777

Support `/pattern/i` regex in broker-protection extract afterText/beforeText/separator#2777
noisysocks merged 3 commits into
mainfrom
randerson/regex-aftertext-beforetext

noisysocks commented Jun 18, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment •

edited

Loading

Uh oh!

github-actions Bot commented Jun 23, 2026

Uh oh!

Uh oh!

Uh oh!

brianhall left a comment

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

File	Lines	Severity	Finding
`injected/src/features/broker-protection/actions/extract.js`	358–383	info	Changes are confined to broker-protection profile extraction. No browser API overrides, prototype patches, or DOM lifecycle changes that could affect third-party scripts on arbitrary sites.
`injected/src/features/broker-protection/actions/extract.js`	376–383	info	Literal-string `afterText`/`beforeText` preserve the prior `String#split` semantics via `splitOnce`. Existing configs using plain strings are unaffected.
`injected/src/features/broker-protection/actions/extract.js`	376–380	info	RegExp `afterText`/`beforeText` intentionally diverge from `String#split` for multi-occurrence text (match + slice keeps everything after the first match). This is documented and tested; it only applies to `/pattern/` or `/pattern/i` matchers, not literals.
`injected/src/features/broker-protection/actions/extract.js`	358–360	warning	`parseRegexFromString` (renamed from `parseRegexSeparator`) changes how slash-wrapped separators ending in `/i` compile. Previously `/age/i` became `new RegExp('age/i')` (no flag); now it becomes `new RegExp('age', 'i')`. Any existing `separator` config that relied on the old parsing would behave differently. The new behavior is correct for the stated goal, but it is a config-level compatibility change worth noting in release notes.
`injected/src/features/broker-protection/actions/extract.js`	184–189	info	`afterText`/`beforeText` gain regex parsing for the first time. A config that used a literal value like `/verified/` as `beforeText` would previously split on that exact string; it now compiles as a regex. Unlikely in practice, but possible for edge-case configs.

File	Lines	Severity	Finding
(scope)	—	info	No changes to `captured-globals.js`, messaging transports, message bridge, `wrapper-utils.js`, `shouldExemptMethod()`, or platform entry points. Injected-runtime attack surface is unchanged.
`injected/src/features/broker-protection/actions/extract.js`	358–360	info	Regex patterns originate from native broker action JSON (trusted config), not from page scripts. `new RegExp(...)` is appropriate for this trust boundary.
`injected/src/features/broker-protection/actions/extract.js`	377–378	info	Regexes execute against page-derived `innerText`/`textContent`. A hostile page could supply long adversarial input against a config regex (ReDoS). This risk already existed for `separator` regexes; this PR extends the same model to `afterText`/`beforeText`. Mitigation is config-side pattern hygiene, not page-side.
`injected/src/features/broker-protection/actions/extract.js`	377	info	`matcher instanceof RegExp` uses the ambient `RegExp` constructor. In the broker-protection execution context this is low risk (patterns are constructed locally via `new RegExp`), but captured `RegExp` would be marginally more consistent with hardened injected code elsewhere.
`injected/src/features/broker-protection/actions/extract.js`	360	info	Invalid regex syntax in a `/pattern/` string will throw from `new RegExp` at extraction time. Failure is localized to the broker-protection action (not a page-wide throw), but a try/catch fallback to literal-string mode would make misconfigured patterns fail more gracefully.

Conversation

noisysocks commented Jun 18, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing Steps

Checklist

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Build Branch

Static preview entry points

Integration commands

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

[Beta] Generated file diff

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Injected PR Evaluation: Web Compatibility & Security

Web Compatibility Assessment

Security Assessment

Risk Level

Recommendations

Uh oh!

github-actions Bot commented Jun 23, 2026

Uh oh!

Uh oh!

Uh oh!

brianhall left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Injected PR Evaluation: Web Compatibility & Security

Web Compatibility Assessment

Security Assessment

Risk Level

Recommendations

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

noisysocks commented Jun 18, 2026 •

edited by cursor Bot

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading

cursor Bot left a comment •

edited

Loading