FEAT Add RegexScorer and CredentialLeakScorer for regex-based secret detection#1704
FEAT Add RegexScorer and CredentialLeakScorer for regex-based secret detection#1704francose wants to merge 6 commits into
Conversation
Adds a deterministic TrueFalseScorer that detects leaked credentials in LLM responses using regex pattern matching. Covers AWS keys, GitHub tokens, Google API keys, Slack tokens/webhooks, JWTs, private key headers, connection strings, and generic key=value assignments. Runs without an LLM call, making it suitable for CI pipelines and high-volume evaluations where the existing SelfAskTrueFalseScorer with the leakage prompt would be too slow or expensive. Supports custom pattern dictionaries for domain-specific secret formats.
There was a problem hiding this comment.
Pull request overview
Adds a new deterministic True/False scorer (CredentialLeakScorer) to quickly detect common credential/secret formats in LLM outputs using compiled regexes, plus unit tests and a public export from pyrit.score.
Changes:
- Introduces
CredentialLeakScorerwith a default regex pattern set and optional custom patterns. - Adds unit tests covering true positives/negatives, rationale output, custom patterns, and CentralMemory integration.
- Exposes
CredentialLeakScorerfrompyrit.score.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
pyrit/score/true_false/credential_leak_scorer.py |
New regex-based scorer implementation producing true/false Score results with rationale. |
tests/unit/score/test_credential_leak_scorer.py |
Unit tests validating detection behavior, rationale, custom patterns, and memory integration. |
pyrit/score/__init__.py |
Exports CredentialLeakScorer from the public pyrit.score package. |
…sive copy, obfuscated test literals - Replace Optional[X] with X | None per repo style guide - Use str(detected).lower() for consistent true/false score values - Copy patterns dict to prevent cross-instance mutation of defaults - Construct test credential strings via concatenation to avoid secret scanner triggers
|
@microsoft-github-policy-service agree |
- AWS Secret Access Key pattern now requires context (aws_secret_access_key=, aws_secret=, or secret_key=) instead of matching any 40-char base64 string. Prevents false positives on git commit hashes and random strings. - Add doc/code/scoring/credential_leak_scorer.py with usage examples for default patterns and custom pattern dictionaries. - Fix AWS test key from 21 to 20 chars to match the AKIA+16 format.
|
@romanlutz Thank you for the feedback 🙏 — totally agree. The regex matching logic is generic enough to stand on its own. I'll refactor into:
That way spinning up new regex-based scorers (PII detection, code patterns, etc.) is just a new subclass with a different pattern set — no engine duplication. Will push the update. |
Extract generic regex matching logic into RegexScorer so future pattern-based scorers can reuse the engine without class proliferation. CredentialLeakScorer now passes its default patterns to super().
|
@romanlutz Pushed the refactor! |
romanlutz
left a comment
There was a problem hiding this comment.
Thanks for this contribution! Approving provided the comments are addressed.
- RegexScorer raises ValueError when patterns dict is empty - Connection string pattern now requires user:pass@ credentials, so postgres://localhost:5432/mydb no longer triggers a false positive
Closes #1703
Adds two new true/false scorers for fast, regex-based content detection — no LLM call required.
RegexScorer(general purpose)A reusable
TrueFalseScorerthat evaluates text against a dict of named regex patterns and returnsTrueif any of them match. Patterns are compiled once in__init__. The score rationale lists which named patterns matched, andcategoriescan be set to tag results (e.g.["pii"],["security"]). Aggregator defaults toTrueFalseScoreAggregator.ORbut is configurable.This is intended as a building block for any domain-specific regex check — credentials, PII, profanity, internal identifiers, etc. — without re-implementing the scorer plumbing each time.
CredentialLeakScorer(built onRegexScorer)Subclasses
RegexScorerwith a built-in default pattern set covering the most common leaked-credential formats:ghp_/gho_/ghu_/ghs_/ghr_)api_key=/secret=/password=/token=assignmentsPass a custom
patternsdict to override the defaults entirely (useful for organization-specific secret formats like internal API key prefixes). Category defaults to["security"].Because there's no LLM call, scoring runs in microseconds per evaluation, which makes it practical for CI and batch evaluation of thousands of responses.
Other changes
pyrit.scoredoc/code/scoring/credential_leak_scorer.pywalking through detection, clean responses, and custom patternsRegexScorer(match / no-match / multiple matches / category propagation) andCredentialLeakScorer(true positives across all default pattern types, true negatives, rationale content, custom patterns, and memory integration)