feat: Add DeepEvalHandler for third-party evaluator integration by stone-coding · Pull Request #528 · aws/bedrock-agentcore-sdk-python

stone-coding · 2026-06-16T22:46:47Z

Issue P446281164 — Third-Party Evaluator Integration (Phase 1)

Description of changes Generic BaseAdapter framework that adapts 3P evaluation libraries into AgentCore-compatible Lambda handlers. Supports DeepEval, Autoevals, and is extensible for future libraries (RAGAS, etc.).

Key components:
BaseAdapter — shared orchestration: parse event (supports EvaluatorInput from @custom_code_based_evaluator() decorator) → extract fields → validate → execute with timeout → error handling
DeepEvalAdapter — runs any DeepEval BaseMetric. DeepEvalHandler alias for backward compat.
AutoevalsAdapter — runs any Autoevals scorer
Field extraction from _eval_log_records in ADOT spans (input, actual_output, retrieval_context, expected_output)
Thread-based timeout (default 290s)
field_mapper escape hatch for custom span extraction

Design decisions:
Composes with @custom_code_based_evaluator() decorator — accepts EvaluatorInput directly
Never raises unhandled exceptions — always returns valid response dict
Adding a new library = one ~20 line subclass with execute() method

Introduces a new integrations/deepeval/ module that adapts AgentCore Lambda evaluation events into DeepEval LLMTestCase objects, runs any BaseMetric, and returns structured score/label/explanation responses.

…leTurnParams deprecation

…d EvaluatorInput support

haomiao037 added 9 commits June 15, 2026 10:39

Add DeepEvalHandler integration with unit tests

ba80889

Introduces a new integrations/deepeval/ module that adapts AgentCore Lambda evaluation events into DeepEval LLMTestCase objects, runs any BaseMetric, and returns structured score/label/explanation responses.

Fix span extraction to use real AgentCore _eval_log_records structure

b0d9682

Set context field from tool messages for HallucinationMetric support

81a46dd

Use metric.success for label instead of manual threshold comparison

3080e40

Add model override and timeout enforcement to DeepEvalHandler

34674bb

Add model override, timeout enforcement, use metric.success, fix Sing…

6aedcbf

…leTurnParams deprecation

Fix _get_required_params to handle GEval unmappable typing params

2260eb3

Add .deepeval/ to gitignore

14f0354

Move model override to init to avoid per-call mutation

b109a64

stone-coding requested a review from a team June 16, 2026 22:46

stone-coding requested a deployment to manual-approval June 16, 2026 22:47 — with GitHub Actions Waiting

jariy17 mentioned this pull request Jun 18, 2026

Expose evaluationReferenceInputs (ground truth) on EvaluatorInput for code-based evaluators #539

Closed

Refactor to BaseAdapter framework with DeepEval/Autoevals adapters an…

4e74926

…d EvaluatorInput support

stone-coding requested a deployment to manual-approval June 24, 2026 23:26 — with GitHub Actions Waiting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add DeepEvalHandler for third-party evaluator integration#528

feat: Add DeepEvalHandler for third-party evaluator integration#528
stone-coding wants to merge 10 commits into
aws:mainfrom
stone-coding:deepeval-handler

stone-coding commented Jun 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

stone-coding commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

stone-coding commented Jun 16, 2026 •

edited

Loading