feat: Add DeepEvalHandler for third-party evaluator integration#528
Open
stone-coding wants to merge 10 commits into
Open
feat: Add DeepEvalHandler for third-party evaluator integration#528stone-coding wants to merge 10 commits into
stone-coding wants to merge 10 commits into
Conversation
Introduces a new integrations/deepeval/ module that adapts AgentCore Lambda evaluation events into DeepEval LLMTestCase objects, runs any BaseMetric, and returns structured score/label/explanation responses.
…leTurnParams deprecation
…d EvaluatorInput support
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue P446281164 — Third-Party Evaluator Integration (Phase 1)
Description of changes Generic BaseAdapter framework that adapts 3P evaluation libraries into AgentCore-compatible Lambda handlers. Supports DeepEval, Autoevals, and is extensible for future libraries (RAGAS, etc.).
Key components:
BaseAdapter — shared orchestration: parse event (supports EvaluatorInput from @custom_code_based_evaluator() decorator) → extract fields → validate → execute with timeout → error handling
DeepEvalAdapter — runs any DeepEval BaseMetric. DeepEvalHandler alias for backward compat.
AutoevalsAdapter — runs any Autoevals scorer
Field extraction from _eval_log_records in ADOT spans (input, actual_output, retrieval_context, expected_output)
Thread-based timeout (default 290s)
field_mapper escape hatch for custom span extraction
Design decisions:
Composes with @custom_code_based_evaluator() decorator — accepts EvaluatorInput directly
Never raises unhandled exceptions — always returns valid response dict
Adding a new library = one ~20 line subclass with execute() method