Skip to content

fix: batch-eval docs + preserve config-bundle placeholders on AB-test promote#1638

Open
jariy17 wants to merge 2 commits into
mainfrom
fix/batch-eval-docs-and-abtest-placeholder
Open

fix: batch-eval docs + preserve config-bundle placeholders on AB-test promote#1638
jariy17 wants to merge 2 commits into
mainfrom
fix/batch-eval-docs-and-abtest-placeholder

Conversation

@jariy17

@jariy17 jariy17 commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Two independent bug-bash fixes, one commit each.

1. fix(docs): remove non-existent Builtin.Completeness evaluator

The batch-evaluation docs listed Builtin.Completeness and used it in a CLI example, but the API rejects it (Builtin evaluator Builtin.Completeness does not exist). The valid builtin list lives in run-eval.ts and never included it.

  • Dropped the row from the evaluator table in docs/batch-evaluation.md
  • Switched the dataset example in docs/commands.md to Builtin.Correctness

2. fix(ab-test): preserve portable component placeholders on promote

config-bundle promote fetched the winning version's components from the service — which keys them by resolved (account/region-specific) runtime ARN — and wrote them straight into agentcore.json, replacing the committed {{runtime:<name>}} placeholders with hardcoded ARNs and breaking cross-account/region portability.

Before: "{{runtime:cbagent}}"after promote: "arn:aws:bedrock-agentcore:us-west-2:...:runtime/cbbugbash_cbagent-N5owhv3MRl"

Fix: restorePlaceholderKeys() inverts the local bundle's placeholder→ARN map (using the same resolver deploy uses) and rewrites each incoming ARN key back to the placeholder before adopting. ARNs with no matching local placeholder pass through unchanged.

Testing

  • promote.test.ts: added a case asserting an ARN-keyed service response is rewritten back to {{runtime:r}} and the ARN key is absent. 13/13 pass.
  • typecheck + prettier + eslint pass (pre-commit hooks).

jariy17 added 2 commits June 24, 2026 20:02
Builtin.Completeness is listed in the batch-evaluation docs and used in a
CLI example, but the API rejects it ('does not exist'). The valid builtin
list lives in run-eval.ts. Drop it from the evaluator table and switch the
dataset example to Builtin.Correctness.
config-bundle promote fetched the winning version's components from the
service, which keys them by resolved (account/region-specific) runtime ARN,
and wrote them straight into agentcore.json — replacing the committed
{{runtime:<name>}} placeholders with hardcoded ARNs and breaking cross-
account/region portability of the config.

Remap the service-returned ARN keys back to the bundle's existing portable
placeholders (inverting the same resolver deploy uses) before adopting them.
@jariy17 jariy17 requested a review from a team June 24, 2026 20:06
@github-actions github-actions Bot added the size/s PR size: S label Jun 24, 2026
@github-actions github-actions Bot added the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 24, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 24, 2026
@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 24, 2026

@agentcore-cli-automation agentcore-cli-automation left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — two well-scoped, independent fixes with appropriate test coverage.

Docs fix: Verified Builtin.Completeness is not in BUILTIN_EVALUATOR_LEVELS in src/cli/operations/eval/run-eval.ts (only Correctness, Faithfulness, Helpfulness, ResponseRelevance, Conciseness, Coherence, InstructionFollowing, Refusal, GoalSuccessRate, ToolSelectionAccuracy are listed). Removing the row and swapping the example to Builtin.Correctness is correct.

Promote placeholder fix: restorePlaceholderKeys correctly inverts the local placeholder→ARN map by reusing the same resolveComponentKeyForJsonPath resolver the deploy/recommendation paths use, so behavior stays consistent. Skipping local arn:-prefixed keys and the arn !== key guard (unresolved placeholders) are both right. Pass-through for unmatched service ARNs is a reasonable fallback.

Test quality: Good — the new test only mocks at true I/O boundaries (ConfigIO, getConfigurationBundleVersion) and exercises the real resolver. Not excessively coupled to implementation.

No telemetry needed since this is a bug fix on a path that doesn't currently emit telemetry.

@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 24, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Package Tarball

aws-agentcore-0.20.2.tgz

How to install

gh release download pr-1638-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.20.2.tgz

@github-actions

Copy link
Copy Markdown
Contributor

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 37.2% 13612 / 36586
🔵 Statements 36.47% 14473 / 39678
🔵 Functions 31.82% 2335 / 7337
🔵 Branches 31.14% 9013 / 28942
Generated in workflow #3822 for commit 2deb3a6 by the Vitest Coverage Report Action

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/s PR size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants