Skip to content

[FE Fix]: Re-enable full-page playground for evaluator workflows#4474

Open
ardaerzin wants to merge 6 commits into
release/v0.100.7from
fe-fix/app-workflow-router-unification-regression-fix
Open

[FE Fix]: Re-enable full-page playground for evaluator workflows#4474
ardaerzin wants to merge 6 commits into
release/v0.100.7from
fe-fix/app-workflow-router-unification-regression-fix

Conversation

@ardaerzin
Copy link
Copy Markdown
Contributor

Summary

PR #4384 disabled EVALUATOR_FULL_PAGE_NAV_ENABLED because the app-style playground was a regression for evaluators (lost the upstream-app connection) and app-scoped observability defaulted to "invocation" instead of "annotation" for evaluator workflows. This change addresses both blockers and re-enables the flow by default.

Playground

  • added app chaining for evaluator workflows
  • minor ui fixes

Observability

  • fixed and improved filtering for evaluator workflows

QA follow-up

  • full app pages router tests for evaluator workflows, and checking against reasons why we disabled this feature after its initial release

Demo

Checklist

  • I have included a video or screen recording for UI changes, or marked Demo as N/A
  • Relevant tests pass locally
  • Relevant linting and formatting pass locally
  • I have signed the CLA, or I will sign it when the bot prompts me

Contributor Resources

PR #4384 disabled EVALUATOR_FULL_PAGE_NAV_ENABLED because the app-style
playground was a regression for evaluators (lost the upstream-app
connection) and app-scoped observability defaulted to "invocation"
instead of "annotation" for evaluator workflows. This change addresses
both blockers and re-enables the flow by default.

Playground
- ConfigureEvaluatorPage: upstream app workflow can be connected via
  EntityPicker (skip-variant adapter, filtered to non-evaluator
  non-feedback workflows). Disconnect affordance on the picker
  trigger and as a popup footer.
- Standalone evaluator runs no longer require an upstream app
  (TestsetDropdown is always available; runDisabled gate removed).
- Playground chain traces now write evaluator references
  (evaluator / evaluator_variant / evaluator_revision slots) so the
  per-evaluator observability page can find them. EntityPicker
  search bar respects a new parentLabel option so app pickers no
  longer show "Search evaluator..."

Observability filters
- Per-workflow-kind trace_type default extracted into
  @agenta/entities (defaultTraceTypeForWorkflow): annotation for
  evaluators, invocation otherwise. Pure helper unit-tested with
  vitest.
- References scope filter adapts to the effective trace_type:
  evaluators with trace_type=annotation pin to references.evaluator,
  invocation pins to references.application, and "no trace_type"
  ORs across both slots so all traces mentioning the evaluator
  surface.
- Dialog reconciliation: live label flip while editing trace_type
  in the filter dialog ("Application ID" / "Evaluator ID") via an
  opt-in reconcileFilterRows callback on Filters; observability
  page provides an evaluator-workflow-aware reconciler.
- Filter persistence across reloads: per-app via atomWithStorage
  under "agenta:observability:filters", with __global__ fallback
  for project-level pages. Both userFilters and traceTypeChoice
  share one packed storage atom.
- Cleaner state machine for trace_type intent: tagged union
  (default / value / cleared) replaces the dual-atom dance that
  could silently revert.
- application_id URL param dropped for evaluator workflows; the
  query is gated on workflow context being settled to avoid
  firing with the wrong scope.

Tests
- vitest unit tests for defaultTraceTypeForWorkflow.
- Playwright acceptance for full-page playground: post-create
  nav, row click for LLM and declarative evaluators, direct URL,
  sidebar switcher; fixes the previously broken
  select-app-and-run test for the new flow.
@vercel
Copy link
Copy Markdown

vercel Bot commented May 28, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment May 31, 2026 6:12pm

Request Review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR enables Phase 5 evaluator full-page playground navigation, adds app connect/disconnect flows derived from the playground node graph, and refactors observability trace_type defaults, persisted filter composition, and UI reconciliation to be workflow-aware.

Changes

Evaluator Full-Page Navigation & Observability Integration

Layer / File(s) Summary
Trace type defaults helper and workflow schema exports
web/packages/agenta-entities/src/workflow/core/traceTypeDefault.ts, web/packages/agenta-entities/src/workflow/core/index.ts, web/packages/agenta-entities/src/workflow/index.ts, web/packages/agenta-entities/tests/unit/traceTypeDefault.test.ts, web/packages/agenta-entities/src/workflow/core/schema.ts
Adds defaultTraceTypeForWorkflow plus types and unit tests; documents workflow_slug/workflow_variant_slug usage for evaluator reference chains.
Feature flag and router navigation
web/oss/src/state/workflow/flags.ts, web/oss/src/components/PlaygroundRouter/index.tsx
Enables EVALUATOR_FULL_PAGE_NAV_ENABLED and routes non-feedback evaluator workflows to ConfigureEvaluatorPage.
App connection state management
web/oss/src/components/Evaluators/components/ConfigureEvaluator/atoms.ts
Derives selected app label from playground node graph; persists selection only after graph mutations succeed; adds disconnectAppFromEvaluatorAtom.
Evaluator header UI with app controls
web/oss/src/components/Evaluators/components/ConfigureEvaluator/EvaluatorPlaygroundHeader.tsx
Adds disconnect button (picker footer and standalone icon), handleDisconnect, computes hasAppSelected, and renders TestsetDropdown unconditionally with updated comment.
ConfigureEvaluatorPage for full-page mode
web/oss/src/components/Evaluators/components/ConfigureEvaluator/index.tsx
Removes run-disabled gating and inline picker prompt; wires handleAppSelect; sets parentLabel: "Application" in app workflow adapter.
Evaluators registry row-click simplified
web/oss/src/components/Evaluators/index.tsx
Simplifies row-click routing to use EVALUATOR_FULL_PAGE_NAV_ENABLED && workflowId.
Drawer navigation and post-create flow
web/oss/src/components/WorkflowRevisionDrawerWrapper/index.tsx, web/oss/src/components/pages/evaluations/NewEvaluation/Components/CreateEvaluatorDrawer/index.tsx
Restores persisted app via connectApp (no direct label writes), removes explicit label clears, adds parentLabel: "Application", and simplifies post-create navigation gate.
Sidebar evaluator switcher gating
web/oss/src/components/Sidebar/components/WorkflowEntityCard.tsx
Builds switcher from nonArchivedEvaluatorsAtom gated by the feature flag and adds Evaluators group conditionally.
Trace type state persistence and derivation
web/oss/src/state/newObservability/atoms/controls.ts
Adds TraceTypeChoice, persists per-app/per-tab filters, and exposes effectiveTraceTypeAtomFamily derived using workflow defaults.
Filter regeneration and scope composition
web/oss/src/state/newObservability/atoms/controls.ts, web/oss/src/state/newObservability/atoms/queries.ts
filtersAtomFamily composes permanent scope filter + derived trace_type row + persisted user filters; tracesQueryAtom uses effectiveAppId and waits for workflow context resolution.
Filter UI reconciliation for evaluators
web/oss/src/components/Filters/Filters.tsx, web/oss/src/components/Filters/types.d.ts, web/oss/src/components/pages/observability/assets/filters/fieldAdapter.ts, web/oss/src/components/pages/observability/components/ObservabilityHeader/index.tsx
Adds reconcileFilterRows display-only projection; fieldAdapter adds referenceCategory and de-duplicates UI values; ObservabilityHeader supplies reconciler to remap reference categories based on derived trace_type.
Workflow adapter labeling
web/packages/agenta-entity-ui/src/selection/adapters/workflowRevisionRelationAdapter.ts, web/oss/src/components/Playground/Components/PlaygroundVariantConfig/assets/PlaygroundVariantConfigHeader.tsx
Adds parentLabel option (defaults to "Evaluator"); applied as "Application" in evaluator contexts to adjust UI copy and messages.
Evaluator trace reference construction
web/packages/agenta-playground/src/state/execution/executionRunner.ts
Adds buildEvaluatorSelfReferences and merges evaluator self-references into non-root stage references.
Playwright test coverage
web/oss/tests/playwright/acceptance/evaluators/tests.ts, web/oss/tests/playwright/acceptance/evaluators/index.ts
Exports new test constants and template name; rewrites and expands acceptance tests to exercise full-page playground flows and navigation gating.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Agenta-AI/agenta#4384: Related predecessor implementing earlier gating and follow-ups for evaluator full-page playground enablement.
  • Agenta-AI/agenta#4229: Related changes touching EvaluatorsRegistry row interactions and navigation behavior.
  • Agenta-AI/agenta#4265: Related observability/defaults updates used by this PR.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 60.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description check ✅ Passed The description is clearly related to the changeset, providing context about the regressions, the fixes implemented, and areas of change (playground app chaining, observability filtering).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title clearly identifies the main change: re-enabling the full-page playground feature for evaluator workflows, which is the primary objective addressed across all modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fe-fix/app-workflow-router-unification-regression-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ardaerzin ardaerzin marked this pull request as ready for review May 28, 2026 11:20
@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. Frontend labels May 28, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
web/oss/src/components/Evaluators/components/ConfigureEvaluator/atoms.ts (1)

165-178: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Persisted app selection can get stale on failed connect/disconnect edge paths.

persistedAppSelectionAtom is written before the primary-node swap succeeds, and disconnect exits early without clearing persisted state when no downstream node is found. That can rehydrate an app selection that is no longer actually connected.

Proposed fix
 export const connectAppToEvaluatorAtom = atom(
@@
-        // Persist across sessions. The picker display label is derived from
-        // the depth-0 node's `label` via `selectedAppLabelAtom`, so no extra
-        // write needed here.
-        set(persistedAppSelectionAtom, {appRevisionId, appLabel})
-
         // Replace primary node with app
         const nodeId = set(playgroundController.actions.changePrimaryNode, {
             type: "workflow",
             id: appRevisionId,
             label: appLabel,
         })
 
         if (!nodeId) return
+        // Persist only after graph mutation succeeds.
+        set(persistedAppSelectionAtom, {appRevisionId, appLabel})
@@
 export const disconnectAppFromEvaluatorAtom = atom(null, (get, set) => {
     const nodes = get(playgroundController.selectors.nodes())
     const downstreamEvaluator = nodes.find((n) => n.depth > 0)
-    if (!downstreamEvaluator) return
+    if (!downstreamEvaluator) {
+        set(persistedAppSelectionAtom, null)
+        return
+    }

Also applies to: 208-225


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: ce60569f-f33c-480b-a472-4ceb822d0b1e

📥 Commits

Reviewing files that changed from the base of the PR and between 0b9012d and 048d662.

📒 Files selected for processing (25)
  • web/oss/src/components/Evaluators/components/ConfigureEvaluator/EvaluatorPlaygroundHeader.tsx
  • web/oss/src/components/Evaluators/components/ConfigureEvaluator/atoms.ts
  • web/oss/src/components/Evaluators/components/ConfigureEvaluator/index.tsx
  • web/oss/src/components/Evaluators/index.tsx
  • web/oss/src/components/Filters/Filters.tsx
  • web/oss/src/components/Filters/types.d.ts
  • web/oss/src/components/Playground/Components/PlaygroundVariantConfig/assets/PlaygroundVariantConfigHeader.tsx
  • web/oss/src/components/PlaygroundRouter/index.tsx
  • web/oss/src/components/Sidebar/components/WorkflowEntityCard.tsx
  • web/oss/src/components/WorkflowRevisionDrawerWrapper/index.tsx
  • web/oss/src/components/pages/evaluations/NewEvaluation/Components/CreateEvaluatorDrawer/index.tsx
  • web/oss/src/components/pages/observability/assets/filters/fieldAdapter.ts
  • web/oss/src/components/pages/observability/components/ObservabilityHeader/index.tsx
  • web/oss/src/state/newObservability/atoms/controls.ts
  • web/oss/src/state/newObservability/atoms/queries.ts
  • web/oss/src/state/workflow/flags.ts
  • web/oss/tests/playwright/acceptance/evaluators/index.ts
  • web/oss/tests/playwright/acceptance/evaluators/tests.ts
  • web/packages/agenta-entities/src/workflow/core/index.ts
  • web/packages/agenta-entities/src/workflow/core/schema.ts
  • web/packages/agenta-entities/src/workflow/core/traceTypeDefault.ts
  • web/packages/agenta-entities/src/workflow/index.ts
  • web/packages/agenta-entities/tests/unit/traceTypeDefault.test.ts
  • web/packages/agenta-entity-ui/src/selection/adapters/workflowRevisionRelationAdapter.ts
  • web/packages/agenta-playground/src/state/execution/executionRunner.ts

Comment thread web/oss/src/components/PlaygroundRouter/index.tsx Outdated
Comment thread web/oss/src/components/Sidebar/components/WorkflowEntityCard.tsx Outdated
Comment thread web/oss/src/state/newObservability/atoms/controls.ts
Comment thread web/oss/tests/playwright/acceptance/evaluators/index.ts Outdated
CodeRabbit flagged 5 issues on the evaluator-full-page rollout PR.
This commit addresses each:

1. PlaygroundRouter — `is_feedback` evaluators skip the full-page swap.
   `workflowKind === "evaluator"` was too broad. Human/feedback
   evaluators are drawer-only in /evaluators (they capture human input,
   they don't run), so routing them to ConfigureEvaluatorPage produced
   a run-controls UI for a workflow with nothing to run. Added a
   `flags.is_feedback` exclusion next to the workflowKind check.

2. Sidebar — switcher filters out `is_feedback` evaluators.
   `nonArchivedEvaluatorsAtom` only filters by `deleted_at` and
   includes human evaluators; the switcher was exposing entries that,
   when clicked, would land on the (now-correctly-gated) generic
   <Playground /> for a feedback workflow. Filtered the list at the
   switcher boundary.

3. controls.ts — handle array-valued `trace_type` for in/not_in.
   The dialog dispatches `{operator: "in", value: ["annotation"]}` for
   the IN operator family, but the intent setter only normalized
   scalars — so the user's choice was silently dropped to
   `{kind: "cleared"}`. Normalize to an array, filter to enum values,
   and collapse single-value arrays back to a scalar. Multi-value
   selections (which mean "no filter" for a 2-value enum) still map
   to `cleared`.

4. Playwright — drop stale `[data-row-key]` poll in select-app-and-run.
   The test asserted post-create navigation to /apps/<id>/playground
   AFTER polling for the new row in the evaluators table — but the
   redirect wins first, the table disappears, and the poll became a
   timing-dependent failure. Removed the registry-side wait;
   evaluator-in-registry assertion is covered by the
   post-create-row-click test alongside.

5. ConfigureEvaluator/atoms.ts — fix persistedAppSelectionAtom race.
   `connectAppToEvaluatorAtom` persisted the app selection BEFORE
   `changePrimaryNode` ran, so a failed swap (returns `null` with no
   primary to swap from) left a stale localStorage record that the
   next mount re-hydrated into a phantom "connected" state. Moved the
   persist call to after both graph mutations succeed.
   `disconnectAppFromEvaluatorAtom` early-returned on no-downstream
   without clearing the persisted state, allowing the same phantom
   record to survive a disconnect attempt. Clear it on that branch
   too.

No behavior change for the happy-path full-page flow — these all
narrow edge cases the reviewer flagged.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

…ssion-fix

Resolves a single conflict in
`web/packages/agenta-entities/src/workflow/core/schema.ts` —
release v0.100.4 added `artifact_slug` / `variant_slug` to the
revision schema alongside the `workflow_slug` /
`workflow_variant_slug` fields this branch had introduced for
emitting evaluator references on playground chain runs.

Both sides added `workflow_slug` and `workflow_variant_slug`
with overlapping intent; resolution keeps all four fields
and merges the two doc comments into one that covers both
purposes (parent-workflow identification for ID-less callers
+ evaluator chain-trace emission).

No source behavior change — schema is additive on both sides.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 28, 2026

Railway Preview Environment

Preview URL https://gateway-production-7809.up.railway.app/w
Image tag pr-4474-9351b9e
Status Failed
Railway logs Open logs
Logs View workflow run
Updated at 2026-06-01T12:53:03.175Z

@junaway junaway changed the base branch from main to release/v0.100.7 May 29, 2026 12:46
@mmabrouk mmabrouk marked this pull request as draft June 1, 2026 07:08
@mmabrouk mmabrouk marked this pull request as ready for review June 1, 2026 07:09
@ardaerzin ardaerzin changed the title fix(frontend): re-enable full-page playground for evaluator workflows [FE Fix]: Re-enable full-page playground for evaluator workflows Jun 1, 2026
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we make changes to the Observability??

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ux improvement for filters

Copy link
Copy Markdown
Member

@mmabrouk mmabrouk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ardaerzin
Copy link
Copy Markdown
Contributor Author

Failed to build -> https://github.com/Agenta-AI/agenta_cloud/actions/runs/26751245314/job/78839925341

fail is not related to a frontend change tho

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Frontend Improvement size:XXL This PR changes 1000+ lines, ignoring generated files. tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants