Skip to content

[Test] Web acceptance stability#4506

Open
bekossy wants to merge 6 commits into
release/v0.100.9from
test-/-web-acceptance-stability
Open

[Test] Web acceptance stability#4506
bekossy wants to merge 6 commits into
release/v0.100.9from
test-/-web-acceptance-stability

Conversation

@bekossy
Copy link
Copy Markdown
Member

@bekossy bekossy commented May 31, 2026

Summary

Testing

Verified locally

Added or updated tests

QA follow-up

Demo

Checklist

  • I have included a video or screen recording for UI changes, or marked Demo as N/A
  • Relevant tests pass locally
  • Relevant linting and formatting pass locally
  • I have signed the CLA, or I will sign it when the bot prompts me

Contributor Resources

@vercel
Copy link
Copy Markdown

vercel Bot commented May 31, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Jun 1, 2026 9:27am

Request Review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 31, 2026

Review Change Stack

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: ff7be0d0-c7ef-49a5-97d5-d94474c2a12b

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR refactors Playwright acceptance tests to reduce network-response dependencies and adds comprehensive tests for the "Use API" documentation drawer. Changes include member-invite modal helpers, prompt-registry UI navigation, new Use API snippet tests for variants and deployments, and a slug validation assertion in evaluator creation.

Changes

Use API Snippet Tests

Layer / File(s) Summary
Acceptance test helpers and variants scenario
web/oss/tests/playwright/acceptance/use-api/index.ts (lines 1–223)
Imports, acceptance tags, and helper functions (deployFirstVariantToDevelopment, openVariantUseApiDrawer, openDeploymentUseApiDrawer, switchToTypescriptTab) for managing drawers and variant deployments. First test validates TypeScript snippet content on the Variants registry page, asserting the snippet includes application_variant_ref and axios.post.
Deployments scenario and export
web/oss/tests/playwright/acceptance/use-api/index.ts (lines 224–297)
Second test creates a completion app, deploys its first variant to Development, then validates the Use API drawer on the Deployments page includes environment_ref and axios.post. Exports useApiTests suite.
Feature file and spec registration
web/oss/tests/playwright/acceptance/features/use-api.feature, web/ee/tests/playwright/acceptance/use-api/use-api.spec.ts, web/oss/tests/playwright/acceptance/use-api/use-api.spec.ts, web/oss/tests/playwright/10-use-api.ts
Cucumber feature file defines two TypeScript scenarios. EE and OSS specs register the useApiTests suite via test.describe. OSS test index exports the suite for test discovery.

Test Refactoring to UI-Based Waiting

Layer / File(s) Summary
Members invite modal helpers
web/ee/tests/playwright/acceptance/members/index.ts
Removes waitForInviteResponse. Adds openInviteMembersModal (retry loop + email input visibility) and submitInviteMembersModal (form submission + modal dismissal). Updates invitePendingMember and test to use modal-based flow instead of network response waiting.
Prompt registry UI navigation
web/oss/tests/playwright/acceptance/prompt-registry/index.ts
Narrows PromptRegistryApiHelpers to remove waitForApiResponse. Refactors openWorkflowRevisionsPage to navigate UI only. Rewrites openFirstPublishedWorkflowRevision to click version labels and extract revisionId from URL. Removes API response state from test scenario.
Evaluators slug assertion
web/oss/tests/playwright/acceptance/evaluators/tests.ts
Adds assertion in createHumanEvaluatorFromDrawer that the "unique slug" input matches the provided evaluatorName, with a 5-second timeout.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • Agenta-AI/agenta#4308: Both PRs modify the Members invite Playwright flow in the EE tests, refactoring how invite submission is synchronized; this PR removes network-wait helpers in favor of modal-based ones.
  • Agenta-AI/agenta#4458: Overlaps with the EE members invite refactoring by replacing network/response waits with UI readiness checks in the same file.
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description is a template with empty sections and placeholder text; it does not provide substantive information about what was changed or why. Fill in the template sections with actual details: describe the refactored invite flow and new use-api tests, explain why the changes improve stability, and document what was tested.
Title check ❓ Inconclusive The title '[Test] Web acceptance stability' is vague and generic, using non-descriptive terms that don't convey specific information about the actual changes made. Consider a more specific title like '[Test] Refactor member invite flow and add use-api snippets tests' that reflects the actual changes.
✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch test-/-web-acceptance-stability

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@bekossy bekossy marked this pull request as ready for review May 31, 2026 20:04
@dosubot dosubot Bot added size:XS This PR changes 0-9 lines, ignoring generated files. Frontend tests labels May 31, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
web/oss/tests/playwright/acceptance/use-api/index.ts (1)

111-143: 💤 Low value

Consolidate the two drawer-open helpers.

openVariantUseApiDrawer and openDeploymentUseApiDrawer are identical except for the button locator. Consider parameterizing to remove the duplicated drawer-resolution block.

♻️ Proposed consolidation
-const openVariantUseApiDrawer = async (page: any) => {
-    await page.waitForLoadState("networkidle")
-    const useApiButton = page.locator('[data-tour="api-code-button"]')
-    await expect(useApiButton).toBeVisible({timeout: 15000})
-    await expect(useApiButton).toBeEnabled({timeout: 5000})
-    await useApiButton.click()
-
-    const drawer = page.locator(".ant-drawer-content-wrapper").filter({
-        hasText: "How to use API",
-    })
-    await expect(drawer).toBeVisible({timeout: 20000})
-    return drawer
-}
-
-const openDeploymentUseApiDrawer = async (page: any) => {
-    await page.waitForLoadState("networkidle")
-    const useApiButton = page.getByRole("button", {name: "Use API"}).first()
-    await expect(useApiButton).toBeVisible({timeout: 15000})
-    await expect(useApiButton).toBeEnabled({timeout: 5000})
-    await useApiButton.click()
-
-    const drawer = page.locator(".ant-drawer-content-wrapper").filter({
-        hasText: "How to use API",
-    })
-    await expect(drawer).toBeVisible({timeout: 20000})
-    return drawer
-}
+const openUseApiDrawer = async (page: any, useApiButton: any) => {
+    await expect(useApiButton).toBeVisible({timeout: 15000})
+    await expect(useApiButton).toBeEnabled({timeout: 5000})
+    await useApiButton.click()
+
+    const drawer = page.locator(".ant-drawer-content-wrapper").filter({
+        hasText: "How to use API",
+    })
+    await expect(drawer).toBeVisible({timeout: 20000})
+    return drawer
+}
+
+const openVariantUseApiDrawer = (page: any) =>
+    openUseApiDrawer(page, page.locator('[data-tour="api-code-button"]'))
+
+const openDeploymentUseApiDrawer = (page: any) =>
+    openUseApiDrawer(page, page.getByRole("button", {name: "Use API"}).first())
web/oss/tests/playwright/acceptance/evaluators/tests.ts (1)

354-355: ⚡ Quick win

Update slug assertion to account for real slug transformation (UI uses slugify)

The slug field in CreateEvaluator is set from the (debounced) evaluator name via a slugify helper (toLowerCase() and replace(/[^a-z0-9_\-]+/g, "-"), plus trimming/collapsing dashes). So the assertion should not assume a no-op transformation in general.

That said, the current tests pass evaluatorName values like e2e-human-eval-${Date.now()} (already lowercase and hyphen-safe), so it matches the slug output today. For robustness, assert against slugify(evaluatorName) instead of evaluatorName.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 685200ae-6c11-485f-8a7f-e0a67aa8fa47

📥 Commits

Reviewing files that changed from the base of the PR and between a269527 and 58e6bcb.

📒 Files selected for processing (8)
  • web/ee/tests/playwright/acceptance/members/index.ts
  • web/ee/tests/playwright/acceptance/use-api/use-api.spec.ts
  • web/oss/tests/playwright/10-use-api.ts
  • web/oss/tests/playwright/acceptance/evaluators/tests.ts
  • web/oss/tests/playwright/acceptance/features/use-api.feature
  • web/oss/tests/playwright/acceptance/prompt-registry/index.ts
  • web/oss/tests/playwright/acceptance/use-api/index.ts
  • web/oss/tests/playwright/acceptance/use-api/use-api.spec.ts

Comment on lines +53 to +77
const openInviteMembersModal = async (page: any) => {
const inviteButton = page.getByRole("button", {name: "Invite Members"}).first()
await expect(inviteButton).toBeVisible({timeout: 20000})
await expect(inviteButton).toBeEnabled()

const inviteModal = page.getByRole("dialog", {name: "Invite Members"})
const emailInput = inviteModal.getByPlaceholder("member@organization.com")

for (let attempt = 0; attempt < 2; attempt++) {
await inviteButton.click()

if (
await inviteModal
.waitFor({state: "visible", timeout: 5000})
.then(() => true)
.catch(() => false)
) {
await expect(emailInput).toBeVisible({timeout: 20000})
return {inviteModal, emailInput}
}
}

await expect(emailInput).toBeVisible({timeout: 10000})
return {inviteModal, emailInput}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Make the email input timeout consistent in both code paths.

The email input visibility check uses a 20-second timeout inside the retry loop (line 70) but only a 10-second timeout in the fallback path (line 75). This inconsistency could cause the fallback to fail prematurely if the modal opens on the second attempt but the email input takes 11-19 seconds to render.

🔧 Proposed fix to unify timeout
     }
 
-    await expect(emailInput).toBeVisible({timeout: 10000})
+    await expect(emailInput).toBeVisible({timeout: 20000})
     return {inviteModal, emailInput}
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const openInviteMembersModal = async (page: any) => {
const inviteButton = page.getByRole("button", {name: "Invite Members"}).first()
await expect(inviteButton).toBeVisible({timeout: 20000})
await expect(inviteButton).toBeEnabled()
const inviteModal = page.getByRole("dialog", {name: "Invite Members"})
const emailInput = inviteModal.getByPlaceholder("member@organization.com")
for (let attempt = 0; attempt < 2; attempt++) {
await inviteButton.click()
if (
await inviteModal
.waitFor({state: "visible", timeout: 5000})
.then(() => true)
.catch(() => false)
) {
await expect(emailInput).toBeVisible({timeout: 20000})
return {inviteModal, emailInput}
}
}
await expect(emailInput).toBeVisible({timeout: 10000})
return {inviteModal, emailInput}
}
const openInviteMembersModal = async (page: any) => {
const inviteButton = page.getByRole("button", {name: "Invite Members"}).first()
await expect(inviteButton).toBeVisible({timeout: 20000})
await expect(inviteButton).toBeEnabled()
const inviteModal = page.getByRole("dialog", {name: "Invite Members"})
const emailInput = inviteModal.getByPlaceholder("member@organization.com")
for (let attempt = 0; attempt < 2; attempt++) {
await inviteButton.click()
if (
await inviteModal
.waitFor({state: "visible", timeout: 5000})
.then(() => true)
.catch(() => false)
) {
await expect(emailInput).toBeVisible({timeout: 20000})
return {inviteModal, emailInput}
}
}
await expect(emailInput).toBeVisible({timeout: 20000})
return {inviteModal, emailInput}
}

Comment on lines +62 to +70
const revisionsRadio = page.getByRole("radio", {name: "Revisions"})
const revisionsControl = page
.locator(".ant-radio-button-wrapper")
.filter({hasText: "Revisions"})
.first()
await expect(revisionsControl).toBeVisible({timeout: 15000})
if (!(await revisionsRadio.isChecked())) {
await revisionsControl.click()
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard revisionsRadio against strict-mode violations.

revisionsControl is scoped with .first(), but revisionsRadio is not. getByRole's name option matches by substring (case-insensitive) by default, so if more than one element resolves, revisionsRadio.isChecked() will throw a strict-mode error. Add .first() for consistency with revisionsControl.

🛡️ Proposed fix
-    const revisionsRadio = page.getByRole("radio", {name: "Revisions"})
+    const revisionsRadio = page.getByRole("radio", {name: "Revisions"}).first()

Comment on lines +111 to +143
const openVariantUseApiDrawer = async (page: any) => {
await page.waitForLoadState("networkidle")
const useApiButton = page.locator('[data-tour="api-code-button"]')
await expect(useApiButton).toBeVisible({timeout: 15000})
await expect(useApiButton).toBeEnabled({timeout: 5000})
await useApiButton.click()

const drawer = page.locator(".ant-drawer-content-wrapper").filter({
hasText: "How to use API",
})
await expect(drawer).toBeVisible({timeout: 20000})
return drawer
}

/**
* Opens the "How to use API" drawer from the Deployments tab.
* Uses the generic primary "Use API" button (no data-tour on the deployments one).
* Waits for networkidle first — DeploymentsDashboard is a dynamic() import whose
* onClick setter may not be wired yet if clicked too early.
*/
const openDeploymentUseApiDrawer = async (page: any) => {
await page.waitForLoadState("networkidle")
const useApiButton = page.getByRole("button", {name: "Use API"}).first()
await expect(useApiButton).toBeVisible({timeout: 15000})
await expect(useApiButton).toBeEnabled({timeout: 5000})
await useApiButton.click()

const drawer = page.locator(".ant-drawer-content-wrapper").filter({
hasText: "How to use API",
})
await expect(drawer).toBeVisible({timeout: 20000})
return drawer
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

Is page.waitForLoadState("networkidle") discouraged in current Playwright documentation, and what is the recommended alternative?

💡 Result:

Yes. In the current Playwright documentation, waiting for the page to reach the networkidle load state is explicitly discouraged for testing/readiness checks. Playwright’s waitUntil documentation for page.goto()/load-state waiting states that waitUntil: "networkidle" / waitForLoadState("networkidle") is “DISCOURAGED” and that it “Don’t use this method for testing, rely on web assertions to assess readiness instead.” [1] Recommended alternative: use assertions (web-first assertions) or wait for specific user-visible/UI conditions (e.g., expect locators like toBeVisible()) rather than waiting for the network to become idle. Playwright’s best-practices guidance emphasizes using auto-waiting, web-first assertions for resilient readiness checks. [2][3]

Citations:


Avoid waitForLoadState("networkidle") in Playwright drawer helpers (web/oss/tests/playwright/acceptance/use-api/index.ts, Lines 112 & 132). Playwright best-practices explicitly discourage using the networkidle load state for test/readiness synchronization because it can be unreliable (e.g., long-polling/streaming can prevent “idle”). Since the helpers already rely on auto-waiting expect(...).toBeVisible() / toBeEnabled() plus a concrete drawer visibility assertion, remove the explicit await page.waitForLoadState("networkidle") and synchronize on the existing UI signals instead.

@bekossy bekossy changed the base branch from main to release/v0.100.9 May 31, 2026 20:14
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 31, 2026

Railway Preview Environment

Preview URL https://gateway-production-ccf9.up.railway.app/w
Project agenta-oss-pr-4506
Image tag pr-4506-c9adf0c
Status Deployed
Railway logs Open logs
Workflow logs View workflow run
Updated at 2026-06-01T09:31:09.146Z

…iness

Without --max-time/--connect-timeout, curl could hang indefinitely when a
Railway preview accepted the TCP connection but stalled before sending an
HTTP response. This caused the wait-for-readiness job to block for hours
instead of cycling through its 30-attempt loop.

- Add --max-time 10 --connect-timeout 5 to each curl attempt so the loop
  is always bounded (~10 min max across both URL checks).
- Add timeout-minutes: 25 to the job as a defence-in-depth backstop.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…n reinstall

playwright install --with-deps chromium was downloading 170MB and running
apt-get for hundreds of X11/font packages on every run, taking ~17 minutes.
This left almost no time for the auth bootstrap within the job timeout.

Cache ~/.cache/ms-playwright keyed on the tests package.json hash so the
browser binary is restored on cache hits (subsequent runs). playwright install
still runs after the cache restore — it detects the binary is present and
skips the download, but still verifies/installs any missing system deps via
apt which is fast when packages are already cached by the runner.

Also bumps the job timeout from 25 to 30 minutes to give the first (cold)
run enough headroom.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Frontend size:XS This PR changes 0-9 lines, ignoring generated files. tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant