Skip to content

Add Playwright E2E tests for stack functionality (daily, dummy-provider, secret-free) #12

@graphaelli

Description

@graphaelli

Goal

Add browser-level Playwright E2E tests that verify the stack actually works (not just boots). PR #11's smoke test proves the stack starts + responds on HTTP; this covers the real user flows: login, an agent chat that queries ClickHouse via the real MCP server, Langfuse trace creation, and the feedback→Langfuse score path.

Runs daily + manual only (not on PRs) — the smoke test remains the PR-time launch gate; this heavier suite catches :latest image drift and functional regressions on a schedule.

Key design decision: dummy inference provider (secret-free)

A local OpenAI-compatible mock fakes only the inference; the MCP server, ClickHouse, and Langfuse stay real. No ANTHROPIC_API_KEY, no token spend, no flakiness.

This still produces real Langfuse traces (generation observation + real MCP→ClickHouse tool spans) and exercises scoring — the feedback→score bridge is upstream (packages/api/src/langfuse/feedback.ts, PR danny-avila/LibreChat#13544), so the public librechat:latest image already contains it. Tradeoff: token/cost and model name in traces are the mock's, and real frontier-model tool-selection isn't covered (not a "does the stack work" concern).

Components to build

  1. e2e/mock-llm/server.js — ~150-line Node (built-ins only) OpenAI-compatible server: GET /v1/models + streaming POST /v1/chat/completions with adaptive tool-calling (inspect the request tools[], find the ClickHouse query tool by name pattern, emit a tool_calls delta running SELECT 1, then a final text answer echoing the tool result). Adaptive lookup avoids hardcoding MCP tool names.

  2. docker-compose.e2e.yml (override; prod config untouched) — adds the mock-llm service and repoints LibreChat at a test config via CONFIG_PATH=/app/librechat.e2e.yaml (new mount path = no conflict), with depends_on: mock-llm.

  3. e2e/librechat.e2e.yaml — copy of librechat.yaml + a MockLLM custom endpoint (baseURL: http://mock-llm:8080/v1, models.default: [mock-model], fetch: false), retaining endpoints.agents, interface.mcpServers.use: true, and the ClickHouse-Local MCP server.

  4. Playwright project e2e/package.json (@playwright/test), playwright.config.ts (chromium, baseURL http://localhost:3080, setup auth project, html reporter, no webServer), setup/auth.setup.ts (login via POST /api/auth/login as admin@admin.com/passwordstorageState), lib/langfuse.ts (poll public API with base64(pk:sk) Basic auth — ingestion is async, poll don't assert-once). .gitignore the artifacts.

  5. Specs:

    • librechat.spec.ts — login page, authed new-chat UI loads, ClickHouse-Local selectable in the agent/MCP picker.
    • langfuse.spec.ts — Langfuse UI login (init user), "Default Project" dashboard + Traces view load.
    • roundtrip.spec.ts — pick MockLLM/agent w/ ClickHouse-Local → send prompt → assert reply contains the SELECT 1 result + tool invocation shown → assert a new Langfuse trace (generation + MCP spans) via the public API.
    • scoring.spec.ts — after a chat, click 👍/👎 → poll /api/public/scores and assert a score is attached to the trace (exercises PR #13544 end-to-end).
  6. .github/workflows/e2e.ymlschedule (daily) + workflow_dispatch only. Steps: bash scripts/generate-env.shdocker compose -f docker-compose.yml -f docker-compose.e2e.yml up -d --wait --wait-timeout 600setup-nodecd e2e && npm ci && npx playwright install --with-deps chromiumnpx playwright test → upload e2e/playwright-report on failure → always down -v. On scheduled failure, open/update a de-duped tracking issue (reuse the report-daily-failure pattern from smoke-test.yml).

Reuse

  • Launch pattern from .github/workflows/smoke-test.yml.
  • LibreChat selectors/auth from ~/src/ch/librechat/e2e/ and ~/src/ch/librechat/client: login getByLabel('Email'|'Password'), getByTestId('login-button'); chat getByTestId('text-input'|'send-button'|'messages-view'|'nav-new-chat-button').

Implementation notes / unknowns

  • Finalize agent/MCP-picker + 👍/👎 selectors via npx playwright codegen --test-id-attribute=data-testid http://localhost:3080/c/new against the live UI.
  • The mock must track the OpenAI streaming/tool-call wire format and how LibreChat passes MCP tools (adaptive lookup de-risks this).
  • Possible follow-up: pin the LibreChat image tag to reduce :latest drift.

Local verification

cd ~/src/ch/agentic-data-stack
bash scripts/generate-env.sh
docker compose -f docker-compose.yml -f docker-compose.e2e.yml up -d --wait --wait-timeout 600
cd e2e && npm ci && npx playwright install --with-deps chromium
npx playwright test
cd .. && docker compose -f docker-compose.yml -f docker-compose.e2e.yml down -v

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions