Goal
Add browser-level Playwright E2E tests that verify the stack actually works (not just boots). PR #11's smoke test proves the stack starts + responds on HTTP; this covers the real user flows: login, an agent chat that queries ClickHouse via the real MCP server, Langfuse trace creation, and the feedback→Langfuse score path.
Runs daily + manual only (not on PRs) — the smoke test remains the PR-time launch gate; this heavier suite catches :latest image drift and functional regressions on a schedule.
Key design decision: dummy inference provider (secret-free)
A local OpenAI-compatible mock fakes only the inference; the MCP server, ClickHouse, and Langfuse stay real. No ANTHROPIC_API_KEY, no token spend, no flakiness.
This still produces real Langfuse traces (generation observation + real MCP→ClickHouse tool spans) and exercises scoring — the feedback→score bridge is upstream (packages/api/src/langfuse/feedback.ts, PR danny-avila/LibreChat#13544), so the public librechat:latest image already contains it. Tradeoff: token/cost and model name in traces are the mock's, and real frontier-model tool-selection isn't covered (not a "does the stack work" concern).
Components to build
-
e2e/mock-llm/server.js — ~150-line Node (built-ins only) OpenAI-compatible server: GET /v1/models + streaming POST /v1/chat/completions with adaptive tool-calling (inspect the request tools[], find the ClickHouse query tool by name pattern, emit a tool_calls delta running SELECT 1, then a final text answer echoing the tool result). Adaptive lookup avoids hardcoding MCP tool names.
-
docker-compose.e2e.yml (override; prod config untouched) — adds the mock-llm service and repoints LibreChat at a test config via CONFIG_PATH=/app/librechat.e2e.yaml (new mount path = no conflict), with depends_on: mock-llm.
-
e2e/librechat.e2e.yaml — copy of librechat.yaml + a MockLLM custom endpoint (baseURL: http://mock-llm:8080/v1, models.default: [mock-model], fetch: false), retaining endpoints.agents, interface.mcpServers.use: true, and the ClickHouse-Local MCP server.
-
Playwright project e2e/ — package.json (@playwright/test), playwright.config.ts (chromium, baseURL http://localhost:3080, setup auth project, html reporter, no webServer), setup/auth.setup.ts (login via POST /api/auth/login as admin@admin.com/password → storageState), lib/langfuse.ts (poll public API with base64(pk:sk) Basic auth — ingestion is async, poll don't assert-once). .gitignore the artifacts.
-
Specs:
librechat.spec.ts — login page, authed new-chat UI loads, ClickHouse-Local selectable in the agent/MCP picker.
langfuse.spec.ts — Langfuse UI login (init user), "Default Project" dashboard + Traces view load.
roundtrip.spec.ts — pick MockLLM/agent w/ ClickHouse-Local → send prompt → assert reply contains the SELECT 1 result + tool invocation shown → assert a new Langfuse trace (generation + MCP spans) via the public API.
scoring.spec.ts — after a chat, click 👍/👎 → poll /api/public/scores and assert a score is attached to the trace (exercises PR #13544 end-to-end).
-
.github/workflows/e2e.yml — schedule (daily) + workflow_dispatch only. Steps: bash scripts/generate-env.sh → docker compose -f docker-compose.yml -f docker-compose.e2e.yml up -d --wait --wait-timeout 600 → setup-node → cd e2e && npm ci && npx playwright install --with-deps chromium → npx playwright test → upload e2e/playwright-report on failure → always down -v. On scheduled failure, open/update a de-duped tracking issue (reuse the report-daily-failure pattern from smoke-test.yml).
Reuse
- Launch pattern from
.github/workflows/smoke-test.yml.
- LibreChat selectors/auth from
~/src/ch/librechat/e2e/ and ~/src/ch/librechat/client: login getByLabel('Email'|'Password'), getByTestId('login-button'); chat getByTestId('text-input'|'send-button'|'messages-view'|'nav-new-chat-button').
Implementation notes / unknowns
- Finalize agent/MCP-picker + 👍/👎 selectors via
npx playwright codegen --test-id-attribute=data-testid http://localhost:3080/c/new against the live UI.
- The mock must track the OpenAI streaming/tool-call wire format and how LibreChat passes MCP tools (adaptive lookup de-risks this).
- Possible follow-up: pin the LibreChat image tag to reduce
:latest drift.
Local verification
cd ~/src/ch/agentic-data-stack
bash scripts/generate-env.sh
docker compose -f docker-compose.yml -f docker-compose.e2e.yml up -d --wait --wait-timeout 600
cd e2e && npm ci && npx playwright install --with-deps chromium
npx playwright test
cd .. && docker compose -f docker-compose.yml -f docker-compose.e2e.yml down -v
Goal
Add browser-level Playwright E2E tests that verify the stack actually works (not just boots). PR #11's smoke test proves the stack starts + responds on HTTP; this covers the real user flows: login, an agent chat that queries ClickHouse via the real MCP server, Langfuse trace creation, and the feedback→Langfuse score path.
Runs daily + manual only (not on PRs) — the smoke test remains the PR-time launch gate; this heavier suite catches
:latestimage drift and functional regressions on a schedule.Key design decision: dummy inference provider (secret-free)
A local OpenAI-compatible mock fakes only the inference; the MCP server, ClickHouse, and Langfuse stay real. No
ANTHROPIC_API_KEY, no token spend, no flakiness.This still produces real Langfuse traces (generation observation + real MCP→ClickHouse tool spans) and exercises scoring — the feedback→score bridge is upstream (
packages/api/src/langfuse/feedback.ts, PR danny-avila/LibreChat#13544), so the publiclibrechat:latestimage already contains it. Tradeoff: token/cost and model name in traces are the mock's, and real frontier-model tool-selection isn't covered (not a "does the stack work" concern).Components to build
e2e/mock-llm/server.js— ~150-line Node (built-ins only) OpenAI-compatible server:GET /v1/models+ streamingPOST /v1/chat/completionswith adaptive tool-calling (inspect the requesttools[], find the ClickHouse query tool by name pattern, emit atool_callsdelta runningSELECT 1, then a final text answer echoing the tool result). Adaptive lookup avoids hardcoding MCP tool names.docker-compose.e2e.yml(override; prod config untouched) — adds themock-llmservice and repoints LibreChat at a test config viaCONFIG_PATH=/app/librechat.e2e.yaml(new mount path = no conflict), withdepends_on: mock-llm.e2e/librechat.e2e.yaml— copy oflibrechat.yaml+ aMockLLMcustom endpoint (baseURL: http://mock-llm:8080/v1,models.default: [mock-model],fetch: false), retainingendpoints.agents,interface.mcpServers.use: true, and theClickHouse-LocalMCP server.Playwright project
e2e/—package.json(@playwright/test),playwright.config.ts(chromium,baseURLhttp://localhost:3080,setupauth project, html reporter, nowebServer),setup/auth.setup.ts(login viaPOST /api/auth/loginasadmin@admin.com/password→storageState),lib/langfuse.ts(poll public API withbase64(pk:sk)Basic auth — ingestion is async, poll don't assert-once)..gitignorethe artifacts.Specs:
librechat.spec.ts— login page, authed new-chat UI loads,ClickHouse-Localselectable in the agent/MCP picker.langfuse.spec.ts— Langfuse UI login (init user), "Default Project" dashboard + Traces view load.roundtrip.spec.ts— pick MockLLM/agent w/ ClickHouse-Local → send prompt → assert reply contains theSELECT 1result + tool invocation shown → assert a new Langfuse trace (generation + MCP spans) via the public API.scoring.spec.ts— after a chat, click 👍/👎 → poll/api/public/scoresand assert a score is attached to the trace (exercises PR #13544 end-to-end)..github/workflows/e2e.yml—schedule(daily) +workflow_dispatchonly. Steps:bash scripts/generate-env.sh→docker compose -f docker-compose.yml -f docker-compose.e2e.yml up -d --wait --wait-timeout 600→setup-node→cd e2e && npm ci && npx playwright install --with-deps chromium→npx playwright test→ uploade2e/playwright-reporton failure → alwaysdown -v. On scheduled failure, open/update a de-duped tracking issue (reuse thereport-daily-failurepattern fromsmoke-test.yml).Reuse
.github/workflows/smoke-test.yml.~/src/ch/librechat/e2e/and~/src/ch/librechat/client: logingetByLabel('Email'|'Password'),getByTestId('login-button'); chatgetByTestId('text-input'|'send-button'|'messages-view'|'nav-new-chat-button').Implementation notes / unknowns
npx playwright codegen --test-id-attribute=data-testid http://localhost:3080/c/newagainst the live UI.:latestdrift.Local verification