Add Playwright E2E tests for stack functionality (daily, dummy-provider, secret-free)

## Goal

Add browser-level **Playwright E2E tests** that verify the stack actually *works* (not just boots). PR #11's smoke test proves the stack starts + responds on HTTP; this covers the real user flows: login, an agent chat that queries **ClickHouse via the real MCP server**, **Langfuse trace creation**, and the **feedback→Langfuse score** path.

Runs **daily + manual only** (not on PRs) — the smoke test remains the PR-time launch gate; this heavier suite catches `:latest` image drift and functional regressions on a schedule.

## Key design decision: dummy inference provider (secret-free)

A local OpenAI-compatible **mock** fakes only the *inference*; the MCP server, ClickHouse, and Langfuse stay **real**. No `ANTHROPIC_API_KEY`, no token spend, no flakiness.

This still produces real Langfuse traces (generation observation + real MCP→ClickHouse tool spans) and exercises scoring — the feedback→score bridge is upstream (`packages/api/src/langfuse/feedback.ts`, PR danny-avila/LibreChat#13544), so the public `librechat:latest` image already contains it. Tradeoff: token/cost and model name in traces are the mock's, and real frontier-model tool-selection isn't covered (not a "does the stack work" concern).

## Components to build

1. **`e2e/mock-llm/server.js`** — ~150-line Node (built-ins only) OpenAI-compatible server: `GET /v1/models` + streaming `POST /v1/chat/completions` with **adaptive tool-calling** (inspect the request `tools[]`, find the ClickHouse query tool by name pattern, emit a `tool_calls` delta running `SELECT 1`, then a final text answer echoing the tool result). Adaptive lookup avoids hardcoding MCP tool names.

2. **`docker-compose.e2e.yml`** (override; prod config untouched) — adds the `mock-llm` service and repoints LibreChat at a test config via `CONFIG_PATH=/app/librechat.e2e.yaml` (new mount path = no conflict), with `depends_on: mock-llm`.

3. **`e2e/librechat.e2e.yaml`** — copy of `librechat.yaml` + a `MockLLM` custom endpoint (`baseURL: http://mock-llm:8080/v1`, `models.default: [mock-model]`, `fetch: false`), retaining `endpoints.agents`, `interface.mcpServers.use: true`, and the `ClickHouse-Local` MCP server.

4. **Playwright project `e2e/`** — `package.json` (`@playwright/test`), `playwright.config.ts` (chromium, `baseURL` `http://localhost:3080`, `setup` auth project, html reporter, no `webServer`), `setup/auth.setup.ts` (login via `POST /api/auth/login` as `admin@admin.com`/`password` → `storageState`), `lib/langfuse.ts` (poll public API with `base64(pk:sk)` Basic auth — **ingestion is async, poll don't assert-once**). `.gitignore` the artifacts.

5. **Specs:**
   - `librechat.spec.ts` — login page, authed new-chat UI loads, `ClickHouse-Local` selectable in the agent/MCP picker.
   - `langfuse.spec.ts` — Langfuse UI login (init user), "Default Project" dashboard + Traces view load.
   - `roundtrip.spec.ts` — pick MockLLM/agent w/ ClickHouse-Local → send prompt → assert reply contains the `SELECT 1` result + tool invocation shown → assert a new Langfuse trace (generation + MCP spans) via the public API.
   - `scoring.spec.ts` — after a chat, click 👍/👎 → poll `/api/public/scores` and assert a score is attached to the trace (exercises PR #13544 end-to-end).

6. **`.github/workflows/e2e.yml`** — `schedule` (daily) + `workflow_dispatch` only. Steps: `bash scripts/generate-env.sh` → `docker compose -f docker-compose.yml -f docker-compose.e2e.yml up -d --wait --wait-timeout 600` → `setup-node` → `cd e2e && npm ci && npx playwright install --with-deps chromium` → `npx playwright test` → upload `e2e/playwright-report` on failure → always `down -v`. On scheduled failure, open/update a de-duped tracking issue (reuse the `report-daily-failure` pattern from `smoke-test.yml`).

## Reuse
- Launch pattern from `.github/workflows/smoke-test.yml`.
- LibreChat selectors/auth from `~/src/ch/librechat/e2e/` and `~/src/ch/librechat/client`: login `getByLabel('Email'|'Password')`, `getByTestId('login-button')`; chat `getByTestId('text-input'|'send-button'|'messages-view'|'nav-new-chat-button')`.

## Implementation notes / unknowns
- Finalize agent/MCP-picker + 👍/👎 selectors via `npx playwright codegen --test-id-attribute=data-testid http://localhost:3080/c/new` against the live UI.
- The mock must track the OpenAI streaming/tool-call wire format and how LibreChat passes MCP tools (adaptive lookup de-risks this).
- Possible follow-up: pin the LibreChat image tag to reduce `:latest` drift.

## Local verification
```bash
cd ~/src/ch/agentic-data-stack
bash scripts/generate-env.sh
docker compose -f docker-compose.yml -f docker-compose.e2e.yml up -d --wait --wait-timeout 600
cd e2e && npm ci && npx playwright install --with-deps chromium
npx playwright test
cd .. && docker compose -f docker-compose.yml -f docker-compose.e2e.yml down -v
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Playwright E2E tests for stack functionality (daily, dummy-provider, secret-free) #12

Goal

Key design decision: dummy inference provider (secret-free)

Components to build

Reuse

Implementation notes / unknowns

Local verification

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add Playwright E2E tests for stack functionality (daily, dummy-provider, secret-free) #12

Description

Goal

Key design decision: dummy inference provider (secret-free)

Components to build

Reuse

Implementation notes / unknowns

Local verification

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions