Skip to content

fix(status): handle Serverless Elasticsearch in elastic status#398

Open
MattDevy wants to merge 1 commit into
mainfrom
fix/status-serverless-elasticsearch
Open

fix(status): handle Serverless Elasticsearch in elastic status#398
MattDevy wants to merge 1 commit into
mainfrom
fix/status-serverless-elasticsearch

Conversation

@MattDevy

@MattDevy MattDevy commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Problem

elastic status reports Elasticsearch as down on Serverless projects, even when the project is healthy:

Context: cli-demo

  Elasticsearch  https://cli-demo-f41c2c.es.us-east-1.aws.elastic.cloud  ✗  request failed (410)
  Kibana         https://cli-demo-f41c2c.kb.us-east-1.aws.elastic.cloud  ✓  available (9.5.0)

Root cause

The Elasticsearch probe calls GET /_cluster/health (src/status/checks.ts). Serverless removes cluster-level APIs and answers that endpoint with 410 Gone, which the generic HTTP classifier turned into request failed (410). Kibana's /api/status exists on Serverless, so it already showed .

Confirmed against a live Serverless project:

Endpoint Status
GET / 200 (returns version.number, build_flavor: serverless)
GET /_cluster/health 410
GET /_security/_authenticate 200

Fix

  • pingService now surfaces the HTTP status code on failure.
  • checkElasticsearch keeps the single /_cluster/health request for stateful clusters. On a 410 specifically, it falls back to GET / (served on Serverless) and reads version.number.
  • EsCheckOk becomes a discriminated union: EsCheckStateful { flavor, status, nodes } | EsCheckServerless { flavor, version }.
  • The formatter renders Serverless as serverless (<version>); stateful output is unchanged.

Only a 410 triggers the fallback. Other failures (401, 503, network) are still reported as failures, and stateful clusters are completely unaffected.

Result

Context: cli-demo

  Elasticsearch  https://cli-demo-f41c2c.es.us-east-1.aws.elastic.cloud  ✓  serverless (9.5.0)
  Kibana         https://cli-demo-f41c2c.kb.us-east-1.aws.elastic.cloud  ✓  available (9.5.0)

Exit code is now 0.

Test plan

  • tsc -b clean
  • All 45 status tests pass (test/status/*.test.ts)
  • New tests: 410 → serverless fallback success, missing version.numberunexpected response, root failure propagation (401), serverless rendering
  • Verified live against a Serverless project (output above)
  • Stateful path / output unchanged (existing tests green)

Serverless projects remove cluster-level APIs, so `GET /_cluster/health`
returns 410 Gone. The status probe surfaced this as `request failed (410)`
and marked Elasticsearch as down, even though the project was healthy.

On a 410 from `_cluster/health`, fall back to `GET /` (served on
Serverless) and report the build version. Stateful clusters keep their
single-request path and existing `green (3 nodes)` output unchanged.

EsCheckOk is now a discriminated union (stateful | serverless); the
formatter renders Serverless as `serverless (<version>)`.
@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

MegaLinter analysis: Success

Descriptor Linter Files Fixed Errors Warnings Elapsed time
✅ COPYPASTE jscpd yes no no 9.26s
✅ REPOSITORY gitleaks yes no no 63.44s
✅ REPOSITORY git_diff yes no no 0.47s
✅ REPOSITORY secretlint yes no no 33.51s
✅ REPOSITORY trivy yes no no 18.06s
✅ TYPESCRIPT eslint 4 0 0 3.6s

Notices

📣 MegaLinter 9.5.0 is out! Discover the new features and security recommendations in the release announcement. (Skip this info by defining SECURITY_SUGGESTIONS: false)

See detailed reports in MegaLinter artifacts
Set VALIDATE_ALL_CODEBASE: true in mega-linter.yml to validate all sources, not only the diff

MegaLinter is graciously provided by OX Security
Show us your support by starring ⭐ the repository

Comment thread src/status/checks.ts
): Promise<EsCheck> {
const result = await pingService(block.url, '/_cluster/health', block.auth, fetchFn)
if (!result.ok) return { ok: false, url: block.url, error: result.error }
if (!result.ok) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only suggestion: if we have any hints that a cluster might be serverless (its hostname, perhaps, or if the commandProfile is set to serverless), we should run the serverless check first to reduce the chances of sending a request that's doomed to fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants