Skip to content

fix(ci): harden Submit docs URLs to IndexNow step#833

Merged
nehagup merged 4 commits intomainfrom
fix/docs-indexnow-harden
Apr 14, 2026
Merged

fix(ci): harden Submit docs URLs to IndexNow step#833
nehagup merged 4 commits intomainfrom
fix/docs-indexnow-harden

Conversation

@slayerjain
Copy link
Copy Markdown
Member

Summary

Three functional bugs in the existing Submit docs URLs to IndexNow step in main.yml have been silently broken since #827 landed:

  1. Hardcoded fallback key is stale: ${INDEXNOW_KEY:-9ded60b43b8b40578588fc42b04c3e5d}. That key was deleted from landing/public/ in keploy/landing#186 because Bing was rejecting it with HTTP 403 UserForbiddedToAccessSite. Every push-run of main.yml since feat: combined SEO/GEO/AEO + IndexNow auto-submission + build fix #827 has been hitting 403 in silence. Reproduced just now:
    curl -sS -X POST https://api.indexnow.org/indexnow \
      -H 'Content-Type: application/json' \
      -d '{"host":"keploy.io","key":"9ded60b43b8b40578588fc42b04c3e5d",...}'
    {"errorCode":"UserForbiddedToAccessSite",...}
    HTTP 403
    
  2. continue-on-error: true + no ::error:: annotation + no exit 1 meant the failure was completely invisible. Step went green, no alerting.
  3. No URL filtering: the sitemap contains ~800 entries including 412 auto-generated /tags/ pages and 184 legacy /docs/{1,2,3}.0.0/ pages (which robots.txt already disallows). Only ~206 are current-version docs worth submitting.

Changes

  • Default INDEXNOW_KEY to feeeddbe661a4686ba6e4d5cb663b2cb — the key actually served from keploy.io/<key>.txt. The secrets.INDEXNOW_KEY override still takes priority for rotations.
  • Add keyLocation to the payload.
  • Filter sitemap URLs: drop /tags/ pages and legacy /docs/{1,2,3}.0.0/ paths before submitting. Reduces submission from ~800 → ~206, all on currently-supported paths.
  • set -euo pipefail, capture CURL_EXIT separately from HTTP_CODE, exit non-zero with ::error:: on real failure (curl exit != 0 or HTTP non-2xx), print response body plus a Troubleshooting block containing a keyLocation check and a ready-to-paste POST reproducer. 2xx (including 202 accepted-pending-verification) is an explicit exit 0.
  • Keep continue-on-error: true so a failure never blocks the deploy, but ::error:: annotations now surface failures in the Actions UI instead of hiding them.

Also mirrors the hardening pattern already shipped in keploy/blog-website#366.

Test plan

  • Dry-ran the filter locally against the live docs/sitemap.xml (after gunzip) → 206 current-version URLs, 0 tag pages, 0 legacy version pages
  • Submitted a real 20-URL POST to https://api.indexnow.org/indexnow with the new key + keyLocationHTTP 200
  • Verified YAML parses cleanly
  • INDEXNOW_KEY secret already set on keploy/docs (separate, so Option A from the diagnosis is in effect right now — the next main.yml run should hit HTTP 200 even before this PR merges)
  • Merge and confirm the next push-run on main.yml logs Submitted <N> URLs to IndexNow (HTTP 200) instead of silently 403'ing
  • Bing Webmaster Tools → IndexNow tab should show the new submission within ~24h

🤖 Generated with Claude Code

The existing step had three functional problems that made it silently
broken since PR #827 landed:

1. Hardcoded fallback key `9ded60b43b8b40578588fc42b04c3e5d` was the
   stale key deleted from landing/public/ in keploy/landing#186 because
   Bing was rejecting it with HTTP 403 UserForbiddedToAccessSite. Every
   push-run since PR #827 has been hitting 403 in silence.

2. `continue-on-error: true` combined with a plain `echo` on failure
   (no `::error::` annotation, no `exit 1`) meant the failing step
   reported green in the Actions UI — no signal, no alerting.

3. No `keyLocation` in the payload, no CURL_EXIT capture, no filter for
   legacy versioned docs or auto-generated tag pages (so ~800 URLs were
   submitted, mostly low-value /tags/ and /docs/{1,2,3}.0.0/ pages that
   robots.txt already disallows).

Changes:
- Default INDEXNOW_KEY to feeeddbe661a4686ba6e4d5cb663b2cb (the key
  actually served from keploy.io/<key>.txt). Secret override still
  works for future rotations.
- Add keyLocation to the payload.
- Filter sitemap URLs: drop /tags/ pages and legacy /docs/{1,2,3}.0.0/
  before submitting — reduces from ~800 to ~206 URLs, all on supported
  paths. Verified locally with a real POST, HTTP 200.
- set -euo pipefail, capture CURL_EXIT separately from HTTP_CODE, exit
  non-zero with ::error:: on real failure, print response body +
  troubleshooting block with keyLocation check and a ready-to-paste
  reproducer. 2xx (including 202 accepted-pending-verification) is an
  explicit exit 0.
- Keep continue-on-error: true so the IndexNow step never blocks the
  deploy, but the ::error:: annotation now surfaces failures in the UI
  instead of hiding them.

Dry-run locally against the live sitemap-derived URL list returned
HTTP 200 with 20 real docs URLs.

Signed-off-by: nehagup <15074229+nehagup@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 14, 2026 15:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Hardens the GitHub Actions “Submit docs URLs to IndexNow” step so IndexNow failures are visible (without blocking deploys) and submissions are limited to the desired subset of docs URLs.

Changes:

  • Defaults INDEXNOW_KEY to a new key (still overridable via secrets.INDEXNOW_KEY) and adds keyLocation to the IndexNow payload.
  • Filters sitemap-derived URLs to exclude /tags/ pages and legacy versioned docs paths before submitting.
  • Improves shell robustness and failure reporting (pipefail, curl exit vs HTTP code, ::error:: annotations, troubleshooting output).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/main.yml Outdated
URLS=$(grep -oP '<loc>\K[^<]+' "$SITEMAP" \
| grep -v '/tags/' \
| grep -vE '/docs/(1|2|3)\.[0-9]+\.[0-9]+/' \
| sort -u)
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With set -euo pipefail, this URLS=$(grep ... | grep ... | ...) pipeline will cause the script to exit immediately if grep finds no matches (exit status 1), so the subsequent if [ -z "$URLS" ] skip logic may never run. Make the pipeline non-fatal (e.g., append || true to the command substitution, or temporarily disable -e around the parse) so an empty sitemap/filter result is handled as an intentional skip rather than a step failure.

Suggested change
| sort -u)
| sort -u || true)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 514e127: appended || true to the command substitution so the filter pipeline can return a non-empty exit without aborting the step. Verified locally with a fixture where every URL in the sitemap matched the drop filters — the [ -z "$URLS" ] skip branch is now reachable and fires cleanly.

Comment on lines 72 to 88
if [ ! -f "$SITEMAP" ]; then
echo "Sitemap not found at $SITEMAP, skipping IndexNow"
echo "::warning::Sitemap not found at $SITEMAP, skipping IndexNow submission"
exit 0
fi

URLS=$(grep -oP '<loc>\K[^<]+' "$SITEMAP" || true)
# Filter to the current (unversioned) docs only:
# - drop auto-generated /tags/ pages (low-value for search indexing)
# - drop legacy versioned docs (/docs/1.0.0/, 2.0.0/, 3.0.0/) since
# robots.txt disallows them and they're marked noindex anyway
URLS=$(grep -oP '<loc>\K[^<]+' "$SITEMAP" \
| grep -v '/tags/' \
| grep -vE '/docs/(1|2|3)\.[0-9]+\.[0-9]+/' \
| sort -u)

if [ -z "$URLS" ]; then
echo "No URLs found in sitemap, skipping"
echo "::warning::No current-version docs URLs parsed from $SITEMAP, skipping"
exit 0
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow annotation uses ::warning:: for non-error conditions (missing sitemap / no URLs). This repo’s logging guidelines disallow adding warnings; consider using a plain echo or ::notice:: instead so the Actions UI doesn’t surface these as warnings while still providing visibility.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 514e127: changed both non-error skip paths (missing sitemap, no URLs after filter) from ::warning:: to ::notice::. The ::error:: annotations on actual IndexNow failures (curl-exit / non-2xx) are retained — those are still real errors that should surface.

Comment thread .github/workflows/main.yml Outdated
Comment on lines +79 to +80
# - drop legacy versioned docs (/docs/1.0.0/, 2.0.0/, 3.0.0/) since
# robots.txt disallows them and they're marked noindex anyway
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment says legacy versioned docs are excluded because robots.txt disallows them and they are noindex, but the repository’s static/robots.txt currently allows all paths (except /cgi-bin/) and there’s no matching noindex rule found. Please update the comment to reflect the real reason for filtering (e.g., keeping submissions focused on preferred doc paths) so future maintainers aren’t misled.

Suggested change
# - drop legacy versioned docs (/docs/1.0.0/, 2.0.0/, 3.0.0/) since
# robots.txt disallows them and they're marked noindex anyway
# - drop legacy versioned docs (/docs/1.0.0/, 2.0.0/, 3.0.0/) to keep
# submissions focused on the preferred current doc paths

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 514e127: replaced the misleading "robots.txt disallows / noindex" rationale with the real one ("keep submissions focused on the preferred current doc paths"). The old comment was written assuming #832's nuanced robots.txt would land first, but that's a separate PR and this file has to stand on its own.

Comment thread .github/workflows/main.yml Outdated
Comment on lines +99 to +102
HTTP_CODE=$(curl -sS -o /tmp/indexnow-resp.txt -w "%{http_code}" \
-X POST "https://api.indexnow.org/indexnow" \
-H "Content-Type: application/json" \
-d "{\"host\":\"keploy.io\",\"key\":\"$KEY\",\"urlList\":$JSON_URLS}" || echo "000")
-d "{\"host\":\"${INDEXNOW_HOST}\",\"key\":\"${INDEXNOW_KEY}\",\"keyLocation\":\"${INDEXNOW_KEY_LOC}\",\"urlList\":${JSON_URLS}}") || CURL_EXIT=$?
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The response body is written to a fixed path (/tmp/indexnow-resp.txt) and never cleaned up. Using a unique temp file (e.g., mktemp) and a trap cleanup makes the step more robust and avoids any chance of collisions or stale data being read later in the script.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 514e127: introduced RESPONSE_FILE=$(mktemp) at the top of the run script plus trap 'rm -f "$RESPONSE_FILE"' EXIT so every exit path (happy path, skip, curl failure, non-2xx) cleans up. Verified the trap fires on EXIT with a local dry-run. Also swapped the two call sites (curl -o and the non-2xx cat) to use $RESPONSE_FILE — no more hardcoded /tmp/indexnow-resp.txt.

…ect filter rationale, mktemp+trap

Four fixes from the Copilot review on PR #833:

1. `|| true` on the URL extraction pipeline. Under `set -euo pipefail`,
   if the filter chain (`grep -oP | grep -v | grep -vE | sort -u`)
   produced no matches, grep's exit 1 would abort the step before the
   `[ -z "$URLS" ]` skip branch could run. Trailing `|| true` makes the
   empty-result path intentionally non-fatal and the skip branch is
   now reachable.

2. `::warning::` → `::notice::` for the two non-error skip paths
   (missing sitemap, no URLs after filter). Per repo guidelines, the
   Actions UI shouldn't surface these as warnings; they're expected
   no-op conditions.

3. Corrected the filter comment. The previous wording claimed legacy
   versioned docs were excluded "because robots.txt disallows them and
   they're marked noindex anyway", but the current static/robots.txt
   on main doesn't have that disallow and there's no noindex rule for
   those paths. Replaced with the real reason: keeping submissions
   focused on the preferred current doc paths.

4. Replaced the fixed `/tmp/indexnow-resp.txt` path with a `mktemp`-
   generated RESPONSE_FILE and a `trap ... EXIT` cleanup, so repeat
   runs can't collide and stale data can't leak into later reads.

Dry-ran all three edge cases locally (all-filtered-out, normal URLs,
missing sitemap) + verified the trap fires on EXIT.

Signed-off-by: nehagup <15074229+nehagup@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/main.yml Outdated
Comment on lines 104 to 105
JSON_URLS=$(printf '%s\n' "$URLS" | jq -R -s 'split("\n") | map(select(length > 0))')

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With set -euo pipefail, if jq is missing or fails to parse/build JSON_URLS, the step will exit immediately without the ::error:: annotation / troubleshooting output that you provide for curl/HTTP failures. Consider explicitly checking jq availability and/or handling a non-zero exit from the jq command to emit a ::error:: with a next step (e.g., install jq / verify sitemap format) before exiting non-zero.

Suggested change
JSON_URLS=$(printf '%s\n' "$URLS" | jq -R -s 'split("\n") | map(select(length > 0))')
if ! command -v jq >/dev/null 2>&1; then
echo "::error::jq is required to build the IndexNow request payload. Install jq or use a runner image that includes jq, then rerun this workflow."
exit 1
fi
if ! JSON_URLS=$(printf '%s\n' "$URLS" | jq -R -s 'split("\n") | map(select(length > 0))'); then
echo "::error::Failed to build the IndexNow JSON payload from sitemap URLs. Verify the sitemap-derived URL list is valid and rerun the workflow."
exit 1
fi

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8033df7: added explicit command -v jq check and wrapped the JSON_URLS=$(... jq ...) assignment in if ! ... then ... exit 1; fi so both failure modes (missing jq, jq parse error) emit ::error:: annotations with actionable next steps before exiting, instead of aborting silently under set -euo pipefail.

Comment thread .github/workflows/main.yml Outdated
cat "$RESPONSE_FILE" 2>/dev/null || true
echo ""
echo "Troubleshooting:"
echo " 1. Verify key file is publicly reachable: curl -sS $INDEXNOW_KEY_LOC"
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The troubleshooting line prints curl -sS $INDEXNOW_KEY_LOC without quoting. Quoting the URL in the echoed command makes the copy/paste reproducer more robust (and avoids surprises if the value ever contains shell-special characters).

Suggested change
echo " 1. Verify key file is publicly reachable: curl -sS $INDEXNOW_KEY_LOC"
echo " 1. Verify key file is publicly reachable: curl -sS \"$INDEXNOW_KEY_LOC\""

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8033df7: quoted $INDEXNOW_KEY_LOC (and the IndexNow API URL on the next line) in the printed troubleshooting reproducer so a copy/paste from the CI log stays shell-safe even if the key or host ever contains a shell-special character.

Two Copilot review follow-ups:

1. Guard the payload build under `set -euo pipefail`: explicitly check
   `command -v jq` and the `jq` command's exit status, emitting a
   `::error::` annotation on either failure before `exit 1`. Previously
   a missing or broken jq would abort the step immediately without the
   actionable error message, leaving an unannotated "process exited 1"
   in the Actions UI.

2. Quote `$INDEXNOW_KEY_LOC` and the IndexNow API URL in the printed
   troubleshooting reproducer so a copy/paste of the line from the CI
   log remains shell-safe even if the key or host ever contains a
   shell-special character.

Signed-off-by: nehagup <15074229+nehagup@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/main.yml Outdated
echo "Failed to build JSON URL list, skipping"
exit 0
if ! JSON_URLS=$(printf '%s\n' "$URLS" | jq -R -s 'split("\n") | map(select(length > 0))'); then
echo "::error::Failed to build the IndexNow JSON payload from the sitemap URL list. Inspect the URLs above for unexpected characters and rerun the workflow."
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ::error:: message says “Inspect the URLs above…”, but the script never prints the URL list (only the count). On a jq failure, the log won’t actually contain the URLs to inspect. Either print a small sample of the parsed URLs before running jq (or dump them to the log on failure), or adjust the error text to point to the sitemap/path filtering logic instead of “URLs above”.

Suggested change
echo "::error::Failed to build the IndexNow JSON payload from the sitemap URL list. Inspect the URLs above for unexpected characters and rerun the workflow."
echo "::error::Failed to build the IndexNow JSON payload from the sitemap URL list. Review the sitemap contents and the /tags/ and versioned /docs/ filtering logic in this step for unexpected input, then rerun the workflow."

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 91c9c66: added a First 5 URLs: sample log immediately after Submitting $URL_COUNT docs URLs to IndexNow (printed via printf '%s\n' "$URLS" | head -5 | sed 's/^/ /') so every run — including jq failures — leaves actionable URL context in the Actions log. The jq-failure error message now references that sample ("See the 'First 5 URLs' sample logged above") instead of the non-existent "URLs above".

Comment thread .github/workflows/main.yml Outdated
Comment on lines +83 to +86
# Filter to the current (unversioned) docs only:
# - drop auto-generated /tags/ pages (low-value for search indexing)
# - drop legacy versioned docs (/docs/1.0.0/, 2.0.0/, 3.0.0/) to keep
# submissions focused on the preferred current doc paths
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says this filters to “current (unversioned) docs only”, but the actual filter only excludes versioned paths for majors 1–3; it will still include any versioned docs like /docs/4.0.0/ (and higher) if they appear in the sitemap. Please either (a) update the comment to match the actual intent (e.g., exclude only unmaintained versions 1–3), or (b) broaden the regex to exclude all versioned doc paths if you truly want unversioned-only URLs.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 91c9c66 — went with option (b): broadened the regex from /docs/(1|2|3)\.[0-9]+\.[0-9]+/ to /docs/[0-9]+\.[0-9]+\.[0-9]+/, and updated the inline comment to say "drop any versioned doc path like /docs/N.M.P/ so only the current (unversioned) docs are submitted". This keeps the rule correct when a future v5/v6 archive lands alongside today's v1-v3 legacy versions without requiring a follow-up edit. Verified locally against four fixtures (/docs/about/, /docs/1.0.0/tags/foo/, /docs/4.0.0/running-keploy/, /docs/12.3.45/test/) — only the unversioned URL is kept.

Two review follow-ups:

1. Tighten the versioned-docs filter from (1|2|3) to [0-9]+ so the
   regex stays correct when a future v5/v6 archive lands alongside
   today's v1-v3 legacy versions. Verified against fixtures including
   /docs/1.0.0/, /docs/4.0.0/, /docs/12.3.45/ — all dropped.

2. Print the first 5 parsed URLs before the jq step so a jq parse
   failure leaves something actionable in the Actions log. Updated the
   jq-failure error message to reference the "First 5 URLs" sample
   instead of the non-existent "URLs above" artifact.

Signed-off-by: nehagup <15074229+nehagup@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@nehagup nehagup merged commit 12aa3c3 into main Apr 14, 2026
10 of 11 checks passed
@nehagup nehagup deleted the fix/docs-indexnow-harden branch April 14, 2026 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants