fix(ci): harden Submit docs URLs to IndexNow step by slayerjain · Pull Request #833 · keploy/docs

slayerjain · 2026-04-14T15:31:51Z

Summary

Three functional bugs in the existing Submit docs URLs to IndexNow step in main.yml have been silently broken since #827 landed:

Hardcoded fallback key is stale: ${INDEXNOW_KEY:-9ded60b43b8b40578588fc42b04c3e5d}. That key was deleted from landing/public/ in keploy/landing#186 because Bing was rejecting it with HTTP 403 UserForbiddedToAccessSite. Every push-run of main.yml since feat: combined SEO/GEO/AEO + IndexNow auto-submission + build fix #827 has been hitting 403 in silence. Reproduced just now:
```
curl -sS -X POST https://api.indexnow.org/indexnow \
  -H 'Content-Type: application/json' \
  -d '{"host":"keploy.io","key":"9ded60b43b8b40578588fc42b04c3e5d",...}'
{"errorCode":"UserForbiddedToAccessSite",...}
HTTP 403
```
continue-on-error: true + no ::error:: annotation + no exit 1 meant the failure was completely invisible. Step went green, no alerting.
No URL filtering: the sitemap contains ~800 entries including 412 auto-generated /tags/ pages and 184 legacy /docs/{1,2,3}.0.0/ pages (which robots.txt already disallows). Only ~206 are current-version docs worth submitting.

Changes

Default INDEXNOW_KEY to feeeddbe661a4686ba6e4d5cb663b2cb — the key actually served from keploy.io/<key>.txt. The secrets.INDEXNOW_KEY override still takes priority for rotations.
Add keyLocation to the payload.
Filter sitemap URLs: drop /tags/ pages and legacy /docs/{1,2,3}.0.0/ paths before submitting. Reduces submission from ~800 → ~206, all on currently-supported paths.
set -euo pipefail, capture CURL_EXIT separately from HTTP_CODE, exit non-zero with ::error:: on real failure (curl exit != 0 or HTTP non-2xx), print response body plus a Troubleshooting block containing a keyLocation check and a ready-to-paste POST reproducer. 2xx (including 202 accepted-pending-verification) is an explicit exit 0.
Keep continue-on-error: true so a failure never blocks the deploy, but ::error:: annotations now surface failures in the Actions UI instead of hiding them.

Also mirrors the hardening pattern already shipped in keploy/blog-website#366.

Test plan

Dry-ran the filter locally against the live docs/sitemap.xml (after gunzip) → 206 current-version URLs, 0 tag pages, 0 legacy version pages
Submitted a real 20-URL POST to https://api.indexnow.org/indexnow with the new key + keyLocation → HTTP 200
Verified YAML parses cleanly
INDEXNOW_KEY secret already set on keploy/docs (separate, so Option A from the diagnosis is in effect right now — the next main.yml run should hit HTTP 200 even before this PR merges)
Merge and confirm the next push-run on main.yml logs Submitted <N> URLs to IndexNow (HTTP 200) instead of silently 403'ing
Bing Webmaster Tools → IndexNow tab should show the new submission within ~24h

🤖 Generated with Claude Code

The existing step had three functional problems that made it silently broken since PR #827 landed: 1. Hardcoded fallback key `9ded60b43b8b40578588fc42b04c3e5d` was the stale key deleted from landing/public/ in keploy/landing#186 because Bing was rejecting it with HTTP 403 UserForbiddedToAccessSite. Every push-run since PR #827 has been hitting 403 in silence. 2. `continue-on-error: true` combined with a plain `echo` on failure (no `::error::` annotation, no `exit 1`) meant the failing step reported green in the Actions UI — no signal, no alerting. 3. No `keyLocation` in the payload, no CURL_EXIT capture, no filter for legacy versioned docs or auto-generated tag pages (so ~800 URLs were submitted, mostly low-value /tags/ and /docs/{1,2,3}.0.0/ pages that robots.txt already disallows). Changes: - Default INDEXNOW_KEY to feeeddbe661a4686ba6e4d5cb663b2cb (the key actually served from keploy.io/<key>.txt). Secret override still works for future rotations. - Add keyLocation to the payload. - Filter sitemap URLs: drop /tags/ pages and legacy /docs/{1,2,3}.0.0/ before submitting — reduces from ~800 to ~206 URLs, all on supported paths. Verified locally with a real POST, HTTP 200. - set -euo pipefail, capture CURL_EXIT separately from HTTP_CODE, exit non-zero with ::error:: on real failure, print response body + troubleshooting block with keyLocation check and a ready-to-paste reproducer. 2xx (including 202 accepted-pending-verification) is an explicit exit 0. - Keep continue-on-error: true so the IndexNow step never blocks the deploy, but the ::error:: annotation now surfaces failures in the UI instead of hiding them. Dry-run locally against the live sitemap-derived URL list returned HTTP 200 with 20 real docs URLs. Signed-off-by: nehagup <15074229+nehagup@users.noreply.github.com>

Copilot

Pull request overview

Hardens the GitHub Actions “Submit docs URLs to IndexNow” step so IndexNow failures are visible (without blocking deploys) and submissions are limited to the desired subset of docs URLs.

Changes:

Defaults INDEXNOW_KEY to a new key (still overridable via secrets.INDEXNOW_KEY) and adds keyLocation to the IndexNow payload.
Filters sitemap-derived URLs to exclude /tags/ pages and legacy versioned docs paths before submitting.
Improves shell robustness and failure reporting (pipefail, curl exit vs HTTP code, ::error:: annotations, troubleshooting output).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-14T15:36:12Z

+          URLS=$(grep -oP '<loc>\K[^<]+' "$SITEMAP" \
+            | grep -v '/tags/' \
+            | grep -vE '/docs/(1|2|3)\.[0-9]+\.[0-9]+/' \
+            | sort -u)


With set -euo pipefail, this URLS=$(grep ... | grep ... | ...) pipeline will cause the script to exit immediately if grep finds no matches (exit status 1), so the subsequent if [ -z "$URLS" ] skip logic may never run. Make the pipeline non-fatal (e.g., append || true to the command substitution, or temporarily disable -e around the parse) so an empty sitemap/filter result is handled as an intentional skip rather than a step failure.

Suggested change

| sort -u)

| sort -u || true)

Fixed in 514e127: appended || true to the command substitution so the filter pipeline can return a non-empty exit without aborting the step. Verified locally with a fixture where every URL in the sitemap matched the drop filters — the [ -z "$URLS" ] skip branch is now reachable and fires cleanly.

Copilot · 2026-04-14T15:36:13Z

          if [ ! -f "$SITEMAP" ]; then
-            echo "Sitemap not found at $SITEMAP, skipping IndexNow"
+            echo "::warning::Sitemap not found at $SITEMAP, skipping IndexNow submission"
            exit 0
          fi

-          URLS=$(grep -oP '<loc>\K[^<]+' "$SITEMAP" || true)
+          # Filter to the current (unversioned) docs only:
+          # - drop auto-generated /tags/ pages (low-value for search indexing)
+          # - drop legacy versioned docs (/docs/1.0.0/, 2.0.0/, 3.0.0/) since
+          #   robots.txt disallows them and they're marked noindex anyway
+          URLS=$(grep -oP '<loc>\K[^<]+' "$SITEMAP" \
+            | grep -v '/tags/' \
+            | grep -vE '/docs/(1|2|3)\.[0-9]+\.[0-9]+/' \
+            | sort -u)

          if [ -z "$URLS" ]; then
-            echo "No URLs found in sitemap, skipping"
+            echo "::warning::No current-version docs URLs parsed from $SITEMAP, skipping"
            exit 0


The workflow annotation uses ::warning:: for non-error conditions (missing sitemap / no URLs). This repo’s logging guidelines disallow adding warnings; consider using a plain echo or ::notice:: instead so the Actions UI doesn’t surface these as warnings while still providing visibility.

Fixed in 514e127: changed both non-error skip paths (missing sitemap, no URLs after filter) from ::warning:: to ::notice::. The ::error:: annotations on actual IndexNow failures (curl-exit / non-2xx) are retained — those are still real errors that should surface.

Copilot · 2026-04-14T15:36:13Z

+          # - drop legacy versioned docs (/docs/1.0.0/, 2.0.0/, 3.0.0/) since
+          #   robots.txt disallows them and they're marked noindex anyway


This comment says legacy versioned docs are excluded because robots.txt disallows them and they are noindex, but the repository’s static/robots.txt currently allows all paths (except /cgi-bin/) and there’s no matching noindex rule found. Please update the comment to reflect the real reason for filtering (e.g., keeping submissions focused on preferred doc paths) so future maintainers aren’t misled.

Suggested change

# - drop legacy versioned docs (/docs/1.0.0/, 2.0.0/, 3.0.0/) since

# robots.txt disallows them and they're marked noindex anyway

# - drop legacy versioned docs (/docs/1.0.0/, 2.0.0/, 3.0.0/) to keep

# submissions focused on the preferred current doc paths

Fixed in 514e127: replaced the misleading "robots.txt disallows / noindex" rationale with the real one ("keep submissions focused on the preferred current doc paths"). The old comment was written assuming #832's nuanced robots.txt would land first, but that's a separate PR and this file has to stand on its own.

Copilot · 2026-04-14T15:36:13Z

+          HTTP_CODE=$(curl -sS -o /tmp/indexnow-resp.txt -w "%{http_code}" \
+            -X POST "https://api.indexnow.org/indexnow" \
            -H "Content-Type: application/json" \
-            -d "{\"host\":\"keploy.io\",\"key\":\"$KEY\",\"urlList\":$JSON_URLS}" || echo "000")
+            -d "{\"host\":\"${INDEXNOW_HOST}\",\"key\":\"${INDEXNOW_KEY}\",\"keyLocation\":\"${INDEXNOW_KEY_LOC}\",\"urlList\":${JSON_URLS}}") || CURL_EXIT=$?


The response body is written to a fixed path (/tmp/indexnow-resp.txt) and never cleaned up. Using a unique temp file (e.g., mktemp) and a trap cleanup makes the step more robust and avoids any chance of collisions or stale data being read later in the script.

Fixed in 514e127: introduced RESPONSE_FILE=$(mktemp) at the top of the run script plus trap 'rm -f "$RESPONSE_FILE"' EXIT so every exit path (happy path, skip, curl failure, non-2xx) cleans up. Verified the trap fires on EXIT with a local dry-run. Also swapped the two call sites (curl -o and the non-2xx cat) to use $RESPONSE_FILE — no more hardcoded /tmp/indexnow-resp.txt.

…ect filter rationale, mktemp+trap Four fixes from the Copilot review on PR #833: 1. `|| true` on the URL extraction pipeline. Under `set -euo pipefail`, if the filter chain (`grep -oP | grep -v | grep -vE | sort -u`) produced no matches, grep's exit 1 would abort the step before the `[ -z "$URLS" ]` skip branch could run. Trailing `|| true` makes the empty-result path intentionally non-fatal and the skip branch is now reachable. 2. `::warning::` → `::notice::` for the two non-error skip paths (missing sitemap, no URLs after filter). Per repo guidelines, the Actions UI shouldn't surface these as warnings; they're expected no-op conditions. 3. Corrected the filter comment. The previous wording claimed legacy versioned docs were excluded "because robots.txt disallows them and they're marked noindex anyway", but the current static/robots.txt on main doesn't have that disallow and there's no noindex rule for those paths. Replaced with the real reason: keeping submissions focused on the preferred current doc paths. 4. Replaced the fixed `/tmp/indexnow-resp.txt` path with a `mktemp`- generated RESPONSE_FILE and a `trap ... EXIT` cleanup, so repeat runs can't collide and stale data can't leak into later reads. Dry-ran all three edge cases locally (all-filtered-out, normal URLs, missing sitemap) + verified the trap fires on EXIT. Signed-off-by: nehagup <15074229+nehagup@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-14T16:12:10Z

+          JSON_URLS=$(printf '%s\n' "$URLS" | jq -R -s 'split("\n") | map(select(length > 0))')



With set -euo pipefail, if jq is missing or fails to parse/build JSON_URLS, the step will exit immediately without the ::error:: annotation / troubleshooting output that you provide for curl/HTTP failures. Consider explicitly checking jq availability and/or handling a non-zero exit from the jq command to emit a ::error:: with a next step (e.g., install jq / verify sitemap format) before exiting non-zero.

Suggested change

JSON_URLS=$(printf '%s\n' "$URLS" | jq -R -s 'split("\n") | map(select(length > 0))')

if ! command -v jq >/dev/null 2>&1; then

echo "::error::jq is required to build the IndexNow request payload. Install jq or use a runner image that includes jq, then rerun this workflow."

exit 1

fi

if ! JSON_URLS=$(printf '%s\n' "$URLS" | jq -R -s 'split("\n") | map(select(length > 0))'); then

echo "::error::Failed to build the IndexNow JSON payload from sitemap URLs. Verify the sitemap-derived URL list is valid and rerun the workflow."

exit 1

fi

Fixed in 8033df7: added explicit command -v jq check and wrapped the JSON_URLS=$(... jq ...) assignment in if ! ... then ... exit 1; fi so both failure modes (missing jq, jq parse error) emit ::error:: annotations with actionable next steps before exiting, instead of aborting silently under set -euo pipefail.

Copilot · 2026-04-14T16:12:11Z

+          cat "$RESPONSE_FILE" 2>/dev/null || true
+          echo ""
+          echo "Troubleshooting:"
+          echo "  1. Verify key file is publicly reachable: curl -sS $INDEXNOW_KEY_LOC"


The troubleshooting line prints curl -sS $INDEXNOW_KEY_LOC without quoting. Quoting the URL in the echoed command makes the copy/paste reproducer more robust (and avoids surprises if the value ever contains shell-special characters).

Suggested change

echo " 1. Verify key file is publicly reachable: curl -sS $INDEXNOW_KEY_LOC"

echo " 1. Verify key file is publicly reachable: curl -sS \"$INDEXNOW_KEY_LOC\""

Fixed in 8033df7: quoted $INDEXNOW_KEY_LOC (and the IndexNow API URL on the next line) in the printed troubleshooting reproducer so a copy/paste from the CI log stays shell-safe even if the key or host ever contains a shell-special character.

Two Copilot review follow-ups: 1. Guard the payload build under `set -euo pipefail`: explicitly check `command -v jq` and the `jq` command's exit status, emitting a `::error::` annotation on either failure before `exit 1`. Previously a missing or broken jq would abort the step immediately without the actionable error message, leaving an unannotated "process exited 1" in the Actions UI. 2. Quote `$INDEXNOW_KEY_LOC` and the IndexNow API URL in the printed troubleshooting reproducer so a copy/paste of the line from the CI log remains shell-safe even if the key or host ever contains a shell-special character. Signed-off-by: nehagup <15074229+nehagup@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-14T16:57:31Z

-            echo "Failed to build JSON URL list, skipping"
-            exit 0
+          if ! JSON_URLS=$(printf '%s\n' "$URLS" | jq -R -s 'split("\n") | map(select(length > 0))'); then
+            echo "::error::Failed to build the IndexNow JSON payload from the sitemap URL list. Inspect the URLs above for unexpected characters and rerun the workflow."


This ::error:: message says “Inspect the URLs above…”, but the script never prints the URL list (only the count). On a jq failure, the log won’t actually contain the URLs to inspect. Either print a small sample of the parsed URLs before running jq (or dump them to the log on failure), or adjust the error text to point to the sitemap/path filtering logic instead of “URLs above”.

Suggested change

echo "::error::Failed to build the IndexNow JSON payload from the sitemap URL list. Inspect the URLs above for unexpected characters and rerun the workflow."

echo "::error::Failed to build the IndexNow JSON payload from the sitemap URL list. Review the sitemap contents and the /tags/ and versioned /docs/ filtering logic in this step for unexpected input, then rerun the workflow."

Fixed in 91c9c66: added a First 5 URLs: sample log immediately after Submitting $URL_COUNT docs URLs to IndexNow (printed via printf '%s\n' "$URLS" | head -5 | sed 's/^/ /') so every run — including jq failures — leaves actionable URL context in the Actions log. The jq-failure error message now references that sample ("See the 'First 5 URLs' sample logged above") instead of the non-existent "URLs above".

Copilot · 2026-04-14T16:57:31Z

+          # Filter to the current (unversioned) docs only:
+          # - drop auto-generated /tags/ pages (low-value for search indexing)
+          # - drop legacy versioned docs (/docs/1.0.0/, 2.0.0/, 3.0.0/) to keep
+          #   submissions focused on the preferred current doc paths


The comment says this filters to “current (unversioned) docs only”, but the actual filter only excludes versioned paths for majors 1–3; it will still include any versioned docs like /docs/4.0.0/ (and higher) if they appear in the sitemap. Please either (a) update the comment to match the actual intent (e.g., exclude only unmaintained versions 1–3), or (b) broaden the regex to exclude all versioned doc paths if you truly want unversioned-only URLs.

Fixed in 91c9c66 — went with option (b): broadened the regex from /docs/(1|2|3)\.[0-9]+\.[0-9]+/ to /docs/[0-9]+\.[0-9]+\.[0-9]+/, and updated the inline comment to say "drop any versioned doc path like /docs/N.M.P/ so only the current (unversioned) docs are submitted". This keeps the rule correct when a future v5/v6 archive lands alongside today's v1-v3 legacy versions without requiring a follow-up edit. Verified locally against four fixtures (/docs/about/, /docs/1.0.0/tags/foo/, /docs/4.0.0/running-keploy/, /docs/12.3.45/test/) — only the unversioned URL is kept.

Two review follow-ups: 1. Tighten the versioned-docs filter from (1|2|3) to [0-9]+ so the regex stays correct when a future v5/v6 archive lands alongside today's v1-v3 legacy versions. Verified against fixtures including /docs/1.0.0/, /docs/4.0.0/, /docs/12.3.45/ — all dropped. 2. Print the first 5 parsed URLs before the jq step so a jq parse failure leaves something actionable in the Actions log. Updated the jq-failure error message to reference the "First 5 URLs" sample instead of the non-existent "URLs above" artifact. Signed-off-by: nehagup <15074229+nehagup@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings April 14, 2026 15:31

Copilot started reviewing on behalf of slayerjain April 14, 2026 15:32 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

nehagup requested a review from Copilot April 14, 2026 16:09

Copilot started reviewing on behalf of nehagup April 14, 2026 16:09 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

nehagup requested a review from Copilot April 14, 2026 16:53

Copilot started reviewing on behalf of nehagup April 14, 2026 16:54 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

nehagup requested a review from Copilot April 14, 2026 16:59

Copilot started reviewing on behalf of nehagup April 14, 2026 17:00 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

nehagup merged commit 12aa3c3 into main Apr 14, 2026
10 of 11 checks passed

nehagup deleted the fix/docs-indexnow-harden branch April 14, 2026 17:21

		# - drop legacy versioned docs (/docs/1.0.0/, 2.0.0/, 3.0.0/) since
		# robots.txt disallows them and they're marked noindex anyway

		JSON_URLS=$(printf '%s\n' "$URLS" \| jq -R -s 'split("\n") \| map(select(length > 0))')

-          JSON_URLS=$(printf '%s\n' "$URLS" | jq -R -s 'split("\n") | map(select(length > 0))')
+          if ! command -v jq >/dev/null 2>&1; then
+            echo "::error::jq is required to build the IndexNow request payload. Install jq or use a runner image that includes jq, then rerun this workflow."
+            exit 1
+          fi
+          if ! JSON_URLS=$(printf '%s\n' "$URLS" | jq -R -s 'split("\n") | map(select(length > 0))'); then
+            echo "::error::Failed to build the IndexNow JSON payload from sitemap URLs. Verify the sitemap-derived URL list is valid and rerun the workflow."
+            exit 1
+          fi

	echo " 1. Verify key file is publicly reachable: curl -sS $INDEXNOW_KEY_LOC"
	echo " 1. Verify key file is publicly reachable: curl -sS \"$INDEXNOW_KEY_LOC\""

	echo "::error::Failed to build the IndexNow JSON payload from the sitemap URL list. Inspect the URLs above for unexpected characters and rerun the workflow."
	echo "::error::Failed to build the IndexNow JSON payload from the sitemap URL list. Review the sitemap contents and the /tags/ and versioned /docs/ filtering logic in this step for unexpected input, then rerun the workflow."

Uh oh!

Conversation

slayerjain commented Apr 14, 2026

Summary

Changes

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants