Skip to content

[FE fix] Providers and models table improvements + Railway deploy diagnostics#4511

Closed
mmabrouk wants to merge 13 commits into
release/v0.100.9from
ci-diag/4497-providers-models-table
Closed

[FE fix] Providers and models table improvements + Railway deploy diagnostics#4511
mmabrouk wants to merge 13 commits into
release/v0.100.9from
ci-diag/4497-providers-models-table

Conversation

@mmabrouk
Copy link
Copy Markdown
Member

@mmabrouk mmabrouk commented Jun 1, 2026

Combines #4497 with the Railway preview-deploy diagnostics from #4509 (cherry-picked: d97cf94 + 63136af).

Purpose: run #4497's Railway preview deploy with the improved diagnostics so a failure surfaces the real cause instead of a bare Process completed with exit code 1:

  • ERR traces with the failing command + file:line call stack
  • railway errors go to stderr instead of /dev/null
  • on a failed deploy, the key services' Railway logs are pulled into the job
  • setup/deploy logs persisted as artifacts + step summaries

Secrets are redacted from every diagnostic path.

Base PR: #4497 • Diagnostics PR: #4509

ardaerzin and others added 11 commits May 29, 2026 13:52
Failed Railway preview deploys reported only 'Process completed with exit
code 1'. The real cause was either swallowed by >/dev/null in the deploy
scripts or lived only in the Railway dashboard, which CI never surfaced.

- Add install_error_trap to bootstrap/configure/deploy-from-images so a
  failure prints the command, exit code, and a file:line call stack.
- railway_call now prints failures to stderr instead of stdout, so callers
  that send stdout to /dev/null still surface the underlying railway error.
- Add dump_railway_logs: on a failed deploy, pull the tail of key services'
  Railway logs (Postgres, alembic, api, ...) into the job.
- Persist setup/deploy output to a log file, upload it as an artifact, and
  write a step summary with the log tail on failure.

Diagnostics only; no deploy logic changes. The artifact-upload steps are
marked continue-on-error so they can never fail an otherwise-passing job.

(cherry picked from commit d97cf94)
Address CodeRabbit review. The ERR handler printed $BASH_COMMAND and
railway_call printed railway's output on failure. configure.sh passes real
secret values as CLI args to 'railway variable set' (POSTGRES_PASSWORD,
AGENTA_AUTH_KEY/CRYPT_KEY, *_API_KEY), so a failure could emit plaintext
secrets into the uploaded deploy-log artifact.

Add _railway_redact (masks KEY=value for PASSWORD/TOKEN/SECRET/KEY keys and
scheme://user:password@host) and apply it to every diagnostic path:
railway_call failure + rate-limit output, the ERR handler command, and
dump_railway_logs. The success path stays unredacted so callers that parse
'variable list -k' output (e.g. resolve_postgres_password) keep working.

(cherry picked from commit 63136af)
@dosubot dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jun 1, 2026
@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 1, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Jun 1, 2026 7:38pm

Request Review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: ee6a99e0-d44b-4e1f-bddd-345cf940a8da

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ci-diag/4497-providers-models-table

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

Railway Preview Environment

Status Destroyed (PR closed)

Updated at 2026-06-02T08:45:42.617Z

railway_call previously retried only rate-limits. Railway's API also
intermittently times out write mutations (notably 'variable set'),
reproduced locally at ~20% even single-threaded and independent of env
size. With ~15 variable-set calls per deploy and no retry, deploys failed
~96% of the time (0.8^15).

Retry policy:
- rate-limit (429): always retried (request was rejected, not processed)
- transient network/timeout: retried ONLY for idempotent commands; a
  timed-out create (init/add/environment new/volume add) may have
  succeeded server-side, so it is not blind-retried (avoids duplicate
  projects/services/volumes); rate-limit retries still apply to them
- deterministic errors (not found/unauthorized): fail fast, no retry

Failure output stays redacted. Covered by unit tests.

(cherry picked from commit 060441a)
dump_railway_logs ran after the 'deploy-from-images.sh | tee $log_file'
pipeline, so its Railway service-log tails only reached the live Actions
log, not the uploaded artifact or the step-summary tail. If the live log
truncates, the root-cause dump was lost again. Tee the dump into $log_file
so it lands in the artifact and the summary. (Addresses CodeRabbit review.)

(cherry picked from commit da990a8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/cd Frontend size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants