Skip to content

Make make up work end-to-end + real success-rate canary#7

Merged
ykstorm merged 4 commits into
mainfrom
fix/real-e2e
Jun 22, 2026
Merged

Make make up work end-to-end + real success-rate canary#7
ykstorm merged 4 commits into
mainfrom
fix/real-e2e

Conversation

@ykstorm

@ykstorm ykstorm commented Jun 19, 2026

Copy link
Copy Markdown
Owner

What changed

  • Self-contained /metrics demo workload (apps/demo, Express + prom-client) emitting http_requests_total{service,code,method,path} — the canary subject. No private buyerchat dependency; anyone who clones can run it. + ServiceMonitor.
  • Ordered bootstrap.sh with kubectl wait gates: kind → tigera-operator → Calico → nodes Ready → namespaces → sealed-secrets → SealedSecrets → ArgoCD → app-of-apps → Synced. Namespace mismatch (workload vs SealedSecret) unified.
  • Real canary analysis: helm/demo AnalysisTemplate computes sum(rate(http_requests_total{code=~"2.."}[2m]))/sum(rate(http_requests_total[2m])) from Prometheus — not the old up{} liveness check.
  • README honesty: ~12–15 min (not 10), single-node (not 3-node), dropped 'dashboards pre-imported'. + docs/CLAIM_AUDIT.md.

Static verification (all green)

  • demo app: node --check OK, unit tests 2/2 pass
  • helm lint helm/demo → 0 failed
  • helm template helm/demo -f values.dev.yaml → renders Rollout + AnalysisTemplate (real query) + Service + ServiceMonitor; default values → Deployment (mutual exclusion correct)

LIVE VERIFY PENDINGmake up has not yet been run on a cluster. A follow-up will attach the live kubectl get pods -A + a canary-rollout screenshot (and correct any ingress hostnames to what actually resolves) before merge.

- Self-contained demo workload (Express + prom-client) emitting
  http_requests_total{service,code,method,path} as the canary subject
  (no private buyerchat dependency), with a ServiceMonitor.
- Rewrite bootstrap.sh in dependency order with kubectl-wait gates
  (tigera-operator -> Calico -> namespaces -> sealed-secrets -> ArgoCD
  -> app-of-apps), and unify the workload + SealedSecret namespace.
- AnalysisTemplate now computes a real 2xx success-rate ratio from
  Prometheus instead of an up{} liveness check.
- README honesty: ~12-15 min (not 10), single-node (not 3), drop the
  'dashboards pre-imported' claim. Add docs/CLAIM_AUDIT.md.

Static-verified (helm lint/template, demo unit tests). Live make-up
verification is a separate gated step.
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

ykstorm added 2 commits June 21, 2026 03:34
- sealed-secrets: the bitnami-labs.github.io/sealed-secrets Helm index
  404s; install the controller from the upstream release manifest instead.
- kube-prometheus-stack: grafana can take >300s to become Available on a
  resource-constrained cluster; raise the helm --wait timeout to 600s.
@ykstorm

ykstorm commented Jun 20, 2026

Copy link
Copy Markdown
Owner Author

Live verify — partial (real bugs fixed, full canary blocked by local RAM)

Ran make up against local Docker (kind, single node). It got the cluster up — node Ready, Calico, sealed-secrets, ingress-nginx, cert-manager, kube-prometheus-stack (Prometheus healthy), ArgoCD, and the demo Rollout all installed — and surfaced two real bugs that static review missed, now fixed in this PR:

  1. sealed-secrets — the bitnami-labs.github.io/sealed-secrets Helm index returns 404; switched to installing the controller from the upstream release manifest.
  2. kube-prometheus-stackgrafana can take >300s to become Available; raised the helm --wait timeout to 600s.

Full canary money-shot (10→100 with success-rate analysis) is blocked by host resources, not code: Docker Desktop here is allocated 3.5 GB, and below ~4 GB the controllers (argo-rollouts, kps-operator, tigera-operator, argocd-repo-server) lose their API watches and crash-loop (clean exit 0, not OOM) — so the Rollout controller can't drive the canary. Documented the ~6 GB floor in the README.

To finish: bump Docker Desktop memory to ≥6 GB + restart, re-run make up, then capture the canary progression + Prometheus success-rate. Holding the merge until that lands.

@ykstorm ykstorm merged commit 4abeab8 into main Jun 22, 2026
4 checks passed
@ykstorm ykstorm deleted the fix/real-e2e branch June 22, 2026 23:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant