Make make up work end-to-end + real success-rate canary#7
Conversation
- Self-contained demo workload (Express + prom-client) emitting
http_requests_total{service,code,method,path} as the canary subject
(no private buyerchat dependency), with a ServiceMonitor.
- Rewrite bootstrap.sh in dependency order with kubectl-wait gates
(tigera-operator -> Calico -> namespaces -> sealed-secrets -> ArgoCD
-> app-of-apps), and unify the workload + SealedSecret namespace.
- AnalysisTemplate now computes a real 2xx success-rate ratio from
Prometheus instead of an up{} liveness check.
- README honesty: ~12-15 min (not 10), single-node (not 3), drop the
'dashboards pre-imported' claim. Add docs/CLAIM_AUDIT.md.
Static-verified (helm lint/template, demo unit tests). Live make-up
verification is a separate gated step.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
- sealed-secrets: the bitnami-labs.github.io/sealed-secrets Helm index 404s; install the controller from the upstream release manifest instead. - kube-prometheus-stack: grafana can take >300s to become Available on a resource-constrained cluster; raise the helm --wait timeout to 600s.
Live verify — partial (real bugs fixed, full canary blocked by local RAM)Ran
Full canary money-shot (10→100 with success-rate analysis) is blocked by host resources, not code: Docker Desktop here is allocated 3.5 GB, and below ~4 GB the controllers (argo-rollouts, kps-operator, tigera-operator, argocd-repo-server) lose their API watches and crash-loop (clean exit 0, not OOM) — so the Rollout controller can't drive the canary. Documented the ~6 GB floor in the README. To finish: bump Docker Desktop memory to ≥6 GB + restart, re-run |
…localtest.me hosts
What changed
/metricsdemo workload (apps/demo, Express + prom-client) emittinghttp_requests_total{service,code,method,path}— the canary subject. No private buyerchat dependency; anyone who clones can run it. + ServiceMonitor.bootstrap.shwithkubectl waitgates: kind → tigera-operator → Calico → nodes Ready → namespaces → sealed-secrets → SealedSecrets → ArgoCD → app-of-apps → Synced. Namespace mismatch (workload vs SealedSecret) unified.helm/demoAnalysisTemplate computessum(rate(http_requests_total{code=~"2.."}[2m]))/sum(rate(http_requests_total[2m]))from Prometheus — not the oldup{}liveness check.docs/CLAIM_AUDIT.md.Static verification (all green)
node --checkOK, unit tests 2/2 passhelm lint helm/demo→ 0 failedhelm template helm/demo -f values.dev.yaml→ renders Rollout + AnalysisTemplate (real query) + Service + ServiceMonitor; default values → Deployment (mutual exclusion correct)⏳ LIVE VERIFY PENDING —
make uphas not yet been run on a cluster. A follow-up will attach the livekubectl get pods -A+ a canary-rollout screenshot (and correct any ingress hostnames to what actually resolves) before merge.