Skip to content

docs: sync all doc areas with engineering-specs as-built behavior#220

Draft
lucaiz wants to merge 4 commits into
developfrom
docs/spec-sync-audit
Draft

docs: sync all doc areas with engineering-specs as-built behavior#220
lucaiz wants to merge 4 commits into
developfrom
docs/spec-sync-audit

Conversation

@lucaiz

@lucaiz lucaiz commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Full audit of docs.sleakops.com content against the as-built feature specs in engineering-specs/features/ (10 doc areas, EN + ES kept equivalent). Every fix is grounded in a spec claim or verified code behavior; facts that are real but internal-only were deliberately left out — listed at the bottom so the judgment calls are reviewable.

109 files changed. Highlights per area:

Cluster

  • cluster/index.mdx: removed leaked AI-assistant chatter accidentally pasted into a live FAQ; rewrote the stale "how upgrades work" FAQ to point at the Upgrades panel; ES table wrongly listed NodePool fields (Max Memory/CPU) as Cluster-creation fields.
  • addons/index.mdx: documented custom Helm-chart addons ("Add custom addon"), Export/Import (JSON + import-from-cluster), uninstall semantics, automatic dependency install; added KEDA + Headlamp to the optional list; fixed stale "Deploy" button wording → "Install".
  • addons/loki.mdx: deprecation warning (loki is deprecated, lokiv2 is the successor).
  • nodepools/*: added edit/delete guardrails (capacity edits can rotate nodes and are rejected below current usage; can't delete internal/build/last pool or one assigned to a Project Environment; no create/edit while provisioning or shut down).
  • shutdown-cluster.mdx: manual Stop/Play requires Scheduled Shutdown to be configured first; node pools restored as-is on power-on.
  • Deleted es/cluster/addons.mdx — orphaned legacy duplicate of es/cluster/addons/index.mdx (same slug, stale content, unmaintained since the original i18n commit).

Domain

  • Fixed the wrong "Alias Scenario A" claim (alias under a managed parent reuses the certificate automatically — no validation records; the DNS routing record remains manual) across index/delegation/setup.
  • Added CAA-failure and alias-validation troubleshooting entries; new FAQs (immutable names, what's deletable, "Cert. Pending" quirk, pending-deployment approval after cert issuance); fixed EN-only heading/numbering copy-paste bugs in setup.mdx.

Environment

  • Fixed the false "environments can't be edited" FAQ (domain can be changed); documented four previously-missing features: Clone, Change Domain, Validation Checklist download, Export; documented root-environment creation via the provider base domain.

Project (core / workloads / Dockertron)

  • Kaniko → BuildKit corrections; fixed the wrong "only Branch/Dockerfile Path are editable" FAQ; added the Method (Docker/Buildpacks) field; trimmed a section duplicating access_config.mdx.
  • access_config.mdx: 10-extra-policies cap; changes require Created/Error state.
  • volumes.mdx: filesystem selection step, 1–1000 GB range, detach-on-next-deploy note.
  • Workload pages: removed a duplicated FAQ (job), added Jobs-can't-be-edited/rerun-only + build-gating FAQs; fixed several ES copy-paste/translation defects ("el servicio" on non-service pages, untranslated closing lines, "Projecto"/"ProjectEnv" → "Proyecto").
  • project/dockertron.mdx + powered-ai/dockertron.mdx: fixed the nonexistent "Projects > Configuration > Dockertron" menu claim (real entry points: "Dockertron IA" header button / empty-Projects prompt); PR delivery is conditional (flag defaults off) — file viewer is the always-available output; no in-place retry; removed a stale "Document version 1.0" footer.

Project (build / chart)

  • build/build.mdx: fixed wrong defaults (Branch defaults to the Project's configured branch, not the environment name; Tag defaults to the commit hash, plus latest); added Cache and Deploy? controls; documented the CLI's client-side 180-min --wait cutoff as independent of the server-side timeout.
  • chart/*: documented the async helm template validation flow (background validation → Error state + notification → auto-recovery), Deploy toggles, editing/removing dependencies, and the namespace-matching rule for extra templates; removed an unverifiable roadmap promise.
  • Deleted es/project/build/index.mdx — stray stale duplicate of build.mdx (no EN counterpart, same sidebar position, Kaniko-era content repeating the exact wrong defaults fixed above).

Dependencies

Worst rot found in the audit — copy-paste between engine pages:

  • opensearch-aws.mdx had SQS's entire config table; memcached-aws.mdx had Redis text and the wrong default port (11121 → 11211); sqs-aws.mdx (ES) had OpenSearch fields appended and a Multi-AZ FAQ that doesn't apply.
  • Fixed false claims on the RDS pages: engine version upgrades ARE supported (major ones via the pending-changes confirmation), replicas and storage ARE editable; added missing fields across engines (Parameter Group Mode, PITR, Deletion Protection, create-from-snapshot, DB Name) and engine-specific caveats (Oracle: no snapshot restore).
  • s3bucket-aws.mdx: added Versioning / Intelligent-Tiering / KMS fields, live name-availability check, and a new "Importing existing data" section (Import Bucket / DataSync) — previously undocumented.
  • index.mdx: catalog was missing Aurora, MariaDB, Oracle, MSK, DocumentDB; guide list was missing 5 links; removed ~60 ES-only lines of unverifiable invented claims (automatic credential rotation, TLS-by-default).
  • faqs.mdx: documented the pending-changes (awaiting approval) confirm/revert flow and dependency cloning — both core flows with zero prior docs.

Deployment / Var Groups

  • Removed the false "Var Group deletion forces a deployment" claim; documented deployment cancellation and failed-deploy log access; corrected Release triggers + documented the minor/patch version-bump rule.
  • vargroup/index.mdx: delete-modal deploy switch, replicated-environments warning, Dependency-owned groups not directly deletable, reveal-vs-write permissions, one-dedicated-vargroup-per-workload, same-cluster Replicate To constraint.

Provider / Network / User

  • provider/*: org reuse on re-onboarding; Security account hidden from listings; two missing error cases in common-errors (account creation still processing / account suspended-closed); Initial-state providers deletable; deletion blocked by dependency deletion protection.
  • network/index.mdx: added per-environment CIDR table (10.120/10.110/10.130); corrected peering to the real hub-and-spoke topology (Management ↔ Dev, Management ↔ Prod — Dev and Prod are NOT peered); dropped an unverified Transit Gateway claim; VPN described accurately (provisioned per account alongside its first cluster).
  • user/index.mdx: no password field on the create form (set-password email instead); replaced an invented "users managed outside SleakOps" section with the real AWS/VPN-only member capability; documented Reset AWS Password / Get Pritunl Credentials, immutable fields, kubeconfig regeneration on role change.
  • user/vpn.mdx: removed the false "24-hour URI validity" claim (code investigation: no TTL exists — profile URI is fetched fresh per request); replaced with credential-handling caution.
  • es/user/aws_console_authentication.mdx: body was entirely untranslated English — translated.
  • es/cluster/addons/otel.mdx: same — translated.

Upgrades / Subscribe / Powered-AI

  • upgrades.mdx: documented the type-specific Upgrade tab for EKS Cluster Upgrades (target version + support window, downtime report, changelog, readiness-report CTA) that landed in the spec after the May alignment (docs(upgrades): align Upgrades page with feature spec #186).
  • subscribe-using-aws.mdx: sidebar path is Settings > Billing > Subscription; button label fixed to the code-verified "Link AWS subscription with this account"; linking replaces the current subscription; documented that the new subscription requires SleakOps-side activation (no automatic path).
  • powered-ai/autodiagnostic.mdx: corrected trigger list (Clusters/Services/Dependencies/Deployments/Builds — "Projects" doesn't exist); new FAQ scoping the Kubernetes-upgrade readiness check to EKS Cluster Upgrade migrations.
  • powered-ai/index.mdx: Dockertron card no longer overclaims "fully deployed application".

Deliberately left out (internal-only, per-spec facts with no customer surface)

  • FSM states/transitions, Celery task & queue names, Pulumi module internals, canvas/callback wiring — across all areas.
  • Feature-flag / subscription-quota gating mechanics (HasAccessFeatureBySubscription, clone.environment, DOCKERTRON_PR_ENABLED, autodiagnostic flag names) — plan-dependent entitlement plumbing; docs state role restrictions in plain language only.
  • Admin/Django-admin-only workflows (VPN regeneration, fixture reloads, CSV exports, Grafana password recovery action).
  • Known internal bugs tracked in the specs' improvement reports where the customer instruction doesn't change (Volumes breadcrumb 404, extra_values toast, environment domain-rename task gap, dockertron external_id quirk).
  • Unshipped / in-flight spec content (essential-addons conversion SLEAK-6093, addon domain scoping SLEAK-4002 in-flight, NodePool override-YAML proposal, sleakops.com/managed label draft).
  • Undiscoverable-by-design surfaces (/deployments/add unlinked route, restore-status endpoint with no UI).
  • Bitnami image scan, EC2 quota auto-requests, internal notification/paging behavior.

Flagged for follow-up (not fixed here)

  • Stale screenshots: attach-to-sleakops.png (old button text) and subscription-menu.png (pre-Billing-group sidebar) need retakes; several new sections carry TODO: screenshot markers.
  • Dockertron page overlap: project/dockertron.mdx vs powered-ai/dockertron.mdx still overlap in scope (factual contradictions fixed; consolidation is an IA decision).
  • No spec exists for the Infrastructure Chat (powered-ai/conversation.mdx) — page left as-is; chatbot has no as-built spec in engineering-specs yet.
  • DocumentDB & RabbitMQ have real config surfaces but no dedicated dependency guide pages.
  • Application Catalog / presets (/projects/presets/*) is substantial and completely undocumented (deliberately not squeezed into the Dockertron page since it's not an AI feature).
  • content/tutorials/*/networking-vpc.mdx repeats the removed Transit Gateway claim (tutorials were out of scope for this pass).
  • build_resources.mdx frontmatter sidebar_label still says "Deploy Build Resources" vs the real console card "Deploy and Build Resources" (body fixed; renaming sidebar labels was out of scope).
  • Readiness-report prompt (AUTODIAGNOSTIC_UNMANAGED_MANIFESTS_READINESS) may still be pending in Langfuse — the new FAQ documents intended behavior; worth confirming it's operational.

🤖 Generated with Claude Code

Full EN+ES audit of the 10 documented areas (cluster, domain,
environment, project, provider, network, user, powered-ai,
subscribe-using-aws, upgrades) against the as-built feature specs in
engineering-specs/features/. Fixes stale claims (wrong defaults,
removed features described as current, copy-paste rot between
dependency engine pages), documents previously-missing customer-facing
features (environment clone/export, addon import-export, pending-changes
flow, S3 import, EKS upgrade drawer tab), removes two orphaned stale ES
duplicate pages, and translates two ES pages that were still in English.

Internal-only spec facts (FSM states, Celery/Pulumi internals,
feature-flag plumbing, admin-only workflows) are deliberately excluded;
the PR description lists each area's exclusions for review.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jul 3, 2026

Copy link
Copy Markdown

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b3ef53f1-618d-4d23-b6ca-2a7fd270daee

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/spec-sync-audit

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

lucaiz and others added 2 commits July 3, 2026 15:47
…arity

Review findings on #220: the dependency spec lists restore_database for
aurora-postgresql too, so the same FAQ added to postgresql-aws.mdx now
also exists on the Aurora page (EN+ES); the ES Validation Checklist
description was missing "dependencias" vs the EN list.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Six independent verifiers re-checked the full PR diff against the specs,
fixtures, and code (~180 claims). Fixes:

- user/vpn: restore the 24-hour URI validity — the TTL is real (Pritunl
  key-link Mongo TTL index, 86400s default in the pinned version); the
  earlier removal was wrong.
- vargroup/deployment: deletion always redeploys the projects that use
  the group (backend ignores the modal's deploy switch on destroy);
  restore the forced-deploy bullet and reword the delete FAQ. Reveal is
  account-scoped; mount path unique per project.
- chart_dependencies: Edit updates values only (version is immutable in
  the drawer); last-dependency removal skips validation.
- domain: alias certificate reuse requires the parent in the same
  account; deploy-toggle wording no longer promises a queued approval.
- environment: domain change doesn't provision DNS zone/certificate on
  rename paths; provider (not account) base domain makes an env root;
  Change Domain is an edit button, clone destination "even" not "only"
  a different cluster/account.
- powered-ai/dockertron: real empty-state prompt text vs creation-form
  banner; entry points land on the hub (no auto-open drawer); PR flag is
  platform-global.
- volumes: deletion triggers an immediate redeploy by default.
- project: Buildpacks builder image isn't auto-detected; arch change
  only queues a deployment if some exist; autodiagnostic covers Workers.
- dependency: S3 KMS uses the AWS-managed aws/s3 key (none is created);
  DataSync role is customer-created via the CFN quick-create link; MSK
  client auth optional (defaults unauthorized, lowercase values);
  restore targets are PostgreSQL-family only; PITR/snapshot mutual
  exclusion on mysql/postgres too.
- cluster: nodepool delete guard is the last non-internal pool; LokiV2
  is self-service; provider suspended-account remedy is AWS Support.
- style: ES voseo/tuteo/usted register normalized in new content; ES
  mistranslation "por vos" fixed; stray ES-only sentence removed.

Full Docusaurus build green (onBrokenLinks: throw).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@lucaiz

lucaiz commented Jul 5, 2026

Copy link
Copy Markdown
Contributor Author

Re-validación adversarial (segunda pasada, modelo distinto) — 6 verificadores independientes re-chequearon todo el diff del PR contra los specs, fixtures y código (~180 claims verificados). Resultado: ~12 errores confirmados, todos corregidos en 04c5f51d1.

Los dos hallazgos más importantes (reversiones de la primera pasada):

  • user/vpn: la validez de 24 h del perfil URI es real — vive en el código de Pritunl (índice TTL de Mongo users_key_link, default 86400 s en la versión pineada v1.32.4400.99), no en core. La primera pasada la había eliminado por no encontrar TTL en nuestro código; restaurada.
  • vargroup / more_on_deployment: eliminar un Variable Group siempre redespliega los proyectos que lo usan (el backend ignora el switch de deploy del modal en el destroy). El bullet original de "forced deployment" era correcto; restaurado y FAQ corregida.

El resto: calificador de misma cuenta para la reutilización de certificados de alias, el cambio de dominio de un Environment no aprovisiona zona/certificado en renombres, Edit de chart-dependencies solo edita values, la clave KMS de S3 es la gestionada por AWS (aws/s3), entry points reales de Dockertron, guard de borrado de nodepools (último pool no-interno), y normalización de registro (voseo/tuteo/usted) en el contenido ES nuevo. Build completo de Docusaurus en verde.

…ainst code

Third pass: three validators checked the doc areas the engineering-specs
don't describe, directly against console@main (Release 2.12.0),
core@main, and the chatbot repo.

Workloads (console forms + core serializers):
- terminationGracePeriod default is 120, not 30; Timeout Seconds row had
  the Initial Delay description; 130% limit example rounds to 666Mi;
  added missing Replicas rows (webservice default 2, worker default 1)
  and dropped worker's copy-pasted "minimum of 2 replicas" claim.
- worker/cronjob/hook/job docs each skipped a real form step (Settings
  with Grace Period) — steps added and renumbered; hooks offer exactly 4
  events (pre/post upgrade, pre/post rollback); cron times are UTC;
  added the Concurrency Policy row; job monitoring caps at 24 hours (not
  ~30 min) and marks the Job failed on timeout; jobs run, they don't
  "deploy".
- Health-check failure unroutes traffic (readiness probe only); it does
  not restart the pod.

Conversation / AI chat (chatbot repo + Lambda IAM):
- Live infrastructure access is gated (allowlist/flag, default off) —
  most users get knowledge-base answers; removed an example promising
  env-var access (secrets are blocked by IAM/RBAC construction); the
  real guarantee is read-only execution (it cannot modify infra), not
  "no commands, everything needs confirmation"; data scope is
  account-wide read-only with K8s Secrets / Secrets Manager / KMS / SSM
  explicitly denied.

Access flows (console + core):
- The "Get Access" drawer no longer exists — AWS access is an inline
  dashboard card ("Get AWS and VPN Access") with an AWS Account Switcher
  button; kubeconfig buttons are Download/Copy Kubeconfig; the Headlamp
  tip keeps its no-Kubeconfig/Lens angle but now notes the VPN
  requirement; Dockerfile path examples must start with "./" (except
  GitLab) as the form enforces; build-args Textmode ignores spaces
  around "=".

Full Docusaurus build green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@lucaiz

lucaiz commented Jul 5, 2026

Copy link
Copy Markdown
Contributor Author

Tercera pasada — validación code-first (core/console/chatbot → docs) en 65b83b0da. Las dos pasadas anteriores eran spec-first; esta valida directamente contra el código las áreas que los specs no cubren:

  • Workloads (forms de console@main + serializers de core): el default de terminationGracePeriod es 120, no 30; worker/cronjob/hook/job salteaban un paso real del formulario (Settings con Grace Period) — agregado y renumerado; los hooks ofrecen exactamente 4 eventos; el cron es UTC; el monitoreo de Jobs corta a las 24 h (no ~30 min) y marca el Job como fallido; el health check solo desenruta tráfico (readiness probe), no reinicia el pod; filas de Replicas agregadas.
  • powered-ai/conversation.mdx (primera validación de esta página — no tiene spec): el acceso a infraestructura en vivo está gateado (allowlist/flag, default off); se eliminó un ejemplo que prometía leer variables de entorno (los secretos están bloqueados por construcción IAM/RBAC); la garantía real es ejecución de solo lectura (no puede modificar infra), no "no ejecuta comandos".
  • Flujos de acceso: el drawer "Get Access" documentado ya no existe (removido en console #1770) — hoy es la card inline "Get AWS and VPN Access" con el botón AWS Account Switcher; los ejemplos de Dockerfile path debían empezar con ./ (el form los rechazaba tal como estaban documentados); el tip de Headlamp ahora aclara que requiere VPN.

Pendiente de captura (texto ya corregido): USER-get-access.png y USER-account-switcher.png muestran la UI retirada. Build de Docusaurus en verde. Con esto el PR queda validado en tres niveles: specs → docs, verificación adversarial del diff, y código real → docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant