fix(openbao): improve install command#430
Conversation
There was a problem hiding this comment.
Pull request overview
This PR enhances the oms install openbao workflow to support deploying OpenBao into a configurable Kubernetes namespace while avoiding Helm ownership conflicts when the (cluster-scoped) Bank-Vaults Operator is already installed elsewhere. It also adjusts the install flow to create the target namespace explicitly, improves DR-restore handling of unseal keys, and introduces an interactive confirmation for destructive fresh initialization.
Changes:
- Add
--namespace/-n,--age-key-file/-k, and--yes/-yflags to the CLI and document them. - Update the OpenBao installer to be namespace-aware, skip operator redeploy when already present cluster-wide, ensure namespace creation before applying the Vault CR, and refine initialization/cleanup logic.
- Enable
preFlightChecks: truein the Vault CR manifest and update installer tests accordingly.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
NOTICE |
Fix jsonpatch/v2 license URL. |
internal/tmpl/NOTICE |
Keep NOTICE template in sync with corrected license URL. |
internal/installer/openbao.go |
Namespace-aware install flow, operator redeploy avoidance, DR restore secret handling, new pod readiness wait, expanded cleanup. |
internal/installer/openbao_test.go |
Update/add tests for new operator deployment behavior and manifest expectations. |
internal/installer/manifests/openbao/vault-cr.yaml |
Enable bank-vaults unseal preFlightChecks. |
docs/oms_install_openbao.md |
Document new CLI flags (--namespace, --age-key-file, --yes). |
cli/cmd/install_openbao.go |
Wire new flags, set SOPS key file env var, add destructive-operation confirmation prompt. |
Comments suppressed due to low confidence (1)
internal/installer/openbao.go:618
hasExistingDeploymentlists PVCs in the target namespace. If the namespace doesn't exist yet (common for a new--namespace), the List call can return a NotFound error and currently bubbles up, aborting the install during the confirmation check. It should treat a NotFound namespace as “no existing deployment” and return (false, nil).
// Vault CR gone but PVCs may linger (e.g. CR was manually deleted).
pvcList, err := o.Clientset.CoreV1().PersistentVolumeClaims(o.Config.Namespace).List(
o.ctx, metav1.ListOptions{LabelSelector: "vault_cr=openbao"},
)
if err != nil {
return false, fmt.Errorf("listing PVCs: %w", err)
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
3614c52 to
8ac7141
Compare
Signed-off-by: Jcing95 <23337729+Jcing95@users.noreply.github.com>
42eba86 to
6044373
Compare
- Fix retry_join and BAO_CLUSTER_ADDR/BAO_API_ADDR DNS format: the headless service segment was missing, producing addresses of the form openbao-N.<namespace>.svc.cluster.local instead of the correct openbao-N.openbao.<namespace>.svc.cluster.local. Introduce openBaoHeadlessService constant to avoid repeating the string. - Default cfg.Namespace to DefaultOpenBaoNamespace in NewOpenBaoInstaller before constructing the Helm client, so the client is never initialised with an empty namespace when --namespace is omitted. - Wait for stale PVCs to be fully deleted after issuing the Delete calls in CleanStaleInstallState. Add waitForPVCsGone helper (mirrors waitForVaultPodsGone) to poll until no vault_cr=openbao PVCs remain, preventing races between terminating PVCs and new StatefulSet creation. - Correct --yes flag description: confirmation is only prompted when an existing deployment is detected without a DR backup, not on every fresh install. Update both the CLI flag and the docs page.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Track which resources were actually deleted and report them. On a fresh install into a new namespace where nothing existed, the previous message 'Stale install state cleaned (CR, pods, PVCs, unseal secret)' was unconditionally printed even when every delete was a no-op (NotFound). Now logs either: 'Cleaned stale resources: Vault CR, 2 PVC(s), unseal secret' or 'No stale install state found in namespace "my-new-ns"'
OliverTrautvetter
left a comment
There was a problem hiding this comment.
Some comments to clean the code a bit
| _, err = secretsClient.Create(o.ctx, secret, metav1.CreateOptions{}) | ||
| if err != nil && !k8serrors.IsAlreadyExists(err) { | ||
| return fmt.Errorf("creating unseal secret from backup: %w", err) | ||
| } |
There was a problem hiding this comment.
When Create fails with AlreadyExists, we return nil without updating the existing secret's data. But I think the secret can have wrong/empty data
| // 2. Wait for all vault pods to terminate | ||
| // 3. Delete PVCs (removes stale Raft data that would confuse initialization) | ||
| // 4. Delete the unseal-keys Secret | ||
| func (o *OpenBaoInstaller) CleanStaleInstallState() error { |
There was a problem hiding this comment.
This function has not tests?
| err = o.Logger.Step("Cleaning stale install state", o.CleanStaleInstallState) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to remove stale unseal keys: %w", err) | ||
| return fmt.Errorf("failed to clean stale install state: %w", err) | ||
| } | ||
| } | ||
|
|
||
| err = o.Logger.Step("Ensuring namespace exists", func() error { |
There was a problem hiding this comment.
So we clean before we ensure that it exists? Sounds slightly weird 🤔 maybe we should ensure first?
| if nsErr == nil { | ||
| rel, err := o.Helm.FindRelease(o.Config.Namespace, cfg.ReleaseName) | ||
| if err != nil { | ||
| return err |
There was a problem hiding this comment.
we should wrap it like
return fmt.Errorf("finding release %s in namespace %s: %w", cfg.ReleaseName, o.Config.Namespace, err)
| return nil | ||
| } | ||
|
|
||
| fmt.Printf("\nWARNING: No DR backup found at: %s\n", c.Opts.DRBackupPath) |
There was a problem hiding this comment.
We use log. Can we change all the fmt.Print stuff to log.Print
| corev1 "k8s.io/api/core/v1" | ||
| ) | ||
|
|
||
| // Test helpers — exported only during `go test` so external test packages |
There was a problem hiding this comment.
Looks like at least backupUnsealKeys is missing and not tested
| // handles the case where the bank-vaults operator deletes or recreates the | ||
| // secret during Vault CR reconciliation — we simply re-apply it until the | ||
| // operator settles and the sidecar can successfully unseal. | ||
| func (o *OpenBaoInstaller) WaitForInitialization() error { |
There was a problem hiding this comment.
no new tests for the new logic in "WaitForInitialization"
| return fmt.Errorf("failed to read confirmation: %w", err) | ||
| } | ||
| if strings.TrimSpace(strings.ToLower(input)) != "yes" { | ||
| return fmt.Errorf("aborted: type 'yes' to continue or pass --yes") |
There was a problem hiding this comment.
Confusing message. If it is aborted, how can we continue. Aborted sounds like a big error, but it is just a typo / failed confirm
| if c.Opts.AgeKeyFile != "" { | ||
| if err := os.Setenv("SOPS_AGE_KEY_FILE", c.Opts.AgeKeyFile); err != nil { | ||
| return fmt.Errorf("setting SOPS_AGE_KEY_FILE: %w", err) |
There was a problem hiding this comment.
changes global process state permanently.
We should pass the key file path through the OpenBaoInstallerConfig
| if delErr != nil && !k8serrors.IsNotFound(delErr) { | ||
| return fmt.Errorf("deleting Vault CR: %w", delErr) | ||
| } | ||
| if delErr == nil { | ||
| cleaned = append(cleaned, "Vault CR") | ||
| } | ||
|
|
||
| if err := o.waitForVaultPodsGone(); err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
Why do we need to wait for a clear if Vault CR didn't exist.
We should only wait for pods when a Vault CR was actually deleted
Summary
--namespace/-nflag tooms install openbaoto allow deploying OpenBao to a custom Kubernetes namespace (defaults tovaultfor backward compatibility)os.Setenvreturn value flagged by the linter