Skip to content

fix(backup): skip stanza-create on standby to enable prefer-standby backups#1

Open
robrigo wants to merge 1 commit into
mainfrom
feat/skip-stanza-create-on-standby
Open

fix(backup): skip stanza-create on standby to enable prefer-standby backups#1
robrigo wants to merge 1 commit into
mainfrom
feat/skip-stanza-create-on-standby

Conversation

@robrigo

@robrigo robrigo commented Jun 1, 2026

Copy link
Copy Markdown

Problem

The Backup RPC (internal/cnpgi/instance/backup.go) calls CreatePgbackrestStanza unconditionally before every backup. pgbackrest stanza-create requires a primary PostgreSQL connection — it reads the live cluster's control data to initialize the stanza metadata in the repo. On a standby it fails with pgbackrest exit code 56 (unable to find primary cluster), which aborts the whole backup.

The practical effect: ScheduledBackup/Backup with target: prefer-standby (or standby) is unusable. Any cluster that wants to offload the read-heavy backup I/O from its primary is forced back to target: primary.

This is the upstream issue we filed: operasoftware#83.

Fix

Skip stanza-create when the instance running the backup is a standby.

This is safe: the stanza is initialized by the primary — both its own backups and WAL archiving's CheckWalArchiveDestination run against the same repo — so by the time a standby backup runs, the stanza already exists. stanza-create is only genuinely required on the primary.

Role detection uses the standby.signal file in PGDATA — the same signal PostgreSQL itself uses to start in standby mode, and the same mechanism CloudNativePG uses to distinguish a replica from a primary (pkg/utils / instance manager). This means no database connection or credentials are needed at backup time. A missing/odd PGDATA falls back to the primary path (stanza-create runs), preserving today's behavior.

isStandby, err := instanceIsStandby(b.PGDataPath)
// ...
if isStandby {
    // skip — stanza is initialized by the primary
} else {
    backupCmd.CreatePgbackrestStanza(...)
}

Tests

  • Adds the internal/cnpgi/instance package's first unit-test suite (Ginkgo/Gomega).
  • instanceIsStandby covered for: primary (no signal → false), standby (signal present → true), and missing-PGDATA (→ false, falls back to primary path).
  • go build ./..., go vet, gofmt -l, and the non-infra unit suites all pass. (The internal/controller envtest suite and test/e2e require a live cluster / envtest binaries and are unaffected by this change.)

Motivation

pgBackRest backups are the single largest recurring read-IO event on our primary data volumes (they pin the volume to ~100% utilization twice daily). Offloading them to a replica via prefer-standby needs this fix.

Upstream

We'll offer this (or an equivalent) to upstream against operasoftware#83. This fork PR lets us build a FACINGS image and validate on staging in the meantime.

…ackups

The Backup RPC called CreatePgbackrestStanza unconditionally. pgbackrest
stanza-create needs a primary connection (it reads the live cluster's control
data to initialize stanza metadata in the repo), so on a standby it fails with
exit code 56 ("unable to find primary cluster") and aborts the whole backup —
making target: prefer-standby / standby backups unusable (issue operasoftware#83).

Skip stanza-create when the instance is a standby. Safe because the stanza is
already initialized by the primary (its own backups + WAL archiving's
CheckWalArchiveDestination run against the same repo), so by the time a standby
backup runs the stanza exists. Role is detected via the standby.signal file in
PGDATA — the same signal PostgreSQL and CNPG itself use — so no DB connection or
credentials are needed at backup time. A missing/odd PGDATA falls back to the
primary path (stanza-create runs), preserving current behavior.

Adds the instance package's first unit-test suite (Ginkgo) covering
instanceIsStandby: primary, standby, and missing-PGDATA cases.

Motivation: pgBackRest backups are the largest recurring read-IO event on our
primary data volumes; offloading them to a replica needs this fix.

Refs: operasoftware#83

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the instance-side Backup RPC to avoid running pgbackrest stanza-create on standby instances, enabling target: prefer-standby / standby backups to succeed by relying on stanza initialization performed on the primary.

Changes:

  • Detect standby instances via PGDATA/standby.signal and skip stanza-create when running on a standby.
  • Add a new instanceIsStandby helper to encapsulate role detection.
  • Introduce the first unit test suite for internal/cnpgi/instance (Ginkgo/Gomega) and cover instanceIsStandby.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
internal/cnpgi/instance/backup.go Skips stanza-create on standby; adds instanceIsStandby helper for role detection.
internal/cnpgi/instance/backup_test.go Adds unit tests for instanceIsStandby behavior across primary/standby/missing PGDATA.
internal/cnpgi/instance/suite_test.go Adds Ginkgo suite bootstrap for the internal/cnpgi/instance package.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +198 to +200
func instanceIsStandby(pgDataPath string) (bool, error) {
return fileutils.FileExists(filepath.Join(pgDataPath, "standby.signal"))
}
Comment on lines +50 to +58
It("treats a non-existent PGDATA as a primary rather than erroring", func() {
// FileExists returns (false, nil) when the directory does not exist, so a
// missing/odd PGDATA path falls back to the primary path (stanza-create
// runs) instead of aborting the backup.
isStandby, err := instanceIsStandby(filepath.Join(pgData, "does-not-exist"))
Expect(err).NotTo(HaveOccurred())
Expect(isStandby).To(BeFalse())
})
})
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants