fix(backup): skip stanza-create on standby to enable prefer-standby backups#1
Open
robrigo wants to merge 1 commit into
Open
fix(backup): skip stanza-create on standby to enable prefer-standby backups#1robrigo wants to merge 1 commit into
robrigo wants to merge 1 commit into
Conversation
…ackups
The Backup RPC called CreatePgbackrestStanza unconditionally. pgbackrest
stanza-create needs a primary connection (it reads the live cluster's control
data to initialize stanza metadata in the repo), so on a standby it fails with
exit code 56 ("unable to find primary cluster") and aborts the whole backup —
making target: prefer-standby / standby backups unusable (issue operasoftware#83).
Skip stanza-create when the instance is a standby. Safe because the stanza is
already initialized by the primary (its own backups + WAL archiving's
CheckWalArchiveDestination run against the same repo), so by the time a standby
backup runs the stanza exists. Role is detected via the standby.signal file in
PGDATA — the same signal PostgreSQL and CNPG itself use — so no DB connection or
credentials are needed at backup time. A missing/odd PGDATA falls back to the
primary path (stanza-create runs), preserving current behavior.
Adds the instance package's first unit-test suite (Ginkgo) covering
instanceIsStandby: primary, standby, and missing-PGDATA cases.
Motivation: pgBackRest backups are the largest recurring read-IO event on our
primary data volumes; offloading them to a replica needs this fix.
Refs: operasoftware#83
There was a problem hiding this comment.
Pull request overview
This PR updates the instance-side Backup RPC to avoid running pgbackrest stanza-create on standby instances, enabling target: prefer-standby / standby backups to succeed by relying on stanza initialization performed on the primary.
Changes:
- Detect standby instances via
PGDATA/standby.signaland skipstanza-createwhen running on a standby. - Add a new
instanceIsStandbyhelper to encapsulate role detection. - Introduce the first unit test suite for
internal/cnpgi/instance(Ginkgo/Gomega) and coverinstanceIsStandby.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| internal/cnpgi/instance/backup.go | Skips stanza-create on standby; adds instanceIsStandby helper for role detection. |
| internal/cnpgi/instance/backup_test.go | Adds unit tests for instanceIsStandby behavior across primary/standby/missing PGDATA. |
| internal/cnpgi/instance/suite_test.go | Adds Ginkgo suite bootstrap for the internal/cnpgi/instance package. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+198
to
+200
| func instanceIsStandby(pgDataPath string) (bool, error) { | ||
| return fileutils.FileExists(filepath.Join(pgDataPath, "standby.signal")) | ||
| } |
Comment on lines
+50
to
+58
| It("treats a non-existent PGDATA as a primary rather than erroring", func() { | ||
| // FileExists returns (false, nil) when the directory does not exist, so a | ||
| // missing/odd PGDATA path falls back to the primary path (stanza-create | ||
| // runs) instead of aborting the backup. | ||
| isStandby, err := instanceIsStandby(filepath.Join(pgData, "does-not-exist")) | ||
| Expect(err).NotTo(HaveOccurred()) | ||
| Expect(isStandby).To(BeFalse()) | ||
| }) | ||
| }) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The
BackupRPC (internal/cnpgi/instance/backup.go) callsCreatePgbackrestStanzaunconditionally before every backup. pgbackreststanza-createrequires a primary PostgreSQL connection — it reads the live cluster's control data to initialize the stanza metadata in the repo. On a standby it fails with pgbackrest exit code 56 (unable to find primary cluster), which aborts the whole backup.The practical effect:
ScheduledBackup/Backupwithtarget: prefer-standby(orstandby) is unusable. Any cluster that wants to offload the read-heavy backup I/O from its primary is forced back totarget: primary.This is the upstream issue we filed: operasoftware#83.
Fix
Skip
stanza-createwhen the instance running the backup is a standby.This is safe: the stanza is initialized by the primary — both its own backups and WAL archiving's
CheckWalArchiveDestinationrun against the same repo — so by the time a standby backup runs, the stanza already exists.stanza-createis only genuinely required on the primary.Role detection uses the
standby.signalfile in PGDATA — the same signal PostgreSQL itself uses to start in standby mode, and the same mechanism CloudNativePG uses to distinguish a replica from a primary (pkg/utils/ instance manager). This means no database connection or credentials are needed at backup time. A missing/odd PGDATA falls back to the primary path (stanza-create runs), preserving today's behavior.Tests
internal/cnpgi/instancepackage's first unit-test suite (Ginkgo/Gomega).instanceIsStandbycovered for: primary (no signal → false), standby (signal present → true), and missing-PGDATA (→ false, falls back to primary path).go build ./...,go vet,gofmt -l, and the non-infra unit suites all pass. (Theinternal/controllerenvtest suite andtest/e2erequire a live cluster /envtestbinaries and are unaffected by this change.)Motivation
pgBackRest backups are the single largest recurring read-IO event on our primary data volumes (they pin the volume to ~100% utilization twice daily). Offloading them to a replica via
prefer-standbyneeds this fix.Upstream
We'll offer this (or an equivalent) to upstream against operasoftware#83. This fork PR lets us build a FACINGS image and validate on staging in the meantime.