Skip to content

K8SPSMDB-1363: Snapshot-based backups#2247

Merged
hors merged 70 commits intomainfrom
K8SPSMDB-1363
Mar 10, 2026
Merged

K8SPSMDB-1363: Snapshot-based backups#2247
hors merged 70 commits intomainfrom
K8SPSMDB-1363

Conversation

@mayankshah1607
Copy link
Copy Markdown
Member

@mayankshah1607 mayankshah1607 commented Feb 18, 2026

CHANGE DESCRIPTION

This PR adds support for volume snapshot (external) backups in the Percona Server MongoDB Operator. Users can create CSI-based volume snapshots of MongoDB data volumes and restore from them, without using object storage.

Changes Overview

1. New Backup Type: external

A new backup type external is added for volume snapshot backups. It uses the Kubernetes CSI Volume Snapshot API instead of PBM object storage.

Supported backup types:

  • logical – logical (mongodump-style) backups
  • physical – physical backups to object storage
  • incremental / incremental-base – incremental backups
  • external – volume snapshot backups (new)

2. CRD and API Changes

PerconaServerMongoDBBackup

Spec:

  • type: new enum value external
  • volumeSnapshotClass: (optional) name of the VolumeSnapshotClass used for snapshot backups; required when type is external
  • storageName: optional when type is external (no object storage)

Status:

  • snapshots: array of SnapshotInfo with replsetName and snapshotName for each replset

PerconaServerMongoDBRestore

Status:

  • conditions: array of metav1.Condition for snapshot restore phases:
    • PBMAgentConfiguredForSnapshot
    • ReplsetPVCsRestoredFromSnapshot
    • PBMAgentAwaitingRestoreFinish
    • PBMRestoreFinishing
    • PBMRestoreFinished

PerconaServerMongoDB (scheduled backup tasks)

BackupTaskSpec:

  • type: new enum value external
  • volumeSnapshotClass: (optional) name of the VolumeSnapshotClass; required when type is external
  • For external tasks, storageName is not required

3. Examples

Demand backup (PerconaServerMongoDBBackup)

apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDBBackup
metadata:
  name: my-snapshot-backup
spec:
  type: external
  clusterName: my-cluster
  volumeSnapshotClass: csi-gce-pd-snapshot-class  # Required for external type

Restore from snapshot backup

apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDBRestore
metadata:
  name: my-snapshot-restore
spec:
  clusterName: my-cluster
  backupName: my-snapshot-backup  # References the external backup

Or with backupSource (e.g. cross-cluster restore):

apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDBRestore
metadata:
  name: my-snapshot-restore
spec:
  clusterName: my-cluster
  backupSource:
    type: external
    snapshots:
      - replsetName: rs0
        snapshotName: my-cluster-rs0-0
      - replsetName: rs0
        snapshotName: my-cluster-rs0-1
      - replsetName: rs0
        snapshotName: my-cluster-rs0-2

Scheduled snapshot backup task (PerconaServerMongoDB)

spec:
  backup:
    enabled: true
    storages:
      s3-us-west:
        type: s3
        s3:
          bucket: my-bucket
          credentialsSecret: backup-s3
    tasks:
      - name: daily-snapshot
        enabled: true
        schedule: "0 2 * * *"
        type: external
        volumeSnapshotClass: csi-gce-pd-snapshot-class
        retention:
          count: 7
          type: count
          deleteFromStorage: true

CHECKLIST

Jira

  • Is the Jira ticket created and referenced properly?
  • Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
  • Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

  • Is an E2E test/test case added for the new feature/change?
  • Are unit tests added where appropriate?
  • Are OpenShift compare files changed for E2E tests (compare/*-oc.yml)?

Config/Logging/Testability

  • Are all needed new/changed options added to default YAML files?
  • Are all needed new/changed options added to the Helm Chart?
  • Did we add proper logging messages for operator actions?
  • Did we ensure compatibility with the previous version or cluster upgrade process?
  • Does the change support oldest and newest supported MongoDB version?
  • Does the change support oldest and newest supported Kubernetes version?

Signed-off-by: Mayank Shah <mayank.shah@percona.com>
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
Copilot AI review requested due to automatic review settings February 18, 2026 11:13
@pull-request-size pull-request-size Bot added the size/XL 500-999 lines label Feb 18, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements snapshot-based backups for Percona Server for MongoDB, adding support for Kubernetes VolumeSnapshots as an alternative backup mechanism. The implementation introduces a new backup executor interface to handle different backup types (managed vs snapshot-based), integrates with the Kubernetes CSI snapshot API, and updates the CRD to support the new external backup type with volume snapshot configuration.

Changes:

  • Introduces backupExecutor interface to support multiple backup implementations (managed and snapshot-based)
  • Adds VolumeSnapshot support for external backups via new snapshot.go controller
  • Updates API types to include VolumeSnapshotClass field and SnapshotInfo status

Reviewed changes

Copilot reviewed 20 out of 21 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
pkg/apis/psmdb/v1/perconaservermongodbbackup_types.go Adds external backup type, VolumeSnapshotClass field, and SnapshotInfo struct to track volume snapshots
pkg/controller/perconaservermongodbbackup/snapshot.go New file implementing snapshot-based backup logic including VolumeSnapshot creation and reconciliation
pkg/controller/perconaservermongodbbackup/backup.go Refactors existing backup logic into managedBackups type and introduces backupExecutor interface
pkg/controller/perconaservermongodbbackup/psmdb_backup_controller.go Updates controller to select backup executor based on backup type and configuration
pkg/psmdb/backup/pbm.go Adds GetBackupByName and FinishBackup methods, wraps credentials with MaskedString for security
pkg/naming/naming.go Adds VolumeSnapshotName function to generate snapshot resource names
deploy/rbac.yaml, deploy/cw-rbac.yaml Grants operator permissions to create and manage VolumeSnapshot resources
config/crd/bases/.yaml, deploy/.yaml Updates CRD definitions to include new backup type and snapshot fields
cmd/manager/main.go Registers VolumeSnapshot v1 API scheme
go.mod, go.sum Updates Go version and PBM dependency version, adds kubernetes-csi/external-snapshotter client
deploy/bundle.yaml Contains deployment configuration with modified operator image reference

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/controller/perconaservermongodbbackup/snapshot.go Outdated
Comment thread pkg/controller/perconaservermongodbbackup/snapshot.go Outdated
Comment on lines +72 to +75
type SnapshotInfo struct {
NodeName string `json:"nodeName,omitempty"`
SnapshotName string `json:"snapshotName,omitempty"`
}
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SnapshotInfo struct lacks documentation. Add a comment describing what this struct represents (e.g., "SnapshotInfo contains information about a volume snapshot created for a MongoDB node during an external backup").

Copilot uses AI. Check for mistakes.
Comment thread pkg/controller/perconaservermongodbbackup/snapshot.go Outdated
Comment on lines 109 to 112
func (p *PerconaServerMongoDBBackup) CheckFields() error {
if len(p.Spec.StorageName) == 0 {
if len(p.Spec.StorageName) == 0 && p.Spec.Type != defs.ExternalBackup {
return fmt.Errorf("spec storageName field is empty")
}
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CheckFields method allows external backups without a VolumeSnapshotClass, but the controller only creates snapshot backups when both Type is ExternalBackup AND VolumeSnapshotClass is set. This could lead to a confusing scenario where a user creates an external backup without a VolumeSnapshotClass, and it falls through to the default managed backup path. Consider adding validation to ensure that if Type is ExternalBackup, VolumeSnapshotClass must be specified, or document this behavior clearly.

Copilot uses AI. Check for mistakes.
Comment thread pkg/controller/perconaservermongodbbackup/snapshot.go Outdated
Comment thread deploy/bundle.yaml Outdated
Comment thread pkg/psmdb/backup/pbm.go
Comment on lines +22 to +25
// VolumeSnapshotClass is the name of the VolumeSnapshotClass to use for snapshot based backups.
// This may be specified only when type is `external`.
// +kubebuilder:validation:Optional
VolumeSnapshotClass *string `json:"volumeSnapshotClass,omitempty"`
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation comment on lines 22-23 states that VolumeSnapshotClass "may be specified only when type is external", but there's no validation enforcing this constraint. Either add validation in the CheckFields method to ensure this is only set when Type is ExternalBackup, or add a kubebuilder validation marker to enforce this at the CRD level.

Copilot uses AI. Check for mistakes.
Comment thread pkg/naming/naming.go Outdated
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
@pull-request-size pull-request-size Bot added size/XXL 1000+ lines and removed size/XL 500-999 lines labels Feb 18, 2026
@github-actions github-actions Bot added tests dependencies Pull requests that update a dependency file labels Feb 18, 2026
Comment on lines +10 to +12
"github.com/percona/percona-backup-mongodb/pbm/ctrl"
"github.com/percona/percona-backup-mongodb/pbm/defs"
pbmErrors "github.com/percona/percona-backup-mongodb/pbm/errors"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[goimports-reviser] reported by reviewdog 🐶

Suggested change
"github.com/percona/percona-backup-mongodb/pbm/ctrl"
"github.com/percona/percona-backup-mongodb/pbm/defs"
pbmErrors "github.com/percona/percona-backup-mongodb/pbm/errors"

Comment thread cmd/manager/main.go
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[goimports-reviser] reported by reviewdog 🐶

// Import all Kubernetes client auth plugins (e.g. Azure, GCP, OIDC, etc.)
// to ensure that exec-entrypoint and run can make use of them.
_ "k8s.io/client-go/plugin/pkg/client/auth"

Signed-off-by: Mayank Shah <mayank.shah@percona.com>
Copilot AI review requested due to automatic review settings February 18, 2026 18:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 28 changed files in this pull request and generated 9 comments.

Comments suppressed due to low confidence (1)

pkg/apis/psmdb/v1/perconaservermongodbbackup_types.go:133

  • The CheckFields validation does not verify that VolumeSnapshotClass is specified when Type is external. This could allow users to create external backup resources without specifying the required VolumeSnapshotClass, leading to failures later when the operator tries to create snapshots. Consider adding validation to require VolumeSnapshotClass when Type is ExternalBackup.
func (p *PerconaServerMongoDBBackup) CheckFields() error {
	if len(p.Spec.StorageName) == 0 && p.Spec.Type != defs.ExternalBackup {
		return fmt.Errorf("spec storageName field is empty")
	}
	if len(p.Spec.GetClusterName()) == 0 {
		return fmt.Errorf("spec clusterName is empty")
	}
	if string(p.Spec.Type) == "" {
		p.Spec.Type = defs.LogicalBackup
	}
	if string(p.Spec.Compression) == "" {
		p.Spec.Compression = compress.CompressionTypeGZIP
	}
	return nil

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +30 to +32
if bcp.Spec.Type == defs.ExternalBackup {
// TODO: should we check that snapshots exist?
return nil
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For external (snapshot-based) backups, the validation skips checking if snapshots exist (as noted in the TODO). This could allow restore operations to proceed when the required VolumeSnapshots are missing or not ready, which would cause the restore to fail later in the process. Consider implementing a validation check to verify that all required snapshots exist and are in a ready state before allowing the restore to proceed.

Copilot uses AI. Check for mistakes.
Comment thread pkg/controller/perconaservermongodbrestore/snapshots.go Outdated
Comment thread pkg/controller/perconaservermongodbbackup/snapshot.go Outdated
Comment thread deploy/rbac.yaml
- get
- list
- watch
- create
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RBAC permissions for VolumeSnapshots are missing delete and update verbs. While the operator creates snapshots during backup, it may also need to clean them up when backups are deleted (as part of the finalizer logic), and potentially update snapshot metadata. Consider adding delete permission at minimum for proper cleanup. Review whether update and patch permissions are also needed for snapshot management.

Suggested change
- create
- create
- update
- patch
- delete

Copilot uses AI. Check for mistakes.
Comment thread pkg/controller/perconaservermongodbrestore/snapshots.go Outdated
Name: snapshotName,
}
pvc.SetAnnotations(map[string]string{
naming.AnnotationRestoreName: snapshotName,
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The annotation value is set to the snapshot name instead of the restore name. This is inconsistent with the check on line 423, where it compares restoreName == restore.Name. This means that if the snapshot name doesn't match the restore name, the PVC will be deleted and recreated unnecessarily. Consider using restore.Name here to ensure consistency.

Copilot uses AI. Check for mistakes.
Comment thread pkg/controller/perconaservermongodbrestore/snapshots.go Outdated
Comment thread pkg/controller/perconaservermongodbbackup/snapshot.go Outdated
Comment on lines +280 to +288
sfs.Spec.Template.Spec.Containers[0].Command = []string{"/opt/percona/pbm-agent"}
sfs.Spec.Template.Spec.Containers[0].Args = []string{
"restore-finish",
restore.Status.PBMname,
"-c", "/etc/pbm/pbm_config.yaml",
"--rs", "$(MONGODB_REPLSET)",
"--node", "$(POD_NAME).$(SERVICE_NAME)-$(MONGODB_REPLSET).$(NAMESPACE).svc.cluster.local",
// "--db-config", "/etc/pbm/db-config.yaml", // TODO
}
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential index out of bounds error. The code assumes that sfs.Spec.Template.Spec.Containers[0] exists, but there's no check to verify that the Containers slice has at least one element. If the StatefulSet has no containers (which would be unusual but possible in error scenarios), this will panic. Consider adding a length check or finding the container by name instead of assuming index 0.

Copilot uses AI. Check for mistakes.
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
Copilot AI review requested due to automatic review settings February 19, 2026 11:24
"sigs.k8s.io/controller-runtime/pkg/client"
logf "sigs.k8s.io/controller-runtime/pkg/log"

api "github.com/percona/percona-server-mongodb-operator/pkg/apis/psmdb/v1"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[goimports-reviser] reported by reviewdog 🐶

Suggested change
api "github.com/percona/percona-server-mongodb-operator/pkg/apis/psmdb/v1"
"github.com/percona/percona-backup-mongodb/pbm/ctrl"
"github.com/percona/percona-backup-mongodb/pbm/defs"
pbmErrors "github.com/percona/percona-backup-mongodb/pbm/errors"
api "github.com/percona/percona-server-mongodb-operator/pkg/apis/psmdb/v1"

Namespace: cr.Namespace,
},
}
err := r.client.Delete(ctx, snapshot)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'd be nice to log here that we are deleting the snapshot

Comment thread pkg/controller/perconaservermongodbrestore/snapshots.go
Comment on lines +562 to +564
if err := r.client.Delete(ctx, observedPVC); err != nil {
return false, errors.Wrapf(err, "delete pvc %s", pvcName)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any difference in the logic :/ Doesn't it still delete all the PVCs first and recreate?

Current logic looks like:

for pvc in pvcs:
  get pvc
  if get fails -> recreate
  if get succeeds -> delete

get will succeed for all PVCs when the for loop executed first

return false, nil
}

if err := r.client.Delete(ctx, observedPVC); err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'd be nice to log here that we are deleting the PVC

Signed-off-by: Mayank Shah <mayank.shah@percona.com>
Copilot AI review requested due to automatic review settings March 9, 2026 06:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 53 out of 54 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

mode: requireTLS
backup:
enabled: true
image: perconalab/percona-server-mongodb-operator:main-backup"
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backup image value has an extra trailing double-quote (...:main-backup"), which will be treated as part of the image reference and likely cause image pull failures in this E2E test. Remove the stray quote or set the intended image reference.

Suggested change
image: perconalab/percona-server-mongodb-operator:main-backup"
image: perconalab/percona-server-mongodb-operator:main-backup

Copilot uses AI. Check for mistakes.
mode: requireTLS
backup:
enabled: true
image: perconalab/percona-server-mongodb-operator:main-backup"
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backup image value has an extra trailing double-quote (...:main-backup"), which will be treated as part of the image reference and likely cause image pull failures in this E2E test. Remove the stray quote or set the intended image reference.

Suggested change
image: perconalab/percona-server-mongodb-operator:main-backup"
image: perconalab/percona-server-mongodb-operator:main-backup

Copilot uses AI. Check for mistakes.
mode: requireTLS
backup:
enabled: true
image: perconalab/percona-server-mongodb-operator:main-backup"
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backup image value has an extra trailing double-quote (...:main-backup"), which will be treated as part of the image reference and likely cause image pull failures in this E2E test. Remove the stray quote or set the intended image reference.

Suggested change
image: perconalab/percona-server-mongodb-operator:main-backup"
image: perconalab/percona-server-mongodb-operator:main-backup

Copilot uses AI. Check for mistakes.
Comment thread deploy/cw-rbac.yaml
- get
- list
- watch
- create
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ClusterRole in cw-rbac grants create/get/list/watch for VolumeSnapshots but not delete. The operator deletes VolumeSnapshots during backup finalization (and likely during cleanup), so missing delete will cause RBAC forbidden errors in cluster-wide deployments. Add delete to the verbs for snapshot.storage.k8s.io/volumesnapshots here (mirroring deploy/rbac.yaml).

Suggested change
- create
- create
- delete

Copilot uses AI. Check for mistakes.
Comment thread deploy/cw-bundle.yaml
Comment on lines +26751 to +26759
- apiGroups:
- snapshot.storage.k8s.io
resources:
- volumesnapshots
verbs:
- get
- list
- watch
- create
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cw-bundle ClusterRole rule for snapshot.storage.k8s.io/volumesnapshots is missing the delete verb. The operator code deletes VolumeSnapshots (e.g., during backup finalizer cleanup), so this bundle will fail with RBAC forbidden in cluster-wide installs. Include delete in the verbs list here (keep in sync with deploy/bundle.yaml and deploy/rbac.yaml).

Copilot uses AI. Check for mistakes.
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
egegunes
egegunes previously approved these changes Mar 9, 2026
gkech
gkech previously approved these changes Mar 9, 2026
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
Copilot AI review requested due to automatic review settings March 9, 2026 11:29
@mayankshah1607 mayankshah1607 dismissed stale reviews from gkech and egegunes via 92ec618 March 9, 2026 11:29
@mayankshah1607 mayankshah1607 requested review from egegunes and gkech March 9, 2026 11:29
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 53 out of 54 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +314 to +316
if sfs.Spec.Replicas != nil && *sfs.Spec.Replicas == 0 {
return nil
}
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In snapshot-restore scale-down reconciliation, the early return when sfs.Spec.Replicas is already 0 skips configuring the pod template (pbm-agent command/args, probes, volumes/mounts). If a StatefulSet is already scaled down (manually or by a prior step) the restore can proceed with an unmodified template and later scale-up will start mongod instead of pbm-agent restore-finish. Consider removing this shortcut and instead patching when the template is not yet in the expected restore-finish configuration (e.g., check container command/args) regardless of replicas count.

Copilot uses AI. Check for mistakes.
Comment on lines +112 to +118
if len(bcp.Status.Snapshots) > 0 {
if err := r.validateSnapshotExistence(ctx, bcp); err != nil {
return errors.Wrap(err, "validate snapshot existence")
}
}

return nil
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validateExternalBackup silently accepts external backups with an empty .status.snapshots list. Since the restore flow later requires snapshots per replset (and the controller only allows restores when the backup is Ready), this should fail fast with a clear validation error (or explicitly wait) when snapshots are missing, rather than letting the restore proceed and error deeper in reconciliation.

Copilot uses AI. Check for mistakes.
stg, err := r.getPBMStorage(ctx, cluster, cr)
if err != nil {
return errors.Wrap(err, "get storage")
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, for external backups getPBMStorage will fail since there's no object storage.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed b00ffe3

Comment thread deploy/backup/backup.yaml
clusterName: my-cluster-name
storageName: s3-us-west
# volumeSnapshotClass: YOUR-VOLUME-SNAPSHOT-CLASS
# type: physical
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we improve this file? or create separate file for external?
As I understand we don't need storageName for external backup and all options in one file can be confusing.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in the DS, let's keep 1 file for simplicity

}
return nil
case len(cr.Status.Snapshots) > 0:
if err := r.deleteVolumeSnapshots(ctx, cr); err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we use here len, not
case cr.Status.Type == defs.ExternalBackup

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particular reason, just to be consistent with the other checks. It should yield the same result nevertheless. Do you think we need to check the type?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's minor, but yes. In other places we checked type, so it looks reasonable.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe consistency with other checks it could be case cr.Status.Snapshots != nil

Signed-off-by: Mayank Shah <mayank.shah@percona.com>
@JNKPercona
Copy link
Copy Markdown
Collaborator

Test Name Result Time
arbiter passed 00:11:07
balancer passed 00:18:44
cross-site-sharded passed 00:18:32
custom-replset-name passed 00:10:18
custom-tls passed 00:13:55
custom-users-roles passed 00:10:08
custom-users-roles-sharded passed 00:11:31
data-at-rest-encryption passed 00:12:37
data-sharded passed 00:23:32
demand-backup passed 00:15:25
demand-backup-eks-credentials-irsa passed 00:00:08
demand-backup-fs passed 00:22:58
demand-backup-if-unhealthy passed 00:10:36
demand-backup-incremental-aws passed 00:12:02
demand-backup-incremental-azure passed 00:11:53
demand-backup-incremental-gcp-native passed 00:11:43
demand-backup-incremental-gcp-s3 passed 00:10:57
demand-backup-incremental-minio passed 00:25:34
demand-backup-incremental-sharded-aws passed 00:19:15
demand-backup-incremental-sharded-azure passed 00:18:25
demand-backup-incremental-sharded-gcp-native passed 00:17:43
demand-backup-incremental-sharded-gcp-s3 passed 00:17:37
demand-backup-incremental-sharded-minio passed 00:27:57
demand-backup-physical-parallel passed 00:08:25
demand-backup-physical-aws passed 00:12:54
demand-backup-physical-azure passed 00:11:42
demand-backup-physical-gcp-s3 passed 00:11:36
demand-backup-physical-gcp-native passed 00:11:55
demand-backup-physical-minio passed 00:21:06
demand-backup-physical-minio-native passed 00:26:32
demand-backup-physical-minio-native-tls passed 00:19:35
demand-backup-physical-sharded-parallel passed 00:11:20
demand-backup-physical-sharded-aws passed 00:19:12
demand-backup-physical-sharded-azure passed 00:17:49
demand-backup-physical-sharded-gcp-native passed 00:17:57
demand-backup-physical-sharded-minio passed 00:17:10
demand-backup-physical-sharded-minio-native passed 00:17:35
demand-backup-sharded passed 00:26:25
demand-backup-snapshot passed 00:36:17
disabled-auth passed 00:16:26
expose-sharded passed 00:33:50
finalizer passed 00:09:59
ignore-labels-annotations passed 00:07:41
init-deploy passed 00:13:03
ldap passed 00:08:22
ldap-tls passed 00:12:54
limits passed 00:06:06
liveness passed 00:09:03
mongod-major-upgrade passed 00:11:57
mongod-major-upgrade-sharded passed 00:22:16
monitoring-2-0 passed 00:24:53
monitoring-pmm3 passed 00:25:45
multi-cluster-service passed 00:13:06
multi-storage passed 00:18:37
non-voting-and-hidden passed 00:16:10
one-pod passed 00:07:34
operator-self-healing-chaos passed 00:12:26
pitr passed 00:31:43
pitr-physical passed 01:01:52
pitr-sharded passed 00:20:58
pitr-to-new-cluster passed 00:25:59
pitr-physical-backup-source passed 00:54:11
preinit-updates passed 00:05:09
pvc-auto-resize passed 00:13:49
pvc-resize passed 00:16:20
recover-no-primary passed 00:26:33
replset-overrides passed 00:18:20
replset-remapping passed 00:17:13
replset-remapping-sharded passed 00:17:21
rs-shard-migration passed 00:14:01
scaling passed 00:11:01
scheduled-backup passed 00:17:23
security-context passed 00:06:58
self-healing-chaos passed 00:15:08
service-per-pod passed 00:18:47
serviceless-external-nodes passed 00:07:15
smart-update passed 00:08:05
split-horizon passed 00:13:56
stable-resource-version passed 00:04:37
storage passed 00:07:30
tls-issue-cert-manager passed 00:29:58
unsafe-psa passed 00:07:59
upgrade passed 00:09:44
upgrade-consistency passed 00:06:21
upgrade-consistency-sharded-tls passed 00:53:05
upgrade-sharded passed 00:19:12
upgrade-partial-backup passed 00:15:39
users passed 00:17:06
users-vault passed 00:13:05
version-service passed 00:26:13
Summary Value
Tests Run 90/90
Job Duration 02:41:17
Total Test Time 25:43:31

commit: b00ffe3
image: perconalab/percona-server-mongodb-operator:PR-2247-b00ffe36c

@hors hors merged commit b58045d into main Mar 10, 2026
21 checks passed
@hors hors deleted the K8SPSMDB-1363 branch March 10, 2026 08:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file size/XXL 1000+ lines tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants