Conversation
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
There was a problem hiding this comment.
Pull request overview
This PR implements snapshot-based backups for Percona Server for MongoDB, adding support for Kubernetes VolumeSnapshots as an alternative backup mechanism. The implementation introduces a new backup executor interface to handle different backup types (managed vs snapshot-based), integrates with the Kubernetes CSI snapshot API, and updates the CRD to support the new external backup type with volume snapshot configuration.
Changes:
- Introduces
backupExecutorinterface to support multiple backup implementations (managed and snapshot-based) - Adds VolumeSnapshot support for external backups via new
snapshot.gocontroller - Updates API types to include
VolumeSnapshotClassfield andSnapshotInfostatus
Reviewed changes
Copilot reviewed 20 out of 21 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/apis/psmdb/v1/perconaservermongodbbackup_types.go | Adds external backup type, VolumeSnapshotClass field, and SnapshotInfo struct to track volume snapshots |
| pkg/controller/perconaservermongodbbackup/snapshot.go | New file implementing snapshot-based backup logic including VolumeSnapshot creation and reconciliation |
| pkg/controller/perconaservermongodbbackup/backup.go | Refactors existing backup logic into managedBackups type and introduces backupExecutor interface |
| pkg/controller/perconaservermongodbbackup/psmdb_backup_controller.go | Updates controller to select backup executor based on backup type and configuration |
| pkg/psmdb/backup/pbm.go | Adds GetBackupByName and FinishBackup methods, wraps credentials with MaskedString for security |
| pkg/naming/naming.go | Adds VolumeSnapshotName function to generate snapshot resource names |
| deploy/rbac.yaml, deploy/cw-rbac.yaml | Grants operator permissions to create and manage VolumeSnapshot resources |
| config/crd/bases/.yaml, deploy/.yaml | Updates CRD definitions to include new backup type and snapshot fields |
| cmd/manager/main.go | Registers VolumeSnapshot v1 API scheme |
| go.mod, go.sum | Updates Go version and PBM dependency version, adds kubernetes-csi/external-snapshotter client |
| deploy/bundle.yaml | Contains deployment configuration with modified operator image reference |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| type SnapshotInfo struct { | ||
| NodeName string `json:"nodeName,omitempty"` | ||
| SnapshotName string `json:"snapshotName,omitempty"` | ||
| } |
There was a problem hiding this comment.
The SnapshotInfo struct lacks documentation. Add a comment describing what this struct represents (e.g., "SnapshotInfo contains information about a volume snapshot created for a MongoDB node during an external backup").
| func (p *PerconaServerMongoDBBackup) CheckFields() error { | ||
| if len(p.Spec.StorageName) == 0 { | ||
| if len(p.Spec.StorageName) == 0 && p.Spec.Type != defs.ExternalBackup { | ||
| return fmt.Errorf("spec storageName field is empty") | ||
| } |
There was a problem hiding this comment.
The CheckFields method allows external backups without a VolumeSnapshotClass, but the controller only creates snapshot backups when both Type is ExternalBackup AND VolumeSnapshotClass is set. This could lead to a confusing scenario where a user creates an external backup without a VolumeSnapshotClass, and it falls through to the default managed backup path. Consider adding validation to ensure that if Type is ExternalBackup, VolumeSnapshotClass must be specified, or document this behavior clearly.
| // VolumeSnapshotClass is the name of the VolumeSnapshotClass to use for snapshot based backups. | ||
| // This may be specified only when type is `external`. | ||
| // +kubebuilder:validation:Optional | ||
| VolumeSnapshotClass *string `json:"volumeSnapshotClass,omitempty"` |
There was a problem hiding this comment.
The documentation comment on lines 22-23 states that VolumeSnapshotClass "may be specified only when type is external", but there's no validation enforcing this constraint. Either add validation in the CheckFields method to ensure this is only set when Type is ExternalBackup, or add a kubebuilder validation marker to enforce this at the CRD level.
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
| "github.com/percona/percona-backup-mongodb/pbm/ctrl" | ||
| "github.com/percona/percona-backup-mongodb/pbm/defs" | ||
| pbmErrors "github.com/percona/percona-backup-mongodb/pbm/errors" |
There was a problem hiding this comment.
[goimports-reviser] reported by reviewdog 🐶
| "github.com/percona/percona-backup-mongodb/pbm/ctrl" | |
| "github.com/percona/percona-backup-mongodb/pbm/defs" | |
| pbmErrors "github.com/percona/percona-backup-mongodb/pbm/errors" |
There was a problem hiding this comment.
[goimports-reviser] reported by reviewdog 🐶
percona-server-mongodb-operator/cmd/manager/main.go
Lines 10 to 13 in 603f322
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
603f322 to
7230fea
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 27 out of 28 changed files in this pull request and generated 9 comments.
Comments suppressed due to low confidence (1)
pkg/apis/psmdb/v1/perconaservermongodbbackup_types.go:133
- The CheckFields validation does not verify that
VolumeSnapshotClassis specified whenTypeisexternal. This could allow users to create external backup resources without specifying the required VolumeSnapshotClass, leading to failures later when the operator tries to create snapshots. Consider adding validation to require VolumeSnapshotClass when Type is ExternalBackup.
func (p *PerconaServerMongoDBBackup) CheckFields() error {
if len(p.Spec.StorageName) == 0 && p.Spec.Type != defs.ExternalBackup {
return fmt.Errorf("spec storageName field is empty")
}
if len(p.Spec.GetClusterName()) == 0 {
return fmt.Errorf("spec clusterName is empty")
}
if string(p.Spec.Type) == "" {
p.Spec.Type = defs.LogicalBackup
}
if string(p.Spec.Compression) == "" {
p.Spec.Compression = compress.CompressionTypeGZIP
}
return nil
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if bcp.Spec.Type == defs.ExternalBackup { | ||
| // TODO: should we check that snapshots exist? | ||
| return nil |
There was a problem hiding this comment.
For external (snapshot-based) backups, the validation skips checking if snapshots exist (as noted in the TODO). This could allow restore operations to proceed when the required VolumeSnapshots are missing or not ready, which would cause the restore to fail later in the process. Consider implementing a validation check to verify that all required snapshots exist and are in a ready state before allowing the restore to proceed.
| - get | ||
| - list | ||
| - watch | ||
| - create |
There was a problem hiding this comment.
The RBAC permissions for VolumeSnapshots are missing delete and update verbs. While the operator creates snapshots during backup, it may also need to clean them up when backups are deleted (as part of the finalizer logic), and potentially update snapshot metadata. Consider adding delete permission at minimum for proper cleanup. Review whether update and patch permissions are also needed for snapshot management.
| - create | |
| - create | |
| - update | |
| - patch | |
| - delete |
| Name: snapshotName, | ||
| } | ||
| pvc.SetAnnotations(map[string]string{ | ||
| naming.AnnotationRestoreName: snapshotName, |
There was a problem hiding this comment.
The annotation value is set to the snapshot name instead of the restore name. This is inconsistent with the check on line 423, where it compares restoreName == restore.Name. This means that if the snapshot name doesn't match the restore name, the PVC will be deleted and recreated unnecessarily. Consider using restore.Name here to ensure consistency.
| sfs.Spec.Template.Spec.Containers[0].Command = []string{"/opt/percona/pbm-agent"} | ||
| sfs.Spec.Template.Spec.Containers[0].Args = []string{ | ||
| "restore-finish", | ||
| restore.Status.PBMname, | ||
| "-c", "/etc/pbm/pbm_config.yaml", | ||
| "--rs", "$(MONGODB_REPLSET)", | ||
| "--node", "$(POD_NAME).$(SERVICE_NAME)-$(MONGODB_REPLSET).$(NAMESPACE).svc.cluster.local", | ||
| // "--db-config", "/etc/pbm/db-config.yaml", // TODO | ||
| } |
There was a problem hiding this comment.
Potential index out of bounds error. The code assumes that sfs.Spec.Template.Spec.Containers[0] exists, but there's no check to verify that the Containers slice has at least one element. If the StatefulSet has no containers (which would be unusual but possible in error scenarios), this will panic. Consider adding a length check or finding the container by name instead of assuming index 0.
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
d66f69d to
9713dbc
Compare
| "sigs.k8s.io/controller-runtime/pkg/client" | ||
| logf "sigs.k8s.io/controller-runtime/pkg/log" | ||
|
|
||
| api "github.com/percona/percona-server-mongodb-operator/pkg/apis/psmdb/v1" |
There was a problem hiding this comment.
[goimports-reviser] reported by reviewdog 🐶
| api "github.com/percona/percona-server-mongodb-operator/pkg/apis/psmdb/v1" | |
| "github.com/percona/percona-backup-mongodb/pbm/ctrl" | |
| "github.com/percona/percona-backup-mongodb/pbm/defs" | |
| pbmErrors "github.com/percona/percona-backup-mongodb/pbm/errors" | |
| api "github.com/percona/percona-server-mongodb-operator/pkg/apis/psmdb/v1" |
| Namespace: cr.Namespace, | ||
| }, | ||
| } | ||
| err := r.client.Delete(ctx, snapshot) |
There was a problem hiding this comment.
it'd be nice to log here that we are deleting the snapshot
| if err := r.client.Delete(ctx, observedPVC); err != nil { | ||
| return false, errors.Wrapf(err, "delete pvc %s", pvcName) | ||
| } |
There was a problem hiding this comment.
I don't see any difference in the logic :/ Doesn't it still delete all the PVCs first and recreate?
Current logic looks like:
for pvc in pvcs:
get pvc
if get fails -> recreate
if get succeeds -> delete
get will succeed for all PVCs when the for loop executed first
| return false, nil | ||
| } | ||
|
|
||
| if err := r.client.Delete(ctx, observedPVC); err != nil { |
There was a problem hiding this comment.
it'd be nice to log here that we are deleting the PVC
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 53 out of 54 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| mode: requireTLS | ||
| backup: | ||
| enabled: true | ||
| image: perconalab/percona-server-mongodb-operator:main-backup" |
There was a problem hiding this comment.
The backup image value has an extra trailing double-quote (...:main-backup"), which will be treated as part of the image reference and likely cause image pull failures in this E2E test. Remove the stray quote or set the intended image reference.
| image: perconalab/percona-server-mongodb-operator:main-backup" | |
| image: perconalab/percona-server-mongodb-operator:main-backup |
| mode: requireTLS | ||
| backup: | ||
| enabled: true | ||
| image: perconalab/percona-server-mongodb-operator:main-backup" |
There was a problem hiding this comment.
The backup image value has an extra trailing double-quote (...:main-backup"), which will be treated as part of the image reference and likely cause image pull failures in this E2E test. Remove the stray quote or set the intended image reference.
| image: perconalab/percona-server-mongodb-operator:main-backup" | |
| image: perconalab/percona-server-mongodb-operator:main-backup |
| mode: requireTLS | ||
| backup: | ||
| enabled: true | ||
| image: perconalab/percona-server-mongodb-operator:main-backup" |
There was a problem hiding this comment.
The backup image value has an extra trailing double-quote (...:main-backup"), which will be treated as part of the image reference and likely cause image pull failures in this E2E test. Remove the stray quote or set the intended image reference.
| image: perconalab/percona-server-mongodb-operator:main-backup" | |
| image: perconalab/percona-server-mongodb-operator:main-backup |
| - get | ||
| - list | ||
| - watch | ||
| - create |
There was a problem hiding this comment.
The ClusterRole in cw-rbac grants create/get/list/watch for VolumeSnapshots but not delete. The operator deletes VolumeSnapshots during backup finalization (and likely during cleanup), so missing delete will cause RBAC forbidden errors in cluster-wide deployments. Add delete to the verbs for snapshot.storage.k8s.io/volumesnapshots here (mirroring deploy/rbac.yaml).
| - create | |
| - create | |
| - delete |
| - apiGroups: | ||
| - snapshot.storage.k8s.io | ||
| resources: | ||
| - volumesnapshots | ||
| verbs: | ||
| - get | ||
| - list | ||
| - watch | ||
| - create |
There was a problem hiding this comment.
The cw-bundle ClusterRole rule for snapshot.storage.k8s.io/volumesnapshots is missing the delete verb. The operator code deletes VolumeSnapshots (e.g., during backup finalizer cleanup), so this bundle will fail with RBAC forbidden in cluster-wide installs. Include delete in the verbs list here (keep in sync with deploy/bundle.yaml and deploy/rbac.yaml).
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 53 out of 54 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| if sfs.Spec.Replicas != nil && *sfs.Spec.Replicas == 0 { | ||
| return nil | ||
| } |
There was a problem hiding this comment.
In snapshot-restore scale-down reconciliation, the early return when sfs.Spec.Replicas is already 0 skips configuring the pod template (pbm-agent command/args, probes, volumes/mounts). If a StatefulSet is already scaled down (manually or by a prior step) the restore can proceed with an unmodified template and later scale-up will start mongod instead of pbm-agent restore-finish. Consider removing this shortcut and instead patching when the template is not yet in the expected restore-finish configuration (e.g., check container command/args) regardless of replicas count.
| if len(bcp.Status.Snapshots) > 0 { | ||
| if err := r.validateSnapshotExistence(ctx, bcp); err != nil { | ||
| return errors.Wrap(err, "validate snapshot existence") | ||
| } | ||
| } | ||
|
|
||
| return nil |
There was a problem hiding this comment.
validateExternalBackup silently accepts external backups with an empty .status.snapshots list. Since the restore flow later requires snapshots per replset (and the controller only allows restores when the backup is Ready), this should fail fast with a clear validation error (or explicitly wait) when snapshots are missing, rather than letting the restore proceed and error deeper in reconciliation.
| stg, err := r.getPBMStorage(ctx, cluster, cr) | ||
| if err != nil { | ||
| return errors.Wrap(err, "get storage") | ||
| } |
There was a problem hiding this comment.
If I understand correctly, for external backups getPBMStorage will fail since there's no object storage.
| clusterName: my-cluster-name | ||
| storageName: s3-us-west | ||
| # volumeSnapshotClass: YOUR-VOLUME-SNAPSHOT-CLASS | ||
| # type: physical |
There was a problem hiding this comment.
Can we improve this file? or create separate file for external?
As I understand we don't need storageName for external backup and all options in one file can be confusing.
There was a problem hiding this comment.
As discussed in the DS, let's keep 1 file for simplicity
| } | ||
| return nil | ||
| case len(cr.Status.Snapshots) > 0: | ||
| if err := r.deleteVolumeSnapshots(ctx, cr); err != nil { |
There was a problem hiding this comment.
Why we use here len, not
case cr.Status.Type == defs.ExternalBackup
There was a problem hiding this comment.
No particular reason, just to be consistent with the other checks. It should yield the same result nevertheless. Do you think we need to check the type?
There was a problem hiding this comment.
It's minor, but yes. In other places we checked type, so it looks reasonable.
There was a problem hiding this comment.
I believe consistency with other checks it could be case cr.Status.Snapshots != nil
Signed-off-by: Mayank Shah <mayank.shah@percona.com>
commit: b00ffe3 |
CHANGE DESCRIPTION
This PR adds support for volume snapshot (external) backups in the Percona Server MongoDB Operator. Users can create CSI-based volume snapshots of MongoDB data volumes and restore from them, without using object storage.
Changes Overview
1. New Backup Type:
externalA new backup type
externalis added for volume snapshot backups. It uses the Kubernetes CSI Volume Snapshot API instead of PBM object storage.Supported backup types:
logical– logical (mongodump-style) backupsphysical– physical backups to object storageincremental/incremental-base– incremental backupsexternal– volume snapshot backups (new)2. CRD and API Changes
PerconaServerMongoDBBackup
Spec:
type: new enum valueexternalvolumeSnapshotClass: (optional) name of theVolumeSnapshotClassused for snapshot backups; required whentypeisexternalstorageName: optional whentypeisexternal(no object storage)Status:
snapshots: array ofSnapshotInfowithreplsetNameandsnapshotNamefor each replsetPerconaServerMongoDBRestore
Status:
conditions: array ofmetav1.Conditionfor snapshot restore phases:PBMAgentConfiguredForSnapshotReplsetPVCsRestoredFromSnapshotPBMAgentAwaitingRestoreFinishPBMRestoreFinishingPBMRestoreFinishedPerconaServerMongoDB (scheduled backup tasks)
BackupTaskSpec:
type: new enum valueexternalvolumeSnapshotClass: (optional) name of theVolumeSnapshotClass; required whentypeisexternalexternaltasks,storageNameis not required3. Examples
Demand backup (PerconaServerMongoDBBackup)
Restore from snapshot backup
Or with
backupSource(e.g. cross-cluster restore):Scheduled snapshot backup task (PerconaServerMongoDB)
CHECKLIST
Jira
Needs Doc) and QA (Needs QA)?Tests
compare/*-oc.yml)?Config/Logging/Testability