Skip to content

CLOUD-727 fix OOMKill in demand-backup-if-unhealthy#2309

Merged
hors merged 1 commit intomainfrom
fix-main-build
Apr 22, 2026
Merged

CLOUD-727 fix OOMKill in demand-backup-if-unhealthy#2309
hors merged 1 commit intomainfrom
fix-main-build

Conversation

@egegunes
Copy link
Copy Markdown
Contributor

CHANGE DESCRIPTION

Problem:
Short explanation of the problem.

Cause:
Short explanation of the root cause of the issue if applicable.

Solution:
Short explanation of the solution we are providing with this PR.

CHECKLIST

Jira

  • Is the Jira ticket created and referenced properly?
  • Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
  • Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

  • Is an E2E test/test case added for the new feature/change?
  • Are unit tests added where appropriate?
  • Are OpenShift compare files changed for E2E tests (compare/*-oc.yml)?

Config/Logging/Testability

  • Are all needed new/changed options added to default YAML files?
  • Are all needed new/changed options added to the Helm Chart?
  • Did we add proper logging messages for operator actions?
  • Did we ensure compatibility with the previous version or cluster upgrade process?
  • Does the change support oldest and newest supported MongoDB version?
  • Does the change support oldest and newest supported Kubernetes version?

Copilot AI review requested due to automatic review settings April 10, 2026 07:22
@pull-request-size pull-request-size Bot added the size/XS 0-9 lines label Apr 10, 2026
@github-actions github-actions Bot added the tests label Apr 10, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to prevent OOMKills in the demand-backup-if-unhealthy E2E scenario by adjusting the PerconaServerMongoDB CR used by that test.

Changes:

  • Removed CPU/memory resources.limits from the demand-backup-if-unhealthy E2E CR config.
  • Kept the existing CPU/memory resources.requests unchanged.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

memory: 300Mi
requests:
cpu: 100m
memory: 100Mi
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test CR now sets only resources.requests and removes resources.limits. Most other e2e CR configs in this repo set both requests and limits (e.g., e2e-tests/conf/some-name-rs0.yml and e2e-tests/demand-backup-incremental-aws/conf/some-name.yml). Dropping limits can make the e2e test less portable (namespaces with a LimitRange that requires limits will reject the workload) and can allow unbounded memory usage on CI nodes. Consider keeping limits but raising the memory limit to avoid OOMKills (and potentially adjusting requests to a more realistic baseline).

Suggested change
memory: 100Mi
memory: 100Mi
limits:
cpu: 300m
memory: 1Gi

Copilot uses AI. Check for mistakes.
@JNKPercona
Copy link
Copy Markdown
Collaborator

Test Name Result Time
arbiter passed 00:00:00
balancer passed 00:00:00
cross-site-sharded passed 00:00:00
custom-replset-name passed 00:00:00
custom-tls passed 00:00:00
custom-users-roles passed 00:00:00
custom-users-roles-sharded passed 00:00:00
data-at-rest-encryption passed 00:00:00
data-sharded passed 00:00:00
demand-backup passed 00:00:00
demand-backup-eks-credentials-irsa passed 00:00:00
demand-backup-fs passed 00:00:00
demand-backup-if-unhealthy passed 00:00:00
demand-backup-incremental-aws passed 00:00:00
demand-backup-incremental-azure passed 00:00:00
demand-backup-incremental-gcp-native passed 00:00:00
demand-backup-incremental-gcp-s3 passed 00:00:00
demand-backup-incremental-minio passed 00:00:00
demand-backup-incremental-sharded-aws passed 00:00:00
demand-backup-incremental-sharded-azure passed 00:00:00
demand-backup-incremental-sharded-gcp-native passed 00:00:00
demand-backup-incremental-sharded-gcp-s3 passed 00:00:00
demand-backup-incremental-sharded-minio passed 00:00:00
demand-backup-logical-minio-native-tls passed 00:00:00
demand-backup-physical-parallel passed 00:00:00
demand-backup-physical-aws passed 00:00:00
demand-backup-physical-azure passed 00:00:00
demand-backup-physical-gcp-s3 passed 00:00:00
demand-backup-physical-gcp-native passed 00:00:00
demand-backup-physical-minio passed 00:00:00
demand-backup-physical-minio-native passed 00:00:00
demand-backup-physical-minio-native-tls passed 00:00:00
demand-backup-physical-sharded-parallel passed 00:00:00
demand-backup-physical-sharded-aws passed 00:00:00
demand-backup-physical-sharded-azure passed 00:00:00
demand-backup-physical-sharded-gcp-native passed 00:00:00
demand-backup-physical-sharded-minio passed 00:00:00
demand-backup-physical-sharded-minio-native passed 00:00:00
demand-backup-sharded passed 00:00:00
demand-backup-snapshot passed 00:00:00
demand-backup-snapshot-vault passed 00:00:00
disabled-auth passed 00:00:00
expose-sharded passed 00:00:00
finalizer passed 00:00:00
ignore-labels-annotations passed 00:00:00
init-deploy passed 00:00:00
ldap passed 00:00:00
ldap-tls passed 00:00:00
limits passed 00:00:00
liveness passed 00:00:00
mongod-major-upgrade passed 00:00:00
mongod-major-upgrade-sharded passed 00:00:00
monitoring-2-0 passed 00:00:00
monitoring-pmm3 passed 00:00:00
multi-cluster-service passed 00:00:00
multi-storage passed 00:00:00
non-voting-and-hidden passed 00:00:00
one-pod passed 00:00:00
operator-self-healing-chaos passed 00:00:00
pitr passed 00:00:00
pitr-physical passed 00:00:00
pitr-sharded passed 00:00:00
pitr-to-new-cluster passed 00:00:00
pitr-physical-backup-source passed 00:00:00
preinit-updates passed 00:00:00
pvc-auto-resize passed 00:00:00
pvc-resize passed 00:00:00
recover-no-primary passed 00:00:00
replset-overrides passed 00:00:00
replset-remapping passed 00:00:00
replset-remapping-sharded passed 00:00:00
rs-shard-migration passed 00:00:00
scaling passed 00:00:00
scheduled-backup passed 00:00:00
security-context passed 00:00:00
self-healing-chaos passed 00:00:00
service-per-pod passed 00:00:00
serviceless-external-nodes passed 00:00:00
smart-update passed 00:00:00
split-horizon passed 00:00:00
stable-resource-version passed 00:00:00
storage passed 00:00:00
tls-issue-cert-manager passed 00:00:00
unsafe-psa passed 00:00:00
upgrade passed 00:00:00
upgrade-consistency passed 00:00:00
upgrade-consistency-sharded-tls passed 00:00:00
upgrade-sharded passed 00:00:00
upgrade-partial-backup passed 00:15:42
users passed 00:00:00
users-vault passed 00:00:00
version-service passed 00:00:00
Summary Value
Tests Run 92/92
Job Duration 00:43:47
Total Test Time 00:15:42

commit: 4eef1de
image: perconalab/percona-server-mongodb-operator:PR-2309-4eef1def

@egegunes egegunes marked this pull request as ready for review April 20, 2026 07:09
@hors hors changed the title Fix OOMKill in demand-backup-if-unhealthy CLOUD-727 fix OOMKill in demand-backup-if-unhealthy Apr 22, 2026
@hors hors merged commit fe0605f into main Apr 22, 2026
24 checks passed
@hors hors deleted the fix-main-build branch April 22, 2026 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants