Skip to content

K8SPSMDB-1504 Fix MongoDB Connection Leaks in PBM Operations#2098

Merged
hors merged 5 commits intomainfrom
fix_con
Oct 29, 2025
Merged

K8SPSMDB-1504 Fix MongoDB Connection Leaks in PBM Operations#2098
hors merged 5 commits intomainfrom
fix_con

Conversation

@hors
Copy link
Copy Markdown
Collaborator

@hors hors commented Oct 28, 2025

K8SPSMDB-1504 Powered by Pull Request Badge

CHANGE DESCRIPTION

Problem:

In various places inside the code, when we are calling pbm, err := backup.NewPBM(ctx, r.client, cr) , we never actually call pbm.Close(ctx). This leads to connections piling up leading eventually to OOMKilled.

Related issue: https://forums.percona.com/t/percona-operator-for-mongodb-endlessly-spawning-connections-until-oomkilled/39634

Screenshot 2025-10-28 at 18 19 45

Cause:
Short explanation of the root cause of the issue if applicable.

Solution:
Properly closing the connections keeps everything normal

Screenshot 2025-10-28 at 19 04 48

Adjusted also the e2e test init-deploy to verify that connections don't grow over a specific sanity limit.

CHECKLIST

Jira

  • Is the Jira ticket created and referenced properly?
  • Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
  • Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

  • Is an E2E test/test case added for the new feature/change?
  • Are unit tests added where appropriate?
  • Are OpenShift compare files changed for E2E tests (compare/*-oc.yml)?

Config/Logging/Testability

  • Are all needed new/changed options added to default YAML files?
  • Are all needed new/changed options added to the Helm Chart?
  • Did we add proper logging messages for operator actions?
  • Did we ensure compatibility with the previous version or cluster upgrade process?
  • Does the change support oldest and newest supported MongoDB version?
  • Does the change support oldest and newest supported Kubernetes version?

@pull-request-size pull-request-size Bot added the size/XS 0-9 lines label Oct 28, 2025
@hors hors changed the title fix con K8SPSMDB-1504 Fix MongoDB Connection Leaks in PBM Operations Oct 28, 2025
@pull-request-size pull-request-size Bot added size/L 100-499 lines and removed size/XS 0-9 lines labels Oct 28, 2025
@github-actions github-actions Bot added the tests label Oct 28, 2025
egegunes
egegunes previously approved these changes Oct 29, 2025
@gkech gkech requested a review from egegunes October 29, 2025 08:38
@gkech gkech marked this pull request as ready for review October 29, 2025 08:49
@JNKPercona
Copy link
Copy Markdown
Collaborator

Test Name Result Time
arbiter passed 00:00:00
balancer passed 00:00:00
cross-site-sharded passed 00:00:00
custom-replset-name passed 00:00:00
custom-tls passed 00:00:00
custom-users-roles passed 00:00:00
custom-users-roles-sharded passed 00:00:00
data-at-rest-encryption passed 00:00:00
data-sharded passed 00:00:00
demand-backup passed 00:00:00
demand-backup-eks-credentials-irsa passed 00:00:00
demand-backup-fs passed 00:00:00
demand-backup-if-unhealthy passed 00:00:00
demand-backup-incremental passed 00:00:00
demand-backup-incremental-sharded passed 00:00:00
demand-backup-physical-parallel passed 00:00:00
demand-backup-physical-aws passed 00:00:00
demand-backup-physical-azure passed 00:00:00
demand-backup-physical-gcp-s3 passed 00:00:00
demand-backup-physical-gcp-native passed 00:00:00
demand-backup-physical-minio passed 00:00:00
demand-backup-physical-sharded-parallel passed 00:00:00
demand-backup-physical-sharded-aws passed 00:00:00
demand-backup-physical-sharded-azure passed 00:00:00
demand-backup-physical-sharded-gcp-native passed 00:00:00
demand-backup-physical-sharded-minio passed 00:00:00
demand-backup-sharded passed 00:00:00
expose-sharded passed 00:00:00
finalizer passed 00:00:00
ignore-labels-annotations passed 00:07:10
init-deploy passed 00:00:00
ldap passed 00:00:00
ldap-tls passed 00:00:00
limits passed 00:00:00
liveness passed 00:00:00
mongod-major-upgrade passed 00:00:00
mongod-major-upgrade-sharded passed 00:00:00
monitoring-2-0 passed 00:00:00
monitoring-pmm3 passed 00:00:00
multi-cluster-service passed 00:00:00
multi-storage passed 00:00:00
non-voting-and-hidden passed 00:00:00
one-pod passed 00:00:00
operator-self-healing-chaos passed 00:00:00
pitr passed 00:00:00
pitr-physical passed 00:00:00
pitr-sharded passed 00:00:00
pitr-to-new-cluster passed 00:00:00
pitr-physical-backup-source passed 00:00:00
preinit-updates passed 00:00:00
pvc-resize passed 00:00:00
recover-no-primary passed 00:00:00
replset-overrides passed 00:00:00
rs-shard-migration passed 00:00:00
scaling passed 00:00:00
scheduled-backup passed 00:00:00
security-context passed 00:00:00
self-healing-chaos passed 00:00:00
service-per-pod passed 00:00:00
serviceless-external-nodes passed 00:00:00
smart-update passed 00:00:00
split-horizon passed 00:00:00
stable-resource-version passed 00:00:00
storage passed 00:00:00
tls-issue-cert-manager passed 00:00:00
upgrade passed 00:00:00
upgrade-consistency passed 00:00:00
upgrade-consistency-sharded-tls passed 00:00:00
upgrade-sharded passed 00:00:00
upgrade-partial-backup passed 00:00:00
users passed 00:00:00
version-service passed 00:00:00
Summary Value
Tests Run 72/72
Job Duration 00:36:43
Total Test Time 00:07:10

commit: 8ff553e
image: perconalab/percona-server-mongodb-operator:PR-2098-8ff553e4

@hors hors merged commit 36d1ae1 into main Oct 29, 2025
17 checks passed
@hors hors deleted the fix_con branch October 29, 2025 13:29
egegunes pushed a commit that referenced this pull request Oct 29, 2025
* fix con

* fix lint and add test

* log pbm connection closing error & improve error message in e2e test

* uniform error messages for e2e test

---------

Co-authored-by: George Kechagias <george.kechagias@percona.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L 100-499 lines tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants