Skip to content

CI: Run some tests with compute-sanitizer#566

Merged
leofang merged 23 commits into
NVIDIA:mainfrom
carterbox:dching/add-compute-sanitizer-to-ci
Apr 29, 2025
Merged

CI: Run some tests with compute-sanitizer#566
leofang merged 23 commits into
NVIDIA:mainfrom
carterbox:dching/add-compute-sanitizer-to-ci

Conversation

@carterbox

@carterbox carterbox commented Apr 22, 2025

Copy link
Copy Markdown
Contributor

Description

Runs python 3.12 pytests in the context of compute-sanitizer to check for memory issues and errors from the CUDA API.

closes #565
closes #562

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@copy-pr-bot

copy-pr-bot Bot commented Apr 22, 2025

Copy link
Copy Markdown
Contributor

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@leofang

leofang commented Apr 22, 2025

Copy link
Copy Markdown
Member

FYI right now we use the mini-CTK approach in the CI:

So compute-sanitizer is currently not available in the CI. But I assume it can be grabbed easily.

@cryos is refactoring our CI (#555). I suggest we perhaps add another standalone pipeline for running compute sanitizer?

@leofang leofang requested review from cryos and leofang April 22, 2025 19:25
@leofang leofang added P1 Medium priority - Should do CI/CD CI/CD infrastructure labels Apr 22, 2025
@cryos

cryos commented Apr 22, 2025

Copy link
Copy Markdown
Collaborator

I was going to look at tools like this next, that is a great point and something I can factor in. Looking at the proposal here picking out a test run would be reasonable, I know there are other tools we would like to run too.

@carterbox

Copy link
Copy Markdown
Contributor Author

/ok to test f27e1f4

@leofang

leofang commented Apr 22, 2025

Copy link
Copy Markdown
Member

I doubt commit f27e1f4 would work -- we'll see: #571.

@github-actions

This comment has been minimized.

@carterbox

Copy link
Copy Markdown
Contributor Author

/ok to test 05a7068

@cryos cryos linked an issue Apr 23, 2025 that may be closed by this pull request
@cryos cryos removed a link to an issue Apr 23, 2025
@carterbox

Copy link
Copy Markdown
Contributor Author

/ok to test dde857b

@carterbox

carterbox commented Apr 23, 2025

Copy link
Copy Markdown
Contributor Author

Yay! The linux-64 tests are failing for the correct reason! (the compute sanitizer returns non-zero because it has detected issues).

https://github.com/NVIDIA/cuda-python/actions/runs/14628267586/job/41045781037?pr=566

Windows tests are failing because I have disabled them partially.

@carterbox

Copy link
Copy Markdown
Contributor Author

/ok to test a1ea51e

@carterbox carterbox force-pushed the dching/add-compute-sanitizer-to-ci branch from a1ea51e to 0430930 Compare April 24, 2025 20:26
@carterbox

Copy link
Copy Markdown
Contributor Author

/ok to test 0430930

@leofang leofang added the cuda.bindings Everything related to the cuda.bindings module label Apr 25, 2025
@carterbox

Copy link
Copy Markdown
Contributor Author

/ok to test 9c25910

@carterbox

Copy link
Copy Markdown
Contributor Author

/ok to test 7fb013e

@carterbox carterbox marked this pull request as ready for review April 25, 2025 22:48
@copy-pr-bot

copy-pr-bot Bot commented Apr 25, 2025

Copy link
Copy Markdown
Contributor

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@carterbox carterbox requested a review from leofang April 25, 2025 22:48
Comment thread cuda_bindings/docs/source/environment_variables.md Outdated
Comment thread .github/workflows/test-wheel-linux.yml Outdated
Comment thread .github/workflows/test-wheel-linux.yml Outdated
Comment thread cuda_bindings/tests/test_cuda.py Outdated
@carterbox

Copy link
Copy Markdown
Contributor Author

/ok to test 675c41f

@carterbox carterbox requested review from leofang and rwgk April 28, 2025 16:30
@carterbox

Copy link
Copy Markdown
Contributor Author

/ok to test 5abd6e3

rwgk
rwgk previously approved these changes Apr 28, 2025

@rwgk rwgk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, I'm really happy that we'll be routinely testing with the compute sanitizer.

Comment thread .github/workflows/test-wheel-linux.yml Outdated
@carterbox

Copy link
Copy Markdown
Contributor Author

/ok to test 53a01cb

@rwgk rwgk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @carterbox!

@leofang leofang left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very neat, thanks!

@cryos are you OK if we merge this now, or if you'd rather wait until we merge the CI refactoring (#555)?

@kkraus14

Copy link
Copy Markdown
Collaborator

@cryos are you OK if we merge this now, or if you'd rather wait until we merge the CI refactoring (#555)?

Marcus is on PTO until May 5 so lets merge this and we can sync with him once he's back 😄

@leofang leofang merged commit 034ffbf into NVIDIA:main Apr 29, 2025
@leofang

leofang commented Apr 29, 2025

Copy link
Copy Markdown
Member

Right he told me about it but I forgot... let's merge now.

@github-actions

Copy link
Copy Markdown
Doc Preview CI
Preview removed because the pull request was closed or merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD CI/CD infrastructure cuda.bindings Everything related to the cuda.bindings module cuda.core Everything related to the cuda.core module P1 Medium priority - Should do

Projects

None yet

5 participants