Fix: Separate CPU and GPU LLVM optimization pipelines before GPU lowering#793
Open
BI71317 wants to merge 3 commits intoexaloop:developfrom
Open
Fix: Separate CPU and GPU LLVM optimization pipelines before GPU lowering#793BI71317 wants to merge 3 commits intoexaloop:developfrom
BI71317 wants to merge 3 commits intoexaloop:developfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
resolves problem of #792
What this PR Fixes
This PR changes the pipeline structure so that CPU and GPU modules are separated before their respective optimization flows are applied.
2-pass Optimization
Like CPU Module, also GPU module introduced 2-pass Optimzation.
But Not only for performance,
When Only 1-pass Optimzation is occured,
Un-inlined GV are retained, these seems to occur invalid PTX.
2-pass
Seems 2-pass opt inlines GV, And this case works correctly.
ApplyGPUTransformation
ApplyGPUTransformationdoes lots of jobs.It clones module, separate CPU and GPU module, run NVPTX passes, cleanup CPU module, and Patches.
In only one Function, it contains these jobs, so to optimize each module, dividing logic of this function was inevitable I think.