Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement transeq in omp backend #27
Implement transeq in omp backend #27
Changes from 12 commits
ef3d02c
9a98a7f
03754b7
29f9f6a
e805ac0
732f4da
237c61c
b30271f
813804c
33a24aa
f7bc7e3
f11ec2e
3a327b9
1646e53
b20b503
6379a5d
ddaaf7d
aba7b1a
6b63417
e97c287
33345c0
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it okay to omit subroutine name here? On github it broke the syntax highlighting for me in the subroutine below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it is never a requirement (maybe it used to be in older standards?). It is weird indeed that github syntax highlight doesn't handle it. I can add it if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realised that here we're writing 3 field sized arrays into main memory unnecessarily. It is potentially increasing the runtime %20.
In the second phase of the algorithm here we pass a part of the
du
,dud
, andd2u
into der_univ_subs, and they're all rewritten in place. Then later we combine them inrhs
for the final result. Ideally, we wantdu
,dud
, andd2u
to be read once andrhs
to be written only once. However because of the way der_univ_subs work, the updated data indu
arrays after der_univ_subs call gets written in the main memory, even though we don't need this at all.There are three ways we can fix this
du
,dud
, andd2u
arrays into(SZ, n)
sized temporary arrays. Then we pass temporary arrays into der_univ_subs, and at the end we use these temporaries to obtain finalrhs
. This is the easiest solution but it may not be the best in terms of performance.du
arrays as we do now, and pass a small temporary array as the output one. Becausedu
arrays will be input arrays no data will be written in main memory. Then we can combine the temporaries to getrhs
.du
,dud
, andd2u
, and write the final resultrhs
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#40 is relevant here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points, I will have a think about it, but indeed having it in a new PR focusing on performance makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I think best way to move forward is implementing the first strategy and checking how much of the peak BW we get. If its a good utilisation then maybe we don't need to bother with a new der_univ_subs at all.