ADPs in structure factor calculation, continued #36
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Thanks for the feedback! Benchmark (1000 q-vectors, 1000 atoms, 1000 rotations) for the original cpuscatter function prior to including ADPs:
$ time ./cputest
CPP OUTPUT:
0.000000
0.000000
real 0m11.691s
user 0m11.682s
sys 0m0.008s
For the original ADP implementation, it clocked in at:
$ time ./cputest
CPP OUTPUT:
0.000000
0.000000
real 0m18.371s
user 0m18.366s
sys 0m0.004s
I revised this function so that Debye-Waller factors are pre-computed in the first nested loop (which loops over q-vectors) rather than in the third nested loop. However, the improvement in speed is marginal:
$ time ./cputest
CPP OUTPUT:
0.000000
0.000000
real 0m16.515s
user 0m16.511s
sys 0m0.003s
I only revised the CPU code, as a comment in cpp_scatter.cu indicates that caching pre-computed Debye-Waller factors could pose memory problems for the GPU version.