Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADPs in structure factor calculation, continued #36

Open
wants to merge 6 commits into
base: apeck12-diffuse
Choose a base branch
from

Conversation

apeck12
Copy link
Collaborator

@apeck12 apeck12 commented Dec 6, 2017

Thanks for the feedback! Benchmark (1000 q-vectors, 1000 atoms, 1000 rotations) for the original cpuscatter function prior to including ADPs:

$ time ./cputest
CPP OUTPUT:
0.000000
0.000000

real 0m11.691s
user 0m11.682s
sys 0m0.008s

For the original ADP implementation, it clocked in at:

$ time ./cputest
CPP OUTPUT:
0.000000
0.000000

real 0m18.371s
user 0m18.366s
sys 0m0.004s

I revised this function so that Debye-Waller factors are pre-computed in the first nested loop (which loops over q-vectors) rather than in the third nested loop. However, the improvement in speed is marginal:

$ time ./cputest
CPP OUTPUT:
0.000000
0.000000

real 0m16.515s
user 0m16.511s
sys 0m0.003s

I only revised the CPU code, as a comment in cpp_scatter.cu indicates that caching pre-computed Debye-Waller factors could pose memory problems for the GPU version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant