-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Integration for CP2K #383
base: main
Are you sure you want to change the base?
Conversation
…ooks like. Still working on making it a more seemless integration with pymatgen, but it seems to be close to functional at least.
avoid using reservation fws.
…instead of reservation, then we can just specify the number of images there will be in each calculation and the env variable.
…d happen if starting a difficult-to-converge calculation from a previous WAVECAR) then it needs to handle bytes instead of strings for the file contents.
testing, I need to push it so I can test on cluster a bit.
…. This seems to be the first version that runs everything (create workflow, run, stich together, push to database). There will sitll be testing that needs to be done though.
…ross different Cp2k executions could be problematic. Probably the best way to deal with it is to assert a universal project name with each workflow. I don't like this, because its not as flexible, but it fixes the problem.
successfully. Might have to decide on universal file naming in order to avoid these problems in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @nwinner Thank you for submitting this!
I know this is a work in progress but I made some comments on the PR so we can hopefully clean this up as you go along.
Would you also be able to run the cp2k module through the black python formatter? Some of your lines are rather long so this will enforce PEP8 formatting of your code.
You can install black from pip:
pip install black
And run it on the cp2k module using a max line length of 80 (presuming you are in the atomate root directory) via:
black atomate/cp2k -l 80
logger = get_logger(__name__) | ||
|
||
|
||
class Cp2kCalcDb(CalcDb): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this different to VASPCalcDb
at all? If not, we should probably rename VASPCalcDb
something more general and then it can be used in the CP2K module also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not different at all I think, a more general VASPCalcDb would be a good idea.
atomate/cp2k/drones.py
Outdated
from __future__ import division, print_function, unicode_literals, absolute_import | ||
|
||
""" | ||
This Drone tries to produce a more sensible task dictionary than the default VaspToDbTaskDrone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove these comments as they aren't relevant.
atomate/cp2k/drones.py
Outdated
@@ -0,0 +1,463 @@ | |||
# coding: utf-8 | |||
|
|||
from __future__ import division, print_function, unicode_literals, absolute_import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Atomate does not support python 2 any more, so you can remove all the __future__
imports from all of the cp2k module files.
atomate/cp2k/drones.py
Outdated
|
||
def post_process(self, dir_name, d): | ||
""" | ||
Post-processing for various files other than the vasprun.xml and OUTCAR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update this docstring.
pass | ||
elif defuse_unsuccessful == "fizzle": | ||
raise RuntimeError( | ||
"VaspToDb indicates that job is not successful " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update VaspToDb
t.append(UpdateStructureFromPrevCalc(prev_calc_loc=prev_calc_loc)) | ||
|
||
# if prev calc directory is being REPEATED, copy files | ||
if prev_calc_dir: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me how prev_calc_loc and prev_calc_dir are interacting here? The docstring for previous_calc_dir
says:
prev_calc_dir (str): Path to a previous calculation to copy from.
but that doesn't happen in the code. Can you clarify how prev_calc_loc
and prev_calc_dir
should be used?
from __future__ import division, print_function, unicode_literals, absolute_import | ||
|
||
""" | ||
This module defines tasks for writing vasp input sets for various types of vasp calculations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update please.
@explicit_serialize | ||
class WriteCp2kFromPrevious(FiretaskBase): | ||
|
||
optional_params = ['cp2k_input_params', 'prev_calc_loc', 'original_input_filename', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cp2k_input_params
and new_input_filename
are unused here.
class UpdateStructureFromPrevCalc(FiretaskBase): | ||
""" | ||
Using the location of a previous calculation. The CP2K output parser will | ||
get the final structure from a previous calculation and update this FW's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why you are updating the structure by updating the firework spec rather than either:
- Just writing the structure to file
- Doing the structure updating as part of
WriteCp2kFromPrevious
.
I think option 2 makes the most sense. If you only want to copy the structure and nothing else from the previous calculation, you could add options to that effect in WriteCp2kFromPrevious
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Alex, number 2 probably does make more sense. Number 1 isn't the case because cp2k doesn't use the same formatting as VASP where a separate file holds the structure. Generally cp2k has everything in a single input file. This can actually be modified, but its not a cp2k handles stuff by default so i didn't want to complicate that until a later time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok great. Let's go with option 2!
atomate/cp2k/fireworks/core.py
Outdated
fw_name = "{}-{}".format(structure.composition.reduced_formula if structure else "unknown", name) | ||
|
||
cp2k_input_set = cp2k_input_set or StaticSet(structure, **cp2k_input_set_params) | ||
cp2k_input_set.project_name = name # Currently, this assertion is needed to keep all FWs in a workflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the problem you are trying to solve by using project_name
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CP2K output files are named according to the "PROJECT_NAME" variable inside the input file. Notably, the wavefunction and DOS files have the project name prepended in their filename. So, in order to deal with this we would need to either (a) check for the project name every time you try to find a file (like a restart file) to copy over, (b) use a universal project name so that the files still have the same format but with a standard prefix, (c) override the file name in the cp2k input file so that it adheres to some agreed upon overall file name. I chose to go with (b) as the easiest one, but I'm open to changing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note we're going to have the same question for my (perpetual backburner project) of CASTEP support, so interested in whatever you settle on. I might pick the seed name to be the firework id, unless that has issues I'm not anticipating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for clarifying that. With option b) are you using the same project name for EVERY calculation or is it specific to the calculation type? Maybe what I am describing is option c?
Are there any downsides to using a fixed project name, and just doing the calculations in different directories? That way there is never any ambiguity over which files to load. it seems like this would remove a lot of the complexity and make the parsing and file copying more similar to how it is handled in the VASP module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently option (b) uses the same name for the workflow. So you could name the files according to "Static" for static WFs as an example. Just to give you a sense of what the output files are like. If you did that you would get a directory like this (for Si):
launcher-X:
- Static-ALPHA_k1-1.pdos # elemental dos for atom kind 1, Spin up channel
- Static-BETA_k1-1.pdos # elemental dos for atom kind 1, Spin down channel
- Static-ALPHA_list1-1.pdos # Site projected dos onto site list 1, spin up
- Static-ALPHA_list2-1.pdos # Site projected dos onto site list 2, spin up
- Static-BETA_list1-1.pdos # Site projected dos onto site list 1, spin down
- Static-BETA_list2-1.pdos # Site projected dos onto site list 2, spin down
- Static-RESTART.wfn # Restart wavefunction file
- Static-RESTART.wfn.bak-1 # Backup wavefunction file (dumped intermittently)
- Static-v_hartree-1_0.cube. # Hartree (electrostatic potential)
- Static-ELECTRON_DENSITY-1_0.cube # Electron density
- Static-SPIN_DENSITY-1_0.cube # spin density
So you can see that it changes the naming, but its pretty repeatable. Something I've been doing in the pymatgen output parsing is globbing to find files containing, say, "pdos", to identify the pdos files and globbing for "RESTART" to get the wavefunction file name. We could think about doing this instead of trying to record what the file name is going to be.
Drones needs to use num_sites of structure object.
of of Danny Broberg's implementation for VASP.
utils: return entire scaling matrix
implemented properly.
Summary
Integration of CP2K into the atomate framework. This WIP PR is here so that maintainers can have a sense of what I'm proposing to add.
New firetasks, fireworks, workflows, and database/drone classes for cp2k. Meant to be used in conjunction with the new pymatgen.io.cp2k module (also a work in progress).
TODO (if any)
More workflows/fireworks are needed. There have also been a few issues with gluing tasks with file exchange. The drone also needs to include the optional Dos, bader, and cube parsing that will be needed.