Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Integration for CP2K #383

Open
wants to merge 55 commits into
base: main
Choose a base branch
from
Open

Conversation

nwinner
Copy link

@nwinner nwinner commented Mar 15, 2020

Summary

Integration of CP2K into the atomate framework. This WIP PR is here so that maintainers can have a sense of what I'm proposing to add.

New firetasks, fireworks, workflows, and database/drone classes for cp2k. Meant to be used in conjunction with the new pymatgen.io.cp2k module (also a work in progress).

TODO (if any)

More workflows/fireworks are needed. There have also been a few issues with gluing tasks with file exchange. The drone also needs to include the optional Dos, bader, and cube parsing that will be needed.

nwinner added 30 commits April 22, 2019 15:49
…ooks like.

Still working on making it a more seemless integration with pymatgen, but it seems to be close to functional at least.
…instead of reservation, then we can just specify the number of images there will be in each calculation and the env variable.
…d happen if starting a difficult-to-converge calculation from a previous WAVECAR) then it needs to handle

bytes instead of strings for the file contents.
testing, I need to push it so I can test on cluster a bit.
…. This seems to be the first version that runs everything (create workflow, run, stich together, push to database). There will sitll be testing that needs to be done though.
…ross different Cp2k executions could be problematic. Probably the best way to deal with it is to assert a universal project name with each workflow. I don't like this, because its not as flexible, but it fixes the problem.
successfully. Might have to decide on universal file naming in order
to avoid these problems in the future.
Copy link
Member

@utf utf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @nwinner Thank you for submitting this!

I know this is a work in progress but I made some comments on the PR so we can hopefully clean this up as you go along.

Would you also be able to run the cp2k module through the black python formatter? Some of your lines are rather long so this will enforce PEP8 formatting of your code.

You can install black from pip:

pip install black

And run it on the cp2k module using a max line length of 80 (presuming you are in the atomate root directory) via:

black atomate/cp2k -l 80

logger = get_logger(__name__)


class Cp2kCalcDb(CalcDb):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this different to VASPCalcDb at all? If not, we should probably rename VASPCalcDb something more general and then it can be used in the CP2K module also.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not different at all I think, a more general VASPCalcDb would be a good idea.

from __future__ import division, print_function, unicode_literals, absolute_import

"""
This Drone tries to produce a more sensible task dictionary than the default VaspToDbTaskDrone.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove these comments as they aren't relevant.

@@ -0,0 +1,463 @@
# coding: utf-8

from __future__ import division, print_function, unicode_literals, absolute_import
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Atomate does not support python 2 any more, so you can remove all the __future__ imports from all of the cp2k module files.


def post_process(self, dir_name, d):
"""
Post-processing for various files other than the vasprun.xml and OUTCAR.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update this docstring.

pass
elif defuse_unsuccessful == "fizzle":
raise RuntimeError(
"VaspToDb indicates that job is not successful "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update VaspToDb

t.append(UpdateStructureFromPrevCalc(prev_calc_loc=prev_calc_loc))

# if prev calc directory is being REPEATED, copy files
if prev_calc_dir:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me how prev_calc_loc and prev_calc_dir are interacting here? The docstring for previous_calc_dir says:

prev_calc_dir (str): Path to a previous calculation to copy from.

but that doesn't happen in the code. Can you clarify how prev_calc_loc and prev_calc_dir should be used?

from __future__ import division, print_function, unicode_literals, absolute_import

"""
This module defines tasks for writing vasp input sets for various types of vasp calculations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update please.

@explicit_serialize
class WriteCp2kFromPrevious(FiretaskBase):

optional_params = ['cp2k_input_params', 'prev_calc_loc', 'original_input_filename',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cp2k_input_params and new_input_filename are unused here.

class UpdateStructureFromPrevCalc(FiretaskBase):
"""
Using the location of a previous calculation. The CP2K output parser will
get the final structure from a previous calculation and update this FW's
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why you are updating the structure by updating the firework spec rather than either:

  1. Just writing the structure to file
  2. Doing the structure updating as part of WriteCp2kFromPrevious.

I think option 2 makes the most sense. If you only want to copy the structure and nothing else from the previous calculation, you could add options to that effect in WriteCp2kFromPrevious.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Alex, number 2 probably does make more sense. Number 1 isn't the case because cp2k doesn't use the same formatting as VASP where a separate file holds the structure. Generally cp2k has everything in a single input file. This can actually be modified, but its not a cp2k handles stuff by default so i didn't want to complicate that until a later time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok great. Let's go with option 2!

fw_name = "{}-{}".format(structure.composition.reduced_formula if structure else "unknown", name)

cp2k_input_set = cp2k_input_set or StaticSet(structure, **cp2k_input_set_params)
cp2k_input_set.project_name = name # Currently, this assertion is needed to keep all FWs in a workflow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the problem you are trying to solve by using project_name?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CP2K output files are named according to the "PROJECT_NAME" variable inside the input file. Notably, the wavefunction and DOS files have the project name prepended in their filename. So, in order to deal with this we would need to either (a) check for the project name every time you try to find a file (like a restart file) to copy over, (b) use a universal project name so that the files still have the same format but with a standard prefix, (c) override the file name in the cp2k input file so that it adheres to some agreed upon overall file name. I chose to go with (b) as the easiest one, but I'm open to changing it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note we're going to have the same question for my (perpetual backburner project) of CASTEP support, so interested in whatever you settle on. I might pick the seed name to be the firework id, unless that has issues I'm not anticipating.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for clarifying that. With option b) are you using the same project name for EVERY calculation or is it specific to the calculation type? Maybe what I am describing is option c?

Are there any downsides to using a fixed project name, and just doing the calculations in different directories? That way there is never any ambiguity over which files to load. it seems like this would remove a lot of the complexity and make the parsing and file copying more similar to how it is handled in the VASP module?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently option (b) uses the same name for the workflow. So you could name the files according to "Static" for static WFs as an example. Just to give you a sense of what the output files are like. If you did that you would get a directory like this (for Si):

launcher-X:

  • Static-ALPHA_k1-1.pdos # elemental dos for atom kind 1, Spin up channel
  • Static-BETA_k1-1.pdos # elemental dos for atom kind 1, Spin down channel
  • Static-ALPHA_list1-1.pdos # Site projected dos onto site list 1, spin up
  • Static-ALPHA_list2-1.pdos # Site projected dos onto site list 2, spin up
  • Static-BETA_list1-1.pdos # Site projected dos onto site list 1, spin down
  • Static-BETA_list2-1.pdos # Site projected dos onto site list 2, spin down
  • Static-RESTART.wfn # Restart wavefunction file
  • Static-RESTART.wfn.bak-1 # Backup wavefunction file (dumped intermittently)
  • Static-v_hartree-1_0.cube. # Hartree (electrostatic potential)
  • Static-ELECTRON_DENSITY-1_0.cube # Electron density
  • Static-SPIN_DENSITY-1_0.cube # spin density

So you can see that it changes the naming, but its pretty repeatable. Something I've been doing in the pymatgen output parsing is globbing to find files containing, say, "pdos", to identify the pdos files and globbing for "RESTART" to get the wavefunction file name. We could think about doing this instead of trying to record what the file name is going to be.

Base automatically changed from master to main March 10, 2021 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants