Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto-calculate ADDED_MOS based on basissets #131

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
auto-calculate ADDED_MOS based on basissets
dev-zero committed Apr 9, 2021
commit a8d4572b96e32d8ac3f89d4ce85015d6cd53d4f2
17 changes: 16 additions & 1 deletion aiida_cp2k/calculations/__init__.py
Original file line number Diff line number Diff line change
@@ -22,6 +22,7 @@
validate_pseudos_namespace,
write_basissets,
write_pseudos,
estimate_added_mos,
)
from ..utils import Cp2kInput

@@ -139,7 +140,7 @@ def prepare_for_submission(self, folder):
:return: `aiida.common.datastructures.CalcInfo` instance
"""

# pylint: disable=too-many-statements,too-many-branches
# pylint: disable=too-many-statements,too-many-branches,too-many-locals

# Create cp2k input file.
inp = Cp2kInput(self.inputs.parameters.get_dict())
@@ -167,6 +168,20 @@ def prepare_for_submission(self, folder):
self.inputs.structure if 'structure' in self.inputs else None)
write_basissets(inp, self.inputs.basissets, folder)

# if we have both basissets and structure we can start helping the user :)
if 'basissets' in self.inputs and 'structure' in self.inputs:
try:
scf_section = inp.get_section_dict('FORCE_EVAL/DFT/SCF')
except (KeyError, NotImplementedError):
pass # if not found or multiple FORCE_EVAL, do nothing (yet)
else:
if 'SMEAR' in scf_section and 'ADDED_MOS' not in scf_section:
# now is our time to shine!
added_mos = estimate_added_mos(self.inputs.basissets, self.inputs.structure)
inp.add_keyword('FORCE_EVAL/DFT/SCF/ADDED_MOS', added_mos)
self.logger.info(f'The FORCE_EVAL/DFT/SCF/ADDED_MOS was added'
f' with an automatically estimated value of {added_mos}')

if 'pseudos' in self.inputs:
validate_pseudos(inp, self.inputs.pseudos, self.inputs.structure if 'structure' in self.inputs else None)
write_pseudos(inp, self.inputs.pseudos, folder)
25 changes: 25 additions & 0 deletions aiida_cp2k/utils/datatype_helpers.py
Original file line number Diff line number Diff line change
@@ -160,6 +160,31 @@ def validate_basissets(inp, basissets, structure):
kind_sec["ELEMENT"] = bset.element


def estimate_added_mos(basissets, structure, fraction=0.3):
"""Calculate an estimate for ADDED_MOS based on used basis sets"""
Copy link
Member

@ltalirz ltalirz May 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @dev-zero! I think adding a guess for ADDED_MOS for smearing calculations is a great idea.

Could you perhaps document the reasoning behind the choices made in the documentation of this function?

In particular, I suspect that you may want to make the value of the smearing an input of this function, since the number of orbitals that are necessary is very much determined by it.

Do you have some test data on what energy range above the highest occupied MO will be spanned as a function of the number of added MOs (measured in % of the basis functions)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can for sure explain it a bit better, but in the end it is an empiric fraction.
Sure, you can most likely come up with a better one (or even a better motivation for the current one) given the type of smearing and the temperature, but I have no plans or time for it atm. This might also be a good idea for the common-workflow project since there you do the initial structure before doing the rest, hence you could gather some info from that first calculation to narrow it down for the rest of them (with the risk of skewing your result if done wrong).

No test data unfortunately (unless you want to start parsing the kp-wfn files), but if you want I can gather it when doing some more runs.

Copy link
Member

@ltalirz ltalirz May 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can for sure explain it a bit better, but in the end it is an empiric fraction.

I think that would make sense.
By the way, if I remember well, then quantum espresso by default adds 10% of the number of electrons (or occupied orbitals).

While this is also approximate of course, to me the number of electrons is the more physically meaningful quantity here - as long as the basis sets are good enough to represent the electronic states in question, the number of ADDED MOS to use should be independent of the basis set size.

This might also be a good idea for the common-workflow project

Indeed. For a given system, smearing value and smearing method, the guess for ADDED_MOS should be independent of the code.

No test data unfortunately (unless you want to start parsing the kp-wfn files), but if you want I can gather it when doing some more runs.

I think that would make sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intuitively I'd have said that making the number of added MOS dependent on the basis functions rather than just the number electrons is due to the fact that we're using a localized basis set. But, any value is empirical and might be wrong.
To make it a bit more systematic I've added a new keyword to the relevant &PRINT section in CP2K which outputs the number of occupied MOs (aggregated as max over all kpoints) on each SCF cycle (one could already print occ, but only for each KP and spin) and the option to directly add all MOs (set ADDED_MOS to -1) The idea is that for something like the aiida-common-workflows the first calculation will be run with all MOs and reporting turned on and the scaled structures will run with that max.
The question is what to do for aiida-cp2k alone: we can leave it completely to the user, or with a recent enough version of CP2K add the -1 if undefined, or add some empirical value as proposed here.

Copy link
Member

@ltalirz ltalirz Jul 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intuitively I'd have said that making the number of added MOS dependent on the basis functions rather than just the number electrons is due to the fact that we're using a localized basis set.

May I ask why?

To make my argument as clearly as possible, let's consider the case where your calculation is already converged as a function of the basis set size.
If you now go ahead and double the size of your basis set, the number of MOs to add should remain exactly the same - it depends only on the smearing function you use and on the electronic structure of your system.
However, if you double the size of your system (and thus also the number of electrons), the number of added MOs should double.

the number of occupied MOs (aggregated as max over all kpoints)

Occupied here means: occupation above some relevance cutoff (say 1e-3), correct?
Yes, I agree this is very useful to know.

the option to directly add all MOs (set ADDED_MOS to -1) The idea is that for something like the aiida-common-workflows the first calculation will be run with all MOs and reporting turned on and the scaled structures will run with that max.
The question is what to do for aiida-cp2k alone: we can leave it completely to the user, or with a recent enough version of CP2K add the -1 if undefined, or add some empirical value as proposed here.

I see. Taking all possible orbitals allowed by the basis is certainly a "safe" solution. For systems where it comes essentially for free, do it, but considering that the number of electrons is typically quite substantially below the number of basis functions, I suspect that for large systems the overhead both in compute time and max. memory needed will be prohibitive.

I strongly suspect that an empirical rule of adding some percentage of the number of (fully) occupied orbitals is going to be enough for most cases (after all, the occupied states tend to span several eVs while half an eV would already be a very high smearing).
As mentioned before, you may want to make the smearing value a parameter in the guess for the number of added MOs.

If you want, I can check the QE source to see exactly what they ended up with.


symbols = [structure.get_kind(s.kind_name).get_symbols_string() for s in structure.sites]
n_mos = 0

# We are currently overcounting in the following cases:
# * if we get a mix of ORB basissets for the same chemical symbol but different sites
# * if we get multiple basissets for one element (merged within CP2K)

for label, bset in _unpack(basissets):
try:
_, bstype = label.split("_", maxsplit=1)
except ValueError:
bstype = "ORB"

if bstype != "ORB": # ignore non-ORB basissets
continue

n_mos += symbols.count(bset.element) * bset.n_orbital_functions

# at least one additional MO per site, otherwise a fraction of the total number of orbital functions
return max(len(symbols), int(fraction * n_mos))


def write_basissets(inp, basissets, folder):
"""Writes the unified BASIS_SETS file with the used basissets"""
_write_gdt(inp, basissets, folder, "BASIS_SET_FILE_NAME", "BASIS_SETS")
68 changes: 68 additions & 0 deletions aiida_cp2k/utils/input_generator.py
Original file line number Diff line number Diff line change
@@ -56,6 +56,74 @@ def add_keyword(self, kwpath, value, override=True, conflicting_keys=None):

Cp2kInput._add_keyword(kwpath, value, self._params, ovrd=override, cfct=conflicting_keys)

@staticmethod
def _stringify_path(kwpath):
"""Stringify a kwpath argument"""
if isinstance(kwpath, str):
return kwpath

assert isinstance(kwpath, Sequence), "path is neither Sequence nor String"
return "/".join(kwpath)

def get_section_dict(self, kwpath=""):
"""Get a copy of a section from the current input structure

Args:

kwpath: Can be a single keyword, a path with `/` as divider for sections & key,
or a sequence with sections and key.
"""

section = self._get_section_or_kw(kwpath)

if not isinstance(section, Mapping):
raise TypeError(f"Section '{self._stringify_path(kwpath)}' requested, but keyword found")

return deepcopy(section)

def get_keyword_value(self, kwpath):
"""Get the value of a keyword from the current input structure

Args:

kwpath: Can be a single keyword, a path with `/` as divider for sections & key,
or a sequence with sections and key.
"""

keyword = self._get_section_or_kw(kwpath)

if isinstance(keyword, Mapping):
raise TypeError(f"Keyword '{self._stringify_path(kwpath)}' requested, but section found")

return keyword

def _get_section_or_kw(self, kwpath):
"""Retrieve either a section or a keyword given a path"""

if isinstance(kwpath, str):
kwpath = kwpath.split("/") # convert to list of sections if string

# get a copy of the path in a mutable sequence
# accept any case, but internally we use uppercase
# strip empty strings to accept leading "/", "//", etc.
path = [k.upper() for k in kwpath if k]

# start with a reference to the root of the parameters
current = self._params

try:
while path:
current = current[path.pop(0)]
except KeyError:
raise KeyError(f"Section '{self._stringify_path(kwpath)}' not found in parameters")
except TypeError:
if isinstance(current, Sequence) and not isinstance(current, str):
raise NotImplementedError(f"Repeated sections are not yet supported when retrieving data"
f" with path '{self._stringify_path(kwpath)}'")
raise

return current

def render(self):
output = [self.DISCLAIMER]
self._render_section(output, deepcopy(self._params))
83 changes: 83 additions & 0 deletions test/test_gaussian_datatypes.py
Original file line number Diff line number Diff line change
@@ -778,3 +778,86 @@ def test_without_kinds(cp2k_code, cp2k_basissets, cp2k_pseudos, clear_database):

_, calc_node = run_get_node(CalculationFactory("cp2k"), **inputs)
assert calc_node.exit_status == 0


def test_added_mos(cp2k_code, cp2k_basissets, cp2k_pseudos, clear_database): # pylint: disable=unused-argument
"""Testing CP2K with the Basis Set stored in gaussian.basisset and a smearing section but no predefined ADDED_MOS"""

structure = StructureData(cell=[[4.00759, 0.0, 0.0], [-2.003795, 3.47067475, 0.0],
[3.06349683e-16, 5.30613216e-16, 5.00307]],
pbc=True)
structure.append_atom(position=(-0.00002004, 2.31379473, 0.87543719), symbols="H")
structure.append_atom(position=(2.00381504, 1.15688001, 4.12763281), symbols="H")
structure.append_atom(position=(2.00381504, 1.15688001, 3.37697219), symbols="H")
structure.append_atom(position=(-0.00002004, 2.31379473, 1.62609781), symbols="H")

# parameters
parameters = Dict(
dict={
'GLOBAL': {
'RUN_TYPE': 'ENERGY',
},
'FORCE_EVAL': {
'METHOD': 'Quickstep',
'DFT': {
"XC": {
"XC_FUNCTIONAL": {
"_": "PBE",
},
},
"MGRID": {
"CUTOFF": 100.0,
"REL_CUTOFF": 10.0,
},
"QS": {
"METHOD": "GPW",
"EXTRAPOLATION": "USE_GUESS",
},
"SCF": {
"EPS_SCF": 1e-05,
"MAX_SCF": 3,
"MIXING": {
"METHOD": "BROYDEN_MIXING",
"ALPHA": 0.4,
},
"SMEAR": {
"METHOD": "FERMI_DIRAC",
"ELECTRONIC_TEMPERATURE": 300.0,
},
},
"KPOINTS": {
"SCHEME": "MONKHORST-PACK 2 2 1",
"FULL_GRID": False,
"SYMMETRY": False,
"PARALLEL_GROUP_SIZE": -1,
},
},
},
})

options = {
"resources": {
"num_machines": 1,
"num_mpiprocs_per_machine": 1
},
"max_wallclock_seconds": 1 * 3 * 60,
}

inputs = {
"structure": structure,
"parameters": parameters,
"code": cp2k_code,
"metadata": {
"options": options,
},
"basissets": {label: b for label, b in cp2k_basissets.items() if label == "H"},
"pseudos": {label: p for label, p in cp2k_pseudos.items() if label == "H"},
}

_, calc_node = run_get_node(CalculationFactory("cp2k"), **inputs)

assert calc_node.exit_status == 0

# check that the ADDED_MOS keyword was added within the calculation
with calc_node.open("aiida.inp") as fhandle:
assert any("ADDED_MOS" in line for line in fhandle), "ADDED_MOS not found in the generated CP2K input file"
39 changes: 39 additions & 0 deletions test/test_input_generator.py
Original file line number Diff line number Diff line change
@@ -185,3 +185,42 @@ def test_invalid_preprocessor():
inp = Cp2kInput({"@SET": "bar"})
with pytest.raises(ValueError):
inp.render()


def test_get_keyword_value():
"""Test get_keyword_value()"""
inp = Cp2kInput({"FOO": "bar", "A": {"KW1": "val1"}})
assert inp.get_keyword_value("FOO") == "bar"
assert inp.get_keyword_value("/FOO") == "bar"
assert inp.get_keyword_value("A/KW1") == "val1"
assert inp.get_keyword_value("/A/KW1") == "val1"
assert inp.get_keyword_value(["A", "KW1"]) == "val1"
with pytest.raises(TypeError):
inp.get_keyword_value("A")
with pytest.raises(KeyError):
inp.get_keyword_value("B")


def test_get_section_dict():
"""Test get_section_dict()"""
orig_dict = {"FOO": "bar", "A": {"KW1": "val1"}}
inp = Cp2kInput(orig_dict)
assert inp.get_section_dict("/") == orig_dict
assert inp.get_section_dict("////") == orig_dict
assert inp.get_section_dict("") == orig_dict
assert inp.get_section_dict() == orig_dict
assert inp.get_section_dict("/") is not orig_dict # make sure we get a distinct object
assert inp.get_section_dict("A") == orig_dict["A"]
assert inp.get_section_dict("/A") == orig_dict["A"]
assert inp.get_section_dict(["A"]) == orig_dict["A"]
with pytest.raises(TypeError):
inp.get_section_dict("FOO")
with pytest.raises(KeyError):
inp.get_section_dict("BAR")


def test_get_section_dict_repeated():
"""Test NotImplementedError for repeated sections in get_section_dict()"""
inp = Cp2kInput({"FOO": [{"KW1": "val1_1"}, {"KW1": "val1_2"}]})
with pytest.raises(NotImplementedError):
inp.get_keyword_value("/FOO/KW1")