Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR: minimal MTP training script #5

Merged
merged 29 commits into from
Mar 13, 2024
Merged

PR: minimal MTP training script #5

merged 29 commits into from
Mar 13, 2024

Conversation

sblackburn86
Copy link
Collaborator

The script trains and evaluates a MTP based on LAMMPS configurations passed as arguments. The current main in train_mtp.py depends on LAMMPS output. A clean interface with args parser needs to be done. The evaluate outputs dataframe in pandas format, so we land in python-friendly territory.

A lot of kwargs for MLIP needs to be investigated and added to the script. The other major TODO is the write the method that converts the quasi-binary output from MLIP to a format usable by LAMMPS (not all lines are binary in the output - most are utf-8 already). maml has a script for mlip-2, I put in the code as comments. Note that this conversion was done in the train method. I disagree with that choice, so I move it to its own method.

I am ambivalent to the choice of using a scratch dir with monty. This is from maml. I chose to keep it to simplify things, but we could revisit this decision.

@sblackburn86 sblackburn86 requested a review from rousseab March 8, 2024 15:47
# Mock self.write_cfg to simulate creating a config file without file operations
mocker.patch.object(MTPWithMLIP3, "write_cfg", return_value="mock_filename.cfg")

# mocker.patch("crystal_diffusion.models.mtp.itertools.chain", return_value=[1, 2, 3])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest deleting this commented code. Yagni.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted


atoms_filename = "train.cfgs"

with ScratchDir("."): # create a tmpdir - deleted afterew
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a typo in the comment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

mocker.patch("os.path.exists", return_value=True)

# Mock the ScratchDir context manager to do nothing
mocker.patch("crystal_diffusion.models.mtp.ScratchDir", mocker.MagicMock())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mtp.train writes a min_dist file to disk. By mocking the ScratchDir, this min_dist file remains in the test folder after the test.

I suggest not mocking the ScratchDir; that way, the side effect files will get deleted along with the scratch folder at the end of the test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

from crystal_diffusion.models.mtp import MTPWithMLIP3


class MockStructure:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more of a style nitpick, so feel free to ignore.
I would tend to reserve the word "Mock" for things that come from the Mock library (or the mocker fixture). Here, I would call this "FakeStructure", or "StructureStub"...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed, that's a good idea



@pytest.fixture
def mock_structure():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: I would call this fake_structure: it is a real Structure object, not something derived from the Mock class.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

return instance


@pytest.mark.parametrize("mock_subprocess", [0]) # Here, 0 simulates a successful subprocess return code
Copy link
Collaborator

@rousseab rousseab Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mock_subprocess parameter can only be 0, or else the test fails. I suggest creating a separate fixture, like

@pytest.fixture
def successful_ subrocess_code():
return 0

and use that instead of a pytest.mark.parametrize.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed


# Mock subprocess.Popen for evaluate method's call
# Mock subprocess.Popen to simulate an external call to `mlp` command
mock_popen = mocker.patch("subprocess.Popen")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is common to test_evaluate and test_train_method. I would hide that in a fixture.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


# Mock the external dependencies and method calls within the MTPWithMLIP3.train method
@pytest.mark.parametrize("mock_subprocess", [0]) # Here, 0 simulates a successful subprocess return code
def test_train_method(mocker, mock_subprocess):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_train_method -> test_train, to be consistent with test_evaluate below?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name changed

# calculate_grade is the method to get the forces, energy & maxvol values
cmd = [self.mlp_command, "calculate_grade", self.fitted_mtp, original_file, predict_file]
predict_file += '.0' # added by mlp...
with subprocess.Popen(cmd, stdout=subprocess.PIPE) as p: # run mlp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block is duplicated between train and evaluate. I would refactor that as a "private" _method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored


Args:
filename: name of mlp output file to be parsed.
nbh_grade (optional): if True, add the nbh_grades in the resulting dataframe. Defaults to False.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "nbh_grade" mean? Is "nbh" an acronym? A quick explanation here would be useful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nbh stands for neighborhood. mlip does either structure or neighborhood to compute the gamma factor. Let's stick to nbh (default) for now. We could revisit this later.
Added information in the arg description.

outputs = d["outputs"]
pos_arr = np.array(outputs["position"])
force_arr = np.array(outputs["forces"])
n_atom = force_arr.shape[0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a num_atoms field in the docs dictionary. A few sanity check asserts on array dimensions here would help prevent the user from shooting themselves in the foot.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. Added assert for that

mtp_file_path = os.path.join(self.mlp_templates, unfitted_mtp)
shutil.copyfile(mtp_file_path, os.path.join(os.getcwd(), unfitted_mtp))
commands = [self.mlp_command, "mindist", atoms_filename]
with open("min_dist", "w") as f, subprocess.Popen(commands, stdout=f) as p:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Popen could be hidden in a _method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hidden, it is

# f"--bfgs-conv-tol={bfgs_conv_tol}",
# f"--weighting={weighting}",
]
with subprocess.Popen(cmds_list, stdout=subprocess.PIPE) as p:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Popen could be hidden away in a _method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be hidden now


# TODO list of yaml files should come from an external call
# yaml dump file
lammps_yaml = ['lammps_scripts/Si/si-custom/dump.si-300-1.yaml']
Copy link
Collaborator

@rousseab rousseab Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These files are not in the repo. Maybe we could add them to the examples folder so that the main below is runnable from a clean repo.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the files in examples/local/mtp_examples
These might be removed later. I'll want to have a 'real' interface to data sources at some point.

Copy link
Collaborator

@rousseab rousseab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look too carefully at the mlip3_missing_files: I assumed they are copied directly from the MLIP-2 repo. There is a CPP file in there: if you changed anything in there, I think it should be highlighted at the top of that file.

Otherwise, I make various suggestions for changes. We can have a live chat in the afternoon if useful.

crystal_diffusion/train_mtp.py Show resolved Hide resolved
crystal_diffusion/train_mtp.py Show resolved Hide resolved
return mtp_inputs


def train_mtp(train_inputs: Dict[str, Any], mlip_cmd_path: str, save_dir: str) -> MTPWithMLIP3:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest mlip_cmd_path -> mlip_folder_path to avoid suggesting that the 'path-to-executable' is what is needed here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

return mtp_inputs


def train_mtp(train_inputs: Dict[str, Any], mlip_cmd_path: str, save_dir: str) -> MTPWithMLIP3:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think train_inputs should be a namedtuple https://docs.python.org/3/library/collections.html#collections.namedtuple.

That's a pretty lightweight way of "typing", ie telling the user exactly what is expected as input.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

return mtp


def evaluate_mtp(eval_inputs: Dict[str, Any], mtp: MTPWithMLIP3) -> Tuple[pd.DataFrame, pd.DataFrame]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think eval_inputs should be a namedtuple.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed



def train_mtp(train_inputs: Dict[str, Any], mlip_cmd_path: str, save_dir: str) -> MTPWithMLIP3:
"""Create and evaluate an MTP potential.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this say "create and train an MTP potential"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed



def evaluate_mtp(eval_inputs: Dict[str, Any], mtp: MTPWithMLIP3) -> Tuple[pd.DataFrame, pd.DataFrame]:
"""Create and evaluate an MTP potential.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this say "Evaluate a trained MTP potential"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

crystal_diffusion/models/mtp.py Show resolved Hide resolved
crystal_diffusion/train_mtp.py Show resolved Hide resolved
gt_energy = df_orig.groupby('structure_index').agg({'energy': 'mean', 'atom_index': 'count'})
gt_energy = (gt_energy['energy'] / gt_energy['atom_index']).to_numpy()

predicted_forces = df_predict.groupby('structure_index').agg({'fx': 'sum', 'fy': 'sum', 'fz': 'sum',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sum on forces seems like a bad idea to me. It will lead to "cancellation of errors". When we have a single structure with $N$ atoms, I think we want

MAE = $\frac{1}{3N} \sum_{\alpha} \sum_{i} |f^\alpha_i -\hat{f}^\alpha_i|$.

What is implemented seems to be
BAD_MAE = $\frac{1}{3N} \sum_{\alpha} |\sum_{i} (f^\alpha_i -\hat{f}^\alpha_i)|$ (!?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't catch that mistake when I was writing it, you are right, it was a bad implementation.
I wrote a simpler version that takes the MAE over atoms & directions. The contribution of a structure depends on the nature of atoms making it.

I doubt these metrics really mattter for us in the long run. I reused what was in maml, but you should define our own. And refactor the code so the metrics can be outside the entry point.

Copy link
Collaborator

@rousseab rousseab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor things and we are good to go.

gt_energy = df_orig.groupby('structure_index').agg({'energy': 'mean', 'atom_index': 'count'})
gt_energy = (gt_energy['energy'] / gt_energy['atom_index']).to_numpy()

predicted_forces = (df_predict[['fx', 'fy', 'fz']].to_numpy().flatten())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the added parentheses are needed here:

predicted_forces = (df_predict[['fx', 'fy', 'fz']].to_numpy().flatten())

can be

predicted_forces = df_predict[['fx', 'fy', 'fz']].to_numpy().flatten()

df_predict = pd.DataFrame({
'structure_index': [0, 0, 1, 1],
'atom_index': [0, 1, 0, 1],
'energy': [1.05, 0.95, 3.1, 2.9], # Energy has a slight variation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace this line with

'energy': [1.1, 1.1, 3.1, 3.1], # Energy has a slight variation

and the test fails. See comment below.

})

# Calculate expected MAE for energy and forces
expected_mae_energy = mean_absolute_error(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is different from what is done in the function being tested. The function being tested computes the MAE on the energy per atom, whereas this computes the MAE on the total energy. The test still passes because the test data produces an MAE of zero for both possibilities.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest changing the test data to be more "random".

@sblackburn86 sblackburn86 merged commit 113a433 into main Mar 13, 2024
1 check passed
@sblackburn86 sblackburn86 deleted the train_mtp branch March 13, 2024 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants