PR: minimal MTP training script #5

sblackburn86 · 2024-03-08T14:44:46Z

The script trains and evaluates a MTP based on LAMMPS configurations passed as arguments. The current main in train_mtp.py depends on LAMMPS output. A clean interface with args parser needs to be done. The evaluate outputs dataframe in pandas format, so we land in python-friendly territory.

A lot of kwargs for MLIP needs to be investigated and added to the script. The other major TODO is the write the method that converts the quasi-binary output from MLIP to a format usable by LAMMPS (not all lines are binary in the output - most are utf-8 already). maml has a script for mlip-2, I put in the code as comments. Note that this conversion was done in the train method. I disagree with that choice, so I move it to its own method.

I am ambivalent to the choice of using a scratch dir with monty. This is from maml. I chose to keep it to simplify things, but we could revisit this decision.

…n a mac

rousseab · 2024-03-11T12:24:24Z

tests/models/test_mtp.py

+    # Mock self.write_cfg to simulate creating a config file without file operations
+    mocker.patch.object(MTPWithMLIP3, "write_cfg", return_value="mock_filename.cfg")
+
+    # mocker.patch("crystal_diffusion.models.mtp.itertools.chain", return_value=[1, 2, 3])


I suggest deleting this commented code. Yagni.

rousseab · 2024-03-11T12:36:53Z

crystal_diffusion/models/mtp.py

+
+        atoms_filename = "train.cfgs"
+
+        with ScratchDir("."):  # create a tmpdir - deleted afterew


There is a typo in the comment.

rousseab · 2024-03-11T12:46:05Z

tests/models/test_mtp.py

+    mocker.patch("os.path.exists", return_value=True)
+
+    # Mock the ScratchDir context manager to do nothing
+    mocker.patch("crystal_diffusion.models.mtp.ScratchDir", mocker.MagicMock())


mtp.train writes a min_dist file to disk. By mocking the ScratchDir, this min_dist file remains in the test folder after the test.

I suggest not mocking the ScratchDir; that way, the side effect files will get deleted along with the scratch folder at the end of the test.

rousseab · 2024-03-11T12:58:29Z

tests/models/test_mtp.py

+from crystal_diffusion.models.mtp import MTPWithMLIP3
+
+
+class MockStructure:


This is more of a style nitpick, so feel free to ignore.
I would tend to reserve the word "Mock" for things that come from the Mock library (or the mocker fixture). Here, I would call this "FakeStructure", or "StructureStub"...

changed, that's a good idea

rousseab · 2024-03-11T13:03:13Z

tests/models/test_mtp.py

+
+
+@pytest.fixture
+def mock_structure():


nitpick: I would call this fake_structure: it is a real Structure object, not something derived from the Mock class.

rousseab · 2024-03-11T13:08:33Z

tests/models/test_mtp.py

+    return instance
+
+
+@pytest.mark.parametrize("mock_subprocess", [0])  # Here, 0 simulates a successful subprocess return code


This mock_subprocess parameter can only be 0, or else the test fails. I suggest creating a separate fixture, like

@pytest.fixture
def successful_ subrocess_code():
return 0

and use that instead of a pytest.mark.parametrize.

rousseab · 2024-03-11T13:21:46Z

tests/models/test_mtp.py

+
+    # Mock subprocess.Popen for evaluate method's call
+    # Mock subprocess.Popen to simulate an external call to `mlp` command
+    mock_popen = mocker.patch("subprocess.Popen")


This is common to test_evaluate and test_train_method. I would hide that in a fixture.

rousseab · 2024-03-11T13:22:29Z

tests/models/test_mtp.py

+
+# Mock the external dependencies and method calls within the MTPWithMLIP3.train method
+@pytest.mark.parametrize("mock_subprocess", [0])  # Here, 0 simulates a successful subprocess return code
+def test_train_method(mocker, mock_subprocess):


test_train_method -> test_train, to be consistent with test_evaluate below?

name changed

mlip3_missing_files/mlp_commands.cpp

rousseab · 2024-03-11T13:47:02Z

crystal_diffusion/models/mtp.py

+            # calculate_grade is the method to get the forces, energy & maxvol values
+            cmd = [self.mlp_command, "calculate_grade", self.fitted_mtp, original_file, predict_file]
+            predict_file += '.0'  # added by mlp...
+            with subprocess.Popen(cmd, stdout=subprocess.PIPE) as p:  # run mlp


This block is duplicated between train and evaluate. I would refactor that as a "private" _method.

rousseab · 2024-03-11T13:49:03Z

crystal_diffusion/models/mtp.py

+
+        Args:
+            filename: name of mlp output file to be parsed.
+            nbh_grade (optional): if True, add the nbh_grades in the resulting dataframe. Defaults to False.


what does "nbh_grade" mean? Is "nbh" an acronym? A quick explanation here would be useful.

nbh stands for neighborhood. mlip does either structure or neighborhood to compute the gamma factor. Let's stick to nbh (default) for now. We could revisit this later.
Added information in the arg description.

rousseab · 2024-03-11T14:28:42Z

crystal_diffusion/models/mtp.py

+            outputs = d["outputs"]
+            pos_arr = np.array(outputs["position"])
+            force_arr = np.array(outputs["forces"])
+            n_atom = force_arr.shape[0]


There's a num_atoms field in the docs dictionary. A few sanity check asserts on array dimensions here would help prevent the user from shooting themselves in the foot.

good catch. Added assert for that

rousseab · 2024-03-11T14:34:45Z

crystal_diffusion/models/mtp.py

+            mtp_file_path = os.path.join(self.mlp_templates, unfitted_mtp)
+            shutil.copyfile(mtp_file_path, os.path.join(os.getcwd(), unfitted_mtp))
+            commands = [self.mlp_command, "mindist", atoms_filename]
+            with open("min_dist", "w") as f, subprocess.Popen(commands, stdout=f) as p:


This Popen could be hidden in a _method.

hidden, it is

rousseab · 2024-03-11T14:35:15Z

crystal_diffusion/models/mtp.py

+                # f"--bfgs-conv-tol={bfgs_conv_tol}",
+                # f"--weighting={weighting}",
+            ]
+            with subprocess.Popen(cmds_list, stdout=subprocess.PIPE) as p:


This Popen could be hidden away in a _method.

should be hidden now

rousseab · 2024-03-11T14:44:34Z

crystal_diffusion/train_mtp.py

+
+# TODO list of yaml files should come from an external call
+# yaml dump file
+lammps_yaml = ['lammps_scripts/Si/si-custom/dump.si-300-1.yaml']


These files are not in the repo. Maybe we could add them to the examples folder so that the main below is runnable from a clean repo.

added the files in examples/local/mtp_examples
These might be removed later. I'll want to have a 'real' interface to data sources at some point.

crystal_diffusion/train_mtp.py

rousseab

I didn't look too carefully at the mlip3_missing_files: I assumed they are copied directly from the MLIP-2 repo. There is a CPP file in there: if you changed anything in there, I think it should be highlighted at the top of that file.

Otherwise, I make various suggestions for changes. We can have a live chat in the afternoon if useful.

crystal_diffusion/train_mtp.py

rousseab · 2024-03-11T14:57:29Z

crystal_diffusion/train_mtp.py

+    return mtp_inputs
+
+
+def train_mtp(train_inputs: Dict[str, Any], mlip_cmd_path: str, save_dir: str) -> MTPWithMLIP3:


I suggest mlip_cmd_path -> mlip_folder_path to avoid suggesting that the 'path-to-executable' is what is needed here.

rousseab · 2024-03-11T15:00:15Z

crystal_diffusion/train_mtp.py

+    return mtp_inputs
+
+
+def train_mtp(train_inputs: Dict[str, Any], mlip_cmd_path: str, save_dir: str) -> MTPWithMLIP3:


I think train_inputs should be a namedtuple https://docs.python.org/3/library/collections.html#collections.namedtuple.

That's a pretty lightweight way of "typing", ie telling the user exactly what is expected as input.

rousseab · 2024-03-11T15:02:22Z

crystal_diffusion/train_mtp.py

+    return mtp
+
+
+def evaluate_mtp(eval_inputs: Dict[str, Any], mtp: MTPWithMLIP3) -> Tuple[pd.DataFrame, pd.DataFrame]:


I think eval_inputs should be a namedtuple.

rousseab · 2024-03-11T15:03:03Z

crystal_diffusion/train_mtp.py

+
+
+def train_mtp(train_inputs: Dict[str, Any], mlip_cmd_path: str, save_dir: str) -> MTPWithMLIP3:
+    """Create and evaluate an MTP potential.


Should this say "create and train an MTP potential"?

rousseab · 2024-03-11T15:03:25Z

crystal_diffusion/train_mtp.py

+
+
+def evaluate_mtp(eval_inputs: Dict[str, Any], mtp: MTPWithMLIP3) -> Tuple[pd.DataFrame, pd.DataFrame]:
+    """Create and evaluate an MTP potential.


Should this say "Evaluate a trained MTP potential"?

crystal_diffusion/models/mtp.py

crystal_diffusion/train_mtp.py

rousseab · 2024-03-11T15:20:36Z

crystal_diffusion/train_mtp.py

+    gt_energy = df_orig.groupby('structure_index').agg({'energy': 'mean', 'atom_index': 'count'})
+    gt_energy = (gt_energy['energy'] / gt_energy['atom_index']).to_numpy()
+
+    predicted_forces = df_predict.groupby('structure_index').agg({'fx': 'sum', 'fy': 'sum', 'fz': 'sum',


Sum on forces seems like a bad idea to me. It will lead to "cancellation of errors". When we have a single structure with $N$ atoms, I think we want

MAE = $\frac{1}{3N} \sum_{\alpha} \sum_{i} |f^\alpha_i -\hat{f}^\alpha_i|$.

What is implemented seems to be
BAD_MAE = $\frac{1}{3N} \sum_{\alpha} |\sum_{i} (f^\alpha_i -\hat{f}^\alpha_i)|$ (!?)

I didn't catch that mistake when I was writing it, you are right, it was a bad implementation.
I wrote a simpler version that takes the MAE over atoms & directions. The contribution of a structure depends on the nature of atoms making it.

I doubt these metrics really mattter for us in the long run. I reused what was in maml, but you should define our own. And refactor the code so the metrics can be outside the entry point.

tests/models/test_mtp.py

rousseab

Just a few minor things and we are good to go.

rousseab · 2024-03-13T15:05:22Z

crystal_diffusion/train_mtp.py

+    gt_energy = df_orig.groupby('structure_index').agg({'energy': 'mean', 'atom_index': 'count'})
+    gt_energy = (gt_energy['energy'] / gt_energy['atom_index']).to_numpy()
+
+    predicted_forces = (df_predict[['fx', 'fy', 'fz']].to_numpy().flatten())


I don't think the added parentheses are needed here:

predicted_forces = (df_predict[['fx', 'fy', 'fz']].to_numpy().flatten())

can be

predicted_forces = df_predict[['fx', 'fy', 'fz']].to_numpy().flatten()

rousseab · 2024-03-13T15:21:02Z

tests/models/test_mtp.py

+    df_predict = pd.DataFrame({
+        'structure_index': [0, 0, 1, 1],
+        'atom_index': [0, 1, 0, 1],
+        'energy': [1.05, 0.95, 3.1, 2.9],  # Energy has a slight variation


Replace this line with

'energy': [1.1, 1.1, 3.1, 3.1], # Energy has a slight variation

and the test fails. See comment below.

rousseab · 2024-03-13T15:21:53Z

tests/models/test_mtp.py

+    })
+
+    # Calculate expected MAE for energy and forces
+    expected_mae_energy = mean_absolute_error(


This is different from what is done in the function being tested. The function being tested computes the MAE on the energy per atom, whereas this computes the MAE on the total energy. The test still passes because the test data produces an MAE of zero for both possibilities.

I suggest changing the test data to be more "random".

sblackburn-mila added 17 commits February 27, 2024 15:05

minimal script to train a MTP from lammps. Beware the README to run o…

781a04f

…n a mac

rewrite for mlip3

cb66544

flake8

5cd18c3

mlip3 instructions

aa8c888

add dependencies

76f3a22

refactor to clean up script

193e765

fixed typo in kwarg default

a55747b

comment out a unused piece of code

49dde3d

basic unit test for train_mtp

944101d

file rename

8457fa1

unit tests

02bbf41

update the readme mtp file

66a30b4

missing blank line

90e2349

line length 121 - obviously

59002f1

extra whiteline

e911272

adding pytest-mock to requirements

b80bc34

typing fixes

35c5551

sblackburn86 requested a review from rousseab March 8, 2024 15:47

rousseab reviewed Mar 11, 2024

View reviewed changes

mlip3_missing_files/mlp_commands.cpp Show resolved Hide resolved

rousseab reviewed Mar 11, 2024

View reviewed changes

crystal_diffusion/train_mtp.py Show resolved Hide resolved

rousseab requested changes Mar 11, 2024

View reviewed changes

sblackburn-mila added 11 commits March 12, 2024 09:00

first batch of code review

4fe53b5

second batch of code review

0b6e90a

adding new unit tests for train_mtp to refactor later

c13472c

code review part 3

3f81f0f

flake8

c6e6e8c

missing period

2c7077d

isort

ad2e425

pytype fixes

be5f345

pytype & namedtuple funtime

99f7ea1

flake8

890eda1

unit test fix for namedtuple

597c83a

rousseab reviewed Mar 13, 2024

View reviewed changes

tests/models/test_mtp.py Show resolved Hide resolved

rousseab requested changes Mar 13, 2024

View reviewed changes

fixing mae on energy

d9f0756

rousseab approved these changes Mar 13, 2024

View reviewed changes

sblackburn86 merged commit 113a433 into main Mar 13, 2024
1 check passed

sblackburn86 deleted the train_mtp branch March 13, 2024 20:31


		atoms_filename = "train.cfgs"

		with ScratchDir("."): # create a tmpdir - deleted afterew

		from crystal_diffusion.models.mtp import MTPWithMLIP3


		class MockStructure:

		return instance


		@pytest.mark.parametrize("mock_subprocess", [0]) # Here, 0 simulates a successful subprocess return code

		return mtp_inputs


		def train_mtp(train_inputs: Dict[str, Any], mlip_cmd_path: str, save_dir: str) -> MTPWithMLIP3:

		return mtp


		def evaluate_mtp(eval_inputs: Dict[str, Any], mtp: MTPWithMLIP3) -> Tuple[pd.DataFrame, pd.DataFrame]:



		def train_mtp(train_inputs: Dict[str, Any], mlip_cmd_path: str, save_dir: str) -> MTPWithMLIP3:
		"""Create and evaluate an MTP potential.

PR: minimal MTP training script #5

PR: minimal MTP training script #5

Conversation

sblackburn86 commented Mar 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rousseab Mar 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rousseab Mar 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rousseab left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rousseab left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rousseab Mar 11, 2024 •

edited

Loading

rousseab Mar 11, 2024 •

edited

Loading