This repository has been archived by the owner on Apr 24, 2024. It is now read-only.

Equi(Kit)Script for workflows that compute, transform, join representations and estimate properties #10

Draft

agoscinski wants to merge 17 commits into main from equikit

Collaborator

agoscinski commented Feb 12, 2023

Couldn't manage to run so far because I still have the ValueError: Trying to set the EQUISTORE library path twice error, even though I am installed the same equistore version with equistore and rascaline. I put several message about the design marked with the tag COMMENT in the code.

I called it for now EquiKitScript to emphasize that these scripts use scikit-learn-like transformers and estimators with fit and transform/predict functions.

Even though I request it in the code of this draft, I am not sure if having a default argument for parameter_keys is a smart idea (we had a debate), I am okay to skip it for now. From the perspective of the EquiKitScript, I just wanted a default estimator, but we can also not have one.


          first draft

0a5dba6

agoscinski requested a review from PicoCentauri

February 12, 2023 12:21

agoscinski changed the title ~~EquiScript a pipeline of computing transforming, joining representations and estimating properties~~ Equi(Kit)Script for workflows that compute, transform, join representations and estimate properties

PicoCentauri suggested changes

View reviewed changes

Collaborator

PicoCentauri left a comment

Thanks @agoscinski ! I like this base class design and it is already in a very good shape. I have some comments and overall we should split the files:

base.py: EquiScriptBase
multispectra.py: MultiSpectraScrip
lode.py: LodeScript

src/equisolve/numpy/models/linear_model.py Outdated

Comment on lines 62 to 64

+                      # TODO(philip) can we make a good default alpha parameter out of paramater_keys?
+                      if alpha is None:
+                          raise NotImplemented("Ridge still needs a good default alpha value")

Collaborator

PicoCentauri Feb 13, 2023 •

edited

Loading

The easiest would be to choose 1. Generally, we can allow floats or TensorMaps as input. This behavior is inline with the operations in equstiore where we also allow either a float or a Tensormap.

src/equisolve/numpy/models/linear_model.py Outdated


		def score(self, X: TensorMap, y: TensorMap, parameter_key: str) -> List[float]:
		def score(self, X: TensorMap, y: TensorMap, parameter_key: str) -> List[float]: # COMMENT why does it return list of floats if we just allow one paramater_key?

Collaborator

PicoCentauri Feb 13, 2023

Sorry this is wrong. It only retrurns a single float.

Suggested change

      
                def score(self, X: TensorMap, y: TensorMap, parameter_key: str) -> List[float]: # COMMENT why does it return list of floats if we just allow one paramater_key?
          
                def score(self, X: TensorMap, y: TensorMap, parameter_key: str) -> float:

src/equisolve/numpy/scripts/equi_kit.py Outdated

+              class EquiKitScript(metaclass=ABCMeta):
+                  """
+                  An EquiScript is a merge of a representation calculator and a ML model.

Collaborator

PicoCentauri Feb 13, 2023

Suggested change

      
                An EquiScript is a merge of a representation calculator and a ML model.
          
                An EquiScript is a merge of a representation calculator, operations on these and an ML model.

src/equisolve/numpy/scripts/equi_kit.py Outdated

+                  """
+                  An EquiScript is a merge of a representation calculator and a ML model.
+                  EquiKitScript supports scikit-learn like transformers and estimators that

Collaborator

PicoCentauri Feb 13, 2023

Suggested change

      
                EquiKitScript supports scikit-learn like transformers and estimators that
          
                EquiKitScript supports equisolve transformers and estimators that

src/equisolve/numpy/scripts/equi_kit.py Outdated

+                          #)])
+                          #self.estimator = Ridge(parameter_keys=parameter_keys, alpha=empty_tm)
+                  def fit(self, X: Tuple[TensorMap, ...], y: TensorMap, **kwargs) -> TEquiKitScript:

Collaborator

PicoCentauri Feb 13, 2023

Tuple[TensorMap, ...] is the same as List[TensorMap]? 🤯

src/equisolve/numpy/scripts/equi_kit.py Outdated

Comment on lines 144 to 145

		# TODO make error message sound nicer, double check if Metaclass is basically useless and does not do this
		raise NotImplemented("compute function not implemented")

Collaborator

PicoCentauri Feb 13, 2023

Suggested change

      
                    # TODO make error message sound nicer, double check if Metaclass is basically useless and does not do this
          
                    raise NotImplemented("compute function not implemented")
          
                    raise NotImplemented("Only implemented in child classes")

src/equisolve/numpy/scripts/equi_kit.py Outdated

+                      raise NotImplemented("join function not implemented")
+                  @abstractmethod
+                  def compute(self, **kwargs) -> Tuple[TensorMap, ...]:

Collaborator

PicoCentauri Feb 13, 2023

The **kwargs should be explained in the docstring. Something like:

Parameters forwarded to the compute function of a calculator.
For a rascaline calculator these are given here: 

https://luthaf.fr/rascaline/latest/references/api/python/calculators.html#rascaline.calculators.CalculatorBase.compute

src/equisolve/numpy/scripts/equi_kit.py Outdated

+                  COMMENT The logic is a bit unnecessary entangled with atomistic data, but I would not bother with it for now
+                  """
+                  def __init__(self, hypers, *, feature_aggregation="mean", transformer_X=None, transformer_y=None, estimator=None):

Collaborator

PicoCentauri Feb 13, 2023

I think hypers should be dictionary with the key being the calculator name and the value the hypers. This this we are tightly bound to rascaline but this is fine for me!

And do I get this correctly that I can give a transformer transformer_X and transformer_y and the class will do this automatically? This is super nice!

Collaborator Author

agoscinski Feb 13, 2023

I think hypers should be dictionary with the key being the calculator name and the value the hypers. This this we are tightly bound to rascaline but this is fine for me!

that is the case at the moment (sorry nowhere documented, wanted to give example, but because of the issue with the equistore libraries import I did not manage

And do I get this correctly that I can give a transformer transformer_X and transformer_y and the class will do this automatically? This is super nice!

in the fit and forward function it is done

src/equisolve/numpy/scripts/equi_kit.py Outdated

+                      if self._transformer_y is not None:
+                          y = self._transformer_y.transform(y)
+                      X = self.join(X)

Collaborator

PicoCentauri Feb 13, 2023

I think join should only called here something like

if len(X) > 1. With this

Suggested change

      
                    X = self.join(X)
          
                    if len(X) > 1:
          
                        try:
          
                            X = self.join(X)
          
                        except NotImplemented as e:
          
                            raise NotImplemented('More than one X. But join function is not implemented!') from e

src/equisolve/numpy/scripts/equi_kit.py Outdated

+                      self._parameter_keys = self.parameter_keys
+                  @abstractmethod
+                  def join(self, X: Tuple[TensorMap, ...]) -> TensorMap:

Collaborator

PicoCentauri Feb 13, 2023

Maybe as private method? The user will never use this function but only forward, score and predict.

Collaborator

PicoCentauri commented Feb 13, 2023

Another remark is that we also need a save and a load function for the usage in ipi.

PicoCentauri and others added 3 commits

February 13, 2023 15:50


          Slight restructure

d7ee21c


          Allow floats for Ridge

423c0e4


          a lot of fixes

1c07eda

PicoCentauri mentioned this pull request

Basic classes to combine transformers and estimators #9

Closed

3 tasks

agoscinski added 8 commits

February 14, 2023 13:24


          hack for scoring

b59a769


          tuples -> dict; hack for scoring

8e4e93b


          transforming has to havppen after moving keys

ce69b11


          adding partially working example

9e172a1


          added md calculator

e931335


          fix typos

97cf6aa


          change example to something working with i-pi

61cd5d9


          fixing stuff

4f04538

PicoCentauri mentioned this pull request

Improvements to Ridge #11

Merged

DavideTisi reviewed

View reviewed changes

examples/multi_spectra_script.ipynb

@@ @@ -0,0 +1,227 @@ @@
+              {
+               "cells": [
+                {

Contributor

DavideTisi Feb 15, 2023

didn't we said that we didn't want notebooks in the examples?

DavideTisi reviewed

View reviewed changes

src/equisolve/numpy/scripts/md_calculator.py

+                      Xi = self.script.compute(systems=self.atoms, gradients=["positions"])
+                      y_pred = self.script.forward(Xi) # implicitely done in score function
+                      energy = y_pred.block().values[0][0]
+                      forces = np.array(y_pred.block().gradient("positions").data.reshape(-1, 3))

Contributor

DavideTisi Feb 15, 2023

forces are -grad


          Merge branch 'main' into equikit

b9f03fb

DavideTisi reviewed

View reviewed changes

src/equisolve/numpy/models/linear_model.py Outdated

                                     to regulerize each property differently.
                       :param sample_weight: sample weights
                       :param rcond: Cut-off ratio for small singular values during the fit. For
                                   the purposes of rank determination, singular values are treated as
                                   zero if they are smaller than ``rcond`` times the largest singular
-                                  value in "coefficient" matrix.
+                                  value in "weightsficient" matrix.

Contributor

DavideTisi Feb 16, 2023

i see now find->coef; replace->weights

DavideTisi reviewed

View reviewed changes

src/equisolve/numpy/models/linear_model.py Outdated

@@ @@ -204,19 +204,19 @@ def fit( @@
                           )
                           weights_blocks.append(weights_block)
-                      # convert coefs to dictionary allowing dump of an instance in a pickle file
-                      self._coef = tensor_map_to_dict(TensorMap(X.keys, coef_blocks))
+                      # convert weightsficients to a dictionary allowing pickle dump of an instance

Contributor

DavideTisi Feb 16, 2023

magnificent

DavideTisi reviewed

View reviewed changes

tests/numpy/models/test_linear_model.py Outdated

@@ @@ -100,9 +100,15 @@ def test_ridge(self, num_properties, num_targets): @@
                       clf = Ridge(parameter_keys="values")
                       clf.fit(X=X, y=y, alpha=alpha, sample_weight=sw)
+              <<<<<<< HEAD

Contributor

DavideTisi Feb 16, 2023

mmmmm

agoscinski mentioned this pull request

Draft towards a general pair potential class #43

Draft

Collaborator Author

agoscinski commented Mar 21, 2023

New prototype from feedback from this PR is now in #43

agoscinski closed this

Collaborator Author

agoscinski commented Mar 21, 2023

I dont want to delete this branch now because there might be still some code that I want to copy over to the new prototype, and keeping it open is a better reminder to delete it as soon as soon as this is done.

agoscinski reopened this


          Merge branch 'main' into equikit

d8dae73

agoscinski added 3 commits

April 20, 2023 16:43


          fixing the multi_spectra_script notebook for thorben

1a35b9b


          updates required to do CV with thes script module

bccd644

* updating the notebook to include PCA of features and CV

* update the score function for the script module to be more flexible

* fix a bug when creating an alpha tensormap in the linear model by
  slicing from the input X array, when a sample with label 0 is not
  existing

* add a rmspe function


          fix for forces hist

82aa2d5

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet