Skip to content
This repository has been archived by the owner on Apr 24, 2024. It is now read-only.

Equi(Kit)Script for workflows that compute, transform, join representations and estimate properties #10

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

agoscinski
Copy link
Collaborator

Couldn't manage to run so far because I still have the ValueError: Trying to set the EQUISTORE library path twice error, even though I am installed the same equistore version with equistore and rascaline. I put several message about the design marked with the tag COMMENT in the code.

I called it for now EquiKitScript to emphasize that these scripts use scikit-learn-like transformers and estimators with fit and transform/predict functions.

Even though I request it in the code of this draft, I am not sure if having a default argument for parameter_keys is a smart idea (we had a debate), I am okay to skip it for now. From the perspective of the EquiKitScript, I just wanted a default estimator, but we can also not have one.

@agoscinski agoscinski changed the title EquiScript a pipeline of computing transforming, joining representations and estimating properties Equi(Kit)Script for workflows that compute, transform, join representations and estimate properties Feb 12, 2023
Copy link
Collaborator

@PicoCentauri PicoCentauri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @agoscinski ! I like this base class design and it is already in a very good shape. I have some comments and overall we should split the files:

base.py: EquiScriptBase
multispectra.py: MultiSpectraScrip
lode.py: LodeScript

Comment on lines 62 to 64
# TODO(philip) can we make a good default alpha parameter out of paramater_keys?
if alpha is None:
raise NotImplemented("Ridge still needs a good default alpha value")
Copy link
Collaborator

@PicoCentauri PicoCentauri Feb 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The easiest would be to choose 1. Generally, we can allow floats or TensorMaps as input. This behavior is inline with the operations in equstiore where we also allow either a float or a Tensormap.


def score(self, X: TensorMap, y: TensorMap, parameter_key: str) -> List[float]:
def score(self, X: TensorMap, y: TensorMap, parameter_key: str) -> List[float]: # COMMENT why does it return list of floats if we just allow one paramater_key?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this is wrong. It only retrurns a single float.

Suggested change
def score(self, X: TensorMap, y: TensorMap, parameter_key: str) -> List[float]: # COMMENT why does it return list of floats if we just allow one paramater_key?
def score(self, X: TensorMap, y: TensorMap, parameter_key: str) -> float:


class EquiKitScript(metaclass=ABCMeta):
"""
An EquiScript is a merge of a representation calculator and a ML model.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
An EquiScript is a merge of a representation calculator and a ML model.
An EquiScript is a merge of a representation calculator, operations on these and an ML model.

"""
An EquiScript is a merge of a representation calculator and a ML model.

EquiKitScript supports scikit-learn like transformers and estimators that
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
EquiKitScript supports scikit-learn like transformers and estimators that
EquiKitScript supports equisolve transformers and estimators that

#)])
#self.estimator = Ridge(parameter_keys=parameter_keys, alpha=empty_tm)

def fit(self, X: Tuple[TensorMap, ...], y: TensorMap, **kwargs) -> TEquiKitScript:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tuple[TensorMap, ...] is the same as List[TensorMap]? 🤯

Comment on lines 144 to 145
# TODO make error message sound nicer, double check if Metaclass is basically useless and does not do this
raise NotImplemented("compute function not implemented")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# TODO make error message sound nicer, double check if Metaclass is basically useless and does not do this
raise NotImplemented("compute function not implemented")
raise NotImplemented("Only implemented in child classes")

raise NotImplemented("join function not implemented")

@abstractmethod
def compute(self, **kwargs) -> Tuple[TensorMap, ...]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The **kwargs should be explained in the docstring. Something like:

Parameters forwarded to the compute function of a calculator.
For a rascaline calculator these are given here: 

https://luthaf.fr/rascaline/latest/references/api/python/calculators.html#rascaline.calculators.CalculatorBase.compute


COMMENT The logic is a bit unnecessary entangled with atomistic data, but I would not bother with it for now
"""
def __init__(self, hypers, *, feature_aggregation="mean", transformer_X=None, transformer_y=None, estimator=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think hypers should be dictionary with the key being the calculator name and the value the hypers. This this we are tightly bound to rascaline but this is fine for me!

And do I get this correctly that I can give a transformer transformer_X and transformer_y and the class will do this automatically? This is super nice!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think hypers should be dictionary with the key being the calculator name and the value the hypers. This this we are tightly bound to rascaline but this is fine for me!

that is the case at the moment (sorry nowhere documented, wanted to give example, but because of the issue with the equistore libraries import I did not manage

And do I get this correctly that I can give a transformer transformer_X and transformer_y and the class will do this automatically? This is super nice!

in the fit and forward function it is done

if self._transformer_y is not None:
y = self._transformer_y.transform(y)

X = self.join(X)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think join should only called here something like

if len(X) > 1. With this

Suggested change
X = self.join(X)
if len(X) > 1:
try:
X = self.join(X)
except NotImplemented as e:
raise NotImplemented('More than one X. But join function is not implemented!') from e

self._parameter_keys = self.parameter_keys

@abstractmethod
def join(self, X: Tuple[TensorMap, ...]) -> TensorMap:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe as private method? The user will never use this function but only forward, score and predict.

@PicoCentauri
Copy link
Collaborator

Another remark is that we also need a save and a load function for the usage in ipi.

@@ -0,0 +1,227 @@
{
"cells": [
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't we said that we didn't want notebooks in the examples?

Xi = self.script.compute(systems=self.atoms, gradients=["positions"])
y_pred = self.script.forward(Xi) # implicitely done in score function
energy = y_pred.block().values[0][0]
forces = np.array(y_pred.block().gradient("positions").data.reshape(-1, 3))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forces are -grad

to regulerize each property differently.
:param sample_weight: sample weights
:param rcond: Cut-off ratio for small singular values during the fit. For
the purposes of rank determination, singular values are treated as
zero if they are smaller than ``rcond`` times the largest singular
value in "coefficient" matrix.
value in "weightsficient" matrix.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see now find->coef; replace->weights

@@ -204,19 +204,19 @@ def fit(
)
weights_blocks.append(weights_block)

# convert coefs to dictionary allowing dump of an instance in a pickle file
self._coef = tensor_map_to_dict(TensorMap(X.keys, coef_blocks))
# convert weightsficients to a dictionary allowing pickle dump of an instance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

magnificent

@@ -100,9 +100,15 @@ def test_ridge(self, num_properties, num_targets):
clf = Ridge(parameter_keys="values")
clf.fit(X=X, y=y, alpha=alpha, sample_weight=sw)

<<<<<<< HEAD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmmmm

@agoscinski
Copy link
Collaborator Author

New prototype from feedback from this PR is now in #43

@agoscinski agoscinski closed this Mar 21, 2023
@agoscinski
Copy link
Collaborator Author

I dont want to delete this branch now because there might be still some code that I want to copy over to the new prototype, and keeping it open is a better reminder to delete it as soon as soon as this is done.

@agoscinski agoscinski reopened this Mar 21, 2023
* updating the notebook to include PCA of features and CV

* update the score function for the script module to be more flexible

* fix a bug when creating an alpha tensormap in the linear model by
  slicing from the input X array, when a sample with label 0 is not
  existing

* add a rmspe function
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants