-
Notifications
You must be signed in to change notification settings - Fork 1
Equi(Kit)Script for workflows that compute, transform, join representations and estimate properties #10
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @agoscinski ! I like this base class design and it is already in a very good shape. I have some comments and overall we should split the files:
base.py
: EquiScriptBase
multispectra.py
: MultiSpectraScrip
lode.py
: LodeScript
# TODO(philip) can we make a good default alpha parameter out of paramater_keys? | ||
if alpha is None: | ||
raise NotImplemented("Ridge still needs a good default alpha value") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The easiest would be to choose 1
. Generally, we can allow float
s or TensorMap
s as input. This behavior is inline with the operations in equstiore where we also allow either a float
or a Tensormap
.
|
||
def score(self, X: TensorMap, y: TensorMap, parameter_key: str) -> List[float]: | ||
def score(self, X: TensorMap, y: TensorMap, parameter_key: str) -> List[float]: # COMMENT why does it return list of floats if we just allow one paramater_key? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this is wrong. It only retrurns a single float.
def score(self, X: TensorMap, y: TensorMap, parameter_key: str) -> List[float]: # COMMENT why does it return list of floats if we just allow one paramater_key? | |
def score(self, X: TensorMap, y: TensorMap, parameter_key: str) -> float: |
|
||
class EquiKitScript(metaclass=ABCMeta): | ||
""" | ||
An EquiScript is a merge of a representation calculator and a ML model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An EquiScript is a merge of a representation calculator and a ML model. | |
An EquiScript is a merge of a representation calculator, operations on these and an ML model. |
""" | ||
An EquiScript is a merge of a representation calculator and a ML model. | ||
|
||
EquiKitScript supports scikit-learn like transformers and estimators that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EquiKitScript supports scikit-learn like transformers and estimators that | |
EquiKitScript supports equisolve transformers and estimators that |
#)]) | ||
#self.estimator = Ridge(parameter_keys=parameter_keys, alpha=empty_tm) | ||
|
||
def fit(self, X: Tuple[TensorMap, ...], y: TensorMap, **kwargs) -> TEquiKitScript: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tuple[TensorMap, ...]
is the same as List[TensorMap]
? 🤯
# TODO make error message sound nicer, double check if Metaclass is basically useless and does not do this | ||
raise NotImplemented("compute function not implemented") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# TODO make error message sound nicer, double check if Metaclass is basically useless and does not do this | |
raise NotImplemented("compute function not implemented") | |
raise NotImplemented("Only implemented in child classes") |
raise NotImplemented("join function not implemented") | ||
|
||
@abstractmethod | ||
def compute(self, **kwargs) -> Tuple[TensorMap, ...]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The **kwargs should be explained in the docstring. Something like:
Parameters forwarded to the compute function of a calculator.
For a rascaline calculator these are given here:
https://luthaf.fr/rascaline/latest/references/api/python/calculators.html#rascaline.calculators.CalculatorBase.compute
|
||
COMMENT The logic is a bit unnecessary entangled with atomistic data, but I would not bother with it for now | ||
""" | ||
def __init__(self, hypers, *, feature_aggregation="mean", transformer_X=None, transformer_y=None, estimator=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think hypers
should be dictionary with the key being the calculator name and the value the hypers. This this we are tightly bound to rascaline but this is fine for me!
And do I get this correctly that I can give a transformer transformer_X
and transformer_y
and the class will do this automatically? This is super nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think hypers should be dictionary with the key being the calculator name and the value the hypers. This this we are tightly bound to rascaline but this is fine for me!
that is the case at the moment (sorry nowhere documented, wanted to give example, but because of the issue with the equistore libraries import I did not manage
And do I get this correctly that I can give a transformer transformer_X and transformer_y and the class will do this automatically? This is super nice!
in the fit
and forward
function it is done
if self._transformer_y is not None: | ||
y = self._transformer_y.transform(y) | ||
|
||
X = self.join(X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think join should only called here something like
if len(X) > 1. With this
X = self.join(X) | |
if len(X) > 1: | |
try: | |
X = self.join(X) | |
except NotImplemented as e: | |
raise NotImplemented('More than one X. But join function is not implemented!') from e |
self._parameter_keys = self.parameter_keys | ||
|
||
@abstractmethod | ||
def join(self, X: Tuple[TensorMap, ...]) -> TensorMap: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe as private method? The user will never use this function but only forward
, score
and predict
.
Another remark is that we also need a |
@@ -0,0 +1,227 @@ | |||
{ | |||
"cells": [ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't we said that we didn't want notebooks in the examples?
Xi = self.script.compute(systems=self.atoms, gradients=["positions"]) | ||
y_pred = self.script.forward(Xi) # implicitely done in score function | ||
energy = y_pred.block().values[0][0] | ||
forces = np.array(y_pred.block().gradient("positions").data.reshape(-1, 3)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forces are -grad
to regulerize each property differently. | ||
:param sample_weight: sample weights | ||
:param rcond: Cut-off ratio for small singular values during the fit. For | ||
the purposes of rank determination, singular values are treated as | ||
zero if they are smaller than ``rcond`` times the largest singular | ||
value in "coefficient" matrix. | ||
value in "weightsficient" matrix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see now find->coef; replace->weights
@@ -204,19 +204,19 @@ def fit( | |||
) | |||
weights_blocks.append(weights_block) | |||
|
|||
# convert coefs to dictionary allowing dump of an instance in a pickle file | |||
self._coef = tensor_map_to_dict(TensorMap(X.keys, coef_blocks)) | |||
# convert weightsficients to a dictionary allowing pickle dump of an instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
magnificent
@@ -100,9 +100,15 @@ def test_ridge(self, num_properties, num_targets): | |||
clf = Ridge(parameter_keys="values") | |||
clf.fit(X=X, y=y, alpha=alpha, sample_weight=sw) | |||
|
|||
<<<<<<< HEAD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmmmm
New prototype from feedback from this PR is now in #43 |
I dont want to delete this branch now because there might be still some code that I want to copy over to the new prototype, and keeping it open is a better reminder to delete it as soon as soon as this is done. |
* updating the notebook to include PCA of features and CV * update the score function for the script module to be more flexible * fix a bug when creating an alpha tensormap in the linear model by slicing from the input X array, when a sample with label 0 is not existing * add a rmspe function
Couldn't manage to run so far because I still have the
ValueError: Trying to set the EQUISTORE library path twice
error, even though I am installed the same equistore version with equistore and rascaline. I put several message about the design marked with the tagCOMMENT
in the code.I called it for now EquiKitScript to emphasize that these scripts use scikit-learn-like transformers and estimators with fit and transform/predict functions.
Even though I request it in the code of this draft, I am not sure if having a default argument for
parameter_keys
is a smart idea (we had a debate), I am okay to skip it for now. From the perspective of the EquiKitScript, I just wanted a default estimator, but we can also not have one.