Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove optional ase dependency #231

Open
agoscinski opened this issue Jun 12, 2024 · 2 comments
Open

Remove optional ase dependency #231

agoscinski opened this issue Jun 12, 2024 · 2 comments
Assignees
Labels
dependencies Pull requests that update a dependency file good first issue Good for newcomers

Comments

@agoscinski
Copy link
Collaborator

The new ase release break some example notebooks. The ase dependency comes from one roy dataset (see

def load_roy_dataset():
)
and can easily removed as dependency by just storing the arrays inside the ase frames. We could even try to just use numpys loadtxt, then we don't need to change the binary data file.

@agoscinski agoscinski added the dependencies Pull requests that update a dependency file label Jun 12, 2024
@agoscinski agoscinski self-assigned this Jun 12, 2024
@PicoCentauri
Copy link
Collaborator

I would extract the arrays and keep them either as plain ascii files or in the numpy format. We can keep the xyz file if we like.

@PicoCentauri PicoCentauri added the good first issue Good for newcomers label Jun 13, 2024
@PicoCentauri
Copy link
Collaborator

What basically is to be done is saving the energies as an .npz to /src/skmatter/datasets/data and change the code loading the energies via ase to pure numpy:

def load_roy_dataset():
"""Load and returns the ROY dataset, which contains structures,
energies and SOAP-derived descriptors for 264 polymorphs of ROY, from [Beran et Al,
Chemical Science (2022)](https://doi.org/10.1039/D1SC06074K)
Returns
-------
roy_dataset : sklearn.utils.Bunch
Dictionary-like object, with the following attributes:
structures : `ase.Atoms` -- the roy structures as ASE objects
features: `np.array` -- SOAP-derived descriptors for the structures
energies: `np.array` -- energies of the structures
"""
module_path = dirname(__file__)
target_structures = join(module_path, "data", "beran_roy_structures.xyz.bz2")
try:
from ase.io import read
except ImportError:
raise ImportError("load_roy_dataset requires the ASE package.")
import bz2
structures = read(bz2.open(target_structures, "rt"), ":", format="extxyz")
energies = np.array([f.info["energy"] for f in structures])
target_features = join(module_path, "data", "beran_roy_features.npz")
features = np.load(target_features)["feats"]
return Bunch(structures=structures, features=features, energies=energies)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants