Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add featurizer #102

Merged
merged 52 commits into from
Sep 13, 2023
Merged

Add featurizer #102

merged 52 commits into from
Sep 13, 2023

Conversation

naik-aakash
Copy link
Collaborator

@naik-aakash naik-aakash commented Apr 25, 2023

Enhancement > Featurizer for ML #96

Added featurizers for ML

Todo

  • Add FeaturizeLobsterpy class for lightweight jsons (of manuscript)
  • Add FeaturizeCOXX class (fingerprints, w_icohp, moment features)
  • Refactor FeaturizeCOXX class methods
  • Add FeaturizeCharges class (Ionicity)
  • Add tests for FeaturizeLobsterpy
  • Update test of FeaturizeLobsterpy to also check exception case
  • Add tests for FeaturizeCOXX
  • Add tests for FeaturizeCharges

@naik-aakash naik-aakash added the enhancement New feature or request label Apr 25, 2023
@naik-aakash
Copy link
Collaborator Author

naik-aakash commented May 26, 2023

coverage run -m pytest
coverage combine
coverage report

Need to use this 3 commands successively to get correct value of the coverage (Low coverage we are seeing is coz of multiprocessing module)

@JaGeo
Copy link
Owner

JaGeo commented May 26, 2023

I will check again. Thanks. However, we could still to increase the coverage further. By refactoring, you might be able to get even more. There are some code pieces that are called multiple times.


def _get_lobsterpy_cba_dict(self) -> dict:
"""
This function uses lobsterpy.cohp.analyze.Analysis class to generate a python dictionary object
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Changed it now to a static method.

COXX nth moment in eV
"""
if e_range:
coxx = coxx[(energies >= self.e_range[0]) & (energies <= self.e_range[-1])]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably works but I would still make sure this comparison is okay.

if ids:
df = pd.DataFrame(index=[ids])
else:
ids = os.path.basename(os.path.dirname(self.path_to_coxxcar))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In many parts of the code, we now use "Path". Could you update this?

Copy link
Collaborator Author

@naik-aakash naik-aakash Jun 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it consistent now

@JaGeo
Copy link
Owner

JaGeo commented May 26, 2023

Could Ou also add documentation of the new installation process?

@naik-aakash
Copy link
Collaborator Author

Could Ou also add documentation of the new installation process?

I will do it in another PR and link it to this open issue #117

@naik-aakash naik-aakash changed the title [WIP] Add featurizer Add featurizer Jun 5, 2023
@naik-aakash
Copy link
Collaborator Author

Hi @JaGeo , I think now the issues with numerical stability should be fixed and also more tests have been added to improve the coverage. If any more changes necessary would be happy to address it 😃

@naik-aakash
Copy link
Collaborator Author

Also, I will try to add orbital-wise COXX features (center, width, skew, kurtosis and fingerprints) as well . But will be added in another PR once this is merged.

@naik-aakash naik-aakash mentioned this pull request Jul 31, 2023
7 tasks
@@ -78,67 +72,6 @@ disable=print-statement,
useless-suppression,
deprecated-pragma,
use-symbolic-message-instead,
apply-builtin,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@naik-aakash any reason why we have to remove all of them or is this an issue with not having the latest main branch merged?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I removed them as they seem deprecated and made the lint workflow print a whole lot of warnings,

Copy link
Owner

@JaGeo JaGeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked throught the main implementations. For the featurization of the COHPs and further features, we might want to run some more tests later on.

@JaGeo JaGeo merged commit e4e317d into JaGeo:main Sep 13, 2023
21 checks passed
@naik-aakash naik-aakash deleted the add-featurizer branch June 20, 2024 06:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Featurizer for ML
2 participants