-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling Optional dependencies for the analysis module #1159
Comments
I do have mixed feeling on the question. On one hand, I like the battery included view you are pushing for. On the other end, having dependencies for analyses automatically added to the main dependencies means we have to be more careful about what analyses we accept in the library. For instance, ENCORE was the first module to introduce a dependency to sklearn. Because sklearn was just an optional requirement, we did not really bothered about it. Would it be introduced as a main requirement for the library, we would have had to weight it the new dependency. |
Yeah I was thinking the same as @jbarnoud , it's nice to be able to import whatever in analysis and not worry about what it is because it's only optional. Maybe we can reassess what's considered "simple" to find/install and expand the whitelist of what's required, something like either the anaconda package list or the scipy stack |
I understand that. But with packages from the standard scientific python stack I'm pretty OK. Odds are one either already uses them or has to use them in the future anyway. I'm mostly put off by our arbitrary decision to say that some dependencies are hard to install and should be optional and others are just fine (even though they have dependencies of their own that might pull in other libraries, see griddataformats eg #1147) |
This has also come up in #1145. Using a batteries included approach I wouldn't need to add joblib as a dependency. With the current approach it would be better to require joblib (pure python) to use 95% of encore and say if sklearn is installed you can use the last 5% as well. |
A fundamental problem is that of dependencies of dependencies as in gridDataFormats. I agree with @kain88-de that it's not straightforward to assess what should count as simple. (gridDataFormats has essentially been grand-fathered in... and I am not overly happy about its scipy dependency but that's a different thing.) My gut feeling was summarized by @jbarnoud #1159 (comment) – but maybe that really harks back to the olden pre-whl/pre-conda days. |
As a clear set of rules for development how about the following if a dependency should be optional?
|
I would be fine with what @kain88-de suggests. Like @orbeckst I mostly form my opinion on dependencies pre-wheel and lot of things are easier now. As we account more and more for anaconda when making decisions, I would just like us to keep in mind that not everybody uses conda and that it shall not become mandatory. |
We could also say that all packages in the scipy-stack are fine. That are less but they include the most prominent packages that one will use (matplotlib / scipy / scikits). See https://www.scipy.org/about.html |
The scipy-stack is not really that well defined: the core packages are clear enough
but the other packages are "too many to list here". Although h5py, pytables, cython, scikit-image, scikit-learn all are well established and (I think) fairly widely used, I am less sure about chaco and mayavi (which is cool, but depends on VTK and is more niche). I think we can easily agree on the core packages being dependencies for the analysis module. For the rest I don't really see the SciPy: other packages giving very useful guidance: ultimately we have to decide what we will consider canonical analysis modules. |
In any case, that means that scipy is an acceptable main dependancy, doesn't it? Shall we upgrade it from the full to the minimal install? |
In my opinion, we can upgrade
|
@MDAnalysis/coredevs can we decide to upgrade the following packages to required dependencies (this will solve #1361):
We can decide on other packages to add later. Please vote with 👍 or 👎 . |
@richardjgowers can you explain your 😕 "confused" comment? |
@orbeckst just 50/50 about it, so I've read it and have no strong opinion |
I agree for scipy. But where is matplotlib used? |
All four do somewhat complicated/idiosyncratic plotting. Perhaps we can get rid of these plotting functions in the future or better shield them from top-level imports but until then I would take the path of least resistance and just include |
Good for me then. |
Also, I didn't forget that idea of adapting |
For right now I am preparing a PR that upgrades |
I updated the List of core module dependencies (and removed any mentioning of scipy as optional). |
- updated all modules - removed any code that guards against scipy or matplotlib import - conforms to style guide https://github.com/MDAnalysis/mdanalysis/wiki/Style-Guide#module-imports-in-mdanalysisanalysis - fixes #1159 - fixes #1361
- updated all modules - removed any code that guards against scipy or matplotlib import - conforms to style guide https://github.com/MDAnalysis/mdanalysis/wiki/Style-Guide#module-imports-in-mdanalysisanalysis - fixes #1159 - fixes #1361
- updated all modules - removed any code that guards against scipy or matplotlib import - conforms to style guide https://github.com/MDAnalysis/mdanalysis/wiki/Style-Guide#module-imports-in-mdanalysisanalysis - fixes #1159 - fixes #1361
- require scipy and matplotlib in setup.py (fixes #1361) - scipy and matplotlib are imported at top in analysis - updated all modules - removed any code that guards against scipy or matplotlib import - conforms to style guide https://github.com/MDAnalysis/mdanalysis/wiki/Style-Guide#module-imports-in-mdanalysisanalysis - fixes #1159 - removed conditional skipping of tests when scipy or matplotlib are missing
- updated all modules - removed any code that guards against scipy or matplotlib import - conforms to style guide https://github.com/MDAnalysis/mdanalysis/wiki/Style-Guide#module-imports-in-mdanalysisanalysis - fixes #1159 - fixes #1361
- updated all modules - removed any code that guards against scipy or matplotlib import - conforms to style guide https://github.com/MDAnalysis/mdanalysis/wiki/Style-Guide#module-imports-in-mdanalysisanalysis - fixes #1159 - fixes #1361
- updated all modules - removed any code that guards against scipy or matplotlib import - conforms to style guide https://github.com/MDAnalysis/mdanalysis/wiki/Style-Guide#module-imports-in-mdanalysisanalysis - fixes #1159 - fixes #1361
make scipy and matplotlib required dependencies (#1159)
- updated all modules - removed any code that guards against scipy or matplotlib import - conforms to style guide https://github.com/MDAnalysis/mdanalysis/wiki/Style-Guide#module-imports-in-mdanalysisanalysis - fixes MDAnalysis#1159 - fixes MDAnalysis#1361
- updated all modules - removed any code that guards against scipy or matplotlib import - conforms to style guide https://github.com/MDAnalysis/mdanalysis/wiki/Style-Guide#module-imports-in-mdanalysisanalysis - fixes MDAnalysis#1159 - fixes MDAnalysis#1361
Converted test_units to use parametrize Some hacking of test_distances to use pytest features Completed moving test_distances to pytest style rewrote test_transformations into pytest style make scipy and matplotlib full dependencies (MDAnalysis#1159) scipy and matplotlib are imported at top in analysis - updated all modules - removed any code that guards against scipy or matplotlib import - conforms to style guide https://github.com/MDAnalysis/mdanalysis/wiki/Style-Guide#module-imports-in-mdanalysisanalysis - fixes MDAnalysis#1159 - fixes MDAnalysis#1361 removed conditional skipping of tests when scipy or matplotlib are missing minor clean ups
Converted test_units to use parametrize Some hacking of test_distances to use pytest features Completed moving test_distances to pytest style rewrote test_transformations into pytest style make scipy and matplotlib full dependencies (MDAnalysis#1159) scipy and matplotlib are imported at top in analysis - updated all modules - removed any code that guards against scipy or matplotlib import - conforms to style guide https://github.com/MDAnalysis/mdanalysis/wiki/Style-Guide#module-imports-in-mdanalysisanalysis - fixes MDAnalysis#1159 - fixes MDAnalysis#1361 removed conditional skipping of tests when scipy or matplotlib are missing minor clean ups
Converted test_units to use parametrize Some hacking of test_distances to use pytest features Completed moving test_distances to pytest style rewrote test_transformations into pytest style make scipy and matplotlib full dependencies (MDAnalysis#1159) scipy and matplotlib are imported at top in analysis - updated all modules - removed any code that guards against scipy or matplotlib import - conforms to style guide https://github.com/MDAnalysis/mdanalysis/wiki/Style-Guide#module-imports-in-mdanalysisanalysis - fixes MDAnalysis#1159 - fixes MDAnalysis#1361 removed conditional skipping of tests when scipy or matplotlib are missing minor clean ups
- updated all modules - removed any code that guards against scipy or matplotlib import - conforms to style guide https://github.com/MDAnalysis/mdanalysis/wiki/Style-Guide#module-imports-in-mdanalysisanalysis - fixes MDAnalysis#1159 - fixes MDAnalysis#1361
It has come up recently how optional dependencies are treated and how we should deal with dependencies in mda.analysis. The current situation seems to be that require any package that is imported in core or analysis with the exception of a few (matplotlib/scipy/sklearn/netcdf4) which are deemed to be hard to install.
Personally I don't think the installation is a problem anymore today. Either the packages are in the distribution package manager or you can use conda or even pip which installs wheels so no compilation should be needed. A short google search also doesn't list any installation problem posts/blogs on the web that are younger then 2014.
I would say that the if one installed MDAnalysis everything should just work (that happens with the conda installation). This will also mean we can get rid of the optional imports and include guards.
I don't like the current approach. Where the library will work 90% but I may not be able to use a feature in the moment when I really need it but have to internet connection. It's also not really clear to a user what parts won't work. The best we do right now is to give warnings when an analysis module is imported.
I would like to hear other opinions as well.
The text was updated successfully, but these errors were encountered: