Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: empty __spec__.submodule_search_locations and __path__ values for editable packages #562

Closed
wants to merge 5 commits into from

Conversation

ogrisel
Copy link

@ogrisel ogrisel commented Jan 16, 2024

This should solve the problem of empty __path__ attributes for packages imported from an editable install.

Fixes #557.

This solves the problem of empty __path__ attributes for packages
imported from an editable install.

Fixes mesonbuild#557.
@ogrisel
Copy link
Author

ogrisel commented Jan 16, 2024

/cc @lesteve @rgommers @dnicolodi.

@ogrisel ogrisel changed the title BUG: empty __spec__.submodule_search_locations and __path__ values for editable packages BUG: empty __spec__.submodule_search_locations and __path__ values for editable packages Jan 16, 2024
@ogrisel ogrisel changed the title BUG: empty __spec__.submodule_search_locations and __path__ values for editable packages BUG: empty __spec__.submodule_search_locations and __path__ values for editable packages Jan 16, 2024
@rgommers
Copy link
Contributor

Thanks @ogrisel! That's a nicely concise fix. I'm not too familiar with this code, but the description at https://docs.python.org/3/library/importlib.html#importlib.machinery.ModuleSpec.submodule_search_locations seems clear enough about this being correct.

I can confirm that it fixes the issue with the SciPy test - it now correctly checks all submodules rather than the test silently passing without doing anything.

@rgommers rgommers added the bug Something isn't working label Jan 16, 2024
@dnicolodi
Copy link
Member

NAK. This does not work. There is a reason why I decided to do not populate the __path__ module attribute.

@ogrisel
Copy link
Author

ogrisel commented Jan 16, 2024

What is the reason? There is no inline comment , nor any failing test when i changed that line of code.

@dnicolodi
Copy link
Member

__path__ is defined to be "the list of locations where the package’s submodules will be found", where "locations" is interpreted by most code to be a list of directories. However, in editable installations there is not a correspondence between the filesystem location of sources and their place in the module structure. Not even the file names need to match the module names. Therefore, setting __path__ in the way this patch does violates the definition of what __path__ is. You omitted testing __path__ for the extension module in the test package. If you would have tested it, you would have discovered that it points to a different directory than for the other modules in the same package.

@ogrisel
Copy link
Author

ogrisel commented Jan 16, 2024

You omitted testing path for the extension module in the test package. If you would have tested it, you would have discovered that it points to a different directory than for the other modules in the same package.

I don't understand. The extension module (I assume you mean complex.test which stems from complex/test.pyx) is not a package, hence it should not have a __path__ attribute at all, right?

@dnicolodi
Copy link
Member

Yes, you are right. However, you cannot find the complex.test module in complex.__path__. This breaks the definition of path.

@ogrisel
Copy link
Author

ogrisel commented Jan 16, 2024

However, you cannot find the complex.test module in complex.path. This breaks the definition of path.

We could populate the __path__ list with the two folders: the regular one with the Python source tree and the one with the compiled extensions, right?

@ogrisel ogrisel force-pushed the fix-editable-__path__ branch from 691ff43 to bdf056d Compare January 16, 2024 20:25
@dnicolodi
Copy link
Member

We could populate the __path__ list with the two folders: the regular one with the Python source tree and the one with the compiled extensions, right?

There is no defined relation between the organization of the modules on the filesystem and the logical structure in the package. The files can be rename on installation, moved from one folder to another, etc. Code that assumes it can use the location of the modules on the filesystem or their name to infer the structure of the package is broken.

@rgommers
Copy link
Contributor

The files can be rename on installation, moved from one folder to another, etc. Code that assumes it can use the location of the modules on the filesystem or their name to infer the structure of the package is broken.

It's possible indeed to rename/move on install, but it's rare. So this does fix things for the common case, without downsides I think (?). It's worrying that code in SciPy that I thought was working was actually silently broken before with an empty list, so this does seem like an improvement. Whether it works or not depends on the package's build definitions right, as far as I can tell it isn't possible to break it via some build/install setting.

@dnicolodi
Copy link
Member

It does not work for extension modules. I assume that scipy wants pkgutil.walk_packages() to return extension modules too.

@rgommers
Copy link
Contributor

It does not work for extension modules. I assume that scipy wants pkgutil.walk_packages() to return extension modules too.

Hmm yeah, you are correct. My comment about it being rare doesn't hold when walking the build dir to pick up extension modules - it's quite a bit more common for those to land in a different install location.

So this really won't work, and we're back to the iter_modules conversation on gh-557 I guess.

@ogrisel
Copy link
Author

ogrisel commented Jan 19, 2024

@dnicolodi I iterated on this PR to address the problem of the discoverability of compiled submodules in editable mode. Let me know what you think. In particular, I am not sure if submodules of namespace "packages" should be made discoverable by pkgutil.walk_packages one way or another but I am not sure it's possible in general.

@dnicolodi
Copy link
Member

All extension modules are built into the root of the build directory, thus this approach still does not work if there are extension modules installed in different sub-packages. There also can be other files that have an extension that matches the one used to load python modules or python extension modules in the build directory. With this approach all these would be interpreted as being part of the package. As I mentioned in #557, there is a solution to make pkgutil.walk_packages() work, but I haven't hat time yet to implement it. Listing filesystems locations in __path__ is not a good starting point.

@ogrisel
Copy link
Author

ogrisel commented Jan 19, 2024

All extension modules are built into the root of the build directory, thus this approach still does not work if there are extension modules installed in different sub-packages.

This is not true. The source folder tree structure is still present to avoid name collisions in case compiled modules with the same local name exist in two different subpackages of the same top level package. For instance, here is the filesystem structure of scikit-learn when built in editable mode with meson:

$ find build -name "*.so"
build/cp311/sklearn/tree/_utils.cpython-311-darwin.so
build/cp311/sklearn/tree/_splitter.cpython-311-darwin.so
build/cp311/sklearn/tree/_tree.cpython-311-darwin.so
build/cp311/sklearn/tree/_criterion.cpython-311-darwin.so
build/cp311/sklearn/metrics/cluster/_expected_mutual_info_fast.cpython-311-darwin.so
build/cp311/sklearn/metrics/_dist_metrics.cpython-311-darwin.so
build/cp311/sklearn/metrics/_pairwise_distances_reduction/_argkmin_classmode.cpython-311-darwin.so
build/cp311/sklearn/metrics/_pairwise_distances_reduction/_base.cpython-311-darwin.so
build/cp311/sklearn/metrics/_pairwise_distances_reduction/_datasets_pair.cpython-311-darwin.so
build/cp311/sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.cpython-311-darwin.so
build/cp311/sklearn/metrics/_pairwise_distances_reduction/_radius_neighbors_classmode.cpython-311-darwin.so
build/cp311/sklearn/metrics/_pairwise_distances_reduction/_radius_neighbors.cpython-311-darwin.so
build/cp311/sklearn/metrics/_pairwise_distances_reduction/_argkmin.cpython-311-darwin.so
build/cp311/sklearn/metrics/_pairwise_fast.cpython-311-darwin.so
build/cp311/sklearn/ensemble/_hist_gradient_boosting/_bitset.cpython-311-darwin.so
build/cp311/sklearn/ensemble/_hist_gradient_boosting/histogram.cpython-311-darwin.so
build/cp311/sklearn/ensemble/_hist_gradient_boosting/_binning.cpython-311-darwin.so
build/cp311/sklearn/ensemble/_hist_gradient_boosting/common.cpython-311-darwin.so
build/cp311/sklearn/ensemble/_hist_gradient_boosting/_predictor.cpython-311-darwin.so
build/cp311/sklearn/ensemble/_hist_gradient_boosting/_gradient_boosting.cpython-311-darwin.so
build/cp311/sklearn/ensemble/_hist_gradient_boosting/utils.cpython-311-darwin.so
build/cp311/sklearn/ensemble/_hist_gradient_boosting/splitting.cpython-311-darwin.so
build/cp311/sklearn/ensemble/_gradient_boosting.cpython-311-darwin.so
build/cp311/sklearn/cluster/_k_means_elkan.cpython-311-darwin.so
build/cp311/sklearn/cluster/_k_means_common.cpython-311-darwin.so
build/cp311/sklearn/cluster/_k_means_minibatch.cpython-311-darwin.so
build/cp311/sklearn/cluster/_k_means_lloyd.cpython-311-darwin.so
build/cp311/sklearn/cluster/_dbscan_inner.cpython-311-darwin.so
build/cp311/sklearn/cluster/_hdbscan/_reachability.cpython-311-darwin.so
build/cp311/sklearn/cluster/_hdbscan/_linkage.cpython-311-darwin.so
build/cp311/sklearn/cluster/_hdbscan/_tree.cpython-311-darwin.so
build/cp311/sklearn/cluster/_hierarchical_fast.cpython-311-darwin.so
build/cp311/sklearn/feature_extraction/_hashing_fast.cpython-311-darwin.so
build/cp311/sklearn/__check_build/_check_build.cpython-311-darwin.so
build/cp311/sklearn/_loss/_loss.cpython-311-darwin.so
build/cp311/sklearn/datasets/_svmlight_format_fast.cpython-311-darwin.so
build/cp311/sklearn/linear_model/_sag_fast.cpython-311-darwin.so
build/cp311/sklearn/linear_model/_sgd_fast.cpython-311-darwin.so
build/cp311/sklearn/linear_model/_cd_fast.cpython-311-darwin.so
build/cp311/sklearn/utils/_openmp_helpers.cpython-311-darwin.so
build/cp311/sklearn/utils/_random.cpython-311-darwin.so
build/cp311/sklearn/utils/_vector_sentinel.cpython-311-darwin.so
build/cp311/sklearn/utils/_isfinite.cpython-311-darwin.so
build/cp311/sklearn/utils/_heap.cpython-311-darwin.so
build/cp311/sklearn/utils/_sorting.cpython-311-darwin.so
build/cp311/sklearn/utils/_weight_vector.cpython-311-darwin.so
build/cp311/sklearn/utils/_cython_blas.cpython-311-darwin.so
build/cp311/sklearn/utils/sparsefuncs_fast.cpython-311-darwin.so
build/cp311/sklearn/utils/_fast_dict.cpython-311-darwin.so
build/cp311/sklearn/utils/arrayfuncs.cpython-311-darwin.so
build/cp311/sklearn/utils/murmurhash.cpython-311-darwin.so
build/cp311/sklearn/utils/_seq_dataset.cpython-311-darwin.so
build/cp311/sklearn/utils/_typedefs.cpython-311-darwin.so
build/cp311/sklearn/svm/_newrand.cpython-311-darwin.so
build/cp311/sklearn/svm/_libsvm.cpython-311-darwin.so
build/cp311/sklearn/svm/_liblinear.cpython-311-darwin.so
build/cp311/sklearn/svm/_libsvm_sparse.cpython-311-darwin.so
build/cp311/sklearn/manifold/_utils.cpython-311-darwin.so
build/cp311/sklearn/manifold/_barnes_hut_tsne.cpython-311-darwin.so
build/cp311/sklearn/_isotonic.cpython-311-darwin.so
build/cp311/sklearn/preprocessing/_target_encoder_fast.cpython-311-darwin.so
build/cp311/sklearn/preprocessing/_csr_polynomial_expansion.cpython-311-darwin.so
build/cp311/sklearn/decomposition/_cdnmf_fast.cpython-311-darwin.so
build/cp311/sklearn/decomposition/_online_lda_fast.cpython-311-darwin.so
build/cp311/sklearn/neighbors/_ball_tree.cpython-311-darwin.so
build/cp311/sklearn/neighbors/_kd_tree.cpython-311-darwin.so
build/cp311/sklearn/neighbors/_partition_nodes.cpython-311-darwin.so
build/cp311/sklearn/neighbors/_quad_tree.cpython-311-darwin.so

And here is the output of walk_packages on the sklearn top level package:

>>> from pkgutil import walk_packages
>>> for info in walk_packages(sklearn.__path__, prefix="sklearn."):
...     print(info.name)
... 
sklearn.__check_build
sklearn.__check_build._check_build
sklearn._build_utils
sklearn._build_utils.openmp_helpers
sklearn._build_utils.pre_build_helpers
sklearn._build_utils.tempita
sklearn._build_utils.version
sklearn._config
sklearn._distributor_init
sklearn._loss
sklearn._loss.link
sklearn._loss.loss
sklearn._loss.tests
sklearn._loss.tests.test_link
sklearn._loss.tests.test_loss
sklearn._loss._loss
sklearn._min_dependencies
sklearn.base
sklearn.calibration
sklearn.cluster
sklearn.cluster._affinity_propagation
sklearn.cluster._agglomerative
sklearn.cluster._bicluster
sklearn.cluster._birch
sklearn.cluster._bisect_k_means
sklearn.cluster._dbscan
sklearn.cluster._feature_agglomeration
sklearn.cluster._hdbscan
sklearn.cluster._hdbscan.hdbscan
sklearn.cluster._hdbscan.tests
sklearn.cluster._hdbscan.tests.test_reachibility
sklearn.cluster._hdbscan._linkage
sklearn.cluster._hdbscan._reachability
sklearn.cluster._hdbscan._tree
sklearn.cluster._kmeans
sklearn.cluster._mean_shift
sklearn.cluster._optics
sklearn.cluster._spectral
sklearn.cluster.tests
sklearn.cluster.tests.common
sklearn.cluster.tests.test_affinity_propagation
sklearn.cluster.tests.test_bicluster
sklearn.cluster.tests.test_birch
sklearn.cluster.tests.test_bisect_k_means
sklearn.cluster.tests.test_dbscan
sklearn.cluster.tests.test_feature_agglomeration
sklearn.cluster.tests.test_hdbscan
sklearn.cluster.tests.test_hierarchical
sklearn.cluster.tests.test_k_means
sklearn.cluster.tests.test_mean_shift
sklearn.cluster.tests.test_optics
sklearn.cluster.tests.test_spectral
sklearn.cluster._dbscan_inner
sklearn.cluster._hierarchical_fast
sklearn.cluster._k_means_common
sklearn.cluster._k_means_elkan
sklearn.cluster._k_means_lloyd
sklearn.cluster._k_means_minibatch
sklearn.compose
sklearn.compose._column_transformer
sklearn.compose._target
sklearn.compose.tests
sklearn.compose.tests.test_column_transformer
sklearn.compose.tests.test_target
sklearn.conftest
sklearn.covariance
sklearn.covariance._elliptic_envelope
sklearn.covariance._empirical_covariance
sklearn.covariance._graph_lasso
sklearn.covariance._robust_covariance
sklearn.covariance._shrunk_covariance
sklearn.covariance.tests
sklearn.covariance.tests.test_covariance
sklearn.covariance.tests.test_elliptic_envelope
sklearn.covariance.tests.test_graphical_lasso
sklearn.covariance.tests.test_robust_covariance
sklearn.cross_decomposition
sklearn.cross_decomposition._pls
sklearn.cross_decomposition.tests
sklearn.cross_decomposition.tests.test_pls
sklearn.datasets
sklearn.datasets._arff_parser
sklearn.datasets._base
sklearn.datasets._california_housing
sklearn.datasets._covtype
sklearn.datasets._kddcup99
sklearn.datasets._lfw
sklearn.datasets._olivetti_faces
sklearn.datasets._openml
sklearn.datasets._rcv1
sklearn.datasets._samples_generator
sklearn.datasets._species_distributions
sklearn.datasets._svmlight_format_io
sklearn.datasets._twenty_newsgroups
sklearn.datasets.data
sklearn.datasets.descr
sklearn.datasets.images
sklearn.datasets.tests
sklearn.datasets.tests.data
sklearn.datasets.tests.data.openml
sklearn.datasets.tests.data.openml.id_1
sklearn.datasets.tests.data.openml.id_1119
sklearn.datasets.tests.data.openml.id_1590
sklearn.datasets.tests.data.openml.id_2
sklearn.datasets.tests.data.openml.id_292
sklearn.datasets.tests.data.openml.id_3
sklearn.datasets.tests.data.openml.id_40589
sklearn.datasets.tests.data.openml.id_40675
sklearn.datasets.tests.data.openml.id_40945
sklearn.datasets.tests.data.openml.id_40966
sklearn.datasets.tests.data.openml.id_42074
sklearn.datasets.tests.data.openml.id_42585
sklearn.datasets.tests.data.openml.id_561
sklearn.datasets.tests.data.openml.id_61
sklearn.datasets.tests.data.openml.id_62
sklearn.datasets.tests.test_20news
sklearn.datasets.tests.test_arff_parser
sklearn.datasets.tests.test_base
sklearn.datasets.tests.test_california_housing
sklearn.datasets.tests.test_common
sklearn.datasets.tests.test_covtype
sklearn.datasets.tests.test_kddcup99
sklearn.datasets.tests.test_lfw
sklearn.datasets.tests.test_olivetti_faces
sklearn.datasets.tests.test_openml
sklearn.datasets.tests.test_rcv1
sklearn.datasets.tests.test_samples_generator
sklearn.datasets.tests.test_svmlight_format
sklearn.datasets._svmlight_format_fast
sklearn.decomposition
sklearn.decomposition._base
sklearn.decomposition._dict_learning
sklearn.decomposition._factor_analysis
sklearn.decomposition._fastica
sklearn.decomposition._incremental_pca
sklearn.decomposition._kernel_pca
sklearn.decomposition._lda
sklearn.decomposition._nmf
sklearn.decomposition._pca
sklearn.decomposition._sparse_pca
sklearn.decomposition._truncated_svd
sklearn.decomposition.tests
sklearn.decomposition.tests.test_dict_learning
sklearn.decomposition.tests.test_factor_analysis
sklearn.decomposition.tests.test_fastica
sklearn.decomposition.tests.test_incremental_pca
sklearn.decomposition.tests.test_kernel_pca
sklearn.decomposition.tests.test_nmf
sklearn.decomposition.tests.test_online_lda
sklearn.decomposition.tests.test_pca
sklearn.decomposition.tests.test_sparse_pca
sklearn.decomposition.tests.test_truncated_svd
sklearn.decomposition._cdnmf_fast
sklearn.decomposition._online_lda_fast
sklearn.discriminant_analysis
sklearn.dummy
sklearn.ensemble
sklearn.ensemble._bagging
sklearn.ensemble._base
sklearn.ensemble._forest
sklearn.ensemble._gb
sklearn.ensemble._hist_gradient_boosting
sklearn.ensemble._hist_gradient_boosting.binning
sklearn.ensemble._hist_gradient_boosting.gradient_boosting
sklearn.ensemble._hist_gradient_boosting.grower
sklearn.ensemble._hist_gradient_boosting.predictor
sklearn.ensemble._hist_gradient_boosting.tests
sklearn.ensemble._hist_gradient_boosting.tests.test_binning
sklearn.ensemble._hist_gradient_boosting.tests.test_bitset
sklearn.ensemble._hist_gradient_boosting.tests.test_compare_lightgbm
sklearn.ensemble._hist_gradient_boosting.tests.test_gradient_boosting
sklearn.ensemble._hist_gradient_boosting.tests.test_grower
sklearn.ensemble._hist_gradient_boosting.tests.test_histogram
sklearn.ensemble._hist_gradient_boosting.tests.test_monotonic_contraints
sklearn.ensemble._hist_gradient_boosting.tests.test_predictor
sklearn.ensemble._hist_gradient_boosting.tests.test_splitting
sklearn.ensemble._hist_gradient_boosting.tests.test_warm_start
sklearn.ensemble._hist_gradient_boosting._binning
sklearn.ensemble._hist_gradient_boosting._bitset
sklearn.ensemble._hist_gradient_boosting._gradient_boosting
sklearn.ensemble._hist_gradient_boosting._predictor
sklearn.ensemble._hist_gradient_boosting.common
sklearn.ensemble._hist_gradient_boosting.histogram
sklearn.ensemble._hist_gradient_boosting.splitting
sklearn.ensemble._hist_gradient_boosting.utils
sklearn.ensemble._iforest
sklearn.ensemble._stacking
sklearn.ensemble._voting
sklearn.ensemble._weight_boosting
sklearn.ensemble.tests
sklearn.ensemble.tests.test_bagging
sklearn.ensemble.tests.test_base
sklearn.ensemble.tests.test_common
sklearn.ensemble.tests.test_forest
sklearn.ensemble.tests.test_gradient_boosting
sklearn.ensemble.tests.test_iforest
sklearn.ensemble.tests.test_stacking
sklearn.ensemble.tests.test_voting
sklearn.ensemble.tests.test_weight_boosting
sklearn.ensemble._gradient_boosting
sklearn.exceptions
sklearn.experimental
sklearn.experimental.enable_halving_search_cv
sklearn.experimental.enable_hist_gradient_boosting
sklearn.experimental.enable_iterative_imputer
sklearn.experimental.tests
sklearn.experimental.tests.test_enable_hist_gradient_boosting
sklearn.experimental.tests.test_enable_iterative_imputer
sklearn.experimental.tests.test_enable_successive_halving
sklearn.externals
sklearn.externals._arff
sklearn.externals._packaging
sklearn.externals._packaging._structures
sklearn.externals._packaging.version
sklearn.externals._scipy
sklearn.externals._scipy.sparse
sklearn.externals._scipy.sparse.csgraph
sklearn.externals._scipy.sparse.csgraph._laplacian
sklearn.externals.conftest
sklearn.feature_extraction
sklearn.feature_extraction._dict_vectorizer
sklearn.feature_extraction._hash
sklearn.feature_extraction._stop_words
sklearn.feature_extraction.image
sklearn.feature_extraction.tests
sklearn.feature_extraction.tests.test_dict_vectorizer
sklearn.feature_extraction.tests.test_feature_hasher
sklearn.feature_extraction.tests.test_image
sklearn.feature_extraction.tests.test_text
sklearn.feature_extraction.text
sklearn.feature_extraction._hashing_fast
sklearn.feature_selection
sklearn.feature_selection._base
sklearn.feature_selection._from_model
sklearn.feature_selection._mutual_info
sklearn.feature_selection._rfe
sklearn.feature_selection._sequential
sklearn.feature_selection._univariate_selection
sklearn.feature_selection._variance_threshold
sklearn.feature_selection.tests
sklearn.feature_selection.tests.test_base
sklearn.feature_selection.tests.test_chi2
sklearn.feature_selection.tests.test_feature_select
sklearn.feature_selection.tests.test_from_model
sklearn.feature_selection.tests.test_mutual_info
sklearn.feature_selection.tests.test_rfe
sklearn.feature_selection.tests.test_sequential
sklearn.feature_selection.tests.test_variance_threshold
sklearn.gaussian_process
sklearn.gaussian_process._gpc
sklearn.gaussian_process._gpr
sklearn.gaussian_process.kernels
sklearn.gaussian_process.tests
sklearn.gaussian_process.tests._mini_sequence_kernel
sklearn.gaussian_process.tests.test_gpc
sklearn.gaussian_process.tests.test_gpr
sklearn.gaussian_process.tests.test_kernels
sklearn.impute
sklearn.impute._base
sklearn.impute._iterative
sklearn.impute._knn
sklearn.impute.tests
sklearn.impute.tests.test_base
sklearn.impute.tests.test_common
sklearn.impute.tests.test_impute
sklearn.impute.tests.test_knn
sklearn.inspection
sklearn.inspection._partial_dependence
sklearn.inspection._pd_utils
sklearn.inspection._permutation_importance
sklearn.inspection._plot
sklearn.inspection._plot.decision_boundary
sklearn.inspection._plot.partial_dependence
sklearn.inspection._plot.tests
sklearn.inspection._plot.tests.test_boundary_decision_display
sklearn.inspection._plot.tests.test_plot_partial_dependence
sklearn.inspection.tests
sklearn.inspection.tests.test_partial_dependence
sklearn.inspection.tests.test_pd_utils
sklearn.inspection.tests.test_permutation_importance
sklearn.isotonic
sklearn.kernel_approximation
sklearn.kernel_ridge
sklearn.linear_model
sklearn.linear_model._base
sklearn.linear_model._bayes
sklearn.linear_model._coordinate_descent
sklearn.linear_model._glm
sklearn.linear_model._glm._newton_solver
sklearn.linear_model._glm.glm
sklearn.linear_model._glm.tests
sklearn.linear_model._glm.tests.test_glm
sklearn.linear_model._huber
sklearn.linear_model._least_angle
sklearn.linear_model._linear_loss
sklearn.linear_model._logistic
sklearn.linear_model._omp
sklearn.linear_model._passive_aggressive
sklearn.linear_model._perceptron
sklearn.linear_model._quantile
sklearn.linear_model._ransac
sklearn.linear_model._ridge
sklearn.linear_model._sag
sklearn.linear_model._stochastic_gradient
sklearn.linear_model._theil_sen
sklearn.linear_model.tests
sklearn.linear_model.tests.test_base
sklearn.linear_model.tests.test_bayes
sklearn.linear_model.tests.test_common
sklearn.linear_model.tests.test_coordinate_descent
sklearn.linear_model.tests.test_huber
sklearn.linear_model.tests.test_least_angle
sklearn.linear_model.tests.test_linear_loss
sklearn.linear_model.tests.test_logistic
sklearn.linear_model.tests.test_omp
sklearn.linear_model.tests.test_passive_aggressive
sklearn.linear_model.tests.test_perceptron
sklearn.linear_model.tests.test_quantile
sklearn.linear_model.tests.test_ransac
sklearn.linear_model.tests.test_ridge
sklearn.linear_model.tests.test_sag
sklearn.linear_model.tests.test_sgd
sklearn.linear_model.tests.test_sparse_coordinate_descent
sklearn.linear_model.tests.test_theil_sen
sklearn.linear_model._cd_fast
sklearn.linear_model._sag_fast
sklearn.linear_model._sgd_fast
sklearn.manifold
sklearn.manifold._isomap
sklearn.manifold._locally_linear
sklearn.manifold._mds
sklearn.manifold._spectral_embedding
sklearn.manifold._t_sne
sklearn.manifold.tests
sklearn.manifold.tests.test_isomap
sklearn.manifold.tests.test_locally_linear
sklearn.manifold.tests.test_mds
sklearn.manifold.tests.test_spectral_embedding
sklearn.manifold.tests.test_t_sne
sklearn.manifold._barnes_hut_tsne
sklearn.manifold._utils
sklearn.metrics
sklearn.metrics._base
sklearn.metrics._classification
sklearn.metrics._pairwise_distances_reduction
sklearn.metrics._pairwise_distances_reduction._dispatcher
sklearn.metrics._pairwise_distances_reduction._argkmin
sklearn.metrics._pairwise_distances_reduction._argkmin_classmode
sklearn.metrics._pairwise_distances_reduction._base
sklearn.metrics._pairwise_distances_reduction._datasets_pair
sklearn.metrics._pairwise_distances_reduction._middle_term_computer
sklearn.metrics._pairwise_distances_reduction._radius_neighbors
sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode
sklearn.metrics._plot
sklearn.metrics._plot.confusion_matrix
sklearn.metrics._plot.det_curve
sklearn.metrics._plot.precision_recall_curve
sklearn.metrics._plot.regression
sklearn.metrics._plot.roc_curve
sklearn.metrics._plot.tests
sklearn.metrics._plot.tests.test_common_curve_display
sklearn.metrics._plot.tests.test_confusion_matrix_display
sklearn.metrics._plot.tests.test_det_curve_display
sklearn.metrics._plot.tests.test_precision_recall_display
sklearn.metrics._plot.tests.test_predict_error_display
sklearn.metrics._plot.tests.test_roc_curve_display
sklearn.metrics._ranking
sklearn.metrics._regression
sklearn.metrics._scorer
sklearn.metrics.cluster
sklearn.metrics.cluster._bicluster
sklearn.metrics.cluster._supervised
sklearn.metrics.cluster._unsupervised
sklearn.metrics.cluster.tests
sklearn.metrics.cluster.tests.test_bicluster
sklearn.metrics.cluster.tests.test_common
sklearn.metrics.cluster.tests.test_supervised
sklearn.metrics.cluster.tests.test_unsupervised
sklearn.metrics.cluster._expected_mutual_info_fast
sklearn.metrics.pairwise
sklearn.metrics.tests
sklearn.metrics.tests.test_classification
sklearn.metrics.tests.test_common
sklearn.metrics.tests.test_dist_metrics
sklearn.metrics.tests.test_pairwise
sklearn.metrics.tests.test_pairwise_distances_reduction
sklearn.metrics.tests.test_ranking
sklearn.metrics.tests.test_regression
sklearn.metrics.tests.test_score_objects
sklearn.metrics._dist_metrics
sklearn.metrics._pairwise_fast
sklearn.mixture
sklearn.mixture._base
sklearn.mixture._bayesian_mixture
sklearn.mixture._gaussian_mixture
sklearn.mixture.tests
sklearn.mixture.tests.test_bayesian_mixture
sklearn.mixture.tests.test_gaussian_mixture
sklearn.mixture.tests.test_mixture
sklearn.model_selection
sklearn.model_selection._plot
sklearn.model_selection._search
sklearn.model_selection._search_successive_halving
sklearn.model_selection._split
sklearn.model_selection._validation
sklearn.model_selection.tests
sklearn.model_selection.tests.common
sklearn.model_selection.tests.test_plot
sklearn.model_selection.tests.test_search
sklearn.model_selection.tests.test_split
sklearn.model_selection.tests.test_successive_halving
sklearn.model_selection.tests.test_validation
sklearn.multiclass
sklearn.multioutput
sklearn.naive_bayes
sklearn.neighbors
sklearn.neighbors._base
sklearn.neighbors._classification
sklearn.neighbors._graph
sklearn.neighbors._kde
sklearn.neighbors._lof
sklearn.neighbors._nca
sklearn.neighbors._nearest_centroid
sklearn.neighbors._regression
sklearn.neighbors._unsupervised
sklearn.neighbors.tests
sklearn.neighbors.tests.test_ball_tree
sklearn.neighbors.tests.test_graph
sklearn.neighbors.tests.test_kd_tree
sklearn.neighbors.tests.test_kde
sklearn.neighbors.tests.test_lof
sklearn.neighbors.tests.test_nca
sklearn.neighbors.tests.test_nearest_centroid
sklearn.neighbors.tests.test_neighbors
sklearn.neighbors.tests.test_neighbors_pipeline
sklearn.neighbors.tests.test_neighbors_tree
sklearn.neighbors.tests.test_quad_tree
sklearn.neighbors._ball_tree
sklearn.neighbors._kd_tree
sklearn.neighbors._partition_nodes
sklearn.neighbors._quad_tree
sklearn.neural_network
sklearn.neural_network._base
sklearn.neural_network._multilayer_perceptron
sklearn.neural_network._rbm
sklearn.neural_network._stochastic_optimizers
sklearn.neural_network.tests
sklearn.neural_network.tests.test_base
sklearn.neural_network.tests.test_mlp
sklearn.neural_network.tests.test_rbm
sklearn.neural_network.tests.test_stochastic_optimizers
sklearn.pipeline
sklearn.preprocessing
sklearn.preprocessing._data
sklearn.preprocessing._discretization
sklearn.preprocessing._encoders
sklearn.preprocessing._function_transformer
sklearn.preprocessing._label
sklearn.preprocessing._polynomial
sklearn.preprocessing._target_encoder
sklearn.preprocessing.tests
sklearn.preprocessing.tests.test_common
sklearn.preprocessing.tests.test_data
sklearn.preprocessing.tests.test_discretization
sklearn.preprocessing.tests.test_encoders
sklearn.preprocessing.tests.test_function_transformer
sklearn.preprocessing.tests.test_label
sklearn.preprocessing.tests.test_polynomial
sklearn.preprocessing.tests.test_target_encoder
sklearn.preprocessing._csr_polynomial_expansion
sklearn.preprocessing._target_encoder_fast
sklearn.random_projection
sklearn.semi_supervised
sklearn.semi_supervised._label_propagation
sklearn.semi_supervised._self_training
sklearn.semi_supervised.tests
sklearn.semi_supervised.tests.test_label_propagation
sklearn.semi_supervised.tests.test_self_training
sklearn.svm
sklearn.svm._base
sklearn.svm._bounds
sklearn.svm._classes
sklearn.svm.tests
sklearn.svm.tests.test_bounds
sklearn.svm.tests.test_sparse
sklearn.svm.tests.test_svm
sklearn.svm._liblinear
sklearn.svm._libsvm
sklearn.svm._libsvm_sparse
sklearn.svm._newrand
sklearn.tests
sklearn.tests.metadata_routing_common
sklearn.tests.random_seed
sklearn.tests.test_base
sklearn.tests.test_build
sklearn.tests.test_calibration
sklearn.tests.test_check_build
sklearn.tests.test_common
sklearn.tests.test_config
sklearn.tests.test_discriminant_analysis
sklearn.tests.test_docstring_parameters
sklearn.tests.test_docstrings
sklearn.tests.test_dummy
sklearn.tests.test_init
sklearn.tests.test_isotonic
sklearn.tests.test_kernel_approximation
sklearn.tests.test_kernel_ridge
sklearn.tests.test_metadata_routing
sklearn.tests.test_metaestimators
sklearn.tests.test_metaestimators_metadata_routing
sklearn.tests.test_min_dependencies_readme
sklearn.tests.test_multiclass
sklearn.tests.test_multioutput
sklearn.tests.test_naive_bayes
sklearn.tests.test_pipeline
sklearn.tests.test_public_functions
sklearn.tests.test_random_projection
sklearn.tree
sklearn.tree._classes
sklearn.tree._export
sklearn.tree._reingold_tilford
sklearn.tree.tests
sklearn.tree.tests.test_export
sklearn.tree.tests.test_monotonic_tree
sklearn.tree.tests.test_reingold_tilford
sklearn.tree.tests.test_tree
sklearn.tree._criterion
sklearn.tree._splitter
sklearn.tree._tree
sklearn.tree._utils
sklearn.utils
sklearn.utils._arpack
sklearn.utils._array_api
sklearn.utils._available_if
sklearn.utils._bunch
sklearn.utils._encode
sklearn.utils._estimator_html_repr
sklearn.utils._joblib
sklearn.utils._mask
sklearn.utils._metadata_requests
sklearn.utils._mocking
sklearn.utils._param_validation
sklearn.utils._plotting
sklearn.utils._pprint
sklearn.utils._response
sklearn.utils._set_output
sklearn.utils._show_versions
sklearn.utils._tags
sklearn.utils._testing
sklearn.utils.class_weight
sklearn.utils.deprecation
sklearn.utils.discovery
sklearn.utils.estimator_checks
sklearn.utils.extmath
sklearn.utils.fixes
sklearn.utils.graph
sklearn.utils.metadata_routing
sklearn.utils.metaestimators
sklearn.utils.multiclass
sklearn.utils.optimize
sklearn.utils.parallel
sklearn.utils.random
sklearn.utils.sparsefuncs
sklearn.utils.stats
sklearn.utils.tests
sklearn.utils.tests.test_arpack
sklearn.utils.tests.test_array_api
sklearn.utils.tests.test_arrayfuncs
sklearn.utils.tests.test_bunch
sklearn.utils.tests.test_class_weight
sklearn.utils.tests.test_cython_blas
sklearn.utils.tests.test_cython_templating
sklearn.utils.tests.test_deprecation
sklearn.utils.tests.test_encode
sklearn.utils.tests.test_estimator_checks
sklearn.utils.tests.test_estimator_html_repr
sklearn.utils.tests.test_extmath
sklearn.utils.tests.test_fast_dict
sklearn.utils.tests.test_fixes
sklearn.utils.tests.test_graph
sklearn.utils.tests.test_metaestimators
sklearn.utils.tests.test_mocking
sklearn.utils.tests.test_multiclass
sklearn.utils.tests.test_murmurhash
sklearn.utils.tests.test_optimize
sklearn.utils.tests.test_parallel
sklearn.utils.tests.test_param_validation
sklearn.utils.tests.test_plotting
sklearn.utils.tests.test_pprint
sklearn.utils.tests.test_random
sklearn.utils.tests.test_response
sklearn.utils.tests.test_seq_dataset
sklearn.utils.tests.test_set_output
sklearn.utils.tests.test_shortest_path
sklearn.utils.tests.test_show_versions
sklearn.utils.tests.test_sparsefuncs
sklearn.utils.tests.test_stats
sklearn.utils.tests.test_tags
sklearn.utils.tests.test_testing
sklearn.utils.tests.test_typedefs
sklearn.utils.tests.test_utils
sklearn.utils.tests.test_validation
sklearn.utils.tests.test_weight_vector
sklearn.utils.validation
sklearn.utils._cython_blas
sklearn.utils._fast_dict
sklearn.utils._heap
sklearn.utils._isfinite
sklearn.utils._openmp_helpers
sklearn.utils._random
sklearn.utils._seq_dataset
sklearn.utils._sorting
sklearn.utils._typedefs
sklearn.utils._vector_sentinel
sklearn.utils._weight_vector
sklearn.utils.arrayfuncs
sklearn.utils.murmurhash
sklearn.utils.sparsefuncs_fast
sklearn._built_with_meson
sklearn._isotonic

so both Python and compiled extensions are recursively discovered successfully using this strategy.

There also can be other files that have an extension that matches the one used to load python modules or python extension modules in the build directory. With this approach all these would be interpreted as being part of the package.

I agree in theory but will that ever be the case under the build/ folder of a meson-python editable build?

@ogrisel
Copy link
Author

ogrisel commented Jan 19, 2024

But I can try to update this PR to add iter_modules method to the MesonpyMetaFinder class instead.

EDIT: the proper way to do it seems to implement this iter_modules method on a path entry finder for mesonpy and register it via path_hook class method in sys.path_hooks similarly to was is done by default for the regular FileFinder class. pkgutil.get_importer (used indirectly by pkgutil.walk_packages/iter_modules) would then be able to look it up for a given package file system path.

Still, I find the definition of empty __path__ package attributes a potential source of surprise for many code-base, beyond the objective of making pkgutil.walk_packages work as intended.

EDIT2: but looking again at the code of pkgutil.walk_packages, customizing iter_modules alone would not be enough, as walk_packages still relies on the __path__ attribute of a subpackage to do its recursive call. But customizing iter_modules should make it possible to filter out files that should not be considered submodules of a given package.

@dnicolodi
Copy link
Member

This is not true.

It depends on how the build definition is structured. If all extension modules are defined in a meson.build in the root project, the build files are all created in the root of the build directory. What we need is not something that works for sklearn but a general solution.

It would be polite if you could at least consider that the maintainer of a project to which you are contributing spent a bit more time than you thinking about the problem space a bit more than you. If they tell you that an approach to solve an issue does not work, it would be nice if you could at least try to understand the objections.

@dnicolodi
Copy link
Member

Still, I find the definition of empty __path__ package attributes a potential source of surprise for many code-base, beyond the objective of making pkgutil.walk_packages work as intended.

__path__ makes sense only within the python import system implementation. Relaying on it to point to any specific filesystem folder (or to a filesystem folder at all) is bogus and always will be. It breaks zip imports, bundled python applications, and whatnot. I see the fact that meson-python editable installs breaks this wrong expectation as a positive thing that will move developers away from using the __path__ attribute for things for which it is not designed.

@dnicolodi
Copy link
Member

I agree in theory but will that ever be the case under the build/ folder of a meson-python editable build?

Sure. For example if a project generates a version.py module with the project version, or a shared library on platforms where both extension modules and shared libraries have the same filename extension.

@ogrisel
Copy link
Author

ogrisel commented Jan 19, 2024

It depends on how the build definition is structured. If all extension modules are defined in a meson.build in the root project, the build files are all created in the root of the build directory. What we need is not something that works for sklearn but a general solution.

Alright, sorry I had not considered it was possible to do otherwise.

It would be polite if you could at least consider that the maintainer of a project to which you are contributing spent a bit more time than you thinking about the problem space a bit more than you. If they tell you that an approach to solve an issue does not work, it would be nice if you could at least try to understand the objections.

Sorry.

@eli-schwartz
Copy link
Member

Still, I find the definition of empty __path__ package attributes a potential source of surprise for many code-base, beyond the objective of making pkgutil.walk_packages work as intended.

As a dunder attribute it is formally reserved by the implementation. The only thing you are allowed to do with pure semantics here is to use __path__ according to its documentation, to mark a package vs a module and to optionally have it as an empty list OR use it to instruct the import machinery where to search for submodules.

Any code that takes this dunder attribute and doesn't fully implement the semantics of the import machinery in question is automatically broken. This includes detecting cases where it is empty and recognizing that a different import machinery finder must be used if available.

Of course, using this dunder attribute is a convenient hack, but many things are a convenient hack. Emphasis on "hack".

@ogrisel
Copy link
Author

ogrisel commented Jan 23, 2024

The only thing you are allowed to do with pure semantics here is to use path according to its documentation, to mark a package vs a module and to optionally have it as an empty list OR use it to instruct the import machinery where to search for submodules.

Any code that takes this dunder attribute and doesn't fully implement the semantics of the import machinery in question is automatically broken. This includes detecting cases where it is empty and recognizing that a different import machinery finder must be used if available.

According to the above interpretation pkgutil.walk_packages, which is part of the standard library, would be considered broken: indeed it only relies on __path__ to recursively find n+1-level sub-packages. A custom iter_modules implementation alone would not fix this as iter_modules is meant to return direct submodules, not submodules of subpackages.

As far as I know the import machinery does not specify any other way to discover subpackages transitively.

I checked that pkgutil.walk_packages works as expected on the following test zip package structure:

$ zipinfo level0_package.zip
Archive:  level0_package.zip
Zip file size: 1304 bytes, number of entries: 6
drwxr-xr-x  2.0 unx        0 bx stor 24-Jan-23 18:16 level0_package/
-rw-r--r--  1.0 unx        0 bX stor 24-Jan-23 18:15 level0_package/level1_module.py
-rw-r--r--  1.0 unx        0 bX stor 24-Jan-23 18:15 level0_package/__init__.py
drwxr-xr-x  2.0 unx        0 bx stor 24-Jan-23 18:17 level0_package/level1_package/
-rw-r--r--  1.0 unx        0 bX stor 24-Jan-23 18:16 level0_package/level1_package/__init__.py
-rw-r--r--  1.0 unx        0 bX stor 24-Jan-23 18:17 level0_package/level1_package/level2_module.py
6 files, 0 bytes uncompressed, 0 bytes compressed:  0.0%
>>> import sys
>>> sys.path.insert(0, "level0_package.zip")
>>> import level0_package
>>> level0_package.__path__
['level0_package.zip/level0_package']
>>> from pkgutil import walk_packages
>>> for info in walk_packages(level0_package.__path__, prefix="level0_package."):
...     print(info.name)
... 
level0_package.level1_module
level0_package.level1_package
level0_package.level1_package.level2_module

>>> import level0_package.level1_package
>>> level0_package.level1_package.__path__
['level0_package.zip/level0_package/level1_package']

and furthermore to confirm that iter_modules only iterates on direct descendants:

>>> from pkgutil import iter_modules
>>> for info in iter_modules(level0_package.__path__, prefix="level0_package."):
...     print(info.name)
... 
level0_package.level1_module
level0_package.level1_package

If you agree with my edited comment above (#562 (comment)), I think it's possible to compute __path__ attribute for meson-python editable installs as done in this PR to get pkg.walk_packages to work as expected while also registering a custom file entry finder with a specific iter_modules implementation that would avoid introducing false positive submodules in case all compiled extensions of meson-python installed project are located in a shared folder (assuming no name collisions).

I can try to update this PR accordingly.

@dnicolodi
Copy link
Member

The proper fix is in #569.

@dnicolodi dnicolodi closed this Feb 4, 2024
@ogrisel ogrisel deleted the fix-editable-__path__ branch February 20, 2024 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

package __path__ is an empty list
4 participants