You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
I'm encountering an error when installing ludwig[distributed] in a Jupyter Notebook environment running on a Dataproc cluster. The installation seems to proceed normally until it attempts to install scikit-learn, at which point the process fails.
** Steps to Reproduce**
Launch a Dataproc cluster with a Jupyter Notebook environment.
Open a new Jupyter Notebook within the cluster.
Execute the command:
!pip install ludwig[distributed]
× Preparing metadata (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [2269 lines of output]
Partial import of sklearn during the build process.
setup.py:128: DeprecationWarning:
`numpy.distutils` is deprecated since NumPy 1.23.0, as a result
of the deprecation of `distutils` itself. It will be removed for
Python >= 3.12. For older Python versions it will remain present.
It is recommended to use `setuptools < 60.0` for those Python versions.
For more details, see:
https://numpy.org/devdocs/reference/distutils_status_migration.html
Declare '_subtract_histograms' as 'noexcept' if you control the definition and you're sure you don't want the function to raise exceptions.
2. Use an 'int' return type on '_subtract_histograms' to allow an error code to be returned.
sklearn/ensemble/_hist_gradient_boosting/splitting.pyx:920:14: Cannot assign type 'int (const void *, const void ) except? -1 nogil' to 'int ()(const void *, const void *) noexcept nogil'. Exception values are incompatible. Suggest adding 'noexcept' to the type of 'compare_cat_infos'.
Traceback (most recent call last):
File "/tmp/pip-build-env-357_itq6/overlay/lib/python3.11/site-packages/Cython/Build/Dependencies.py", line 1345, in cythonize_one_helper
File "/opt/conda/miniconda3/lib/python3.11/multiprocessing/pool.py", line 774, in get
raise self._value
Cython.Compiler.Errors.CompileError: sklearn/ensemble/_hist_gradient_boosting/splitting.pyx
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
Environment
Dataproc Cluster: image-version 2.2-debian12
Jupyter Notebook (Testing the package on jupyter notebook using dataproc cluster)
Python Version: 3.11.8
! pip install ludwig[distributed] and tried with this package also ! pip install ludwig
Package Repository: nexus.onedev.xxxx.biz xxxx - masked for security (my organization name) private repository configured for installing pacakages Additional context
I tried with multiple environments also
Python 3.10.8 & Python 3.8.15 by downgrading the image version of dataproc cluster (GCP)
The text was updated successfully, but these errors were encountered:
Hi @ayyappagundu We seem to have some major problems with our dependencies. I try to get hold of it. I hope we can fix that in the nearer future. Thanks for the ticket.
Hi @ayyappagundu we are moving from requirements.txt to pyproject.toml with poetry and hatch. Also we need to pin down a lot of dependencies. That acutally cost us some time, but we are on a good way.
In addition, we will probably have to overhaul our distribution backend as we are using outdated versions of ray. Is it important that you distribute your task or can you also run it on a compute node?
Description
I'm encountering an error when installing ludwig[distributed] in a Jupyter Notebook environment running on a Dataproc cluster. The installation seems to proceed normally until it attempts to install scikit-learn, at which point the process fails.
** Steps to Reproduce**
Launch a Dataproc cluster with a Jupyter Notebook environment.
Open a new Jupyter Notebook within the cluster.
Execute the command:
!pip install ludwig[distributed]
Error
Collecting scikit-learn (from ludwig[distributed])
Using cached https://nexus.onedev.neustar.biz/repository/ds-pypi-group/packages/scikit-learn/1.5.2/scikit_learn-1.5.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.3 MB)
Using cached https://nexus.onedev.neustar.biz/repository/ds-pypi-group/packages/scikit-learn/1.5.1/scikit_learn-1.5.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.3 MB)
Using cached https://nexus.onedev.neustar.biz/repository/ds-pypi-group/packages/scikit-learn/1.2.0/scikit_learn-1.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.5 MB)
Using cached https://nexus.onedev.neustar.biz/repository/ds-pypi-group/packages/scikit-learn/1.1.3/scikit_learn-1.1.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (32.0 MB)
Using cached https://nexus.onedev.xxxx.biz/repository/ds-pypi-group/packages/scikit-learn/1.1.2/scikit-learn-1.1.2.tar.gz (7.0 MB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... error
error: subprocess-exited-with-error
× Preparing metadata (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [2269 lines of output]
Partial import of sklearn during the build process.
setup.py:128: DeprecationWarning:
Declare '_subtract_histograms' as 'noexcept' if you control the definition and you're sure you don't want the function to raise exceptions.
2. Use an 'int' return type on '_subtract_histograms' to allow an error code to be returned.
Error compiling Cython file:
...
if n_used_bins <= 1:
free(cat_infos)
return
sklearn/ensemble/_hist_gradient_boosting/splitting.pyx:920:14: Cannot assign type 'int (const void *, const void ) except? -1 nogil' to 'int ()(const void *, const void *) noexcept nogil'. Exception values are incompatible. Suggest adding 'noexcept' to the type of 'compare_cat_infos'.
Traceback (most recent call last):
File "/tmp/pip-build-env-357_itq6/overlay/lib/python3.11/site-packages/Cython/Build/Dependencies.py", line 1345, in cythonize_one_helper
Cython.Compiler.Errors.CompileError: sklearn/ensemble/_hist_gradient_boosting/splitting.pyx
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
Environment
Dataproc Cluster: image-version 2.2-debian12
Jupyter Notebook (Testing the package on jupyter notebook using dataproc cluster)
Python Version: 3.11.8
! pip install ludwig[distributed] and tried with this package also ! pip install ludwig
Package Repository: nexus.onedev.xxxx.biz xxxx - masked for security (my organization name) private repository configured for installing pacakages
Additional context
I tried with multiple environments also
Python 3.10.8 & Python 3.8.15 by downgrading the image version of dataproc cluster (GCP)
The text was updated successfully, but these errors were encountered: