There are two ways in which you can get started with dislib. You can perform a manual installation, or you can download our ready-to-use docker image.
dislib currently requires:
- PyCOMPSs >= 2.8
- scikit-learn >= 1.0.2
- scipy >= 1.3.0
- numpy == 1.23.1
- cvxpy >= 1.1.5
- cbor2 >= 5.4.0
Some of the examples also require matplotlib >= 2.2.3 and pandas >= 0.24.2. numpydoc >= 0.8.0 is required to build the documentation. While in order to use GPUs, cupy and/or pytorch are also required.
-
Check which PyCOMPSs version to install.
- Latest dislib release requires PyCOMPSs 2.8 or greater (check here for information about other releases).
-
Install PyCOMPSs following these instructions.
-
Install the latest dislib version with
pip3 install dislib
.- IMPORTANT: dislib requires the
pycompss
Python module. However, this command will NOT install the module automatically. The module should be available after manually installing PyCOMPSs following the instructions in step 2. For more information on this, see here.
- IMPORTANT: dislib requires the
-
You can check that everything works fine by running one of our examples:
-
Download the latest source code here.
-
Extract the contents of the tar package.
tar xzvf dislib-X.Y.Z.tar.gz
- Run an example application.
runcompss --python_interpreter=python3 dislib-X.Y.Z/examples/kmeans.py
-
Warning: requires docker version >= 17.12.0-ce
-
Follow these instructions
- Docker for Mac. Or, if you prefer to use Homebrew.
- Docker for Ubuntu.
- Docker for Arch Linux.
Be aware that the docker package has been renamed from
docker
todocker-ce
for some distributions. Make sure you install the new package. -
Add user to docker group to run dislib as a non-root user.
-
Check that docker is correctly installed.
docker --version docker ps # this should be empty as no docker processes are yet running.
-
Install docker-py
pip3 install docker
pip3 install dislib
This should add the dislib executable to your path.
Initialize dislib where your source code will be (you can re-init anytime). This will allow docker to access your local code and run it inside the container.
Note that the first time dislib needs to download the docker image from the registry, and it may take a while.
# Without a path it operates on the current working directory.
dislib init
# You can also provide a path
dislib init /home/user/replace/path/
Note: running the docker dislib does not work with applications with GUI or with visual plots such as examples/clustering_comparison.py
).
First clone dislib repo and checkout release branch vX.Y.Z (docker version and dislib code should preferably be the same to avoid inconsistencies):
git clone https://github.com/bsc-wdc/dislib.git
Init the dislib environment in the root of the repo.
The source files path are resolved from the init directory which sometimes can be confusing.
As a rule of thumb, initialize the library in a current directory and check the paths are correct running the file with python3 path_to/file.py
(in this case python3 examples/rf_iris.py
).
cd dislib
dislib init
dislib exec examples/rf_iris.py
The log files of the execution can be found at $HOME/.COMPSs.
You can also init the library inside the examples folder. This will mount the examples directory inside the container so you can execute it without adding the path:
cd dislib/examples
dislib init
dislib exec rf_iris.py
Notebooks can be run using the dislib jupyter
command. Run the
following snippet from the root of the project:
dislib init
dislib jupyter ./notebooks
An alternative and more flexible way of starting jupyter is using the
dislib run
command in the following way:
dislib run jupyter-notebook ./notebooks --ip=0.0.0.0 --allow-root
Access your notebook by ctrl-clicking or copy pasting into the browser the link shown on the CLI (e.g. http://127.0.0.1:8888/?token=TOKEN_VALUE).
If the notebook process is not properly closed, you might get the following warning when trying to start jupyter notebooks again:
The port 8888 is already in use, trying another port.
To fix it, just restart the dislib container with dislib init
.
Note: adding more nodes is still in beta phase. Please report issues, suggestions, or feature requests on Github.
To add more computing nodes, you can either let docker create more workers for you or manually create and config a custom node.
For docker just issue the desired number of workers to be added. For example, to add 2 docker workers:
dislib components add worker 2
You can check that both new computing nodes are up with:
dislib components list
If you want to add a custom node it needs to be reachable through ssh without user.
Moreover, dislib will try to copy the working_dir
there, so it needs write permissions for the scp.
For example, to add the local machine as a worker node:
dislib components add worker '127.0.0.1:6'
- '127.0.0.1': is the IP used for ssh (can also be a hostname like 'localhost' as long as it can be resolved).
- '6': desired number of available computing units for the new node.
Please be aware that dislib components
will not list your custom nodes because they are not docker processes and thus it can't be verified if they are up and running.