From 368646e40b31a6dfb925f520e2abc1250b1bdd21 Mon Sep 17 00:00:00 2001 From: Scheib Date: Tue, 10 Oct 2023 14:30:08 +0200 Subject: [PATCH 1/3] Minor linguistic corrections in the Heat Tutorials. --- doc/source/getting_started.rst | 4 ++-- doc/source/tutorial_parallel_computation.rst | 12 ++++++------ ht | 0 3 files changed, 8 insertions(+), 8 deletions(-) create mode 100644 ht diff --git a/doc/source/getting_started.rst b/doc/source/getting_started.rst index f2ab8d1097..7953f11ec6 100644 --- a/doc/source/getting_started.rst +++ b/doc/source/getting_started.rst @@ -31,12 +31,12 @@ If you do not have a recent installation on you system, you may want to upgrade sudo dnf update python3 -If you have new administrator privileges on your system, because you are working on a cluster for example, make sure to check its *user guide*, the module system (``module spider python``) or get in touch with the administrators. +If you have no administrator privileges on your system, because you are working on a cluster for example, make sure to check its *user guide*, the module system (``module spider python``) or get in touch with the administrators. Optional Dependencies ^^^^^^^^^^^^^^^^^^^^^ -You can accelerate computations with Heat in different ways. For GPU acceleration ensure that you have a `CUDA `_ installation on your system. Distributed computations require an MPI stack on you computer. We recommend `MVAPICH `_ or `OpenMPI `_. Finally, for parallel data I/O, Heat offers interface to `HDF5 `_ and `NetCDF `_. You can obtain these packages using your operating system's package manager. +You can accelerate computations with Heat in different ways. For GPU acceleration ensure that you have a `CUDA `_ installation on your system. Distributed computations require an MPI stack on your computer. We recommend `MVAPICH `_ or `OpenMPI `_. Finally, for parallel data I/O, Heat offers interface to `HDF5 `_ and `NetCDF `_. You can obtain these packages using your operating system's package manager. Installation ------------ diff --git a/doc/source/tutorial_parallel_computation.rst b/doc/source/tutorial_parallel_computation.rst index 3dde428861..684e775cea 100644 --- a/doc/source/tutorial_parallel_computation.rst +++ b/doc/source/tutorial_parallel_computation.rst @@ -70,7 +70,7 @@ Distributed Computing --------------------- .. warning:: - For the following code examples, make sure to you have `MPI `_ installed. + For the following code examples, make sure you have `MPI `_ installed. With Heat you can even compute in distributed memory environments with multiple computation nodes, like modern high-performance cluster systems. For this, Heat makes use of the fact that operations performed on multi-dimensional arrays tend to be identical for all data items. Hence, they can be processed in data-parallel manner. Heat partitions the total number of data items equally among all processing nodes. A ``DNDarray`` assumes the role of a virtual overlay over these node-local data portions and manages them for you while offering the same interface. Consequently, operations can now be executed in parallel. Each processing node applies them locally to their own data chunk. If necessary, partial results are communicated and automatically combined behind the scenes for correct global results. @@ -174,7 +174,7 @@ Output: .. code:: text - DNDarray([12.], dtype=ht.float32, device=cpu:0, split=None) + DNDarray(12., dtype=ht.float32, device=cpu:0, split=None) The previously ``split=0`` matrix is ``split=None`` after the reduction operation. Obviously, we can also perform operations between (differently) split ``DNDarrays``. @@ -191,7 +191,7 @@ Output: DNDarray([[1., 2., 3., 4.], [1., 2., 3., 4.], - [1., 2., 3., 4.]], dtype=ht.float32, device=cpu:0, split=0) + [1., 2., 3., 4.]], dtype=ht.float32, device=cpu:0, split=1) [0/3] DNDarray([1., 2., 3., 4.], dtype=ht.int32, device=cpu:0, split=None) [1/3] DNDarray([1., 2., 3., 4.], dtype=ht.int32, device=cpu:0, split=None) @@ -200,7 +200,7 @@ Output: Technical Details ^^^^^^^^^^^^^^^^^ -On a technical level, Heat is inspired by the so-called `Bulk Synchronous Parallel (BSP) `_ processing model. Computations proceed in a series of hierarchical supersteps, each consisting of a number of node-local computations and subsequent communications. In contrast to the classical BSP model, communicated data is available immediately, rather than after the next global synchronization. In Heat, global synchronizations only occurs for collective MPI calls as well as at the program start and termination. +On a technical level, Heat is inspired by the so-called `Bulk Synchronous Parallel (BSP) `_ processing model. Computations proceed in a series of hierarchical supersteps, each consisting of a number of node-local computations and subsequent communications. In contrast to the classical BSP model, communicated data is available immediately, rather than after the next global synchronization. In Heat, global synchronization only occurs for collective MPI calls as well as at the program start and termination. .. image:: ../images/bsp.svg :align: center @@ -223,13 +223,13 @@ You can start the distributed interactive interpreter by invoking the following .. note:: - The interactive interpreter does only support a subset of all controls commands. + The interactive interpreter does only support a subset of all control commands. Parallel Performance -------------------- -When working with parallel and distributed computation in Heat there are some best practices for you may to know about. The following list covers the major ones. +When working with parallel and distributed computation in Heat there are some best practices for you to know about. The following list covers the major ones. Dos ^^^ diff --git a/ht b/ht new file mode 100644 index 0000000000..e69de29bb2 From b06c8b5bf56e76e7b79f69a1a863008739f218f4 Mon Sep 17 00:00:00 2001 From: Scheib Date: Wed, 11 Oct 2023 11:46:52 +0200 Subject: [PATCH 2/3] Other corrections in the cluster analysis tutorial. --- doc/source/tutorial_clustering.rst | 13 +++++++------ ht | 0 2 files changed, 7 insertions(+), 6 deletions(-) delete mode 100644 ht diff --git a/doc/source/tutorial_clustering.rst b/doc/source/tutorial_clustering.rst index ce6aa61c6b..21b4157065 100644 --- a/doc/source/tutorial_clustering.rst +++ b/doc/source/tutorial_clustering.rst @@ -68,8 +68,8 @@ initial centroids. c1.balance_() c2.balance_() - print(f"Number of points assigned to c1: {c1.shape[0]} " - f"Number of points assigned to c2: {c2.shape[0]} " + print(f"Number of points assigned to c1: {c1.shape[0]} \n" + f"Number of points assigned to c2: {c2.shape[0]} \n" f"Centroids = {centroids}") .. code:: text @@ -95,7 +95,7 @@ Let's plot the assigned clusters and the respective centroids: .. image:: ../images/clustering.png -We can also cluster the data with kmedians. The respective advanced initial centroid sampling is called 'kmedians++' +We can also cluster the data with kmedians. The respective advanced initial centroid sampling is called 'kmedians++'. .. code:: python @@ -110,8 +110,9 @@ We can also cluster the data with kmedians. The respective advanced initial cent c1.balance_() c2.balance_() - print(f"Number of points assigned to c1: {c1.shape[0]}" - f"Number of points assigned to c2: {c2.shape[0]}") + print(f"Number of points assigned to c1: {c1.shape[0]} \n" + f"Number of points assigned to c2: {c2.shape[0]} \n" + f"Centroids = {centroids}") Plotting the assigned clusters and the respective centroids: @@ -132,7 +133,7 @@ The Iris Dataset ------------------------------ The _iris_ dataset is a well known example for clustering analysis. It contains 4 measured features for samples from three different types of iris flowers. A subset of 150 samples is included in formats h5, csv and netcdf in Heat, -located under 'heat/heat/datasets/iris.h5', and can be loaded in a distributed manner with Heat's parallel +located under 'heat/heat/datasets', and can be loaded in a distributed manner with Heat's parallel dataloader .. code:: python diff --git a/ht b/ht deleted file mode 100644 index e69de29bb2..0000000000 From 639e6d652ad8e068ea6d35be8b0dd36a6bf3af09 Mon Sep 17 00:00:00 2001 From: Fabian Hoppe <112093564+mrfh92@users.noreply.github.com> Date: Wed, 11 Oct 2023 16:51:07 +0200 Subject: [PATCH 3/3] dummy commit to find out why CI got stuck removed on "=" in getting started... --- doc/source/getting_started.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/getting_started.rst b/doc/source/getting_started.rst index 7953f11ec6..1749c0eb52 100644 --- a/doc/source/getting_started.rst +++ b/doc/source/getting_started.rst @@ -1,7 +1,7 @@ .. _Installation: Getting Started -=============== +============== Heat is a Python package for accelerated and distributed tensor computations. Internally, it is based on `PyTorch `_. Consequently, all operating systems that support Python and PyTorch also support a Heat installation. Currently, this list contains at least Linux, MacOS and Windows. However, most of our development is done under Linux and interoperability should therefore be optimal.