Skip to content

Hybrid Parallel MPI and OpenMP implementations of clustering algorithms. This code was developed for my university thesis.

License

Notifications You must be signed in to change notification settings

Lefti97/parallel-clustering-hybrid

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hybrid Parallel Clustering Algorithms

Title: Development and Evaluation of Parallel Clustering Algorithms in Hybrid Enviroment using OpenMP and MPI

Abstract: The object of this thesis will be the design, development and evaluation, in a parallel environment of shared memory, distributed memory, and hybrid form (massive parallel programming in a combined environment of distributed-shared memory), of efficient algorithms for the problem of data clustering. The development of the algorithms that will be selected will be done in C/C++ language and their evaluation will be done in a suitable real environment. Individual implementations in OpenMP and/or MPI as well as combined implementations will be developed indicatively, such as e.g. using MPI+OpenMP and/or using MPI+MPI Shared Memory, and corresponding comparative measurements and conclusions will be drawn.

For this thesis parallel implementations were made for the clustering algorithms Kmeans and CURE. Further implementations of parallel clustering algorithms may be added in this repo.

Instructions

  • Compile the code in src folder using Makefile
make
  • How to execute:
                          ./Kmeans_Serial <filename> <clusters> <distance_threshold>
                          ./Kmeans_OpenMP <filename> <clusters> <distance_threshold> <OpenMP threads>
mpirun -n <mpi processes> ./Kmeans_MPI    <filename> <clusters> <distance_threshold>
mpirun -n <mpi processes> ./Kmeans_Hybrid <filename> <clusters> <distance_threshold> <OpenMP threads>

                          ./Cure_Serial <filename> <clusters> <representatives> <shrink fraction>
                          ./Cure_OpenMP <filename> <clusters> <representatives> <shrink fraction> <OpenMP threads>
mpirun -n <mpi processes> ./Cure_MPI    <filename> <clusters> <representatives> <shrink fraction>
mpirun -n <mpi processes> ./Cure_Hybrid <filename> <clusters> <representatives> <shrink fraction> <OpenMP threads>
  • Example Runs:
            ./Kmeans_Serial.out ../inputs/test1K.txt 3 1
            ./Kmeans_OpenMP.out ../inputs/test1K.txt 3 1 4
mpirun -n 4 ./Kmeans_MPI.out    ../inputs/test1K.txt 3 1
mpirun -n 4 ./Kmeans_Hybrid.out ../inputs/test1K.txt 3 1 4

            ./Cure_Serial.out ../inputs/test1K.txt 3 5 0.4
            ./Cure_OpenMP.out ../inputs/test1K.txt 3 5 0.4 4
mpirun -n 4 ./Cure_MPI.out    ../inputs/test1K.txt 3 5 0.4
mpirun -n 4 ./Cure_Hybrid.out ../inputs/test1K.txt 3 5 0.4 4
  • If gnuplot is available, a scatter plot will be saved in src/output folder.
  • A txt file with the terminal output will be saved in src/output folder.

Resources

  • Hadjidoukas, Panagiotis & Amsaleg, Laurent. (2006). Parallelization of a Hierarchical Data Clustering Algorithm Using OpenMP. 4315. 289-299. 10.1007/978-3-540-68555-5_24. Link
  • Zhang, Jing & Wu, Gongqing & Xuegang, Hu & Li, Shiying & Hao, Shuilong. (2011). A Parallel K-Means Clustering Algorithm with MPI. 10.1109/PAAP.2011.17. Link

About

Hybrid Parallel MPI and OpenMP implementations of clustering algorithms. This code was developed for my university thesis.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published