Skip to content

mbenlioglu/parallel-nearest-neighbour

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPU Asisted Parallelization of Nearest Neighbour Calculation

     This program executes sequential and parallel versions of Nearest-Neighbour calculation using both GPU parallelizm and CPU parallelizm, outputs the execution times of sequential and parallel versions

Implemented by:

Getting Started

Data

     There are two sets of 16-dimensional points considered in this application, which will be referred as train set and test set. This application finds points from training set having the shortest Euclidian distances to the points given in test set.

     To represent this two sets, 2 CSV files that have 16 columns of numbers, without any header rows, are considered. One of which will represent the train set, and the other will represent the test set. There are 2 files under ./data/ directory provided, which are used for the results shown here.

Compiling

     The provided Makefile, ./Makefile is to compile codes into executable in Linux environment. Executing following command in the project root will create an executable named "nearestNeighbour" which takes zero, one or three arguments.

$ make

     For Windows use Visual Studio with provided *.sln & *.vxproj files to compile the codes.

Execution

     Compiled executable will run either with zero, one or three parameters. If zero parameters are provided, program will use device 0 as the GPU device and will try to input train and test sets from ./data/train.txt and ./data/test.txt respectively. (which are provided) If 1 parameter is provided that will be used as the GPU device id and data files will be assumed in the same place as before. If 3 parameters are provided, first parameter will be used for the path of train set second will be used for test set and third will be used for the GPU device id.

     The program will run sequential and parallel versions of the algorithm and output the execution times.

Algorithm & Implementation

     To find closest point in train set to a point in test set, Euclidian distances of the point in test set to all points in train set must be checked, but since we are not interested in actual values of distances, we can omit square root part in the calculation and just compare the sums of squares, to reduce the computational complexity.

     In both CPU and GPU parallelizm, closest neighbours of multiple points from test set is calculated concurrently, but since number of cores in GPU is significantly larger than CPU, we can find more neighbours in parallel, compared to CPU. Results can be seen in the following sections.

Observations and performance improvements

     A quick analysis of the provided data revealed that none of the dimensions in neither train nor test set exceeds 15, which means we can represent a dimension with at most 4 bits of information, or an entire point with 64 bits of information. Therefore, the first realization was using regular 32-bit integers for every dimension would waste 8 times more space than we would need, even 8-bit unsigned chars would cause a waste of space. Therefore, bit fields are used to reduce memory consumption and increase the chances of fitting most of the data into fast memories to imporve performance.

Results

     In this section, result tables containing various information about execution results is given.

Reusults Table

Table 1. Execution time of the algortihm on GPU and CPU with different number of threads

About

Nearest neighbour algorithm with GPU assisted parallelizm.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published