This data structure was made for the Advanced Data Structures course at Universidad Católica San Pablo.
The dataset used for this project can be downloaded from here.
It is a .csv
file containing a bit over 170k songs from Spotify, classified by
their different attributes.
Once the dataset is downloaded, it should be placed inside a folder named data
at the root of the project.
project_root
├── f CMakeLists.txt
├── d data
│ └── f data.csv <-------- HERE
├── f LICENSE
├── f README.md
└── d src
The project is made using solely the standard library (STL). Thus, you can compile
this project very easily. Make use of the CMakeLists.txt
file to create your
build folder. If you are on Windows or use an IDE, you should do your research
to find out how to do it for your environment. If you are on Linux or Mac, you
can run the following commands from the root of the project to build and run
the project:
cmake -B build/ -S .
cd build/
make
./x_tree
Note: From what I've tested, compiling the project using clang++
over g++
offers better overall performance (almost double). If you want to use clang++
,
run the command export CXX=/path/to/clang++
before running the above commands.
If you are on Linux, the /path/to/clang++
should be /usr/bin/clang++
.
Once you run the project, it will start loading the data file and proceed to index all the data points.
Once the data is loaded, you will be prompted to start making kNN queries. You
will be asked for a k, number of Nearest Neighbors to be retrieved, and then
the query point from which start the search. You will need to enter the values
in the following order of attributes: valence, acousticness, daceability, duration_ms, energy, explicit, instrumentalness, key, liveness, loudness, mode, popularity, speechiness, tempo
.