Table of Contents
This project aims to classify music genres. Music Genre Classification is an Audio Signal Processing project. Signal Processing is one of the sub-fields of Deep Learning apart from Image Processing and Natural Language Processing. The GTZAN dataset consists of "wav" audio files. The Librosa library was used to extract the features of these audio files (more on Preprocess section). Different architectures have been created to classification (NN, LSTM, CNN...).
The GTZAN dataset was used. Briefly, the data set consists of 10 classes and the CSV file contains many attributes such as MFCC, Chroma, RMS. In addition, there are two different CSV in the dataset, whose attributes are extracted on the basis of 3 seconds and 30 seconds.
The Classes of GTZAN (Image by Author)
As I said earlier, Librosa was used for feature extraction. Features in CSV were not used. I extracted my own features instead of existing features. These features are the top 13 of the MFCCs. Each data was read sequentially. At the same time, the MFCC features are extracted and their labels are respectively added to a json file.
The code cell below, can be seen how MFCC's are extracted.
y, sample_rate = librosa.load(file_path, sr=SAMPLE_RATE)
librosa.feature.mfcc(y, sample_rate, n_mfcc=13, n_fft=2048, hop_length=512)
Various architectures were built for training. Some of these are ANN, Vanilla LSTM, Stacked LSTM and various CNN architectures. The best was CNN architecture. Later the model was strengthened with regularizer, normalization etc. In short, the model consisted of three Convolutional Layers and an output layer. Pooling Layer and Normalization Layers follow Conv Layer. The accuracy of the model on the test dataset is almost 80%. The model architecture can be seen below.
Model Architecture (Image by Author)
After the Deep Learning part was over, it was time for the Web Application part. Flask was used to do this. The Web Application consists of 4 pages. These are Home (where the audio file is uploaded), Project (a brief description of the project here), About (dedicated to the team), and finally the Contact page.
Users can contact the team on the Contact Page. After users submit the form, various information is saved/logged in the MySQL database and mailed to the predefined email address.
SQL query that saves data to MySQL database:
CREATE TABLE contacts (
id INT(6) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
fullname VARCHAR(30) NOT NULL,
email VARCHAR(30) NOT NULL,
phone_number VARCHAR(50),
url VARCHAR(50),
message VARCHAR(200),
reg_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);
The model cannot predict MP3 audio files. That's why FFMPEG was used (
1.Fork this repository.
git clone https://github.com/MelihGulum/Music-Genre-Classification.git
2.Load the dependencies of the project
pip install -r requirements.txt
3.Now you can run project.
flask --app MGC_flask.py --debug run