Welcome to the Musical Instrument Sound Classifier repository!
This project utilizes machine learning to classify musical instrument sounds using Mel Spectrogram features extracted from audio files.
The repository is structured to facilitate easy exploration, experimentation, and deployment of the classifier.
This project aims to classify sounds of musical instruments such as guitar, piano, drums, and violin. Key highlights include:
- Mel Spectrogram feature extraction for audio preprocessing.
- Pre-trained and custom-trained models for experimentation.
- Progressive improvements documented through various model iterations.
- Deployment-ready server for real-time classification.
- Multiple Models: Six different models, each with unique approaches, are trained and evaluated.
- Visualization: Confusion matrices and model performance metrics are documented.
- Deployment: Dockerized server and web interface for easy deployment.
- Research-Driven Development: Insights and research guiding the development process are documented inside this file and this one.
- Python Environment: Install Python 3.11+.
- Pytroch: Install Pytorch.
- Dependencies: Install dependencies with:
pip install -r requirements.txt
- Docker(Optional, but recommend): Ensure Docker is installed for deployment.
- Clone this repository:
git clone https://github.com/LMicol/instrument-classifier
- Navigate to the project directory:
cd instrument-classifier/
- Set up the environment and install dependencies.
The dataset used in this project can be found at Micol/musical-instruments-sound-dataset.
I've used this Kaggle dataset as base and made some changes using the scripts in src/helpers
.
Explore the src/models
directory for Jupyter Notebooks to train and evaluate models. Each model has its corresponding training script and saved weights.
To train a model in your machine or play with the notebooks, you will need to setup the whole environment.
I recommend using docker and the web view if you just want to test the final model.
In the folder src/web
you'll find a simple web interface to test the model API with your microfone, I recommend using Firefox to test it.
If you allow your microphone, the response from the model will be highlighted with a red box.
If you want to use a file upload, the highlight will be a blue box and won't change.
For deployment of both the server and web interface, use the docker-compose.yml
file provided in the repository. This will set up two services:
- Web Interface: Runs on port
5000
. - Audio Server: Runs on port
8000
.
To deploy, run the following command in the project root directory:
docker-compose up --build
Access the services:
- Web Interface:
http://localhost:5000
- API Server:
http://localhost:8000
Access the web interface at http://localhost:5000
.
The docs/research.md
file contains detailed information about the research conducted to guide model development.
The file is structured in three categories:
- Personal Thoughts: Personal monologues and internal discussion I've had
- Actions: Things I've done.
- Research Oobservations: Comments about code, model behavior and, research overall.
The docs/ideas.md
file includes:
- How the model was developed and idea behind the implementation.
- Results for each model iteration.
- Confusion matrices and performance insights.
The images
directory contains performance metrics for each model.
This project is licensed under the MIT License. See the LICENSE file for details.