The Keyword Spotting project is a real-time system for recognizing specific spoken words or phrases. Utilizing deep learning, it analyzes audio inputs to identify and classify spoken keywords, suitable for applications such as voice-activated assistants and home automation systems.
- Sony Spresense main board: The core microcontroller unit for processing.
- Spresense extension board: For additional connectivity and features.
- Microphone: Following the tutorial for using multiple microphone inputs with Spresense.
- OLED Display (SSD1306): For real-time display of keyword spotting results.
- 32GB Micro SD Card: Used to store the DSP driver for audio processing.
While environment_install.sh
takes care of setting up the software environment, it is important to be aware of the key dependencies:
- Python 3.8 or later
- PyTorch
- ONNX
- NumPy
- Additional Python libraries listed in requirements.txt
Install the required Python libraries by executing the following command in your terminal:
bash ./environment_install.sh
Download the dataset from this link and unzip it in the project's root folder. Ensure that the data directory structure is correct as expected by the scripts in the project.
Execute the run.sh
script to run the entire project, which includes steps for training, testing, and model conversion:
bash ./run.sh
The deployment on Arduino is facilitated through the run.sh
script, which prepares the model in a compatible format:
- Follow this intruction to set up your arduino Board Manager: https://github.com/zhenyulincs/Sony-Spresense-Arduino-TFMicro
- Run the
run.sh
script to ensure the model is converted to a TensorFlow Lite format and then to a C header file. - Integrate the generated C header file with your Arduino sketch and upload it to the Arduino environment.