Skip to content

Speech-to-Text based on SileroVAD + whisper.cpp (GGML Whisper) for ROS 2

License

Notifications You must be signed in to change notification settings

mgonzs13/whisper_ros

Repository files navigation

whisper_ros

This repository provides a set of ROS 2 packages to integrate whisper.cpp into ROS 2 using audio_common. Besides, silero-vad is used to perform VAD (Voice Activity Detection).

Table of Contents

  1. Related Projects
  2. Installation
  3. Docker
  4. Usage
  5. Demos

Related Projects

  • chatbot_ros → This chatbot, integrated into ROS 2, uses whisper_ros, to listen to people speech; and llama_ros, to generate responses. The chatbot is controlled by a state machine created with YASMIN.

Installation

To run whisper_ros with CUDA, first, you must install the CUDA Toolkit.

$ cd ~/ros2_ws/src
$ git clone https://github.com/mgonzs13/audio_common.git
$ git clone https://github.com/mgonzs13/whisper_ros.git
$ sudo apt install portaudio19-dev
$ pip3 install -r audio_common/requirements.txt
$ pip3 install -r whisper_ros/requirements.txt
$ cd ~/ros2_ws
$ colcon build --cmake-args -DGGML_CUDA=ON # add this for CUDA

Docker

Build the whisper_ros docker. Additionally, you can choose to build whisper_ros with CUDA (USE_CUDA) and choose the CUDA version (CUDA_VERSION). Remember that you have to use DOCKER_BUILDKIT=0 to compile whisper_ros with CUDA when building the image.

$ DOCKER_BUILDKIT=0 docker build -t whisper_ros --build-arg USE_CUDA=1 --build-arg CUDA_VERSION=12-6 .

Run the docker container. If you want to use CUDA, you have to install the NVIDIA Container Tollkit and add --gpus all.

$ docker run -it --rm --gpus all whisper_ros

Usage

Run Silero for VAD and Whisper for STT:

$ ros2 launch whisper_bringup whisper.launch.py

Demos

Send a goal action to listen:

$ ros2 action send_goal /whisper/listen whisper_msgs/action/STT "{}"

Or try the example of a whisper client:

$ ros2 run whisper_demos whisper_demo_node