ROS Package bob_whisper_cpp
This ROS node is a basic STT Voice Assistant that accepts voice commands from the microphone and publishes it to a ROS std_msg/msg/String topic for further processing. It's basically the same as the command
example of the whisper.cpp
project extended with a ROS integration.
More infos about the origin example:
# Run the command ROS Node with default arguments and small model
ros2 run bob_whisper_cpp command -m ./models/ggml-small.en.bin -t 8
This ROS node is also as a pre build docker image available.
See below how to use it
The whisper.cpp
library has a script to download models.
# assuming you are in the home directory which contains the colcon workspace
mkdir ./models 2>/dev/null
./colcon_ws/src/bob_whisper_cpp/whisper.cpp/models/download-ggml-model.sh base.en ./models
# run the ROS command node with the model
ros2 run bob_whisper_cpp command -m ./models/ggml-base.en.bin -t 16 --prompt "Hey Bob" --prompt-ms 1000 --vad-thold 0.7 --ros-args -r command:=/gpt/gpt_in
To get further information see here
- This ROS Node depends on the whisper.cpp project.
Whisper.cpp
is integrated as a GIT submodule and is downloaded along with cloning this package with the option--recurse-submodules
. - The SDL2 library is used to capture audio from the microphone. The library can be installed like this:
# Install SDL2
sudo apt-get install libsdl2-dev
cd <colcon_ws>/src
git clone --recurse-submodules /~https://github.com/bob-ros2/bob_whisper_cpp.git
cd ..
# this builds the whisper.cpp and the bob_whisper_cpp ROS node
colcon build
. install/setup.bash
Name: command
Type: std_msgs/msg/String
Outputs the detected text.
"Guided Mode" allows you to specify a list of commands (i.e. strings) and the transcription will be guided to classify your command into one from the list. This can be useful in situations where a device is listening only for a small subset of commands.
This approach might be extremely efficient in terms of performance.
# Run in guided mode, the list of allowed commands is in commands.txt
ros2 run bob_whisper_cpp command \
-m ./models/ggml-base.en.bin \
-cmd ./examples/command/commands.txt
The "Activation Prompt" allows you to specify a prefix to identfy the beginning of a command. The prefix text will be stripped from the spoken text.
# Run with activation prompt and remaps the ROS topic to another topic name
ros2 run bob_whisper_cpp command \
--model ./models/ggml-base.en.bin \
--threads 8 \
--prompt "Hey Bob" \
--prompt-ms 1000 \
--vad-thold 0.7 \
--ros-args -r command:=/bob/llm/llm_in
This ROS node has no ROS Node related parameters and has to be configured using the commandline arguments.
Using --ros-args
, e.g. for topic remapping, is still possible. These args must be added at the end of the command arguments.
# show help
$ ros2 run bob_whisper_cpp command --help
usage: command [options]
options:
-h, --help [default] show this help message and exit
-t N, --threads N [4 ] number of threads to use during computation
-pms N, --prompt-ms N [5000 ] prompt duration in milliseconds
-cms N, --command-ms N [8000 ] command duration in milliseconds
-c ID, --capture ID [-1 ] capture device ID
-mt N, --max-tokens N [32 ] maximum number of tokens per audio chunk
-ac N, --audio-ctx N [0 ] audio context size (0 - all)
-vth N, --vad-thold N [0.60 ] voice activity detection threshold
-fth N, --freq-thold N [100.00 ] high-pass frequency cutoff
-tr, --translate [false ] translate from source language to english
-ps, --print-special [false ] print special tokens
-pe, --print-energy [false ] print sound energy (for debugging)
-ng, --no-gpu [false ] disable GPU
-fa, --flash-attn [false ] flash attention
-l LANG, --language LANG [en ] spoken language
-m FNAME, --model FNAME [models/ggml-base.en.bin] model path
-f FNAME, --file FNAME [ ] text output file name
-cmd FNAME, --commands FNAME [ ] text file with allowed commands
-p, --prompt [ ] the required activation prompt
-ctx, --context [ ] sample text to help the transcription
--grammar GRAMMAR [ ] GBNF grammar to guide decoding
--grammar-penalty N [100.0 ] scales down logits of nongrammar tokens
--suppress-regex REGEX [ ] regular expression matching tokens to suppress