Skip to content

bob-ros2/bob_whisper_cpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ROS Package bob_whisper_cpp

ROS Node Command

This ROS node is a basic STT Voice Assistant that accepts voice commands from the microphone and publishes it to a ROS std_msg/msg/String topic for further processing. It's basically the same as the command example of the whisper.cpp project extended with a ROS integration.
More infos about the origin example:

# Run the command ROS Node with default arguments and small model
ros2 run bob_whisper_cpp command -m ./models/ggml-small.en.bin -t 8

Docker Support

This ROS node is also as a pre build docker image available.
See below how to use it

Getting Models

The whisper.cpp library has a script to download models.

# assuming you are in the home directory which contains the colcon workspace
mkdir ./models 2>/dev/null
./colcon_ws/src/bob_whisper_cpp/whisper.cpp/models/download-ggml-model.sh base.en ./models

# run the ROS command node with the model
ros2 run bob_whisper_cpp command -m ./models/ggml-base.en.bin -t 16 --prompt "Hey Bob" --prompt-ms 1000 --vad-thold 0.7 --ros-args -r command:=/gpt/gpt_in

To get further information see here

Dependencies

  • This ROS Node depends on the whisper.cpp project. Whisper.cpp is integrated as a GIT submodule and is downloaded along with cloning this package with the option --recurse-submodules.
  • The SDL2 library is used to capture audio from the microphone. The library can be installed like this:
# Install SDL2
sudo apt-get install libsdl2-dev

Setup Package

cd <colcon_ws>/src
git clone --recurse-submodules /~https://github.com/bob-ros2/bob_whisper_cpp.git
cd ..
# this builds the whisper.cpp and the bob_whisper_cpp ROS node
colcon build
. install/setup.bash

Published ROS Topics

Name: command
Type: std_msgs/msg/String
Outputs the detected text.

Guided Mode

"Guided Mode" allows you to specify a list of commands (i.e. strings) and the transcription will be guided to classify your command into one from the list. This can be useful in situations where a device is listening only for a small subset of commands.

This approach might be extremely efficient in terms of performance.

# Run in guided mode, the list of allowed commands is in commands.txt
ros2 run bob_whisper_cpp command \
-m ./models/ggml-base.en.bin \
-cmd ./examples/command/commands.txt

Activation Prompt

The "Activation Prompt" allows you to specify a prefix to identfy the beginning of a command. The prefix text will be stripped from the spoken text.

# Run with activation prompt and remaps the ROS topic to another topic name
ros2 run bob_whisper_cpp command \
--model ./models/ggml-base.en.bin \
--threads 8 \
--prompt "Hey Bob" \
--prompt-ms 1000 \
--vad-thold 0.7 \
--ros-args -r command:=/bob/llm/llm_in

Command Line Arguments

This ROS node has no ROS Node related parameters and has to be configured using the commandline arguments.
Using --ros-args, e.g. for topic remapping, is still possible. These args must be added at the end of the command arguments.

# show help
$ ros2 run bob_whisper_cpp command --help

usage: command [options]

options:
  -h,         --help           [default] show this help message and exit
  -t N,       --threads N      [4      ] number of threads to use during computation
  -pms N,     --prompt-ms N    [5000   ] prompt duration in milliseconds
  -cms N,     --command-ms N   [8000   ] command duration in milliseconds
  -c ID,      --capture ID     [-1     ] capture device ID
  -mt N,      --max-tokens N   [32     ] maximum number of tokens per audio chunk
  -ac N,      --audio-ctx N    [0      ] audio context size (0 - all)
  -vth N,     --vad-thold N    [0.60   ] voice activity detection threshold
  -fth N,     --freq-thold N   [100.00 ] high-pass frequency cutoff
  -tr,        --translate      [false  ] translate from source language to english
  -ps,        --print-special  [false  ] print special tokens
  -pe,        --print-energy   [false  ] print sound energy (for debugging)
  -ng,        --no-gpu         [false  ] disable GPU
  -fa,        --flash-attn     [false  ] flash attention
  -l LANG,    --language LANG  [en     ] spoken language
  -m FNAME,   --model FNAME    [models/ggml-base.en.bin] model path
  -f FNAME,   --file FNAME     [       ] text output file name
  -cmd FNAME, --commands FNAME [       ] text file with allowed commands
  -p,         --prompt         [       ] the required activation prompt
  -ctx,       --context        [       ] sample text to help the transcription
  --grammar GRAMMAR            [       ] GBNF grammar to guide decoding
  --grammar-penalty N          [100.0  ] scales down logits of nongrammar tokens
  --suppress-regex REGEX       [       ] regular expression matching tokens to suppress

About

This ROS package integrates whisper.cpp STT into ROS

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages