[voicerss] Add support for WAV audio format #11916

abrenk · 2021-12-31T15:41:38Z

This pull request adds support for WAV / PCM audio in all frequencies and bit depths supported by VoiceRSS.

Support for MP3, Ogg Vorbis and AAC was kept as it was (44.1 kHz, 16 bit) but could be extended based on this work.

The file-based cache was kept backwards compatible and only adds a format specifier to the file name if it is different from the default.

abrenk · 2021-12-31T19:08:20Z

...hab.voice.voicerss/src/main/java/org/openhab/voice/voicerss/internal/VoiceRSSTTSService.java

@@ -121,22 +135,12 @@ public AudioStream synthesize(String text, Voice voice, AudioFormat requestedFor
        if (!voices.contains(voice)) {
            throw new TTSException("The passed voice is unsupported");
        }
-        boolean isAudioFormatSupported = false;


This check isn't needed here any longer because the methods getApiAudioCodec() and getApiAudioFormat() called later on throw a TTSException if a format isn't supported.

lolodomo · 2022-01-09T11:12:36Z

Does the service let you choose the output format (I do not remember and I have not checked) ?

WAV files are probably bigger, so will take more place in RAM (for RPI) and will mean probably a bigger delay to start the playback. Does it provide a better audio quality in practice ?

Please fix the conflict. Then I will review.

Signed-off-by: Andreas Brenk <mail@andreasbrenk.com>

abrenk · 2022-01-09T12:49:21Z

The URL contains the requested audio format (see VoiceRSSCloudImpl.java#L259). Supported formats are listed in the VoiceRSS API documentation in the "Audio Codecs" (MP3,WAV, AAC, OGG, CAF) and "Audio Formats" (8 kHz - 48 kHz, 8 or 16 Bit) sections.

Anything above 8-Bit, 16 kHz is pretty much the same quality, it's just generated speech after all. Below that it sounds muffled.

But it's not a question of quality, rather than what formats the audio sink supports.

In my case I need the audio in WAV, 22 kHz, 16 Bit to stream to my u::Lux switches (smart wall switch with integrated speaker).

I don't think that even a Raspberry Pi would have any trouble with the file size difference. Converting the audio format between audio source and sink would add latency and processing time on the other hand. So it's best to let VoiceRSS generate the needed format.

I have rebased to main (fixing the conflict and updating copyright headers in the two new files).

lolodomo · 2022-01-09T15:18:08Z

But it's not a question of quality, rather than what formats the audio sink supports.

Ok, I understand now the main purpose of this enhancement.

...e.voicerss/src/main/java/org/openhab/voice/voicerss/internal/cloudapi/VoiceRSSCloudImpl.java

lolodomo · 2022-01-09T17:24:20Z

...org.openhab.voice.voicerss/src/main/java/org/openhab/voice/voicerss/tool/CreateTTSCache.java

@@ -106,7 +106,7 @@ private void generateCacheForMessage(String apiKey, String cacheDir, String loca
            return;
        }
        CachedVoiceRSSCloudImpl impl = new CachedVoiceRSSCloudImpl(cacheDir);
-        File cachedFile = impl.getTextToSpeechAsFile(apiKey, trimmedMsg, locale, voice, "MP3");
+        File cachedFile = impl.getTextToSpeechAsFile(apiKey, trimmedMsg, locale, voice, "MP3", null);


I am a little lost. Why is MP3 hardcoded here ?
And using null as last new parameter can not lead to problems (NPE) ?

More generally, using VoiceRSS with your changes, what will be the default codec used ? Will it be MP3 if the sink supports MP3 ?

I am a little lost. Why is MP3 hardcoded here ?

"MP3" was already hardcoded before and not parameterized in the commit to "add support for OGG and AAC audio formats" because it's probably unused. So I also didn't touch it.

This file is meant to be called from the CLI, see Caching in README.md. Perhaps it should be deleted as calling the openhab:voice say <text> command in the runtime console has the same effect.

And using null as last new parameter can not lead to problems (NPE) ?

No risk of a NPE: if (!Objects.equals(format, "44khz_16bit_mono")) { ... }

More generally, using VoiceRSS with your changes, what will be the default codec used ? Will it be MP3 if the sink supports MP3?

Hm. Good question. I think so, if the sink does not support WAV. That's decided in VoiceManagerImpl:

Set<AudioFormat> audioFormats = tts.getSupportedFormats(); ... AudioFormat audioFormat = getBestMatch(audioFormats, sink.getSupportedFormats());

So I think that's not really an implementation concern of the VoiceRSS binding but the openhab-core classes.

I see that it was previously hardcoded with MP3 but I did not remember why. As I was probably the one who added OGG/AAC, if I left MP3 hardcoded here, there is certainly a good reason, I hope :) But I will check that again, considering your comments about the CLI command to build the cache.

I would like to be sure that these changes will not lead to users "loosing" their cache in practice because the binding is now using another codec and other file names in the cache.
Currently, I see that my cached files have all the MP3 extension.

Let me the time to analyse more in details the code and to test your changes.

I made sure not to change the filename for previous uses of the default 44khz_16bit_mono format:

/** * Gets a unique filename [...] suffixed by the format if it is not the default of "44khz_16bit_mono". */ private String getUniqueFilenameForText(String text, String locale, String voice, String format) { ... if (!Objects.equals(format, "44khz_16bit_mono")) { filename += "_" + format; } ... }

Thanks for your review so far.

A lot of audio sinks support MP3 and WAV: javasound, webaudio, chromecast, kodi, squeezebox, ...

VoiceManagerImpl#getPreferredFormat seems to prefer WAV, so I guess you're right that this change might lead to cache misses.

But it would be somewhat unfitting to have special "prefer MP3" code only in this binding. Better to implement something in openhab-core.

There's code duplication between VoiceManagerImpl#getPreferredFormat and AudioFormat#getPreferredFormat, the latter has a strange defaultFrequency = 16384 setting, that is not contained in the list of common sampling frequencies in Wikipedia (the correct value would be 16000 Hz). This is corrected in VoiceManagerImpl to 44100 Hz. I already wrote that bitRate is practically unused. bigEndian only makes sense for WAV with a bit depth of 16 bit but not for 8 bit WAV or MP3 or other compressed formats. Maybe it's time for a little cleanup and refactoring there.

Maybe we should just add an advanced parameter to let the user force an audio format.
By default, the binding will support WAV, MP3 , OGG, ... If set to MP3 for example, the binding will only consider MP3 format.
Like that, we have a way to keep backward compatibility + compatibility with the cache tool.
WDYT?

A config option to opt-in to WAV support could be implemented. The return value of getSupportedFormats() could be without WAV in the default case then. But I think this is not something a single binding should do as a special case.

The code that decides what audio format to use is in openhab-core. I can't say if a sink can prefer a format by returning it first in the of list of supported formats. Oh, I see that it's a Set and not a list, so I guess returning a LinkedHashSet would be possible but isn't really part of the contract of AudioSink.

Another possibility is to mention how to batch convert the cache in the release notes:

for i in userdata/voicerss/cache/*.mp3; do sox "$i" "${i%.mp3}.wav"; done

By default, the binding will support WAV, MP3 , OGG, ... If set to MP3 for example, the binding will only consider MP3 format.
Like that, we have a way to keep backward compatibility

Listing the preferred audio formats to use as a config option in openhab-core would be a good idea. But would not solve our problem here, because -core preferred wav and should continue to do so when such an option is introduced, I guess.

So if you do not think a remark in the release notes is enough we should start to implement an option for the voicerss binding and maybe later on do something about it in -core. Perhaps even changing AudioSink to return a List. A sink should have the possibility to specify a preference.

What I was thinking of is clearly an option in voicerss. My concern is users who created cached files for voicerss using the provided tool. Currently all these files are MP3 files.

My idea was a binding parameter that would limit the audio formats handled by the binding to a particular audio format, MP3 as an example.

But now I think the best option is just to enhance the tool to let the user select the output audio format for cached files to be created. Like that, to switch to WAV, files, the user will have only to rerurn his command but asking WAV format.

Can you enhance the tool with one additional optional command line argument that will be the audio codec to be used? The call will then use this argument rather than hardcoded string "MP3".

Okay, I will do that!

lolodomo · 2022-01-22T10:58:52Z

@abrenk : any progress ?

lolodomo

LGTM

lolodomo · 2022-01-22T15:52:41Z

In fact, I am going to merge your PR now.
The cache tool can be updated in a separate PR.

lolodomo · 2022-01-22T16:26:51Z

I just tested and even if a MP3 file is still produced in my case, I see "_44khz_16_mono" added to my files.
So your test is probably not working. I will see if I find why.

lolodomo · 2022-01-22T16:39:35Z

I think I found the problem: you were building an audio format code like 44khz_16_mono instead of expected 44khz_16bit_mono.
Fixed.
I think even the audio quality was reduced.

* [voicerss] add unit test for supported formats * [voicerss] add support for WAV audio format Signed-off-by: Andreas Brenk <mail@andreasbrenk.com>

* [voicerss] add unit test for supported formats * [voicerss] add support for WAV audio format Signed-off-by: Andreas Brenk <mail@andreasbrenk.com> Signed-off-by: Nick Waterton <n.waterton@outlook.com>

* [voicerss] add unit test for supported formats * [voicerss] add support for WAV audio format Signed-off-by: Andreas Brenk <mail@andreasbrenk.com>

* [voicerss] add unit test for supported formats * [voicerss] add support for WAV audio format Signed-off-by: Andreas Brenk <mail@andreasbrenk.com> Signed-off-by: Andras Uhrin <andras.uhrin@gmail.com>

abrenk added the enhancement An enhancement or new feature for an existing add-on label Dec 31, 2021

abrenk requested a review from JochenHiller as a code owner December 31, 2021 15:41

abrenk force-pushed the voicerss-wav branch from 14ad4a6 to 030b352 Compare December 31, 2021 15:56

abrenk commented Dec 31, 2021

View reviewed changes

abrenk added 2 commits January 9, 2022 13:46

[voicerss] add unit test for supported formats

8e74130

Signed-off-by: Andreas Brenk <mail@andreasbrenk.com>

[voicerss] add support for WAV audio format

2ccc45d

Signed-off-by: Andreas Brenk <mail@andreasbrenk.com>

abrenk force-pushed the voicerss-wav branch from 030b352 to 2ccc45d Compare January 9, 2022 12:47

lolodomo requested changes Jan 9, 2022

View reviewed changes

lolodomo approved these changes Jan 22, 2022

View reviewed changes

lolodomo merged commit 03b5347 into openhab:main Jan 22, 2022

lolodomo added this to the 3.3 milestone Jan 22, 2022

lolodomo mentioned this pull request Jan 22, 2022

[voicerss] Wrong generated audio format code #12091

Closed

lolodomo mentioned this pull request Jan 29, 2022

[voicerss] Fix and enhance the external tool to prefill the cache #12155

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[voicerss] Add support for WAV audio format #11916

[voicerss] Add support for WAV audio format #11916

abrenk commented Dec 31, 2021

abrenk Dec 31, 2021

lolodomo commented Jan 9, 2022

abrenk commented Jan 9, 2022

lolodomo commented Jan 9, 2022

lolodomo Jan 9, 2022 •

edited

Loading

abrenk Jan 9, 2022

lolodomo Jan 9, 2022 •

edited

Loading

abrenk Jan 9, 2022

abrenk Jan 9, 2022

lolodomo Jan 11, 2022 •

edited

Loading

abrenk Jan 11, 2022

abrenk Jan 11, 2022

lolodomo Jan 11, 2022 •

edited

Loading

abrenk Jan 11, 2022

lolodomo commented Jan 22, 2022

lolodomo left a comment

lolodomo commented Jan 22, 2022

lolodomo commented Jan 22, 2022

lolodomo commented Jan 22, 2022 •

edited

Loading

[voicerss] Add support for WAV audio format #11916

[voicerss] Add support for WAV audio format #11916

Conversation

abrenk commented Dec 31, 2021

abrenk Dec 31, 2021

Choose a reason for hiding this comment

lolodomo commented Jan 9, 2022

abrenk commented Jan 9, 2022

lolodomo commented Jan 9, 2022

lolodomo Jan 9, 2022 • edited Loading

Choose a reason for hiding this comment

abrenk Jan 9, 2022

Choose a reason for hiding this comment

lolodomo Jan 9, 2022 • edited Loading

Choose a reason for hiding this comment

abrenk Jan 9, 2022

Choose a reason for hiding this comment

abrenk Jan 9, 2022

Choose a reason for hiding this comment

lolodomo Jan 11, 2022 • edited Loading

Choose a reason for hiding this comment

abrenk Jan 11, 2022

Choose a reason for hiding this comment

abrenk Jan 11, 2022

Choose a reason for hiding this comment

lolodomo Jan 11, 2022 • edited Loading

Choose a reason for hiding this comment

abrenk Jan 11, 2022

Choose a reason for hiding this comment

lolodomo commented Jan 22, 2022

lolodomo left a comment

Choose a reason for hiding this comment

lolodomo commented Jan 22, 2022

lolodomo commented Jan 22, 2022

lolodomo commented Jan 22, 2022 • edited Loading

lolodomo Jan 9, 2022 •

edited

Loading

lolodomo Jan 9, 2022 •

edited

Loading

lolodomo Jan 11, 2022 •

edited

Loading

lolodomo Jan 11, 2022 •

edited

Loading

lolodomo commented Jan 22, 2022 •

edited

Loading