-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[voicerss] Add support for WAV audio format #11916
Conversation
@@ -121,22 +135,12 @@ public AudioStream synthesize(String text, Voice voice, AudioFormat requestedFor | |||
if (!voices.contains(voice)) { | |||
throw new TTSException("The passed voice is unsupported"); | |||
} | |||
boolean isAudioFormatSupported = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check isn't needed here any longer because the methods getApiAudioCodec() and getApiAudioFormat() called later on throw a TTSException if a format isn't supported.
Does the service let you choose the output format (I do not remember and I have not checked) ? WAV files are probably bigger, so will take more place in RAM (for RPI) and will mean probably a bigger delay to start the playback. Does it provide a better audio quality in practice ? Please fix the conflict. Then I will review. |
Signed-off-by: Andreas Brenk <mail@andreasbrenk.com>
Signed-off-by: Andreas Brenk <mail@andreasbrenk.com>
The URL contains the requested audio format (see VoiceRSSCloudImpl.java#L259). Supported formats are listed in the VoiceRSS API documentation in the "Audio Codecs" (MP3,WAV, AAC, OGG, CAF) and "Audio Formats" (8 kHz - 48 kHz, 8 or 16 Bit) sections. Anything above 8-Bit, 16 kHz is pretty much the same quality, it's just generated speech after all. Below that it sounds muffled. But it's not a question of quality, rather than what formats the audio sink supports. In my case I need the audio in WAV, 22 kHz, 16 Bit to stream to my u::Lux switches (smart wall switch with integrated speaker). I don't think that even a Raspberry Pi would have any trouble with the file size difference. Converting the audio format between audio source and sink would add latency and processing time on the other hand. So it's best to let VoiceRSS generate the needed format. I have rebased to main (fixing the conflict and updating copyright headers in the two new files). |
Ok, I understand now the main purpose of this enhancement. |
...e.voicerss/src/main/java/org/openhab/voice/voicerss/internal/cloudapi/VoiceRSSCloudImpl.java
Show resolved
Hide resolved
@@ -106,7 +106,7 @@ private void generateCacheForMessage(String apiKey, String cacheDir, String loca | |||
return; | |||
} | |||
CachedVoiceRSSCloudImpl impl = new CachedVoiceRSSCloudImpl(cacheDir); | |||
File cachedFile = impl.getTextToSpeechAsFile(apiKey, trimmedMsg, locale, voice, "MP3"); | |||
File cachedFile = impl.getTextToSpeechAsFile(apiKey, trimmedMsg, locale, voice, "MP3", null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little lost. Why is MP3 hardcoded here ?
And using null as last new parameter can not lead to problems (NPE) ?
More generally, using VoiceRSS with your changes, what will be the default codec used ? Will it be MP3 if the sink supports MP3 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little lost. Why is MP3 hardcoded here ?
"MP3" was already hardcoded before and not parameterized in the commit to "add support for OGG and AAC audio formats" because it's probably unused. So I also didn't touch it.
This file is meant to be called from the CLI, see Caching in README.md. Perhaps it should be deleted as calling the openhab:voice say <text>
command in the runtime console has the same effect.
And using null as last new parameter can not lead to problems (NPE) ?
No risk of a NPE: if (!Objects.equals(format, "44khz_16bit_mono")) { ... }
More generally, using VoiceRSS with your changes, what will be the default codec used ? Will it be MP3 if the sink supports MP3?
Hm. Good question. I think so, if the sink does not support WAV. That's decided in VoiceManagerImpl
:
Set<AudioFormat> audioFormats = tts.getSupportedFormats();
...
AudioFormat audioFormat = getBestMatch(audioFormats, sink.getSupportedFormats());
So I think that's not really an implementation concern of the VoiceRSS binding but the openhab-core classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that it was previously hardcoded with MP3 but I did not remember why. As I was probably the one who added OGG/AAC, if I left MP3 hardcoded here, there is certainly a good reason, I hope :) But I will check that again, considering your comments about the CLI command to build the cache.
I would like to be sure that these changes will not lead to users "loosing" their cache in practice because the binding is now using another codec and other file names in the cache.
Currently, I see that my cached files have all the MP3 extension.
Let me the time to analyse more in details the code and to test your changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made sure not to change the filename for previous uses of the default 44khz_16bit_mono format:
/**
* Gets a unique filename [...] suffixed by the format if it is not the default of "44khz_16bit_mono".
*/
private String getUniqueFilenameForText(String text, String locale, String voice, String format) {
...
if (!Objects.equals(format, "44khz_16bit_mono")) {
filename += "_" + format;
}
...
}
Thanks for your review so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of audio sinks support MP3 and WAV: javasound, webaudio, chromecast, kodi, squeezebox, ...
VoiceManagerImpl#getPreferredFormat
seems to prefer WAV, so I guess you're right that this change might lead to cache misses.
But it would be somewhat unfitting to have special "prefer MP3" code only in this binding. Better to implement something in openhab-core.
There's code duplication between VoiceManagerImpl#getPreferredFormat
and AudioFormat#getPreferredFormat
, the latter has a strange defaultFrequency = 16384
setting, that is not contained in the list of common sampling frequencies in Wikipedia (the correct value would be 16000 Hz). This is corrected in VoiceManagerImpl
to 44100 Hz. I already wrote that bitRate is practically unused. bigEndian only makes sense for WAV with a bit depth of 16 bit but not for 8 bit WAV or MP3 or other compressed formats. Maybe it's time for a little cleanup and refactoring there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should just add an advanced parameter to let the user force an audio format.
By default, the binding will support WAV, MP3 , OGG, ... If set to MP3 for example, the binding will only consider MP3 format.
Like that, we have a way to keep backward compatibility + compatibility with the cache tool.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A config option to opt-in to WAV support could be implemented. The return value of getSupportedFormats()
could be without WAV in the default case then. But I think this is not something a single binding should do as a special case.
The code that decides what audio format to use is in openhab-core. I can't say if a sink can prefer a format by returning it first in the of list of supported formats. Oh, I see that it's a Set
and not a list, so I guess returning a LinkedHashSet
would be possible but isn't really part of the contract of AudioSink
.
Another possibility is to mention how to batch convert the cache in the release notes:
for i in userdata/voicerss/cache/*.mp3; do sox "$i" "${i%.mp3}.wav"; done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default, the binding will support WAV, MP3 , OGG, ... If set to MP3 for example, the binding will only consider MP3 format.
Like that, we have a way to keep backward compatibility
Listing the preferred audio formats to use as a config option in openhab-core would be a good idea. But would not solve our problem here, because -core preferred wav and should continue to do so when such an option is introduced, I guess.
So if you do not think a remark in the release notes is enough we should start to implement an option for the voicerss binding and maybe later on do something about it in -core. Perhaps even changing AudioSink
to return a List
. A sink should have the possibility to specify a preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I was thinking of is clearly an option in voicerss. My concern is users who created cached files for voicerss using the provided tool. Currently all these files are MP3 files.
My idea was a binding parameter that would limit the audio formats handled by the binding to a particular audio format, MP3 as an example.
But now I think the best option is just to enhance the tool to let the user select the output audio format for cached files to be created. Like that, to switch to WAV, files, the user will have only to rerurn his command but asking WAV format.
Can you enhance the tool with one additional optional command line argument that will be the audio codec to be used? The call will then use this argument rather than hardcoded string "MP3".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I will do that!
@abrenk : any progress ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
In fact, I am going to merge your PR now. |
I just tested and even if a MP3 file is still produced in my case, I see "_44khz_16_mono" added to my files. |
I think I found the problem: you were building an audio format code like |
* [voicerss] add unit test for supported formats * [voicerss] add support for WAV audio format Signed-off-by: Andreas Brenk <mail@andreasbrenk.com>
* [voicerss] add unit test for supported formats * [voicerss] add support for WAV audio format Signed-off-by: Andreas Brenk <mail@andreasbrenk.com> Signed-off-by: Nick Waterton <n.waterton@outlook.com>
* [voicerss] add unit test for supported formats * [voicerss] add support for WAV audio format Signed-off-by: Andreas Brenk <mail@andreasbrenk.com>
* [voicerss] add unit test for supported formats * [voicerss] add support for WAV audio format Signed-off-by: Andreas Brenk <mail@andreasbrenk.com> Signed-off-by: Andras Uhrin <andras.uhrin@gmail.com>
This pull request adds support for WAV / PCM audio in all frequencies and bit depths supported by VoiceRSS.
Support for MP3, Ogg Vorbis and AAC was kept as it was (44.1 kHz, 16 bit) but could be extended based on this work.
The file-based cache was kept backwards compatible and only adds a format specifier to the file name if it is different from the default.