Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[voicerss] Add support for WAV audio format #11916

Merged
merged 2 commits into from
Jan 22, 2022
Merged

Conversation

abrenk
Copy link
Member

@abrenk abrenk commented Dec 31, 2021

This pull request adds support for WAV / PCM audio in all frequencies and bit depths supported by VoiceRSS.

Support for MP3, Ogg Vorbis and AAC was kept as it was (44.1 kHz, 16 bit) but could be extended based on this work.

The file-based cache was kept backwards compatible and only adds a format specifier to the file name if it is different from the default.

@abrenk abrenk added the enhancement An enhancement or new feature for an existing add-on label Dec 31, 2021
@abrenk abrenk requested a review from JochenHiller as a code owner December 31, 2021 15:41
@@ -121,22 +135,12 @@ public AudioStream synthesize(String text, Voice voice, AudioFormat requestedFor
if (!voices.contains(voice)) {
throw new TTSException("The passed voice is unsupported");
}
boolean isAudioFormatSupported = false;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check isn't needed here any longer because the methods getApiAudioCodec() and getApiAudioFormat() called later on throw a TTSException if a format isn't supported.

@lolodomo
Copy link
Contributor

lolodomo commented Jan 9, 2022

Does the service let you choose the output format (I do not remember and I have not checked) ?

WAV files are probably bigger, so will take more place in RAM (for RPI) and will mean probably a bigger delay to start the playback. Does it provide a better audio quality in practice ?

Please fix the conflict. Then I will review.

abrenk added 2 commits January 9, 2022 13:46
Signed-off-by: Andreas Brenk <mail@andreasbrenk.com>
Signed-off-by: Andreas Brenk <mail@andreasbrenk.com>
@abrenk
Copy link
Member Author

abrenk commented Jan 9, 2022

The URL contains the requested audio format (see VoiceRSSCloudImpl.java#L259). Supported formats are listed in the VoiceRSS API documentation in the "Audio Codecs" (MP3,WAV, AAC, OGG, CAF) and "Audio Formats" (8 kHz - 48 kHz, 8 or 16 Bit) sections.

Anything above 8-Bit, 16 kHz is pretty much the same quality, it's just generated speech after all. Below that it sounds muffled.

But it's not a question of quality, rather than what formats the audio sink supports.

In my case I need the audio in WAV, 22 kHz, 16 Bit to stream to my u::Lux switches (smart wall switch with integrated speaker).

I don't think that even a Raspberry Pi would have any trouble with the file size difference. Converting the audio format between audio source and sink would add latency and processing time on the other hand. So it's best to let VoiceRSS generate the needed format.

I have rebased to main (fixing the conflict and updating copyright headers in the two new files).

@lolodomo
Copy link
Contributor

lolodomo commented Jan 9, 2022

But it's not a question of quality, rather than what formats the audio sink supports.

Ok, I understand now the main purpose of this enhancement.

@@ -106,7 +106,7 @@ private void generateCacheForMessage(String apiKey, String cacheDir, String loca
return;
}
CachedVoiceRSSCloudImpl impl = new CachedVoiceRSSCloudImpl(cacheDir);
File cachedFile = impl.getTextToSpeechAsFile(apiKey, trimmedMsg, locale, voice, "MP3");
File cachedFile = impl.getTextToSpeechAsFile(apiKey, trimmedMsg, locale, voice, "MP3", null);
Copy link
Contributor

@lolodomo lolodomo Jan 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little lost. Why is MP3 hardcoded here ?
And using null as last new parameter can not lead to problems (NPE) ?

More generally, using VoiceRSS with your changes, what will be the default codec used ? Will it be MP3 if the sink supports MP3 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little lost. Why is MP3 hardcoded here ?

"MP3" was already hardcoded before and not parameterized in the commit to "add support for OGG and AAC audio formats" because it's probably unused. So I also didn't touch it.

This file is meant to be called from the CLI, see Caching in README.md. Perhaps it should be deleted as calling the openhab:voice say <text> command in the runtime console has the same effect.

And using null as last new parameter can not lead to problems (NPE) ?

No risk of a NPE: if (!Objects.equals(format, "44khz_16bit_mono")) { ... }

More generally, using VoiceRSS with your changes, what will be the default codec used ? Will it be MP3 if the sink supports MP3?

Hm. Good question. I think so, if the sink does not support WAV. That's decided in VoiceManagerImpl:

Set<AudioFormat> audioFormats = tts.getSupportedFormats();
...
AudioFormat audioFormat = getBestMatch(audioFormats, sink.getSupportedFormats());

So I think that's not really an implementation concern of the VoiceRSS binding but the openhab-core classes.

Copy link
Contributor

@lolodomo lolodomo Jan 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that it was previously hardcoded with MP3 but I did not remember why. As I was probably the one who added OGG/AAC, if I left MP3 hardcoded here, there is certainly a good reason, I hope :) But I will check that again, considering your comments about the CLI command to build the cache.

I would like to be sure that these changes will not lead to users "loosing" their cache in practice because the binding is now using another codec and other file names in the cache.
Currently, I see that my cached files have all the MP3 extension.

Let me the time to analyse more in details the code and to test your changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made sure not to change the filename for previous uses of the default 44khz_16bit_mono format:

/**
 * Gets a unique filename [...] suffixed by the format if it is not the default of "44khz_16bit_mono".
 */
private String getUniqueFilenameForText(String text, String locale, String voice, String format) {
    ...
    if (!Objects.equals(format, "44khz_16bit_mono")) {
        filename += "_" + format;
    }
    ...
}

Thanks for your review so far.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of audio sinks support MP3 and WAV: javasound, webaudio, chromecast, kodi, squeezebox, ...

VoiceManagerImpl#getPreferredFormat seems to prefer WAV, so I guess you're right that this change might lead to cache misses.

But it would be somewhat unfitting to have special "prefer MP3" code only in this binding. Better to implement something in openhab-core.

There's code duplication between VoiceManagerImpl#getPreferredFormat and AudioFormat#getPreferredFormat, the latter has a strange defaultFrequency = 16384 setting, that is not contained in the list of common sampling frequencies in Wikipedia (the correct value would be 16000 Hz). This is corrected in VoiceManagerImpl to 44100 Hz. I already wrote that bitRate is practically unused. bigEndian only makes sense for WAV with a bit depth of 16 bit but not for 8 bit WAV or MP3 or other compressed formats. Maybe it's time for a little cleanup and refactoring there.

Copy link
Contributor

@lolodomo lolodomo Jan 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should just add an advanced parameter to let the user force an audio format.
By default, the binding will support WAV, MP3 , OGG, ... If set to MP3 for example, the binding will only consider MP3 format.
Like that, we have a way to keep backward compatibility + compatibility with the cache tool.
WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A config option to opt-in to WAV support could be implemented. The return value of getSupportedFormats() could be without WAV in the default case then. But I think this is not something a single binding should do as a special case.

The code that decides what audio format to use is in openhab-core. I can't say if a sink can prefer a format by returning it first in the of list of supported formats. Oh, I see that it's a Set and not a list, so I guess returning a LinkedHashSet would be possible but isn't really part of the contract of AudioSink.

Another possibility is to mention how to batch convert the cache in the release notes:

for i in userdata/voicerss/cache/*.mp3; do sox "$i" "${i%.mp3}.wav"; done

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, the binding will support WAV, MP3 , OGG, ... If set to MP3 for example, the binding will only consider MP3 format.
Like that, we have a way to keep backward compatibility

Listing the preferred audio formats to use as a config option in openhab-core would be a good idea. But would not solve our problem here, because -core preferred wav and should continue to do so when such an option is introduced, I guess.

So if you do not think a remark in the release notes is enough we should start to implement an option for the voicerss binding and maybe later on do something about it in -core. Perhaps even changing AudioSink to return a List. A sink should have the possibility to specify a preference.

Copy link
Contributor

@lolodomo lolodomo Jan 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was thinking of is clearly an option in voicerss. My concern is users who created cached files for voicerss using the provided tool. Currently all these files are MP3 files.

My idea was a binding parameter that would limit the audio formats handled by the binding to a particular audio format, MP3 as an example.

But now I think the best option is just to enhance the tool to let the user select the output audio format for cached files to be created. Like that, to switch to WAV, files, the user will have only to rerurn his command but asking WAV format.

Can you enhance the tool with one additional optional command line argument that will be the audio codec to be used? The call will then use this argument rather than hardcoded string "MP3".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will do that!

@lolodomo
Copy link
Contributor

@abrenk : any progress ?

Copy link
Contributor

@lolodomo lolodomo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lolodomo
Copy link
Contributor

In fact, I am going to merge your PR now.
The cache tool can be updated in a separate PR.

@lolodomo lolodomo merged commit 03b5347 into openhab:main Jan 22, 2022
@lolodomo lolodomo added this to the 3.3 milestone Jan 22, 2022
@lolodomo
Copy link
Contributor

I just tested and even if a MP3 file is still produced in my case, I see "_44khz_16_mono" added to my files.
So your test is probably not working. I will see if I find why.

@lolodomo
Copy link
Contributor

lolodomo commented Jan 22, 2022

I think I found the problem: you were building an audio format code like 44khz_16_mono instead of expected 44khz_16bit_mono.
Fixed.
I think even the audio quality was reduced.

nemerdaud pushed a commit to nemerdaud/openhab-addons that referenced this pull request Jan 28, 2022
* [voicerss] add unit test for supported formats
* [voicerss] add support for WAV audio format

Signed-off-by: Andreas Brenk <mail@andreasbrenk.com>
NickWaterton pushed a commit to NickWaterton/openhab-addons that referenced this pull request Apr 27, 2022
* [voicerss] add unit test for supported formats
* [voicerss] add support for WAV audio format

Signed-off-by: Andreas Brenk <mail@andreasbrenk.com>
Signed-off-by: Nick Waterton <n.waterton@outlook.com>
andan67 pushed a commit to andan67/openhab-addons that referenced this pull request Nov 6, 2022
* [voicerss] add unit test for supported formats
* [voicerss] add support for WAV audio format

Signed-off-by: Andreas Brenk <mail@andreasbrenk.com>
andrasU pushed a commit to andrasU/openhab-addons that referenced this pull request Nov 12, 2022
* [voicerss] add unit test for supported formats
* [voicerss] add support for WAV audio format

Signed-off-by: Andreas Brenk <mail@andreasbrenk.com>
Signed-off-by: Andras Uhrin <andras.uhrin@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An enhancement or new feature for an existing add-on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants