Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tts] Cache mechanism #3057

Merged
merged 14 commits into from
Feb 28, 2023
Merged

[tts] Cache mechanism #3057

merged 14 commits into from
Feb 28, 2023

Conversation

dalgwen
Copy link
Contributor

@dalgwen dalgwen commented Aug 8, 2022

Implements a cache mechanism for all TTS services.

Reason :

Online TTS service can be costly, and reducing call to the cloud is always good.
It will also improve user experience (less latency for local services)
Amazon Polly TTS and Google TTS both implement their own mechanism, and I thought that it could be interesting to mutualize on the same code base (and for other services as well)

Functional specification :

Eviction policy is LRU mode.
Cache size is a voice bundle parameter (10 mb default)
You can enable or disable this, system wide, or by TTS service Each TTS service has to enable this.
Default mode is cache enabled. I'm wondering if it should be switched to default off, at least for the beginning, what do you think ? (Activated by the TTSService, so it's opt-in for the dev)
It doesn't wait for the stream to end and can serve data as soon as a small bunch is available (10kb).
A side effect functionnality : this cache can serve several streams concurrently, for the same utterance, with only one call to the TTS.
Another side effect : provide transparently, with a wrapper, the benefit of a FixedLengthAudioStream for playing on sink that requires it (such as all the one based on the openHAB audio servlet).

Technical :

A new generic LRUMediaCache is added in the core.cache package.
This class use a double linked list with head and tail, and a hashmap (a rather classical LRU implementation) a LinkedHashMap in LRU mode.
We use it the with get method, which take a Supplier as an argument. The Supplier must give a LRUMediaCacheEntry, which is an InputStream along with an arbitrary metadata object. The Supplier is of course used only if there is a cache miss.
The InputStream will be read on the fly and stored in a file.
The metadata will be stored thanks to the openhab Storage service. It must be a serializable object.
For subsequent call with the same key, a LRUMediaCacheEntry is given to the caller, transparently using the file on disk (even if not fully completed). It uses under the hood an InputStreamCacheWrapper, responsible for querying the cache entry for data.
It allows several clients to request the same TTSResult LRUMediaCacheEntry without waiting, each getting their own InputStream.

A LRU cache (TTSLRUCacheImpl) implementing a simple interface (TTSCache with a get method) uses this cache implementation, and is provided as an OSGI Component.

TTS service can disable this by a hard coded value in the TTS service implementation. (Default method in the interface returns true=enable) Each TTSService must opt-in (simplest solution is to extend the AbstractCachedTTSService which provide the functionnality transparantly).

The TTS AudioStream result is provided by a supplier (AudioStreamSupplier, which can delay call to the TTS service, thus allowing the cache service to not wait during the cached entry creation). (<-- Using a more advanced locking mechanism to lock only the part needed, and removing some fallback mechanism, allowed to get rid of this class.)

The TTS Results LRUMediaCacheEntry are created when calling the TTS service for the first time, or loaded from the disk at startup. An "info" file is stored alongside the sound file and contains the AudioFormat information. <-- No more .info file, using Storage service.

The AudioStreamFromCache object provides an AudioStream wrapper implementation which is send to the sink responsible for playing. This wrapper override the read() method to call the InputStreamCacheWrapper transparently.

A fallback mechanism is implemented (if the cache mechanism failed for whatever reason, then the TTS is directly called). It has been rewritten for genericity and it is now much simpler (the cost is that a little robustness disappears : now a call after a forced file deletion is not honored. But the next will still be)

Closes #3039

Signed-off-by: Gwendal Roulleau gwendal.roulleau@gmail.com

@dalgwen dalgwen requested a review from a team as a code owner August 8, 2022 15:25
@wborn wborn added the enhancement An enhancement or new feature of the Core label Aug 20, 2022
@wborn wborn requested a review from GiviMAD December 10, 2022 15:13
@GiviMAD
Copy link
Member

GiviMAD commented Dec 10, 2022

This looks great, seems also a great idea to reduce the resource usage on offline services. I have a couple of questions about the cache implementation and tts resolver, I will open a review tomorrow and give it a try.

}
logger.debug("Using TTS cache folder '{}'", cacheFolder.getAbsolutePath());

cleanCacheDirectory();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think is ok to move this to a static context so the clean of orphan files is only done on program startup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TTSLRUCacheImpl is instantiated each time the voice config is modified.
If the clean is done one only once, at startup, it won't take into account the new cache size the user could set when he modifies the voice parameters.
He could set a lesser value. The cache has to take this new value and clean again accordingly.

Copy link
Member

@GiviMAD GiviMAD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added some comments.
Do you think this could be refactored into an AudioStream Cache that is not tied to a TTS?
Because I think that if you allow to provide a supplier factory into the TTSCache that generate the AudioStreamSupplier based on the properties file it has more or less no real dependency on the tts itself. Do you think this makes sense?

}
int i = 0;
for (; i < len && i < bytesRead.length; i++) {
b[off + i] = bytesRead[i];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ok for you to implement the InputStream interface in the TTSResult so this class only handles the fallback mechanism?
For me it will look better if this code is inside the result and will allow you to write directly into the input byte[] I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only one TTSResult, but many AudioStream can be instanciated from it (even concurrently)
If the TTSResult implements InputStream, it wouldn't be able to be reused (because once closed it is useless)
The TTSResults are made to be stored as LRU entries, and reused at will by creating several AudioStream from it.

(But it's also equally possible that christmas put a toll on my mind and that I don't understand your proposition 😅 ?)

@openhab-bot
Copy link
Collaborator

This pull request has been mentioned on openHAB Community. There might be relevant details there:

https://community.openhab.org/t/mimic-text-to-speech/137040/10

@dalgwen dalgwen force-pushed the tts_cache branch 2 times, most recently from 12772d7 to 94b5cb1 Compare December 30, 2022 17:02
@dalgwen
Copy link
Contributor Author

dalgwen commented Dec 30, 2022

Thanks @GiviMAD for your propositions and review !

Do you think this could be refactored into an AudioStream Cache that is not tied to a TTS?

What do you mean by tied to a TTS ? Do you mean an AudioStreamLRUCache class with an implementation with no reference to the TTSService class ? If so, then yes, it could be.
Do you think such cache could have some usage beside the one here ? if you provide me some use case I can do it.

Copy link
Member

@J-N-K J-N-K left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank. I have left some comments, but we'll probably need a second round.

Please first have a look at my comment regarding the use of LinkedHashMap as that will probably reduce code size quite a lot.

if (volume != null && !volume.isEmpty()) {
volumePercent = new PercentType(volume);
}
voiceManager.say(text, voiceId, sinkId, volumePercent, enableCache);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding an enable cache parameter here makes it more complicated fr the user. If the cache is enabled for the TTS use it, If it is not enable, don't use it. What would be the use-cache to bypass the cache if it is enabled? (also below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lolodomo provides me with a use case for bypassing the cache :
#3039 (comment)
But simplifying the code and parameters here is also a good objective. I have to say I was not fond of adding the numerous methods for this parameter 😅
What shoud I do ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The contribution guidelines say:

We're trying very hard to keep openHAB lean and focused. We don't want it to do everything for everybody.

I don't think we should blow up API to cover every use-case. At maximum this will add one cache-miss (because the oldest entry is removed and would be a miss on next call). Of course the rate increases the more one-time calls you make, but on the other hand: if the majority of your calls is not to be cached, then using the cache makes no sense anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'm convinced. And the added complexity of a new parameter is a strong argument to my lazy part.
@lolodomo do you have a counter argument, or is it also OK for you ?

*
* @return
*/
public default boolean isCacheEnabled() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Methods in interface should not use public, they are public by default. Please also clean-up the other methods in this TTSService. Why should the service hard-disable the cache? Wouldn't it be better to have a configurable service and let the user decide?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One example : the TTS service already uses its own cache, better suited, and doesn't want the global one ? Or there is a compatibility issue with it ? In this case, if the cache is defaulted to true, a user installing the TTS service and not aware of this subtlety will complain that it is not working.

But, let's be clear : I added this parameter because I wanted to let as many options as possible for other (dev and user), and I didn't want to the cache system to force other TTS. But in fact I'm totally in favor of simplifying the code here and letting only one global parameter, if it is deemed solid enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can default to false, that's not my point. I would prefer a user-configurable option for that, not something in the interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, with a twist I propose : the TTSService is now responsible for using the cache or not.
(see below)

* @return A likely unique key identifying the combination of parameters and/or internal state,
* as a string suitable to be part of a filename. This will be used in the cache system to store the result.
*/
public default String getCacheKey(String text, Voice voice, AudioFormat requestedFormat) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should implement such things in default methods.

From the "Oracle Java Tutorial":

Default methods enable you to add new functionality to the interfaces of your libraries and ensure binary compatibility with code written for older versions of those interfaces.

We don't need binary compatibility for OH4, so we should not use such "hacks" here. Are there other common methods that can be re-used for different TTS services? I would prefer a AbstractTTSService then that implements TTSService and can be extended by the implementations.

Copy link
Contributor Author

@dalgwen dalgwen Jan 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I see two options here :

  • "can be" extended : if TTS service are not compeled to extends the AbstractTTSService, then I have to check with instanceof and use explicit call to the method for getting "getCacheKey" (or "isCacheEnabled" if we keep it)

  • "must be" extended : no alternative method needed, but every TTS service have to be changed accordingly

WDYT ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also add an interface CachedTTSService which extends TTSService. This is probably even a better idea, because it can separate the cache methods from the TTSService methods. TTS services that don't want to (or can't) use the cache implement TTSService, others CachedTTSService (similar to PersistenceService and ModifiablePersistenceService). This also removes the need to programatically disable the cache.

Copy link
Contributor Author

@dalgwen dalgwen Jan 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, with a proposal of separation of concerns :
The TTSCache is now an OSGI service (with TTSLRUCacheImpl implementing it). It respects the global parameter to enable the cache or not.
VoiceManagerImpl is left totally untouched, not tied to the cache anymore.
Each TTSService can choose to use the TTSCache or not.
To do so, they can extend a helper Class (AbstractCachedTTSService, which implements the TTSCachedService extending the TTSService). The TTSCache is injected inside and used transparently.

Comment on lines 125 to 132
String containerS = properties.getProperty(PROPERTY_FORMAT_CONTAINER);
String bigEndianS = properties.getProperty(PROPERTY_FORMAT_BIGENDIAN);
String bitDepthS = properties.getProperty(PROPERTY_FORMAT_BITDEPTH);
String bitRateS = properties.getProperty(PROPERTY_FORMAT_BITRATE);
String channelsS = properties.getProperty(PROPERTY_FORMAT_CHANNELS);
String codecS = properties.getProperty(PROPERTY_FORMAT_CODEC);
String frequencyS = properties.getProperty(PROPERTY_FORMAT_FREQUENCY);
String textS = properties.getProperty(PROPERTY_TEXT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make sense to create a record or class holding this information and leave the serialization/deserialization to Gson? Probably performance is not so much an issue here and you could simplify code to about two or three lines here.

Copy link
Contributor Author

@dalgwen dalgwen Jan 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting !
Sadly, deserializing record needs Gson 2.10, and we have Gson 2.9.1
Do you think it could be pushed for openHAB 4 ?

I will use a standard class for the moment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably when we switch to Karaf 4.4. class is fine for now.

Comment on lines 223 to 225
if (ttsAudioStreamFinal != null && ttsAudioStreamFinal instanceof FixedLengthAudioStream) {
return ((FixedLengthAudioStream) ttsAudioStreamFinal).length();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use pattern matching (see above)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

ttsResultMap = ttsResultOrderedList.stream()
.collect(Collectors.toMap(TTSResult::getKey, Function.identity()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.collect(Collectors.toMap(TTSResult::getKey, Function.identity()));
.collect(Collectors.toMap(TTSResult::getKey, v -> v));

Copy link
Contributor Author

@dalgwen dalgwen Jan 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strangely enough, if I do this, I have a compilation error about a Null type mismatch ?
(EDIT : But the refactor removes the need for such a Collector)

Comment on lines 42 to 43
TTSResult mockedTTSResult = Mockito.mock(TTSResult.class);
AudioStreamSupplier mockedAudioStreamSupplier = Mockito.mock(AudioStreamSupplier.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make these mocks members of the class:

private @Mock @NonNullByDefault({}) TTSResult ttsResultMock;

and they get mocked automatically if you annotate the class with

@ExtendWith(MockitoExtensions.class)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks

Comment on lines 52 to 53
@NonNullByDefault({})
private @Mock Voice voiceMock;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inline null-annotations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@GiviMAD
Copy link
Member

GiviMAD commented Jan 1, 2023

Thanks @GiviMAD for your propositions and review !

Do you think this could be refactored into an AudioStream Cache that is not tied to a TTS?

What do you mean by tied to a TTS ? Do you mean an AudioStreamLRUCache class with an implementation with no reference to the TTSService class ? If so, then yes, it could be. Do you think such cache could have some usage beside the one here ? if you provide me some use case I can do it.

I think that the cache mechanism that you have added is really complete and could be useful to cache any kind of data that is tied to a format. I don't have specific examples other than cache the custom sounds of the dialog. But I think that having a media cache is something valuable.

Copy link
Member

@J-N-K J-N-K left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general looks much better now, and many thanks for your speedy response. I have left some comments.

One general questions: We already have an infrastructure for storing data, the StorageService, which already takes care of serialization/de-serialization and has a nice interface for handling key-value-pairs with an arbitrary unique String as key. With that your "consistency check" should be very much improved: You can use Storage.getKeys and check if the corresponding voice-files exist and Storage.containsKey to check if there is metadata for the voice-file in the cache folder. As a plus you wouldn't need any file handling for the metadata, just call .get/.put on the storage.

@@ -0,0 +1,55 @@
/**
* Copyright (c) 2010-2022 Contributors to the openHAB project
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Copyright (c) 2010-2022 Contributors to the openHAB project
* Copyright (c) 2010-2023 Contributors to the openHAB project

Please also update the other new files, otherwise the build will fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* are not supported or another error occurs while creating an
* {@link AudioStream}
*/
AudioStream getOrSynthetize(TTSCachedService tts, String text, Voice voice, AudioFormat requestedFormat)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
AudioStream getOrSynthetize(TTSCachedService tts, String text, Voice voice, AudioFormat requestedFormat)
AudioStream getOrSynthesize(TTSCachedService tts, String text, Voice voice, AudioFormat requestedFormat)

Can't this be just get?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

import org.slf4j.LoggerFactory;

/**
* Each {@link TTSResult} instance can handle several AudioStream.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Each {@link TTSResult} instance can handle several AudioStream.
* Each {@link TTSResult} instance can handle several {@link AudioStream}s.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


/**
* Each {@link TTSResult} instance can handle several AudioStream.
* This class is a wrapper for such functionality, and can
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* This class is a wrapper for such functionality, and can
* This class is a wrapper for such functionality and can

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

import org.openhab.core.voice.Voice;

/**
* Custom supplier class to defer synthetizing with a TTS service
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Custom supplier class to defer synthetizing with a TTS service
* Custom supplier class to defer synthesizing with a TTS service

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in an intermediary commit, but I deleted the file as I simplified the code...

String key = tts.getClass().getSimpleName() + "_" + tts.getCacheKey(text, voice, requestedFormat);
// try to get from cache
TTSResult ttsResult = get(key);
if (ttsResult == null || !ttsResult.getText().equals(text)) { // it's a cache miss or a false positive, we
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary, but probably will not hurt. It's just a simple string compare.

private long currentSize = 0;
private boolean completed;

@NonNullByDefault({}) // file channel could not be null when we read or write from the wrapper
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct, it's @Nullable. You also ensure that when calling by creating a local variable. Please change the annotation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (and the class is now a generic LRUMediaCacheEntry)

* @param fallbackAudioStreamSupplier If something goes wrong with the cache, this supplier will provide the
* AudioStream directly from the TTS service
*
* @return An @AudioStream that can be used to play sound
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @return An @AudioStream that can be used to play sound
* @return An {@link AudioStream} that can be used to play sound

why not name it getAudioStream?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed.
In fact, it is now a getInputStream() because of the new genericity.

} catch (TTSException e) {
logger.warn("Cannot get or store audio format from the TTS audio service: {}", e.getMessage());
}
audioFormatFinal = audioFormat;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's final, don't reuse it. Either create another object or re-name to something that is not final (e.g. local)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Done by renaming it in "local"

@ExtendWith(MockitoExtension.class)
public class AudioStreamCacheWrapperTest {

TTSResult mockedTTSResult = Mockito.mock(TTSResult.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TTSResult mockedTTSResult = Mockito.mock(TTSResult.class);
private @Mock @NonNullByDefault({}) TTSResult ttsResultMock;

Mockito can do the mocking by itself, no need to call mock. We usually name mocked objects <name>Mock in core. For consistency, please change that (also below, and inline the null-annotation).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks for the tip.

@dalgwen
Copy link
Contributor Author

dalgwen commented Feb 6, 2023

Oops, I did (another) rewrite....
You can blame @GiviMAD for this. He gave me the idea to make the cache generic.
Joking aside, thanks GiviMAD for the idea. I comply to your suggestion because I have a use case in mind (two sinks, pulseaudio and doorbell, use codec conversion before sending audio, and a cache could enhance latency, which is as right now very noticeable)

I also rewrote to use the Storage openhab service, as @J-N-K suggested.
And I did a rebase.

I will update the open post to match the new proposal.

@dalgwen dalgwen force-pushed the tts_cache branch 2 times, most recently from f229298 to fea3693 Compare February 6, 2023 18:23
@GiviMAD
Copy link
Member

GiviMAD commented Feb 6, 2023

Oops, I did (another) rewrite.... You can blame @GiviMAD for this. He gave me the idea to make the cache generic. Joking aside, thanks GiviMAD for the idea. I comply to your suggestion because I have a use case in mind (two sinks, pulseaudio and doorbell, use codec conversion before sending audio, and a cache could enhance latency, which is as right now very noticeable)

Good to know you already found an use for it.
With all the emerging IA services, I think there will be more use cases on the future, an idea that came to my mind will be to creating a DALL·E binding that allows the user to ask for images through a text item, another is an interpreter that allows to ask to ChatGPT. I assume those services will not be cheap so I think a cache like this will help there. Lets see, maybe I'm totally wrong.

Regards!

Implements a cache mechanism for all TTS services.
Eviction policy is LRU mode.
This cache can serve several streams concurrently, for the same utterance, with only one call to the TTS. It doesn't wait for the stream to end and can serve data rapidly.
Cache size is a voice bundle parameter (10 mb default)

Closes openhab#3039

Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
And also the volume parameter

Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
The LRU cache use a LinkedHashMap instead of a custom implementation

Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
Using gson instead of java properties for .info files

Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
…tion of concerns

Reverts (dropping) the enableCache parameter per request
Removes the cache from the VoiceManager to put it in a dedicated service "TTSCache" (now declared as a separated OSGI component). This service respects the global parameter that allows  TTSService to use the cache.

And it is now the responsability of the TTSService to use the TTSCache or not.
To do so, it can extends the AbstractCachedTTSService, which then transparently use the TTSCache injected.

Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
Instead of custom .info files
Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
Copy link
Member

@J-N-K J-N-K left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for letting you wait so long, unfortunately we don't get notified on pushes, only on comments. Thanks again for this contribution.

I have left some last comments which I believe can be solved quickly. Please ping me when you are done.

@QueryParam("sinkid") @Parameter(description = "audio sink id") @Nullable String sinkId,
@QueryParam("volume") @Parameter(description = "volume level") @Nullable String volume) {
PercentType volumePercent = null;
if (volume != null && !volume.isEmpty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (volume != null && !volume.isEmpty()) {
if (volume != null && !volume.isBlank()) {

" " would also not work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

public AudioFormatInfo(String text, @Nullable Boolean bigEndian, @Nullable Integer bitDepth,
@Nullable Integer bitRate, @Nullable Long frequency, @Nullable Integer channels, @Nullable String codec,
@Nullable String container) {
super();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
super();

There is not need for a call to the super constructor if the class has no parent.

*
* @author Gwendal Roulleau - Initial contribution
*/
@Component(immediate = false, configurationPid = VoiceManagerImpl.CONFIGURATION_PID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@Component(immediate = false, configurationPid = VoiceManagerImpl.CONFIGURATION_PID)
@Component(configurationPid = VoiceManagerImpl.CONFIGURATION_PID)

no need to add default values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* @throws IOException when we cannot create the cache directory or if we have not enough space (*2 security margin)
*/
@Modified
protected void activate(Map<String, Object> config) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
protected void activate(Map<String, Object> config) {
protected void modified(Map<String, Object> config) {

return (AudioStream) fileAndMetadata.getInputStream();
}
} catch (IOException e) {
logger.warn("Cannot get audio from cache, fallback to TTS service");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there something the user can do about it? If I understand your code correctly, no harm is done and the user will get the requested value. At DEBUG level it might make sense to either log the full stack trace, or just the message (e.getMesage()), so we get a hint what went wrong.

Suggested change
logger.warn("Cannot get audio from cache, fallback to TTS service");
logger.debug("Cannot get audio from cache, fallback to TTS service.", e);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the user would like to be aware of a failure of the cache, as he could investigate, but indeed, it is not mandatory.
Done

Comment on lines 78 to 85
/**
* Simulate a cache miss, then two other hits
* The TTS service is called only once
*
* @throws TTSException
* @throws IOException
*/
@Test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for @JavaDoc on individual tests. In general the test method's name should explain what is tested (also on other test classes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

*
*/
@NonNullByDefault
public class DataRetrievalException extends RuntimeException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why this extends RuntimeException and not Exception? In most cases it's better to have checked exceptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like it neither, but I did not find a proper solution.

I made it a runtime exception because, as you can see in the TTSLRUCacheImpl (line 114), it's the only way I can think of to help handling exception from a Supplier that can launch an exception (such as the TTSService)

I can also completely drop this DataRetrievalException and let the service using the cache cope with its own exception on its own term.

Do you have a better idea ?

Copy link
Contributor Author

@dalgwen dalgwen Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, I deleted the DataRetrievalException.
As a result, the code now uses a RuntimeException to wrap the checked TTSException that must be catched inside the supplier get function. (and I just find out that the try / catch was wrongly placed and moved it)
Don't like it either, but it sounds better than a custom RuntimeException with no real use

return currentSize;
}

protected String getKey() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JavaDoc missing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 32 to 33
@NonNullByDefault
@ExtendWith(MockitoExtension.class)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@NonNullByDefault
@ExtendWith(MockitoExtension.class)
@ExtendWith(MockitoExtension.class)
@NonNullByDefault

you chose this order for the other test classes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
Copy link
Member

@J-N-K J-N-K left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks!

@J-N-K J-N-K added this to the 4.0 milestone Feb 28, 2023
@J-N-K J-N-K merged commit 5544945 into openhab:main Feb 28, 2023
@lolodomo
Copy link
Contributor

lolodomo commented Mar 1, 2023

Can we have a summary of the changes? Is the cache enabled by default? For what TTS services?
Is there a need to adjust some existing TTS services? For example, VoiceRSS has its own cache mechanism.

@dalgwen
Copy link
Contributor Author

dalgwen commented Mar 1, 2023

Hello @lolodomo

First, this pull request add a LRU cache implementation for media files. (it is generic and not only for TTS, I have one idea to use it elsewhere). Its main capabilities are :

  • least recently used files are evicted when threshold is reached
  • can serve several thread concurrently, even if launched at the same time, with only one call to the underlying service, without waiting for the entire file to be downloaded (it means for TTS, saying something simultaneously all around the house with only one computation)

Second, an OSGI service instantiates one LRU cache for TTS files.

This OSGI service is enabled by default (10 MB, we can customize this value or disable it in the "voice" configuration), and provides cache capability but ONLY for TTS services designed for it. It means that, as for now, no TTS services is using it.

The simplest way for a TTS service to use this cache is to extend the AbstractCachedTTSService and implements its abstract method instead of the classical TTSService interface.

I will make a pull request in the next days to add this capability for the MimicTTS service and for the PicoTTS service.

I don't have an opinion about the other TTS services, especially for those already using a custom cache. I would be glad if it is deemed worthy for other TTS services, but meh, "if it works, don't fix it" ? If maintainers think it is worth a shot, I can make some pull request for others.
The main advantage should be to simplify their code, provide a thread safe implementation (not so easy) and to use an application wide voice parameter to configure / enable / disable it.

@lolodomo
Copy link
Contributor

lolodomo commented Mar 2, 2023

Ok, perfect, so nothing is broken.
In VoiceRSS, the cache was mainly to create predefined TTS messages. But it is also used when requesting any TTS. I think the original cache should be kept for predefined messages but for any other TTS message, it should be better to use your new LRU cache. I will study such solution.

Your new cache is a common and unique cache for all openHAB services?

@lolodomo
Copy link
Contributor

lolodomo commented Mar 8, 2023

@dalgwen : why did you set the synthesize method as final in AbstractCachedTTSService ? Because of that, I can't override it.
I was trying to implement the new cache in VoiceRSS, it should be easy I believe, but I need to override this method because I have to consider in higher priority the cache of this voice binding (not for storing new TTS messages but to retrieved predefined TTS messages).

@dalgwen
Copy link
Contributor Author

dalgwen commented Mar 9, 2023

Hello @lolodomo

Sorry for the delay,
I set the synthetize method "final" because I wanted to let the addon developers know that they do not have to implement it.
The synthetizeForCache method is enough.
I just made a PR for a "sample" implementation for mimic TTS
openhab/openhab-addons#14564

@dalgwen
Copy link
Contributor Author

dalgwen commented Mar 9, 2023

You can look at the MimicTTSService.java file for a quick implementation. (the AutoDeleteFileAudioStream is a fix for another bug)
For your information, the getCacheKeymethod is sometimes not even needed.
It is only needed when the tts service use some "hidden" parameter that can affect the sound produced.

@lolodomo
Copy link
Contributor

lolodomo commented Mar 9, 2023

I set the synthetize method "final" because I wanted to let the addon developers know that they do not have to implement it.
The synthetizeForCache method is enough.

Generally, yes, but some cases require to oveeride it.

@dalgwen
Copy link
Contributor Author

dalgwen commented Mar 9, 2023

Indeed, I didn't think about a use case like yours. I wanted the effort for the basic use case to be minimal, but I now hope the implementation choice I made will not be a hindrance for your TTS service. Let me know if I can help you with something.

@lolodomo
Copy link
Contributor

lolodomo commented Mar 9, 2023

I now hope the implementation choice I made will not be a hindrance for your TTS service

Absolutely not. After removing the "final" constraint, your implementation looks very good to me.
I just have to find how to fix my audio stream when your cache is not used., to make it usable with the OH HTTP audio server

splatch pushed a commit to ConnectorIO/copybara-hab-core that referenced this pull request Jul 12, 2023
* [tts] Cache mechanism

Implements a cache mechanism for all TTS services.
Eviction policy is LRU mode.
This cache can serve several streams concurrently, for the same utterance, with only one call to the TTS. It doesn't wait for the stream to end and can serve data rapidly.
Cache size is a voice bundle parameter (10 mb default)

Closes openhab#3039

Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
GitOrigin-RevId: 5544945
@dalgwen dalgwen deleted the tts_cache branch August 21, 2023 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An enhancement or new feature of the Core
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Implement a cache mecanism in the voice manager
6 participants