-
-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tts] Cache mechanism #3057
[tts] Cache mechanism #3057
Conversation
This looks great, seems also a great idea to reduce the resource usage on offline services. I have a couple of questions about the cache implementation and tts resolver, I will open a review tomorrow and give it a try. |
} | ||
logger.debug("Using TTS cache folder '{}'", cacheFolder.getAbsolutePath()); | ||
|
||
cleanCacheDirectory(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think is ok to move this to a static context so the clean of orphan files is only done on program startup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TTSLRUCacheImpl is instantiated each time the voice config is modified.
If the clean is done one only once, at startup, it won't take into account the new cache size the user could set when he modifies the voice parameters.
He could set a lesser value. The cache has to take this new value and clean again accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added some comments.
Do you think this could be refactored into an AudioStream Cache that is not tied to a TTS?
Because I think that if you allow to provide a supplier factory into the TTSCache that generate the AudioStreamSupplier based on the properties file it has more or less no real dependency on the tts itself. Do you think this makes sense?
} | ||
int i = 0; | ||
for (; i < len && i < bytesRead.length; i++) { | ||
b[off + i] = bytesRead[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is ok for you to implement the InputStream interface in the TTSResult so this class only handles the fallback mechanism?
For me it will look better if this code is inside the result and will allow you to write directly into the input byte[] I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is only one TTSResult, but many AudioStream can be instanciated from it (even concurrently)
If the TTSResult implements InputStream, it wouldn't be able to be reused (because once closed it is useless)
The TTSResults are made to be stored as LRU entries, and reused at will by creating several AudioStream from it.
(But it's also equally possible that christmas put a toll on my mind and that I don't understand your proposition 😅 ?)
This pull request has been mentioned on openHAB Community. There might be relevant details there: https://community.openhab.org/t/mimic-text-to-speech/137040/10 |
12772d7
to
94b5cb1
Compare
Thanks @GiviMAD for your propositions and review !
What do you mean by tied to a TTS ? Do you mean an AudioStreamLRUCache class with an implementation with no reference to the TTSService class ? If so, then yes, it could be. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank. I have left some comments, but we'll probably need a second round.
Please first have a look at my comment regarding the use of LinkedHashMap
as that will probably reduce code size quite a lot.
if (volume != null && !volume.isEmpty()) { | ||
volumePercent = new PercentType(volume); | ||
} | ||
voiceManager.say(text, voiceId, sinkId, volumePercent, enableCache); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding an enable cache parameter here makes it more complicated fr the user. If the cache is enabled for the TTS use it, If it is not enable, don't use it. What would be the use-cache to bypass the cache if it is enabled? (also below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lolodomo provides me with a use case for bypassing the cache :
#3039 (comment)
But simplifying the code and parameters here is also a good objective. I have to say I was not fond of adding the numerous methods for this parameter 😅
What shoud I do ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The contribution guidelines say:
We're trying very hard to keep openHAB lean and focused. We don't want it to do everything for everybody.
I don't think we should blow up API to cover every use-case. At maximum this will add one cache-miss (because the oldest entry is removed and would be a miss on next call). Of course the rate increases the more one-time calls you make, but on the other hand: if the majority of your calls is not to be cached, then using the cache makes no sense anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'm convinced. And the added complexity of a new parameter is a strong argument to my lazy part.
@lolodomo do you have a counter argument, or is it also OK for you ?
....media/src/main/java/org/openhab/core/automation/module/media/internal/SayActionHandler.java
Outdated
Show resolved
Hide resolved
* | ||
* @return | ||
*/ | ||
public default boolean isCacheEnabled() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Methods in interface
should not use public
, they are public by default. Please also clean-up the other methods in this TTSService
. Why should the service hard-disable the cache? Wouldn't it be better to have a configurable service and let the user decide?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One example : the TTS service already uses its own cache, better suited, and doesn't want the global one ? Or there is a compatibility issue with it ? In this case, if the cache is defaulted to true, a user installing the TTS service and not aware of this subtlety will complain that it is not working.
But, let's be clear : I added this parameter because I wanted to let as many options as possible for other (dev and user), and I didn't want to the cache system to force other TTS. But in fact I'm totally in favor of simplifying the code here and letting only one global parameter, if it is deemed solid enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can default to false
, that's not my point. I would prefer a user-configurable option for that, not something in the interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, with a twist I propose : the TTSService is now responsible for using the cache or not.
(see below)
* @return A likely unique key identifying the combination of parameters and/or internal state, | ||
* as a string suitable to be part of a filename. This will be used in the cache system to store the result. | ||
*/ | ||
public default String getCacheKey(String text, Voice voice, AudioFormat requestedFormat) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should implement such things in default
methods.
From the "Oracle Java Tutorial":
Default methods enable you to add new functionality to the interfaces of your libraries and ensure binary compatibility with code written for older versions of those interfaces.
We don't need binary compatibility for OH4, so we should not use such "hacks" here. Are there other common methods that can be re-used for different TTS services? I would prefer a AbstractTTSService
then that implements TTSService
and can be extended by the implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I see two options here :
-
"can be" extended : if TTS service are not compeled to extends the AbstractTTSService, then I have to check with instanceof and use explicit call to the method for getting "getCacheKey" (or "isCacheEnabled" if we keep it)
-
"must be" extended : no alternative method needed, but every TTS service have to be changed accordingly
WDYT ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also add an interface CachedTTSService
which extends TTSService
. This is probably even a better idea, because it can separate the cache methods from the TTSService
methods. TTS services that don't want to (or can't) use the cache implement TTSService
, others CachedTTSService
(similar to PersistenceService
and ModifiablePersistenceService
). This also removes the need to programatically disable the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, with a proposal of separation of concerns :
The TTSCache
is now an OSGI service (with TTSLRUCacheImpl
implementing it). It respects the global parameter to enable the cache or not.
VoiceManagerImpl
is left totally untouched, not tied to the cache anymore.
Each TTSService
can choose to use the TTSCache
or not.
To do so, they can extend a helper Class (AbstractCachedTTSService
, which implements the TTSCachedService
extending the TTSService
). The TTSCache
is injected inside and used transparently.
...s/org.openhab.core.voice/src/main/java/org/openhab/core/voice/internal/VoiceManagerImpl.java
Outdated
Show resolved
Hide resolved
String containerS = properties.getProperty(PROPERTY_FORMAT_CONTAINER); | ||
String bigEndianS = properties.getProperty(PROPERTY_FORMAT_BIGENDIAN); | ||
String bitDepthS = properties.getProperty(PROPERTY_FORMAT_BITDEPTH); | ||
String bitRateS = properties.getProperty(PROPERTY_FORMAT_BITRATE); | ||
String channelsS = properties.getProperty(PROPERTY_FORMAT_CHANNELS); | ||
String codecS = properties.getProperty(PROPERTY_FORMAT_CODEC); | ||
String frequencyS = properties.getProperty(PROPERTY_FORMAT_FREQUENCY); | ||
String textS = properties.getProperty(PROPERTY_TEXT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make sense to create a record
or class
holding this information and leave the serialization/deserialization to Gson
? Probably performance is not so much an issue here and you could simplify code to about two or three lines here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting !
Sadly, deserializing record needs Gson 2.10, and we have Gson 2.9.1
Do you think it could be pushed for openHAB 4 ?
I will use a standard class for the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably when we switch to Karaf 4.4. class is fine for now.
if (ttsAudioStreamFinal != null && ttsAudioStreamFinal instanceof FixedLengthAudioStream) { | ||
return ((FixedLengthAudioStream) ttsAudioStreamFinal).length(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use pattern matching (see above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
|
||
ttsResultMap = ttsResultOrderedList.stream() | ||
.collect(Collectors.toMap(TTSResult::getKey, Function.identity())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.collect(Collectors.toMap(TTSResult::getKey, Function.identity())); | |
.collect(Collectors.toMap(TTSResult::getKey, v -> v)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strangely enough, if I do this, I have a compilation error about a Null type mismatch ?
(EDIT : But the refactor removes the need for such a Collector)
TTSResult mockedTTSResult = Mockito.mock(TTSResult.class); | ||
AudioStreamSupplier mockedAudioStreamSupplier = Mockito.mock(AudioStreamSupplier.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can make these mocks members of the class:
private @Mock @NonNullByDefault({}) TTSResult ttsResultMock;
and they get mocked automatically if you annotate the class with
@ExtendWith(MockitoExtensions.class)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks
@NonNullByDefault({}) | ||
private @Mock Voice voiceMock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inline null-annotations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
I think that the cache mechanism that you have added is really complete and could be useful to cache any kind of data that is tied to a format. I don't have specific examples other than cache the custom sounds of the dialog. But I think that having a media cache is something valuable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general looks much better now, and many thanks for your speedy response. I have left some comments.
One general questions: We already have an infrastructure for storing data, the StorageService
, which already takes care of serialization/de-serialization and has a nice interface for handling key-value-pairs with an arbitrary unique String
as key. With that your "consistency check" should be very much improved: You can use Storage.getKeys
and check if the corresponding voice-files exist and Storage.containsKey
to check if there is metadata for the voice-file in the cache folder. As a plus you wouldn't need any file handling for the metadata, just call .get/.put
on the storage.
@@ -0,0 +1,55 @@ | |||
/** | |||
* Copyright (c) 2010-2022 Contributors to the openHAB project |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Copyright (c) 2010-2022 Contributors to the openHAB project | |
* Copyright (c) 2010-2023 Contributors to the openHAB project |
Please also update the other new files, otherwise the build will fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
* are not supported or another error occurs while creating an | ||
* {@link AudioStream} | ||
*/ | ||
AudioStream getOrSynthetize(TTSCachedService tts, String text, Voice voice, AudioFormat requestedFormat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AudioStream getOrSynthetize(TTSCachedService tts, String text, Voice voice, AudioFormat requestedFormat) | |
AudioStream getOrSynthesize(TTSCachedService tts, String text, Voice voice, AudioFormat requestedFormat) |
Can't this be just get
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
import org.slf4j.LoggerFactory; | ||
|
||
/** | ||
* Each {@link TTSResult} instance can handle several AudioStream. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Each {@link TTSResult} instance can handle several AudioStream. | |
* Each {@link TTSResult} instance can handle several {@link AudioStream}s. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
/** | ||
* Each {@link TTSResult} instance can handle several AudioStream. | ||
* This class is a wrapper for such functionality, and can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* This class is a wrapper for such functionality, and can | |
* This class is a wrapper for such functionality and can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
import org.openhab.core.voice.Voice; | ||
|
||
/** | ||
* Custom supplier class to defer synthetizing with a TTS service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Custom supplier class to defer synthetizing with a TTS service | |
* Custom supplier class to defer synthesizing with a TTS service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in an intermediary commit, but I deleted the file as I simplified the code...
String key = tts.getClass().getSimpleName() + "_" + tts.getCacheKey(text, voice, requestedFormat); | ||
// try to get from cache | ||
TTSResult ttsResult = get(key); | ||
if (ttsResult == null || !ttsResult.getText().equals(text)) { // it's a cache miss or a false positive, we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's necessary, but probably will not hurt. It's just a simple string compare.
private long currentSize = 0; | ||
private boolean completed; | ||
|
||
@NonNullByDefault({}) // file channel could not be null when we read or write from the wrapper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not correct, it's @Nullable
. You also ensure that when calling by creating a local variable. Please change the annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (and the class is now a generic LRUMediaCacheEntry)
* @param fallbackAudioStreamSupplier If something goes wrong with the cache, this supplier will provide the | ||
* AudioStream directly from the TTS service | ||
* | ||
* @return An @AudioStream that can be used to play sound |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* @return An @AudioStream that can be used to play sound | |
* @return An {@link AudioStream} that can be used to play sound |
why not name it getAudioStream
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed.
In fact, it is now a getInputStream() because of the new genericity.
} catch (TTSException e) { | ||
logger.warn("Cannot get or store audio format from the TTS audio service: {}", e.getMessage()); | ||
} | ||
audioFormatFinal = audioFormat; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's final
, don't reuse it. Either create another object or re-name to something that is not final
(e.g. local
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Done by renaming it in "local"
@ExtendWith(MockitoExtension.class) | ||
public class AudioStreamCacheWrapperTest { | ||
|
||
TTSResult mockedTTSResult = Mockito.mock(TTSResult.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TTSResult mockedTTSResult = Mockito.mock(TTSResult.class); | |
private @Mock @NonNullByDefault({}) TTSResult ttsResultMock; |
Mockito can do the mocking by itself, no need to call mock
. We usually name mocked objects <name>Mock
in core. For consistency, please change that (also below, and inline the null-annotation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks for the tip.
Oops, I did (another) rewrite.... I also rewrote to use the Storage openhab service, as @J-N-K suggested. I will update the open post to match the new proposal. |
f229298
to
fea3693
Compare
Good to know you already found an use for it. Regards! |
Implements a cache mechanism for all TTS services. Eviction policy is LRU mode. This cache can serve several streams concurrently, for the same utterance, with only one call to the TTS. It doesn't wait for the stream to end and can serve data rapidly. Cache size is a voice bundle parameter (10 mb default) Closes openhab#3039 Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
And also the volume parameter Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
The LRU cache use a LinkedHashMap instead of a custom implementation Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
Using gson instead of java properties for .info files Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
…tion of concerns Reverts (dropping) the enableCache parameter per request Removes the cache from the VoiceManager to put it in a dedicated service "TTSCache" (now declared as a separated OSGI component). This service respects the global parameter that allows TTSService to use the cache. And it is now the responsability of the TTSService to use the TTSCache or not. To do so, it can extends the AbstractCachedTTSService, which then transparently use the TTSCache injected. Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
Instead of custom .info files Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for letting you wait so long, unfortunately we don't get notified on pushes, only on comments. Thanks again for this contribution.
I have left some last comments which I believe can be solved quickly. Please ping me when you are done.
@QueryParam("sinkid") @Parameter(description = "audio sink id") @Nullable String sinkId, | ||
@QueryParam("volume") @Parameter(description = "volume level") @Nullable String volume) { | ||
PercentType volumePercent = null; | ||
if (volume != null && !volume.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (volume != null && !volume.isEmpty()) { | |
if (volume != null && !volume.isBlank()) { |
" "
would also not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
public AudioFormatInfo(String text, @Nullable Boolean bigEndian, @Nullable Integer bitDepth, | ||
@Nullable Integer bitRate, @Nullable Long frequency, @Nullable Integer channels, @Nullable String codec, | ||
@Nullable String container) { | ||
super(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super(); |
There is not need for a call to the super constructor if the class has no parent.
* | ||
* @author Gwendal Roulleau - Initial contribution | ||
*/ | ||
@Component(immediate = false, configurationPid = VoiceManagerImpl.CONFIGURATION_PID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Component(immediate = false, configurationPid = VoiceManagerImpl.CONFIGURATION_PID) | |
@Component(configurationPid = VoiceManagerImpl.CONFIGURATION_PID) |
no need to add default values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
* @throws IOException when we cannot create the cache directory or if we have not enough space (*2 security margin) | ||
*/ | ||
@Modified | ||
protected void activate(Map<String, Object> config) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protected void activate(Map<String, Object> config) { | |
protected void modified(Map<String, Object> config) { |
return (AudioStream) fileAndMetadata.getInputStream(); | ||
} | ||
} catch (IOException e) { | ||
logger.warn("Cannot get audio from cache, fallback to TTS service"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there something the user can do about it? If I understand your code correctly, no harm is done and the user will get the requested value. At DEBUG level it might make sense to either log the full stack trace, or just the message (e.getMesage()
), so we get a hint what went wrong.
logger.warn("Cannot get audio from cache, fallback to TTS service"); | |
logger.debug("Cannot get audio from cache, fallback to TTS service.", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the user would like to be aware of a failure of the cache, as he could investigate, but indeed, it is not mandatory.
Done
/** | ||
* Simulate a cache miss, then two other hits | ||
* The TTS service is called only once | ||
* | ||
* @throws TTSException | ||
* @throws IOException | ||
*/ | ||
@Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for @JavaDoc on individual tests. In general the test method's name should explain what is tested (also on other test classes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
* | ||
*/ | ||
@NonNullByDefault | ||
public class DataRetrievalException extends RuntimeException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why this extends RuntimeException
and not Exception
? In most cases it's better to have checked exceptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like it neither, but I did not find a proper solution.
I made it a runtime exception because, as you can see in the TTSLRUCacheImpl (line 114), it's the only way I can think of to help handling exception from a Supplier that can launch an exception (such as the TTSService)
I can also completely drop this DataRetrievalException and let the service using the cache cope with its own exception on its own term.
Do you have a better idea ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually, I deleted the DataRetrievalException.
As a result, the code now uses a RuntimeException to wrap the checked TTSException that must be catched inside the supplier get function. (and I just find out that the try / catch was wrongly placed and moved it)
Don't like it either, but it sounds better than a custom RuntimeException with no real use
return currentSize; | ||
} | ||
|
||
protected String getKey() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JavaDoc missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@NonNullByDefault | ||
@ExtendWith(MockitoExtension.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NonNullByDefault | |
@ExtendWith(MockitoExtension.class) | |
@ExtendWith(MockitoExtension.class) | |
@NonNullByDefault |
you chose this order for the other test classes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thanks!
Can we have a summary of the changes? Is the cache enabled by default? For what TTS services? |
Hello @lolodomo First, this pull request add a LRU cache implementation for media files. (it is generic and not only for TTS, I have one idea to use it elsewhere). Its main capabilities are :
Second, an OSGI service instantiates one LRU cache for TTS files. This OSGI service is enabled by default (10 MB, we can customize this value or disable it in the "voice" configuration), and provides cache capability but ONLY for TTS services designed for it. It means that, as for now, no TTS services is using it. The simplest way for a TTS service to use this cache is to extend the AbstractCachedTTSService and implements its abstract method instead of the classical TTSService interface. I will make a pull request in the next days to add this capability for the MimicTTS service and for the PicoTTS service. I don't have an opinion about the other TTS services, especially for those already using a custom cache. I would be glad if it is deemed worthy for other TTS services, but meh, "if it works, don't fix it" ? If maintainers think it is worth a shot, I can make some pull request for others. |
Ok, perfect, so nothing is broken. Your new cache is a common and unique cache for all openHAB services? |
@dalgwen : why did you set the |
Hello @lolodomo Sorry for the delay, |
You can look at the MimicTTSService.java file for a quick implementation. (the AutoDeleteFileAudioStream is a fix for another bug) |
Generally, yes, but some cases require to oveeride it. |
Indeed, I didn't think about a use case like yours. I wanted the effort for the basic use case to be minimal, but I now hope the implementation choice I made will not be a hindrance for your TTS service. Let me know if I can help you with something. |
Absolutely not. After removing the "final" constraint, your implementation looks very good to me. |
* [tts] Cache mechanism Implements a cache mechanism for all TTS services. Eviction policy is LRU mode. This cache can serve several streams concurrently, for the same utterance, with only one call to the TTS. It doesn't wait for the stream to end and can serve data rapidly. Cache size is a voice bundle parameter (10 mb default) Closes openhab#3039 Signed-off-by: Gwendal Roulleau <gwendal.roulleau@gmail.com> GitOrigin-RevId: 5544945
Implements a cache mechanism for all TTS services.
Reason :
Online TTS service can be costly, and reducing call to the cloud is always good.
It will also improve user experience (less latency for local services)
Amazon Polly TTS and Google TTS both implement their own mechanism, and I thought that it could be interesting to mutualize on the same code base (and for other services as well)
Functional specification :
Eviction policy is LRU mode.
Cache size is a voice bundle parameter (10 mb default)
You can enable or disable this, system wide,
or by TTS serviceEach TTS service has to enable this.Default mode is cache enabled.
I'm wondering if it should be switched to default off, at least for the beginning, what do you think ?(Activated by the TTSService, so it's opt-in for the dev)It doesn't wait for the stream to end and can serve data as soon as a small bunch is available (10kb).
A side effect functionnality : this cache can serve several streams concurrently, for the same utterance, with only one call to the TTS.
Another side effect : provide transparently, with a wrapper, the benefit of a FixedLengthAudioStream for playing on sink that requires it (such as all the one based on the openHAB audio servlet).
Technical :
A new generic LRUMediaCache is added in the core.cache package.
This class use
a double linked list with head and tail, and a hashmap (a rather classical LRU implementation)a LinkedHashMap in LRU mode.We use it the with
get
method, which take aSupplier
as an argument. TheSupplier
must give aLRUMediaCacheEntry
, which is anInputStream
along with an arbitrary metadata object. TheSupplier
is of course used only if there is a cache miss.The
InputStream
will be read on the fly and stored in a file.The metadata will be stored thanks to the openhab Storage service. It must be a serializable object.
For subsequent call with the same key, a
LRUMediaCacheEntry
is given to the caller, transparently using the file on disk (even if not fully completed). It uses under the hood anInputStreamCacheWrapper
, responsible for querying the cache entry for data.It allows several clients to request the same
TTSResultLRUMediaCacheEntry
without waiting, each getting their ownInputStream
.A LRU cache (
TTSLRUCacheImpl
) implementing a simple interface (TTSCache
with aget
method) uses this cache implementation, and is provided as an OSGIComponent
.TTS service can disable this by a hard coded value in the TTS service implementation. (Default method in the interface returns true=enable)Each TTSService must opt-in (simplest solution is to extend the AbstractCachedTTSService which provide the functionnality transparantly).The TTS AudioStream result is provided by a supplier ((<-- Using a more advanced locking mechanism to lock only the part needed, and removing some fallback mechanism, allowed to get rid of this class.)AudioStreamSupplier
, which can delay call to the TTS service, thus allowing the cache service to not wait during the cached entry creation).The
TTS ResultsLRUMediaCacheEntry are created when calling the TTS service for the first time, or loaded from the disk at startup.An "info" file is stored alongside the sound file and contains the<-- No more .info file, using Storage service.AudioFormat
information.The AudioStreamFromCache object provides an AudioStream wrapper implementation which is send to the sink responsible for playing. This wrapper override the
read()
method to call theInputStreamCacheWrapper
transparently.A fallback mechanism is implemented (if the cache mechanism failed for whatever reason, then the TTS is directly called). It has been rewritten for genericity and it is now much simpler (the cost is that a little robustness disappears : now a call after a forced file deletion is not honored. But the next will still be)
Closes #3039
Signed-off-by: Gwendal Roulleau gwendal.roulleau@gmail.com