-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rrd4j] Improve event handling #15223
Conversation
@rkoshak Would you like to test that? |
Sure thing but I still don't have a build environment but I think I grabbed the right jar file from Artifactory. It doesn't happen super often so I'll let it run for a few hours and report back if I see the warning again or see something else weird. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposal looks good, I have just 2 comments.
@@ -191,24 +191,31 @@ public void store(final Item item, @Nullable final String alias) { | |||
|
|||
long now = System.currentTimeMillis() / 1000; | |||
|
|||
scheduler.schedule(() -> internalStore(name, value, now, true), 0, TimeUnit.SECONDS); | |||
scheduler.schedule(() -> internalStore(name, value, null), 0, TimeUnit.SECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the line just before that set variable now
that is no more used.
...ence.rrd4j/src/main/java/org/openhab/persistence/rrd4j/internal/RRD4jPersistenceService.java
Show resolved
Hide resolved
I let it run overnight and I've seen a few "RRD4J does not support adding past value..." warnings in the logs. It does feel like it happens less often but it's always been sporadic so it's hard to say if that's just a feeling or a fact. It happened twice since 3:14 this morning (that's when the current log rolled over). Both of these warnings occurred in the middle or at the end of the everyMinute batch save. Here are the events as they occurred for one Item and one warning.
It looks like the two events on the Item that occurred when the everyMinute batch had started got delayed to the end of the everyMinute batch and because there were two of them that occurred in the everyMinute batch time the second one got rejected. It does look like the value that was saved (event 04:20:01:124) had it's timestamp moved forward as expected to 1689157206 but the second one (event 04:20:03.144) did not with 1689157205. Does openhab/openhab-core#3597 (comment) have impacts on this? |
I have done some further investigations. There is still an issue if a lot of events are created in a very short time (e.g. by a rule):
There is a conceptual issue with that in core because
That aside: storing with What would be the best case to handle that? "Replacing" the already stored value is not possible with RRD4J. |
Maybe we could do something like a "delayed write" - i.e. keep the value in memory, wait for the next second and then write the latest then available update to the db (and discard any additional ones one might have received inbetween). What I don't understand, though, is what is now different to before? We always had the situation that there are many more updates than rrd4j can persist and we had a solution for it in place - we should figure out how that worked and restore it to behave exactly the same way as before (in terms of what states get persisted with which timestamp). |
It worked because the persistence service blocked the event handler until it successfully wrote a datapoint. Edit: your proposal is fine for number items, but we'll miss quickly changing on/off changes of a switch and the item may look unchanged in persistence also it did change in reality. |
This still didn't prevent new updates from occuring. So what happened with those? I'd assume that writing to the db didn't take a second, but new updates came in much faster? |
It just scheduled a new |
so we essentially persisted all events and made sure there is always a second inbetween? But we persisted them in almost random order...? Ok, if that was the case, maybe we should not try to replicate exactly that behavior...
If something turns on for a fraction of a second only, we can anyhow not correctly persist this - if we extend it to being on for one second, it might be some wrong information already. With the same argument you could say you would want to see that something turned on 10 times within a second and we simply cannot do this. Imho, the most important thing is that the last persisted state is indeed the then valid one. |
Fine. I'll try to implement that. |
I agree. |
I've been thinking on this most of the day and I come back and find a proposed solution. 🎉 I think it's clear that rrd4j is simply unsuitable for rapidly changing data. We might consider adding a warning to that effect to the readme. If we cover the rapidly changing Switch with docs and make sure that the everyMinute batch doesn't cause problems I think that's as good as it gets for rrd4j. |
I hope you can at least agree to that proposal as well! |
Absolutely! It certainly makes a lot more sense than randomly saving the events like it was doing before or throwing warnings into the logs which for most end users are inexplicable and unclear how they can be handled. When events are thrown away because a new one came in before it could be saved, it would be good I think to have a DEBUG level log statement explaining that's what's happening. I wouldn't want it to be WARN because it will freak out a bunch of end users over something most of them wouldn't notice in the first place. Couple that with adding the behavior to the docs and I think we are good. |
Signed-off-by: Jan N. Klug <github@klug.nrw>
Signed-off-by: Jan N. Klug <github@klug.nrw>
Signed-off-by: Jan N. Klug <github@klug.nrw>
I think this should do it. |
I've downloaded the latest jar from Artifactory and will let it run with debug logging for a bit and see what happens. So far so good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you - I've left a few small comments.
@Activate | ||
public RRD4jPersistenceService(final @Reference ItemRegistry itemRegistry) { | ||
this.itemRegistry = itemRegistry; | ||
storeJob = scheduler.scheduleWithFixedDelay(() -> doStore(false), 1, 1, TimeUnit.SECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this line better be placed in the activate() method behind active = true
? Then there wouldn't be the need to set a 1 second delay either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the activate
method can be removed at all. We can also call modified
from the constructor. WDYT?
The 1s delay is not needed at all. I just set it because during the first second there can't be something to write to the database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that should work as well.
@@ -190,11 +321,25 @@ public void store(final Item item, @Nullable final String alias) { | |||
} | |||
|
|||
long now = System.currentTimeMillis() / 1000; | |||
storageMap.computeIfAbsent(now, t -> new ConcurrentHashMap<>()).put(name, value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be the right place to add debug logging as suggested by @rkoshak?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll check if put
returns something and add a debug log if it does (because that means we have overwritten the value before) AND the old value is different from the new one (because discarding the same value doesn't matter).
...ence.rrd4j/src/main/java/org/openhab/persistence/rrd4j/internal/RRD4jPersistenceService.java
Show resolved
Hide resolved
Signed-off-by: Jan N. Klug <github@klug.nrw>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks!
* [rrd4j] Improve event handling Signed-off-by: Jan N. Klug <github@klug.nrw> Signed-off-by: Jørgen Austvik <jaustvik@acm.org>
Closes #15220