Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rrd4j] Improve event handling #15223

Merged
merged 5 commits into from
Jul 13, 2023
Merged

[rrd4j] Improve event handling #15223

merged 5 commits into from
Jul 13, 2023

Conversation

J-N-K
Copy link
Member

@J-N-K J-N-K commented Jul 11, 2023

Closes #15220

@J-N-K J-N-K added the bug An unexpected problem or unintended behavior of an add-on label Jul 11, 2023
@J-N-K J-N-K requested a review from a team as a code owner July 11, 2023 16:10
@J-N-K
Copy link
Member Author

J-N-K commented Jul 11, 2023

@rkoshak Would you like to test that?

@rkoshak
Copy link
Contributor

rkoshak commented Jul 11, 2023

Sure thing but I still don't have a build environment but I think I grabbed the right jar file from Artifactory. It doesn't happen super often so I'll let it run for a few hours and report back if I see the warning again or see something else weird.

Copy link
Contributor

@lolodomo lolodomo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposal looks good, I have just 2 comments.

@@ -191,24 +191,31 @@ public void store(final Item item, @Nullable final String alias) {

long now = System.currentTimeMillis() / 1000;

scheduler.schedule(() -> internalStore(name, value, now, true), 0, TimeUnit.SECONDS);
scheduler.schedule(() -> internalStore(name, value, null), 0, TimeUnit.SECONDS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the line just before that set variable now that is no more used.

@rkoshak
Copy link
Contributor

rkoshak commented Jul 12, 2023

I let it run overnight and I've seen a few "RRD4J does not support adding past value..." warnings in the logs. It does feel like it happens less often but it's always been sporadic so it's hard to say if that's just a feeling or a fact. It happened twice since 3:14 this morning (that's when the current log rolled over).

Both of these warnings occurred in the middle or at the end of the everyMinute batch save. Here are the events as they occurred for one Item and one warning.

Log Time Event
events.log 04:19:55.065 Changed to 75.44
openhab.log 04:19.55.065 Stored 75.44 with timestamp 1689157195
openhab.log 04:20:00.628 everyMinute batch started
events.log 04:20:01.124 Changed to 75.26
events.log 04:20:03.144 Changed to 75.44
openhab.log 04:20:05.236 Stored 75.44 with timestamp 1689157205 (everyMinute save)
openhab.log 04:20:05.797 Stored 75.26 with timestamp 1689157206 (event from 04:20:01.124 save)
openhab.log 04:20:05.846 Discarding 75.44, timestamp 1689157205, last update 1689157206** (event from 04:20:03:144)
openhab.log 04:20:05.866 everyMinute batch ended
events.log 04:20:15.251 Changed to 75.26
openhab.log 04:20:15.250 Stored 75.26 with timestamp 1689157215

It looks like the two events on the Item that occurred when the everyMinute batch had started got delayed to the end of the everyMinute batch and because there were two of them that occurred in the everyMinute batch time the second one got rejected. It does look like the value that was saved (event 04:20:01:124) had it's timestamp moved forward as expected to 1689157206 but the second one (event 04:20:03.144) did not with 1689157205.

Does openhab/openhab-core#3597 (comment) have impacts on this?

@J-N-K
Copy link
Member Author

J-N-K commented Jul 12, 2023

I have done some further investigations. There is still an issue if a lot of events are created in a very short time (e.g. by a rule):

var item = items.getItem("Num");

for (var i=0; i<10; i++) {
  item.postUpdate(i);
}

There is a conceptual issue with that in core because Item is passed to the persistence service and updates that are faster than the event processing:

19:39:58.201 [INFO ] [openhab.event.ItemStateChangedEvent  ] - Item 'Num' changed from 9.0 to 0
19:39:58.202 [INFO ] [openhab.event.ItemStateChangedEvent  ] - Item 'Num' changed from 0 to 1
19:39:58.202 [DEBUG] [rd4j.internal.RRD4jPersistenceService] - Num (Type=NumberItem, State=4, Label=Num, Category=) 
19:39:58.202 [DEBUG] [rd4j.internal.RRD4jPersistenceService] - Num (Type=NumberItem, State=4, Label=Num, Category=) 
19:39:58.203 [INFO ] [openhab.event.ItemStateChangedEvent  ] - Item 'Num' changed from 1 to 2
19:39:58.202 [DEBUG] [rd4j.internal.RRD4jPersistenceService] - Num (Type=NumberItem, State=5, Label=Num, Category=) 
19:39:58.202 [DEBUG] [rd4j.internal.RRD4jPersistenceService] - Num (Type=NumberItem, State=4, Label=Num, Category=) 
19:39:58.202 [DEBUG] [rd4j.internal.RRD4jPersistenceService] - Num (Type=NumberItem, State=4, Label=Num, Category=) 
19:39:58.203 [INFO ] [openhab.event.ItemStateChangedEvent  ] - Item 'Num' changed from 2 to 3
19:39:58.203 [INFO ] [openhab.event.ItemStateChangedEvent  ] - Item 'Num' changed from 3 to 4
19:39:58.204 [DEBUG] [rd4j.internal.RRD4jPersistenceService] - Num (Type=NumberItem, State=8, Label=Num, Category=) 
19:39:58.204 [INFO ] [openhab.event.ItemStateChangedEvent  ] - Item 'Num' changed from 4 to 5
19:39:58.204 [DEBUG] [rd4j.internal.RRD4jPersistenceService] - Num (Type=NumberItem, State=8, Label=Num, Category=) 
19:39:58.204 [DEBUG] [rd4j.internal.RRD4jPersistenceService] - Num (Type=NumberItem, State=8, Label=Num, Category=) 
19:39:58.204 [DEBUG] [rd4j.internal.RRD4jPersistenceService] - Num (Type=NumberItem, State=8, Label=Num, Category=) 
19:39:58.204 [DEBUG] [rd4j.internal.RRD4jPersistenceService] - Num (Type=NumberItem, State=9, Label=Num, Category=) 
19:39:58.204 [INFO ] [openhab.event.ItemStateChangedEvent  ] - Item 'Num' changed from 5 to 6
19:39:58.204 [INFO ] [openhab.event.ItemStateChangedEvent  ] - Item 'Num' changed from 6 to 7
19:39:58.205 [INFO ] [openhab.event.ItemStateChangedEvent  ] - Item 'Num' changed from 7 to 8
19:39:58.205 [INFO ] [openhab.event.ItemStateChangedEvent  ] - Item 'Num' changed from 8 to 9

That aside: storing with now+1 as timestamp will only help for the first "too fast" update, then we'll have the last>now issue again (because last == now+1). Increasing the timestamp until we find one that is possible feels also wrong, because that may result in very large offsets.

What would be the best case to handle that? "Replacing" the already stored value is not possible with RRD4J.

@kaikreuzer
Copy link
Member

What would be the best case to handle that? "Replacing" the already stored value is not possible with RRD4J.

Maybe we could do something like a "delayed write" - i.e. keep the value in memory, wait for the next second and then write the latest then available update to the db (and discard any additional ones one might have received inbetween).

What I don't understand, though, is what is now different to before? We always had the situation that there are many more updates than rrd4j can persist and we had a solution for it in place - we should figure out how that worked and restore it to behave exactly the same way as before (in terms of what states get persisted with which timestamp).

@J-N-K
Copy link
Member Author

J-N-K commented Jul 12, 2023

It worked because the persistence service blocked the event handler until it successfully wrote a datapoint.

Edit: your proposal is fine for number items, but we'll miss quickly changing on/off changes of a switch and the item may look unchanged in persistence also it did change in reality.

@kaikreuzer
Copy link
Member

blocked the event handler until it successfully wrote a datapoint.

This still didn't prevent new updates from occuring. So what happened with those? I'd assume that writing to the db didn't take a second, but new updates came in much faster?

@J-N-K
Copy link
Member Author

J-N-K commented Jul 12, 2023

It just scheduled a new store with a timestamp one second later until it succeeds. This also messed up the order of the events but it didn't print warnings.

@kaikreuzer
Copy link
Member

so we essentially persisted all events and made sure there is always a second inbetween? But we persisted them in almost random order...? Ok, if that was the case, maybe we should not try to replicate exactly that behavior...

we'll miss quickly changing on/off changes of a switch and the item may look unchanged in persistence also it did change in reality.

If something turns on for a fraction of a second only, we can anyhow not correctly persist this - if we extend it to being on for one second, it might be some wrong information already. With the same argument you could say you would want to see that something turned on 10 times within a second and we simply cannot do this. Imho, the most important thing is that the last persisted state is indeed the then valid one.

@J-N-K
Copy link
Member Author

J-N-K commented Jul 12, 2023

Fine. I'll try to implement that.

@lolodomo
Copy link
Contributor

Imho, the most important thing is that the last persisted state is indeed the then valid one.

I agree.

@rkoshak
Copy link
Contributor

rkoshak commented Jul 12, 2023

I've been thinking on this most of the day and I come back and find a proposed solution. 🎉

I think it's clear that rrd4j is simply unsuitable for rapidly changing data. We might consider adding a warning to that effect to the readme. If we cover the rapidly changing Switch with docs and make sure that the everyMinute batch doesn't cause problems I think that's as good as it gets for rrd4j.

@kaikreuzer
Copy link
Member

I come back and find a proposed solution. 🎉

I hope you can at least agree to that proposal as well!

@rkoshak
Copy link
Contributor

rkoshak commented Jul 13, 2023

I hope you can at least agree to that proposal as well!

Absolutely! It certainly makes a lot more sense than randomly saving the events like it was doing before or throwing warnings into the logs which for most end users are inexplicable and unclear how they can be handled.

When events are thrown away because a new one came in before it could be saved, it would be good I think to have a DEBUG level log statement explaining that's what's happening. I wouldn't want it to be WARN because it will freak out a bunch of end users over something most of them wouldn't notice in the first place. Couple that with adding the behavior to the docs and I think we are good.

Signed-off-by: Jan N. Klug <github@klug.nrw>
J-N-K added 3 commits July 13, 2023 19:00
Signed-off-by: Jan N. Klug <github@klug.nrw>
Signed-off-by: Jan N. Klug <github@klug.nrw>
Signed-off-by: Jan N. Klug <github@klug.nrw>
@J-N-K
Copy link
Member Author

J-N-K commented Jul 13, 2023

I think this should do it.

@rkoshak
Copy link
Contributor

rkoshak commented Jul 13, 2023

I've downloaded the latest jar from Artifactory and will let it run with debug logging for a bit and see what happens. So far so good.

Copy link
Member

@kaikreuzer kaikreuzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you - I've left a few small comments.

@Activate
public RRD4jPersistenceService(final @Reference ItemRegistry itemRegistry) {
this.itemRegistry = itemRegistry;
storeJob = scheduler.scheduleWithFixedDelay(() -> doStore(false), 1, 1, TimeUnit.SECONDS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this line better be placed in the activate() method behind active = true? Then there wouldn't be the need to set a 1 second delay either.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the activate method can be removed at all. We can also call modified from the constructor. WDYT?

The 1s delay is not needed at all. I just set it because during the first second there can't be something to write to the database.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that should work as well.

@@ -190,11 +321,25 @@ public void store(final Item item, @Nullable final String alias) {
}

long now = System.currentTimeMillis() / 1000;
storageMap.computeIfAbsent(now, t -> new ConcurrentHashMap<>()).put(name, value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be the right place to add debug logging as suggested by @rkoshak?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll check if put returns something and add a debug log if it does (because that means we have overwritten the value before) AND the old value is different from the new one (because discarding the same value doesn't matter).

Signed-off-by: Jan N. Klug <github@klug.nrw>
Copy link
Member

@kaikreuzer kaikreuzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks!

@kaikreuzer kaikreuzer added this to the 4.0 milestone Jul 13, 2023
@kaikreuzer kaikreuzer merged commit 571cd63 into openhab:main Jul 13, 2023
@J-N-K J-N-K deleted the rrd4j branch July 13, 2023 20:30
austvik pushed a commit to austvik/openhab-addons that referenced this pull request Mar 27, 2024
* [rrd4j] Improve event handling

Signed-off-by: Jan N. Klug <github@klug.nrw>
Signed-off-by: Jørgen Austvik <jaustvik@acm.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An unexpected problem or unintended behavior of an add-on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[rrd4j] Persistence warnings
4 participants