Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix WatchService tests #3518

Merged
merged 1 commit into from
Apr 4, 2023
Merged

Fix WatchService tests #3518

merged 1 commit into from
Apr 4, 2023

Conversation

J-N-K
Copy link
Member

@J-N-K J-N-K commented Apr 1, 2023

See #3516

Signed-off-by: Jan N. Klug <github@klug.nrw>
@J-N-K J-N-K added the test label Apr 1, 2023
@J-N-K J-N-K requested a review from a team as a code owner April 1, 2023 18:03
@J-N-K J-N-K requested a review from wborn April 1, 2023 18:04
Copy link
Member

@wborn wborn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for having a look!

According to the GHA build result testFileInWatchedSubDir is unstable:

[ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 67.61 s <<< FAILURE! - in org.openhab.core.internal.service.WatchServiceImplTest
[ERROR] org.openhab.core.internal.service.WatchServiceImplTest.testFileInWatchedSubDir  Time elapsed: 8.051 s  <<< FAILURE!
java.lang.AssertionError: 

Expected: an empty collection
     but: <[Event[path=subDir/testFile, kind=MODIFY]]>
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
	at org.openhab.core.internal.service.WatchServiceImplTest.assertNoEvent(WatchServiceImplTest.java:192)
	at org.openhab.core.internal.service.WatchServiceImplTest.testFileInWatchedSubDir(WatchServiceImplTest.java:113)

@J-N-K
Copy link
Member Author

J-N-K commented Apr 2, 2023

I've seen that. But even with a CPU count of 32 and a -T 2C I'm not able to make it fail locally. I'll now re-trigger a few times here, maybe it's platform dependent.

@J-N-K
Copy link
Member Author

J-N-K commented Apr 3, 2023

I have now done more than a dozens of local builds and ten GHA builds. Local builds never failed and both failures of GHA builds occurred on org.openhab.core.thing.tests, so unrelated and after the watch service tests already succeeded.

How shall we proceed?

Copy link
Member

@wborn wborn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Let's give it a test and see how stable they are. Maybe it only fails 0.01% of the time and we were just unlucky the first time around. 😉

@wborn wborn merged commit ee392e8 into openhab:main Apr 4, 2023
@wborn wborn added this to the 4.0 milestone Apr 4, 2023
@J-N-K J-N-K deleted the watchservice-tests branch April 4, 2023 05:40
@wborn
Copy link
Member

wborn commented Apr 4, 2023

Bad luck stroke again, see:

/~https://github.com/openhab/openhab-core/actions/runs/4604346508

@J-N-K
Copy link
Member Author

J-N-K commented Apr 4, 2023

The storage thing is that the file from which the event originates Event[path=subDir/testFile, kind=MODIFY] is not even created in that test.

According to the documentation of the surefire-plugin tests within one class are not executed in parallel when "reuseForks=false" unless you explicitly specify "parallel=methods", so another test (that creates this file) should not interfere. Also a new temp dir should be created for each test, because @TempDir is only shared if used in @BeforeAll or on static fields.

Edit: sorry. It failed in a different test this time. The file is from that same test. I'll have another look.

@jimtng
Copy link
Contributor

jimtng commented Apr 4, 2023

@J-N-K
Copy link
Member Author

J-N-K commented Apr 4, 2023

Yes, that's the same.

@J-N-K
Copy link
Member Author

J-N-K commented Apr 4, 2023

@wborn Do you think that it is possible that Files.write on an existing file actually does two operations? It truncates the file with standard options and then writes the new content. If truncating is sometimes written to disc, it might be that the WatchServiceImpl indeed gets a MODIFY event, even if the file content after the second write is the same as before.

In that case we should remove that check, it's anyway not checking our implementation but the watcher library.

@jimtng
Copy link
Contributor

jimtng commented Apr 4, 2023

Wouldn't that mean that the watch service sends two MODIFY events, causing things to re-load twice?

@J-N-K
Copy link
Member Author

J-N-K commented Apr 4, 2023

Depends. I believe what happens is:

  1. Create file with content "foo" -> CREATE event
  2. (some time passes)
  3. Write to same file (modify) file with content "bar" -> MODIFY event (which can be the result of DELETE/CREATE, MODIFY or multiple MODIFY events in case truncate/write results in two operations). In any case only one event is emitted, this is the result of Prevent duplicate events in WatchService #3404 which collates all events that arrive within 1s.
  4. (some time passes)
  5. Write to same file, with same content "bar" -> In case of an atomic event, the watcher library detects this is the same content and does not emit any event to our watch service. If truncating the file to zero size and writing the content are two operations, then the watcher library will emit two events, because it first detects a zero size file (which is different from the one before) and then a file with some content (which is different from the zero size file). In that case our watch service receives two MODIFY events (which will be collapsed to one like above).

So unnecessary reloading only occurs when a file is written with the same content twice AND the file system reports two events for that AND there is a time between those writes that is longer than the collapse event timeout of 1s.

@jimtng
Copy link
Contributor

jimtng commented Apr 5, 2023

5. If truncating the file to zero size and writing the content are two operations, then the watcher library will emit two events

Have you tried to log this to check that this was the case?

@J-N-K
Copy link
Member Author

J-N-K commented Apr 5, 2023

Difficult, because it does never happen on my system :-). It may be filesystem dependent. I'll try to add logging and push that so we can see if it fails in CI with logging enabled.

@J-N-K
Copy link
Member Author

J-N-K commented Apr 5, 2023

Bingo. AAAAAAAAAAAAAAAAAAAAAA== is the hash of an empty file. I'll check what we can do to prevent that.

[unnamed] ERROR o.o.c.i.service.WatchServiceImpl - 1680686776123 - DirectoryChangeEvent{eventType=CREATE, isDirectory=true, path=/tmp/junit5515863195763780971/subDir, count=1, rootPath=/tmp/junit5515863195763780971, hash=DIRECTORY}
[unnamed] ERROR o.o.c.i.service.WatchServiceImpl - 1680686776123 - DirectoryChangeEvent{eventType=CREATE, isDirectory=false, path=/tmp/junit5515863195763780971/subDir/testFile2, count=1, rootPath=/tmp/junit5515863195763780971, hash=y4hqdyIj0XjEHmgyu7nWyQ==}
[OH-watchservice-1] ERROR o.o.c.i.service.WatchServiceImpl - Events in queue: [CREATE]
[unnamed] ERROR o.o.c.i.service.WatchServiceImpl - 1680686777629 - DirectoryChangeEvent{eventType=MODIFY, isDirectory=false, path=/tmp/junit5515863195763780971/subDir/testFile2, count=2, rootPath=/tmp/junit5515863195763780971, hash=LjN+p/Zx2paedwVTpBuz9w==}
[OH-watchservice-3] ERROR o.o.c.i.service.WatchServiceImpl - Events in queue: [MODIFY]
[unnamed] ERROR o.o.c.i.service.WatchServiceImpl - 1680686779136 - DirectoryChangeEvent{eventType=MODIFY, isDirectory=false, path=/tmp/junit5515863195763780971/subDir/testFile2, count=1, rootPath=/tmp/junit5515863195763780971, hash=AAAAAAAAAAAAAAAAAAAAAA==}
[unnamed] ERROR o.o.c.i.service.WatchServiceImpl - 1680686779137 - DirectoryChangeEvent{eventType=MODIFY, isDirectory=false, path=/tmp/junit5515863195763780971/subDir/testFile2, count=1, rootPath=/tmp/junit5515863195763780971, hash=LjN+p/Zx2paedwVTpBuz9w==}
[OH-watchservice-4] ERROR o.o.c.i.service.WatchServiceImpl - Events in queue: [MODIFY, MODIFY]

splatch pushed a commit to ConnectorIO/copybara-hab-core that referenced this pull request Jul 12, 2023
Signed-off-by: Jan N. Klug <github@klug.nrw>
GitOrigin-RevId: ee392e8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants