Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sys/ztimer: fix re-scheduling of timers #20924

Merged
merged 1 commit into from
Oct 19, 2024

Conversation

Enoch247
Copy link
Contributor

@Enoch247 Enoch247 commented Oct 18, 2024

Contribution description

If the timer at the head of a ztimer clock's timer list is re-scheduled (ztimer_set() called on an already set timer) and the timer is no longer at the head after being re-scheduled, clock-ops->set() is never called from inside ztimer_set(), and the underlying timer is left with an ISR scheduled to expire at the timer's old time. The intended behavior is that the clock's lower level timer should always be set to expire at the time of the clocks head timer.

This patch changes ztimer_set() to call _ztimer_update(), which sets the lower level timer according to the current list of timers, rather than setting the timer directly inside of ztimer_set().

This is a fix we might consider back porting. As far as I can tell this bug has always existed in ztimer.

Testing procedure

  • apply patch to create testbench
  • run make -C examples/hello-world/ all term
  • observe that the timers without this fix fire at the wrong time, and that with this fix they fire at the correct time

Testbench patch

diff --git a/examples/hello-world/Makefile b/examples/hello-world/Makefile
index 258d8e9baf..92c9137241 100644
--- a/examples/hello-world/Makefile
+++ b/examples/hello-world/Makefile
@@ -15,4 +15,6 @@ DEVELHELP ?= 1
 # Change this to 0 show compiler invocation lines by default:
 QUIET ?= 1
 
+USEMODULE += ztimer_usec
+
 include $(RIOTBASE)/Makefile.include
diff --git a/examples/hello-world/main.c b/examples/hello-world/main.c
index 213128ac64..f17d23c8e4 100644
--- a/examples/hello-world/main.c
+++ b/examples/hello-world/main.c
@@ -20,6 +20,12 @@
  */
 
 #include <stdio.h>
+#include <ztimer.h>
+
+void cb(void *arg) {
+    uint32_t n = (uint32_t)arg;
+    printf("t%"PRIu32"@%"PRIu32"\n", n, ztimer_now(ZTIMER_USEC));
+}
 
 int main(void)
 {
@@ -28,5 +34,15 @@ int main(void)
     printf("You are running RIOT on a(n) %s board.\n", RIOT_BOARD);
     printf("This board features a(n) %s CPU.\n", RIOT_CPU);
 
+    ztimer_t t1 = {.callback = cb, .arg = (void*)1};
+    ztimer_t t2 = {.callback = cb, .arg = (void*)2};
+
+    cb((void*)0);
+    ztimer_set(ZTIMER_USEC, &t1, 1*1000000);
+    ztimer_set(ZTIMER_USEC, &t2, 2*1000000);
+    //ztimer_remove(ZTIMER_USEC, &t1); // If this line is commented out these timers do not fire at the correct time.
+    ztimer_set(ZTIMER_USEC, &t1, 3*1000000);
+
+    while(1);
     return 0;
 }

Sample of testbench's expected (good) behavior

2024-10-18 15:06:43,015 # RIOT native interrupts/signals initialized.
2024-10-18 15:06:43,016 # RIOT native board initialized.
2024-10-18 15:06:43,017 # RIOT native hardware initialization complete.
2024-10-18 15:06:43,018 # 
2024-10-18 15:06:43,018 # main(): This is RIOT! (Version: 2024.10-devel-323-g45942-fix-ztimer-timer-reschedule)
2024-10-18 15:06:43,018 # Hello World!
2024-10-18 15:06:43,018 # You are running RIOT on a(n) native board.
2024-10-18 15:06:43,019 # This board features a(n) native CPU.
2024-10-18 15:06:43,019 # t0@34
2024-10-18 15:06:44,013 # t2@2000830
2024-10-18 15:06:45,013 # t1@3000279

Sample of testbench's behavior without this fix

2024-10-18 14:46:21,082 # RIOT native interrupts/signals initialized.
2024-10-18 14:46:21,083 # RIOT native board initialized.
2024-10-18 14:46:21,083 # RIOT native hardware initialization complete.
2024-10-18 14:46:21,083 # 
2024-10-18 14:46:21,083 # main(): This is RIOT! (Version: 2024.07-devel-164-g99cef-add-stm32h7.wip+ohsheet.patched+u5-spi)
2024-10-18 14:46:21,084 # Hello World!
2024-10-18 14:46:21,084 # You are running RIOT on a(n) native board.
2024-10-18 14:46:21,084 # This board features a(n) native CPU.
2024-10-18 14:46:21,084 # t0@23
2024-10-18 14:46:21,085 # t2@1000230
2024-10-18 14:46:21,085 # t1@1000248
2024-10-18 14:46:39,207 # 
2024-10-18 14:46:39,207 # native: exiting
2024-10-18 14:46:39,207 # Exiting Pyterm

Issues/PRs references

This bug is possible the cause of the early timeout worked around in this #19965. Perhaps that workaround could be removed?

If the timer at the head of a ztimer clock's timer list is re-scheduled
(ztimer_set() called on an already set timer) and the timer is no longer
at the head after being re-scheduled, clock-ops->set() is never called
from inside ztimer_set(), and the underlying timer is left with an ISR
scheduled to expire at the timer's old time. The intended behavior is
that the clock's lower level timer should always be set to expire at the
time of the clocks head timer.

This patch changes ztimer_set() to call _ztimer_update(), which sets the
lower level timer according to the current list of timers, rather than
setting the timer directly inside of ztimer_set().
@github-actions github-actions bot added Area: timers Area: timer subsystems Area: sys Area: System labels Oct 18, 2024
@maribu maribu added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Process: needs backport Integration Process: The PR is required to be backported to a release or feature branch labels Oct 18, 2024
@maribu maribu enabled auto-merge October 18, 2024 21:12
@riot-ci
Copy link

riot-ci commented Oct 18, 2024

Murdock results

✔️ PASSED

45942f6 sys/ztimer: fix re-scheduling of timers

Success Failures Total Runtime
10212 0 10214 13m:06s

Artifacts

@maribu maribu added this pull request to the merge queue Oct 18, 2024
@benpicco
Copy link
Contributor

That fix looks pretty straightforward!

Please also add a test case that reproduced the bug.

Merged via the queue into RIOT-OS:master with commit 616e6a5 Oct 19, 2024
29 checks passed
@maribu
Copy link
Member

maribu commented Oct 21, 2024

Backport provided in #20928

@Enoch247 Enoch247 deleted the fix-ztimer-timer-reschedule branch October 21, 2024 12:12
@benpicco benpicco added this to the Release 2024.10 milestone Nov 27, 2024
maribu added a commit to maribu/RIOT that referenced this pull request Dec 30, 2024
This reverts commit e3d0068.

With RIOT-OS#20924 merged, this should
no longer be needed.
@maribu
Copy link
Member

maribu commented Dec 30, 2024

This bug is possible the cause of the early timeout worked around in this #19965. Perhaps that workaround could be removed?

Mostly yes.

However, the timeout in there can also trigger early even when ztimer does not trigger early: The GNRC SOCK implementation uses an mbox to wait for a network message to be received. However, mbox has no timeout. Instead, ztimer will put a message to the mbox when it is triggered. If the mbox message related to a received network message is put into the mbox just before the timeout message, the next one to fetch something from the mbox will get the stale timeout message from the mbox directly.

@maribu
Copy link
Member

maribu commented Dec 31, 2024

I think with #21113 the work around in #19965 could indeed be dropped for good.

maribu added a commit to maribu/RIOT that referenced this pull request Jan 10, 2025
This reverts commit e3d0068, which
added a work around for two bugs:

- ztimer triggering too early (fixed in
  RIOT-OS#20924)
- gnrc_sock_recv() returning when an old "timeout" message is still
  in the message queue (fixed in
  RIOT-OS#21113)

With those bugs fixed, the work around should not longer be needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: sys Area: System Area: timers Area: timer subsystems CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Process: needs backport Integration Process: The PR is required to be backported to a release or feature branch Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants