Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bluetooth disconnects after about 12 hours #2252

Open
richard9999999999 opened this issue Feb 10, 2025 · 17 comments
Open

Bluetooth disconnects after about 12 hours #2252

richard9999999999 opened this issue Feb 10, 2025 · 17 comments
Assignees

Comments

@richard9999999999
Copy link

I have a LE attribute server implemented on Pico2W plus ESP32 based GATT client. Everything runs correctly and reliably but for about 12 hours only. After 12 hours, I receive HCI_EVENT_DISCONNECTION_COMPLETE event on both sides (with reason set to 8 (CONNECTION_TIMEOUT)). After I restart the ESP32 client, I am getting the next connection but for few seconds only (client is able to read few attributes during that time) and then it disconnects again. Here if I restart the client again, I am getting the connection but again just for few seconds only..
To get things "fixed", I need to restart the Pico server site (restart from the debugger is enough) and everything works perfectly. But for next 12 hours only!
I've already spent quite lot of time debugging the code on both sites, but haven't found a root of the problem. "HCI_EVENT_DISCONNECTION_COMPLETE" hci event seems to be not sent by the software of any of both sites!
I know that this is not much info provided from my site, I apologize. However maybe somebody will immediately know the reason? Could the problem be on the CYW controller site? Here I've attempted to implement "on the fly" restart of the controller (after disconnection) however I was not able to maintain continuity yet (to initialize the controller correctly to be able to continue)..

@peterharperuk peterharperuk self-assigned this Feb 11, 2025
@peterharperuk
Copy link
Contributor

Do you have the issue on the develop branch? We have fixed some timer issues on there.

@richard9999999999
Copy link
Author

I am using main branch but I have changes in time_adapter.h from dev branch applied (I was one from those which reported problems with repeating timer plus If I remember right there was also another fix done there in dev). I really have spent a lot of time trying to get the root of the problem and timers was the first thing I was looking for. I've found nothing. So it seems like the problem lies inside the firmware of the CYW module?

@peterharperuk
Copy link
Contributor

Are you using async poll or background thread safe? Do you have any logging enabled that might reveal the problem? Does a debugger show the code is stuck somewhere? Blaming the firmware with little evidence is unwise.

@richard9999999999
Copy link
Author

richard9999999999 commented Feb 12, 2025

I am using async poll, logging just shows that HCI_EVENT_DISCONNECTION_COMPLETE is received with reason 8, Debugger shows nothing unusual. I suppose that HCI_EVENT_DISCONNECTION_COMPLETE is emitted by the BT module itself, it is always after 12 hours of connection. I am not blaming the firmware it is most probably feature rather than a bug (a feature I am not aware of). I just wanted to tell that the disconnection is not invoked by our software nor the software of the client site..

@peterharperuk
Copy link
Contributor

I've had this example running on two Pico W's for 14 hours /~https://github.com/raspberrypi/pico-examples/tree/master/pico_w/bt/standalone. It sends the temp from one device to another, and I modified it to send the temp every 1s. So I can't reproduce this on develop. I'll retest with Pico 2 W in polling mode.

@richard9999999999
Copy link
Author

Thank you. I have used following service and characteristics:
PRIMARY_SERVICE, ORG_BLUETOOTH_SERVICE_GENERIC_ACCESS
CHARACTERISTIC, ORG_BLUETOOTH_CHARACTERISTIC_DIGITAL, READ | NOTIFY | INDICATE | DYNAMIC,

Now I tried to set ones as per the temperature reading sample you mentioned:
PRIMARY_SERVICE, ORG_BLUETOOTH_SERVICE_ENVIRONMENTAL_SENSING
CHARACTERISTIC, ORG_BLUETOOTH_CHARACTERISTIC_TEMPERATURE, READ | NOTIFY | INDICATE | DYNAMIC,
Not sure if it can (should) make some difference so am I waiting till evening to see..

@peterharperuk
Copy link
Contributor

Are you running debug or release? i.e. What's you cmake command line?

@richard9999999999
Copy link
Author

I am using Release MinSizeRel configuration (optimized for minimal image size) using copy_to_ram image, however I tried also without copy_to_ram (run from flash) with the same result.
GCC 10.3.0

@richard9999999999
Copy link
Author

Some more info: Pico2w overclocked to 200Mhz, I am sending the data (about 90Bytes) 25 times per second..

@peterharperuk
Copy link
Contributor

These are quite important things that might have been relevant in your original report!

@peterharperuk
Copy link
Contributor

I modified the example to run at 200MHz and send the temp every 40ms. It's still running 24 hours later.

@richard9999999999
Copy link
Author

Thank you you are still paying attention to this problem. OK, I've made some experiments between while:
-I switched to debug build run from flash. No difference observed, data are sent 16 times per second in this build, runs for about 12 hours only so the same problem
-I tried do exclude stuff as much as possible but didn't find the culprit.

However today I've found something. A small recapitulation first:

  1. I am sending few data sets of various lengths after the client connection is established. The lengths of data are: 2, 220, 32, 33
  2. Then I am sending set of 80 bytes of data 25 times per second (16 times per second in slow debug build) in a loop
  3. After about 12 hours (always after this (resp. the same time)), the connection is broken
  4. Client connects again and exchanges that first various size data sets as in point 1) above
  5. After sending first 80 bytes data set (point 2), the connection is broken again. The points 4) and 5) are repeated forever then till the restart of Pico2w server. then it works for another 12 hour s again. Restart of client doesn't help.

I found following:

  • The 80 bytes long dataset (after which the connection is broken) gets transferred to client successfully
  • I tried to stop the code in debugger before sending of this dataset and modified the variable contents to send another one with different length (I tried also the longest 220 bytes one). It get through without connection lost. I made various attempts and it looks like the problem is related to that 80 as length of data..

I've restarted the pico server and modified the code to set that dataset length to 84 bytes. And waiting for the evening to see if that makes some difference :-)

@peterharperuk
Copy link
Contributor

Some random unhelpful thoughts...

I wonder if it's worth enabling the btstack traces. It will generate a lot of data over 12 hours but it might reveal something if you keep the last x MB of data. Maybe something to try next.
I wonder if using background thread safe is worth trying as it might reveal that you have delays calling the poll function.
Have you reproduced this on develop or are you still using a patched version of master?

@richard9999999999
Copy link
Author

richard9999999999 commented Feb 14, 2025

Yes, I will try to keep on. I am still using patched master version, usage of dev could be the next iteration. I am also planing try to build a test case (a minimalist code but as close as mine in the term of bluetooth functionality). If not successful I will try to enable the tracing..
Those iterations are pretty slow since the problem happens after quite long time. In each case, this time is "deterministic" thus I think typical thread/irq related "synchronization/resource access kind" of problems is ruled out (it should happen after semi random time in that case I suppose).
Plus I hope I am on track since I've found that the problem happens when I am trying to send exactly 80 Bytes of data (and no when e.g. 33 or 220 Bytes). And that it seems it is not related on the data content.. Need to do more checks next time the problem will happen..

@richard9999999999
Copy link
Author

Some update: I've found that the problem happens when I am sending data of following lengths only:
38,39,40,41,42,43,79,80,81,82,83,84 (bytes).
I modified my code to send 88 bytes and it still runs for about 26 hours now..

@richard9999999999
Copy link
Author

richard9999999999 commented Feb 16, 2025

..the lengths above are payload lengths (so no packet lengths).

@richard9999999999
Copy link
Author

Next update:
I tried dev branch: the same behavior like master.
I tried BT pooling mode and also background mode (low priority IRQ thread): the same behavior

My clocks initialization (not sure if related to problem):

set_sys_clock_khz(200 * 1000, true);

pll_init(pll_usb, 1, 1200 * MHZ, 4, 2);

clock_configure(clk_peri,
        0,
        CLOCKS_CLK_PERI_CTRL_AUXSRC_VALUE_CLKSRC_PLL_USB,
        150000 * 1000,
        150000 * 1000);

stdio_init_all();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants