Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mbedtls_ssl_handshake() hangs #182

Closed
ole-johan opened this issue Aug 10, 2016 · 12 comments
Closed

mbedtls_ssl_handshake() hangs #182

ole-johan opened this issue Aug 10, 2016 · 12 comments

Comments

@ole-johan
Copy link

I am working on an application based on the 'http_get_mbedtls' example to push data to a server using TLS. It is all working quite well most of the time, but sometimes the call to mbedtls_ssl_handshake() never returns. Other tasks keep running. I can have several hundred successful calls, and then one that hangs.
Should it block like this, or is there a way to add a timeout to this call?

Attached are two log files, one of a successful call and the other when the call hangs.
It seems to happen around here:

ssl_cli.c:2944: client state: 10
...
ssl_tls.c:2429: message length: 6, out_left: 6
HANG

log_success.txt
log_failed.txt

@kanflo
Copy link
Contributor

kanflo commented Aug 10, 2016

Hi @ole-johan. Without having looked deeper into this I would say no it should not hang. It would be good if you could nail down the hang somewhat. Eg, does it hang in mbedtls, lwip, the Espressif wifi driver or somewhere else.

@projectgus
Copy link
Contributor

projectgus commented Aug 11, 2016

If the server end of the socket goes away for some reason then the handshake will block on read() until the TCP layer times out. This takes a while. If you leave it for an exended period (five minutes-ish, I think from memory? it's part of the LWIP configuration) then maybe it will fail.

The solution is to call mbedtls_ssl_conf_read_timeout to set a timeout on all TLS-related reads. This should work, and possibly we should add it to the example (PRs welcome!)

@ole-johan
Copy link
Author

Hi guys, thanks for your response!

I already have added read timeout like this:

mbedtls_ssl_conf_read_timeout(&conf, 1000); 
...
mbedtls_ssl_set_bio(&ssl, &server_fd, mbedtls_net_send, NULL, mbedtls_net_recv_timeout);

so I know that mbedtls_ssl_read() is not blocking.

If that timeout also applies to the underlying calls during mbedtls_ssl_handshake(), I assume the problem is something else.

@projectgus
Copy link
Contributor

Yes, it should apply.

If you turn up the mbedtls debugging level (in the main header) you should get some fairly useful debug dumps out of it (you should be able to debug when it's calling in/out of the socket layer, which is probably the most useful thing to look at.)

@ole-johan
Copy link
Author

I changed to mbedtls_debug_set_threshold(4) which gave a lot of output, but nothing that shed more light over the problem. A successful and failing run have the exact same log up until the hang.

However, when I increased the log level, it took a lot more iterations before the hang. It now run successfully 568 times before hanging, with lower log level the problem generally occurs a sooner. Could be a coincidence...

If there are other ways to increase the debug output, maybe from a lower level, please let me know and I will enable and try.

@rongsaws
Copy link
Contributor

The log message "Timer Stop Failed" seems an indication of problem. I remember I saw similar errors when I had higher priority tasks blocking the system timer task, and then everything became unstable. Did you set your task priority higher than 2? I didn't look into the details, but it seems the system timer is implemented as a low priority task, which is kinda problematic.

@ole-johan
Copy link
Author

I actually have two non-related tasks that had a higher priority than 2. I took them down to 2, and the "Timer Stop Failed" message disappeared. So that was probably a good thing, thanks for that tip!

Unfortunately, the TLS handshake hang problem remains...
I am quite new to this platform, but I would like to debug this further. @projectgus, you mentioned increasing debug output, could you specify how I do that? I assume you mean at lower levels, as I already used the mbedtls_debug_set_threshold(4) setting.

@recursify
Copy link
Contributor

@ole-johan So comparing the failed logs to the success one, it seems like the failed case stops just before a call to ssl->f_send() - could be that the socket is blocking on the write?

What happens when you make the socket non-blocking before the handshake. You'll have to manually calculate the timeout, in this case. It will be something like:

mbedtls_net_set_nonblock(&server_fd)
mbedtls_ssl_set_bio(&ssl, &server_fd, mbedtls_net_send, mbedtls_net_recv, NULL);

...

while((ret = mbedtls_ssl_handshake(&server_fd)) != 0)
{
    if(ret != MBEDTLS_ERR_SSL_WANT_READ && ret != MBEDTLS_ERR_SSL_WANT_WRITE)
    {
        // Actual failure!!
    }
}

// Success!

@ole-johan
Copy link
Author

I finally have my application running stable for quite some time. Seems to me that the solution has been to reduce the FreeRTOS priority to 1 to all my custom tasks. Several of my problems, including this, has gone away after I set the priority to 1. So maybe not a scientific solution, but I am happy for now.

@nitinrawal
Copy link

The Function mbedtls_ssl_handshake() is hanging. I got the point where it is hanging and giving error. the code snippet is
if ((pcb->snd_queuelen >= TCP_SND_QUEUELEN) || (pcb->snd_queuelen > TCP_SNDQUEUELEN_OVERFLOW)) {
LWIP_DEBUGF(TCP_OUTPUT_DEBUG | LWIP_DBG_LEVEL_SEVERE, ("tcp_write: too long queue %"U16_F" (max %"U16_F")\n",
pcb->snd_queuelen, (u16_t)TCP_SND_QUEUELEN));
TCP_STATS_INC(tcp.memerr);
pcb->flags |= TF_NAGLEMEMERR;
return ERR_MEM;
}

in tcp_out.c
here the value for pcb->snd_queuelen is coming 60449. I do not know why.

I wana implement TLS client for STM32f4xx processor with tcp transport layer not socket. please anybody share the project files.

Thanks,
Nitin

@ourairquality
Copy link
Contributor

Would you be able to try it using the branch with updated lwip here in PR #706 Using SSL on the esp8266 is a stretch for memory, might this just be a result of running low on memory.

@ChenHsiang
Copy link

ChenHsiang commented Jun 9, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants