Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]Stability needs to be improved with concurrent requests #4

Closed
manhere opened this issue Jan 22, 2025 · 17 comments · Fixed by #7 or #10
Closed

[BUG]Stability needs to be improved with concurrent requests #4

manhere opened this issue Jan 22, 2025 · 17 comments · Fixed by #7 or #10
Assignees

Comments

@manhere
Copy link

manhere commented Jan 22, 2025

Description

it seems that big content(tens to hundreds KB), if js, css packaged together to a single page, output does not seem to be a big problem, but web page using external js, css ... result in multiple requests concurrently, then it will be jammed or return the truncated content.

files are stored in PROGMEM and gzip compressed. based on the description in the above paragraph:
AsyncWebServerResponse *response = request->beginResponse(200, "application/javascript", VENDOR_JS, VENDOR_JS_SIZE);
output is completely unusable: the content is truncated.

trying to use the chunked output:

server.on("/static/js/chunk-vendors.ddd1c7e2.js", HTTP_GET, [](AsyncWebServerRequest *request) {
        String len = String(VENDOR_JS_SIZE);
        if(request->header(asyncsrv::T_INM) == len) {
            request->send(304);
            return;
        }
        AsyncWebServerResponse *response = request->beginChunkedResponse("application/javascript", [](uint8_t *buffer, size_t maxLen, size_t index) -> size_t {
            size_t remainingSize = VENDOR_JS_SIZE - index;
            if(remainingSize <= 0) {
                return 0;
            }
            const int chunkSize = min((size_t)1024, min(maxLen, remainingSize));
            memcpy_P(buffer, VENDOR_JS + index, chunkSize);
            return chunkSize;
        });
        response->addHeader(asyncsrv::T_Cache_Control, "public,max-age=600");
        response->addHeader(asyncsrv::T_ETag, len);
        response->addHeader("Content-Encoding", "gzip");
        request->send(response);
    });

barely works, occasionally the content is truncated.
trying to increase “CONFIG_ASYNC_TCP_QUEUE_SIZE” and “CONFIG_ASYNC_TCP_STACK_SIZE” does not help much either.

Thanks for your help.

Board: _esp32dev("flash_size": "4MB", "maximum_ram_size": 327680)

@manhere manhere changed the title [BUG]Stability needs to be improved with oncurrently requests [BUG]Stability needs to be improved with concurrent requests Jan 22, 2025
@mathieucarbou
Copy link
Member

@manhere : hello, I've just updated the issue template. can you please go through it below give the missing info when applicable. Thanks!

PLEASE MAKE SURE TO READ THE RECOMMENDATIONS BEFORE OPENING A BUG REPORT:

/~https://github.com/ESP32Async/ESPAsyncWebServer?tab=readme-ov-file#important-recommendations-for-build-options

PLEASE READ THE FOLLOWING BEFORE OPENING A BUG REPORT:

If you want to discuss a feature, improvement, or as a question, please create a discussion here: /~https://github.com/ESP32Async/ESPAsyncWebServer/discussions/categories/general

If you have a crash, regression or found an issue, please give the following information:

  • Which board:
  • Which platform (ESP8266, ESP32):
  • AsyncTCP version (if applicable):
  • AsyncTCP repository location you use (if applicable and if you use one coming from elsewhere):
  • ESPAsyncTCP version (if applicable):
  • ESPAsyncTCP repository location you use (if applicable and if you use one coming from elsewhere):
  • Do you use AsyncTCPSock (y/n):

Please post a complete decoded stack trace below in case of a crash (if applicable):


Please post some code, ino file, minimal reproductible use case (if applicable):

@manhere
Copy link
Author

manhere commented Jan 22, 2025

board: esp32dev (esp32-wroom-32)
platform (ESP8266, ESP32): ESP32
framework = arduino

Dependency Graph
|-- ESPAsyncWebServer @ 3.6.0
|-- AsyncTCP @ 3.3.2
|-- WiFi @ 2.0.0

following attachment is a minimal replication project:

webpage.zip

thanks

@mathieucarbou
Copy link
Member

@manhere : and your progmem content is really all gzip encoded right ?

@manhere
Copy link
Author

manhere commented Jan 22, 2025

@manhere : and your progmem content is really all gzip encoded right ?

yes

=======================================================
the code attached above basically works, but reloading the page will result in incomplete downloads leading to a browser error “content length mismatch” or “decode failed”(==unzip failed?), and the chunking method causes low performance, the "chunk-vendors.ddd1c7e2.js"(169K) needs about 15 seconds to load, adjust the chunksize will cause the program to crash.

please advise how to improve performance in case of large file downloads.

thanks

@mathieucarbou
Copy link
Member

mathieucarbou commented Jan 22, 2025

I found what is happening.

@vortigont implemented in AsyncTCP and ESPAsyncWebServer a sort of throttling for responses taking a long time to commit in the network, because when serving files from SD cards (which is slow), it can block for a long time which can cause the esp32 to deadlock or trigger the watchdog.

Refs:

So this is implemented in the chunk code also.

See: 3d3456e#diff-646b25b11691c11dce25529e3abce843f0ba4bd07ab75ec9eee7e72b06dbf13fR388-R392

// for response data we need to control the queue and in-flight fragmentation. Sending small chunks could give low latency,
// but flood asynctcp's queue and fragment socket buffer space for large responses.
// Let's ignore polled acks and acks in case when we have more in-flight data then the available socket buff space.
// That way we could balance on having half the buffer in-flight while another half is filling up, while minimizing events in asynctcp q

In our example, we used a 1024 chunk size to specifically increase the chunks. But in an app use case, you need to set the chunk size as high as your heap will possibly allow. 1k is really low! You are then serving a LOT of resources with small chunks, so you increase the concurrency level of chunk serving, which is slower because chunks are served in loops of small buffers. And they battle each other because you then have to serve several long requests at the same time.

If you look at the logs, you should see a lot of:

 85114][D][WebResponses.cpp:362] _ack(): (chunk) out of in-flight credits

It would be more efficient in your case:

  • to serve the static files sequentially using one chunk size of 8196 for example.
  • to limit the chunk serving (i.e. some resources like INDEX_HTML do not need)
  • to increase the chunk buffer - for example, with 8196 there is no message anymore.

@mathieucarbou mathieucarbou added Type: Question Further information is requested and removed Status: Awaiting triage labels Jan 22, 2025
@mathieucarbou
Copy link
Member

mathieucarbou commented Jan 22, 2025

@vortigont @me-no-dev : about the use case above... I think _in_flight_credit should only be used in the case of really slow chunk serving (like the bug we fixed with serving from sd card), but maybe not in all cases ?

What do you think ?

Should we just issue surround the code with a macro and the user activates the feature globally only if his app is doing some slow file serving ? Or should it be there by default and we add a macro to disable it ?

@mathieucarbou
Copy link
Member

mathieucarbou commented Jan 22, 2025

@vortigont : I have opened PR #7 but I will let you check first in case you see a better solution...

@vortigont
Copy link

I would strongly advise to use built-in static handler for files on FS.
"/static/js/chunk-vendors.ddd1c7e2.js
your code with String buffer is far from perfect. Also you do not need to use _P functions on ESP32.

@mathieucarbou
Copy link
Member

@manhere : I added a macro flag: -D ASYNCWEBSERVER_USE_CHUNK_INFLIGHT=0|1, set to 0 by default.
So the throttling code is disabled by default which should solve your issue.

And the few people who need slow downloads with chunk encoding like file transfer from SD card will be able to switch it to 1 and have their application built in consequence of that.

Note that the way your code is, it is still subject to some issues long-term because you are not really increasing response speed by increasing concurrency and lowering the chunk buffer amount: you are more increasing the workload on ESP size. If for example you happen to have some file download on top of that, you will run into issues.

It could be better to server 4 requests consecutively with one chunk buffer of 8192k instead of serving 4 concurrent chunk request each with a buffer of 1024k.

Imagine you have 2 or 3 users opening you page at the same time ;-)

@manhere
Copy link
Author

manhere commented Jan 22, 2025

@mathieucarbou
Thanks for the reply.
As mentioned above, if I change 1024 to 8192, the page will be no responding, in the browser developer tools "network tab" shows “pending”, not sure how to fix this.

@mathieucarbou
Copy link
Member

mathieucarbou commented Jan 22, 2025

As mentioned above, if I change 1024 to 8192, the page will be no responding, in the browser developer tools "network tab" shows “pending”, not sure how to fix this.

I was not able to reproduce that with esp32dev and Chrome, sorry... And with the information we have here in this issue there is nothing we can help with on that point.

Note: for now, can you please update your lib_deps section and point to:

/~https://github.com/ESP32Async/ESPAsyncWebServer/archive/refs/heads/main.zip

to test ?

@mathieucarbou mathieucarbou self-assigned this Jan 22, 2025
@manhere
Copy link
Author

manhere commented Jan 23, 2025

I was not able to reproduce that with esp32dev and Chrome, sorry... And with the information we have here in this issue there is nothing we can help with on that point.

Note: for now, can you please update your lib_deps section and point to:

/~https://github.com/ESP32Async/ESPAsyncWebServer/archive/refs/heads/main.zip

to test ?

Thanks for the reply. Using the configuration you provided, the results remain the same. In my real project (using lvgl to refresh the screen to show some sensor parameters, sdcard to log them, webserver to provide a webpage to download the log file) I still can't set a higher chunksize, the webpage is completely unresponsive (at this point the output of ESP.getFreeHeap() is around 60000), only with the skeleton project(ESPAsyncWebServer ONLY) can I set the chunksize to 8192 and get better performance(js of size 169K is transferred in about 1 second, ESPAsyncWebServer@3.6.0 took 10-14 seconds).

I don't have a lot of experience with ESP32, any suggestions for this situation? Thanks.

@vortigont
Copy link

In our recent tests we were able to serve dozens of megs files from SD card without any problems.
And I do not think that using chunked encoding is the right way to serve large blobs. Blob size is known in advance and server can handle it's transfer using proper response with size.

@manhere could try to check two options:

  • try to use built-in AsyncProgmemResponse, it was intended for such cases though it's name is confusing for esp32 since it does not have a progmem
  • try if you can serve this same file from littlefs with static handler. It might not be what you want in your project but just for the sake of test purpose

@manhere
Copy link
Author

manhere commented Jan 23, 2025

  • try to use built-in AsyncProgmemResponse, it was intended for such cases though it's name is confusing for esp32 since it does not have a progmem

That's what I was using in the beginning, and it was completely unavailable. Maybe it's because my program is running on low memory...lvgl takes up most of the memory and now freeheap has about 60000 available. it's really depressing.

@mathieucarbou
Copy link
Member

ESPAsyncWebServer@3.6.0 took 10-14 seconds).

Did you try with ESPAsyncWebServer@3.6.1 ?

@mathieucarbou
Copy link
Member

  • try to use built-in AsyncProgmemResponse, it was intended for such cases though it's name is confusing for esp32 since it does not have a progmem

@manhere : @vortigont is totally right also. Do not use chunk encoding for our embedded flash data.

Use: void send(int code, const char* contentType, const uint8_t* content, size_t len)

If you have other issues with that, this is probably more related to few heap. send() will serve the data in chunk sizes optimized to match the client pcb free space (free lwip tcp buffer). So it can be more than 1024 like you have in your code.

But whatever you do you are screwed by the low heap.

  • Using send() => you will run into issues because multiple requests could be concurrently served with a buffer size > 1024
  • Using chunk like you did => you will run into issues because the low chunk buffer will slow down the request processing and increase the amount of "slow" requests to serve, also accumulating the chunk buffers to create at the same time

You really need to refactor your application. If you are using LVGL to display and it takes a LOT of heap, then, you need to create your website according to that. The current one is not suitable at all. Do instead like ESP-DASH team is doing: serve only one blob containing HTML + script + CSS, and then open a websocket connection to communicate with the UI. Avoid HTTP requests as much as possible, especially concurrent ones.

@mathieucarbou
Copy link
Member

@manhere : in v3.6.2 you will need to set the flag: -D ASYNCWEBSERVER_USE_CHUNK_INFLIGHT=0 in your case if you keep the current code you have.

See: #7 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment