From 7685921663d4a531310ac9a3eb621aef37819ae1 Mon Sep 17 00:00:00 2001 From: Joshua Pinter Date: Sun, 25 Oct 2020 19:47:05 -0600 Subject: [PATCH] Read HTML string and generated PDF file in chunks. When the HTML content and generated PDF files get quite large (how large is too large depends on your system, OS, config and available resources), trying to read all of the content into memory can lead to `Errno::EINVAL` errors like `Invalid argument @ io_fread` and `Invalid argument @ io_write`. Instead of reading this content entirely into memory, this content should be read in chunks to save memory usage. After some benchmarking, a chunk size of 1MB was picked (`1024 * 1024`). Here are the benchmarks comparing different methods and chunk sizes for different content sizes: ``` 13027836 bytes 13.03 MBs user system total real write: 0.000767 0.004443 0.005210 ( 0.005312) each_char: 5.756789 0.032231 5.789020 ( 5.797378) each_byte: 8.997680 0.067377 9.065057 ( 9.179755) StringIO 1 KB: 0.004029 0.006966 0.010995 ( 0.011648) StringIO 1 MB: 0.016100 0.007118 0.023218 ( 0.023509) StringIO 10 MB: 0.003347 0.006924 0.010271 ( 0.010334) StringIO 100 MB: 0.000456 0.003758 0.004214 ( 0.007080) StringIO 1 GB: 0.000468 0.003787 0.004255 ( 0.005037) 706583272 bytes 0.71 GBs user system total real write: 0.001035 0.285726 0.286761 ( 0.324529) each_char: 362.444086 1.820033 364.264119 (365.362415) each_byte: 548.788409 3.254867 552.043276 (553.390843) StringIO 1 KB: 0.310588 0.331768 0.642356 ( 0.697581) StringIO 1 MB: 0.302101 0.325285 0.627386 ( 0.671933) StringIO 10 MB: 0.254845 0.294017 0.548862 ( 0.895430) StringIO 100 MB: 0.471879 0.429933 0.901812 ( 1.181456) StringIO 1 GB: 0.000471 0.260011 0.260482 ( 0.653977) 5577825775 bytes 5.58 GBs user system total real write: ERROR ERROR ERROR ERROR each_char: 2926.215017 38.658114 2964.873131 (3008.319599) each_byte: 4305.082576 35.090730 4340.173306 (4363.944091) StringIO 1 KB: 4.145908 3.962275 8.108183 ( 9.490059) StringIO 1 MB: 3.741062 2.779802 6.520864 ( 7.423770) StringIO 10 MB: 2.916272 2.553926 5.470198 ( 6.271349) StringIO 100 MB: 4.262794 3.007702 7.270496 ( 10.986725) StringIO 1 GB: 2.063459 4.572225 6.635684 ( 9.212933) ``` You can see with the 5.58 GB content size, using `write` didn't even complete. Instead, I received a `Errno::EINVAL` error. This allows significantly large PDFs to be generated. Additionally, instead of just throwing a cryptic `Errno::EINVAL Invalid argument @ io_fread` error, I added a `rescue` that logs an error with a helpful description indicating if the HTML content or PDF file is too large. --- lib/wicked_pdf.rb | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/lib/wicked_pdf.rb b/lib/wicked_pdf.rb index 925ed269..8d8e15bf 100644 --- a/lib/wicked_pdf.rb +++ b/lib/wicked_pdf.rb @@ -41,16 +41,22 @@ def pdf_from_string(string, options = {}) options.merge!(WickedPdf.config) { |_key, option, _config| option } string_file = WickedPdfTempfile.new('wicked_pdf.html', options[:temp_path]) string_file.binmode - string_file.write(string) + string_io = StringIO.new(string) + until string_io.eof? + string_file.write(string_io.read(1024 * 1024)) # Read 1 MB chunks at a time to avoid `Errno::EINVAL` errors like `Invalid argument @ io_fread` and `Invalid argument @ io_write`. + end string_file.close pdf = pdf_from_html_file(string_file.path, options) pdf + rescue Errno::EINVAL => e + Rails.logger.error '[wicked_pdf] The HTML file is too large! Try reducing the size or using the return_file option instead.' + raise e ensure string_file.close! if string_file end - def pdf_from_url(url, options = {}) + def pdf_from_url(url, options = {}) # rubocop:disable Metrics/PerceivedComplexity # merge in global config options options.merge!(WickedPdf.config) { |_key, option, _config| option } generated_pdf_file = WickedPdfTempfile.new('wicked_pdf_generated_file.pdf', options[:temp_path]) @@ -75,11 +81,19 @@ def pdf_from_url(url, options = {}) end generated_pdf_file.rewind generated_pdf_file.binmode - pdf = generated_pdf_file.read + + pdf = '' + until generated_pdf_file.eof? + pdf << generated_pdf_file.read(1024 * 1024) # Read 1 MB chunks at a time to avoid `Errno::EINVAL` errors like `Invalid argument @ io_fread` and `Invalid argument @ io_write`. + end + raise "Error generating PDF\n Command Error: #{err}" if options[:raise_on_all_errors] && !err.empty? raise "PDF could not be generated!\n Command Error: #{err}" if pdf && pdf.rstrip.empty? pdf + rescue Errno::EINVAL => e + Rails.logger.error '[wicked_pdf] The PDF file is too large! Try reducing the size or using the return_file option instead.' + raise e rescue StandardError => e raise "Failed to execute:\n#{command}\nError: #{e}" ensure