Skip to content

Commit

Permalink
Read HTML string and generated PDF file in chunks.
Browse files Browse the repository at this point in the history
When the HTML content and generated PDF files get quite large (how large is too large depends on your system, OS, config and available resources), trying to read all of the content into memory can lead to `Errno::EINVAL` errors like `Invalid argument @ io_fread` and `Invalid argument @ io_write`.

Instead of reading this content entirely into memory, this content should be read in chunks to save memory usage.

After some benchmarking, a chunk size of 1MB was picked (`1024 * 1024`). Here are the benchmarks comparing different methods and chunk sizes for different content sizes:

```
13027836 bytes
13.03 MBs
                           user     system      total        real
write:                 0.000767   0.004443   0.005210 (  0.005312)
each_char:             5.756789   0.032231   5.789020 (  5.797378)
each_byte:             8.997680   0.067377   9.065057 (  9.179755)
StringIO 1 KB:         0.004029   0.006966   0.010995 (  0.011648)
StringIO 1 MB:         0.016100   0.007118   0.023218 (  0.023509)
StringIO 10 MB:        0.003347   0.006924   0.010271 (  0.010334)
StringIO 100 MB:       0.000456   0.003758   0.004214 (  0.007080)
StringIO 1 GB:         0.000468   0.003787   0.004255 (  0.005037)

706583272 bytes
0.71 GBs
                           user     system      total        real
write:                 0.001035   0.285726   0.286761 (  0.324529)
each_char:           362.444086   1.820033 364.264119 (365.362415)
each_byte:           548.788409   3.254867 552.043276 (553.390843)
StringIO 1 KB:         0.310588   0.331768   0.642356 (  0.697581)
StringIO 1 MB:         0.302101   0.325285   0.627386 (  0.671933)
StringIO 10 MB:        0.254845   0.294017   0.548862 (  0.895430)
StringIO 100 MB:       0.471879   0.429933   0.901812 (  1.181456)
StringIO 1 GB:         0.000471   0.260011   0.260482 (  0.653977)

5577825775 bytes
5.58 GBs
                            user     system       total         real
write:                     ERROR      ERROR       ERROR        ERROR
each_char:           2926.215017  38.658114 2964.873131 (3008.319599)
each_byte:           4305.082576  35.090730 4340.173306 (4363.944091)
StringIO 1 KB:          4.145908   3.962275    8.108183 (   9.490059)
StringIO 1 MB:          3.741062   2.779802    6.520864 (   7.423770)
StringIO 10 MB:         2.916272   2.553926    5.470198 (   6.271349)
StringIO 100 MB:        4.262794   3.007702    7.270496 (  10.986725)
StringIO 1 GB:          2.063459   4.572225    6.635684 (   9.212933)
```

You can see with the 5.58 GB content size, using `write` didn't even complete. Instead, I received a `Errno::EINVAL` error.

This allows significantly large PDFs to be generated.

Additionally, instead of just throwing a cryptic `Errno::EINVAL Invalid argument @ io_fread` error, I added a `rescue` that logs an error with a helpful description indicating if the HTML content or PDF file is too large.
  • Loading branch information
joshuapinter committed Oct 26, 2020
1 parent ee6a5e1 commit 7685921
Showing 1 changed file with 17 additions and 3 deletions.
20 changes: 17 additions & 3 deletions lib/wicked_pdf.rb
Original file line number Diff line number Diff line change
Expand Up @@ -41,16 +41,22 @@ def pdf_from_string(string, options = {})
options.merge!(WickedPdf.config) { |_key, option, _config| option }
string_file = WickedPdfTempfile.new('wicked_pdf.html', options[:temp_path])
string_file.binmode
string_file.write(string)
string_io = StringIO.new(string)
until string_io.eof?
string_file.write(string_io.read(1024 * 1024)) # Read 1 MB chunks at a time to avoid `Errno::EINVAL` errors like `Invalid argument @ io_fread` and `Invalid argument @ io_write`.
end
string_file.close

pdf = pdf_from_html_file(string_file.path, options)
pdf
rescue Errno::EINVAL => e
Rails.logger.error '[wicked_pdf] The HTML file is too large! Try reducing the size or using the return_file option instead.'
raise e
ensure
string_file.close! if string_file
end

def pdf_from_url(url, options = {})
def pdf_from_url(url, options = {}) # rubocop:disable Metrics/PerceivedComplexity
# merge in global config options
options.merge!(WickedPdf.config) { |_key, option, _config| option }
generated_pdf_file = WickedPdfTempfile.new('wicked_pdf_generated_file.pdf', options[:temp_path])
Expand All @@ -75,11 +81,19 @@ def pdf_from_url(url, options = {})
end
generated_pdf_file.rewind
generated_pdf_file.binmode
pdf = generated_pdf_file.read

pdf = ''
until generated_pdf_file.eof?
pdf << generated_pdf_file.read(1024 * 1024) # Read 1 MB chunks at a time to avoid `Errno::EINVAL` errors like `Invalid argument @ io_fread` and `Invalid argument @ io_write`.
end

raise "Error generating PDF\n Command Error: #{err}" if options[:raise_on_all_errors] && !err.empty?
raise "PDF could not be generated!\n Command Error: #{err}" if pdf && pdf.rstrip.empty?

pdf
rescue Errno::EINVAL => e
Rails.logger.error '[wicked_pdf] The PDF file is too large! Try reducing the size or using the return_file option instead.'
raise e
rescue StandardError => e
raise "Failed to execute:\n#{command}\nError: #{e}"
ensure
Expand Down

0 comments on commit 7685921

Please sign in to comment.