Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: monitor PDF generation performance #247

Closed
getlarge opened this issue Feb 21, 2024 · 6 comments · Fixed by #248
Closed

perf: monitor PDF generation performance #247

getlarge opened this issue Feb 21, 2024 · 6 comments · Fixed by #248
Assignees
Labels
perf test Adding tests

Comments

@getlarge
Copy link
Contributor

Description

From our usage of S1Seven's platform, we observed that producing PDF for an average certificate takes 2-3 seconds.
Still, we have yet to determine where most of the time is spent and if the process can be improved.

I suggest running measurements in our example scripts and eventually adding some thresholds in our integration tests.

@getlarge getlarge added test Adding tests perf labels Feb 21, 2024
@getlarge getlarge self-assigned this Feb 21, 2024
@getlarge getlarge linked a pull request Feb 21, 2024 that will close this issue
9 tasks
@getlarge
Copy link
Contributor Author

getlarge commented Feb 22, 2024

Benchmark

Here is the outcome of some high-level benchmarks for the PDF generation examples.
I refactored the examples to isolate the following phases:

  • generateContentInSandbox -> When the compiled script generating the pdfmake Content from the JSON certificate runs into node vm
  • print -> When instantiating PdfPrinter and creating the pdfkit PDFDocument
  • store -> When the PDFDocument stream is piped to a writable stream to store the output into a local file until its completion

generate-en10168-pdf-template

Phase Duration (ms) Memory used (heap) (MB)
before X 114.41
generateContentInSandbox 12.679 114.86
print 753.185 135
store 1294.031 125.32
after 2060.447 125.34

generate-coa-pdf-template

Phase Duration (ms) Memory used (heap) (MB)
before X 116.73
generateContentInSandbox 35.927 121.09
print 665.618 142.89
store 1207.106 133.04
after 1909.240 133.05

It is quite surprising that most of the processing time is spent in the store phase, when reading the pdfkit PDFDocument stream.
I will generate some CPU profiles to check where (which modules and functions) precisely the time is spent.

@getlarge
Copy link
Contributor Author

getlarge commented Feb 22, 2024

The flame graph below results from a CPU profiling of the store phase. It shows that most of the time is consumed by the function tinf_inflate_block_data, it comes from tiny-inflate library which decompresses buffer during font processing (!).

Screenshot 2024-02-22 at 10 54 08

I now think of the following potential improvement tracks:

  • understand why font processing is so time-consuming (could this be affected by the type/format of the fonts?)
  • disabling compression/decompression (this could result in larger memory consumption, but might be acceptable for the JSON certificates)
  • replacing the decompression library with a more performant one

@getlarge
Copy link
Contributor Author

The most straightforward track is "try another font". I did that, and the results are hard to believe!

generate-coa-pdf-template

Phase Duration (ms) Memory used (heap) (MB)
before X 121.57
generateContentInSandbox 32.749 125.93
print 107.098 125.01
store 108.804 127.53
after 249.143 127.53

Yes, you read correctly; the process is 7 times faster.
To achieve this result, I simply replaced the current lato-font package by this one.
I understand that lato-font is being replaced, but I encourage you to run this benchmark before making a choice.

I suspect performance could be even better when using optimal fonts with the WOFF2 format, but during the trial, I ran into a
RangeError [ERR_BUFFER_OUT_OF_BOUNDS]: Attempt to access memory outside buffer bounds thrown by fontkit.

@getlarge
Copy link
Contributor Author

And this is the rendered PDF:
generating.pdf

The fonts are not exactly the same weight as previously, but it is just a matter of picking the right ones from the @fontsource/lato package.

@eamon0989
Copy link
Member

Nice job with the benchmarking, remarkable how much of a difference changing the font made! After reading through this, I recommend that the related PR be merged first and that @christophbuehler runs the performance benchmarks against the proposed new font as the time saved is significant!

@christophbuehler
Copy link
Contributor

Nice job with the benchmarking, remarkable how much of a difference changing the font made! After reading through this, I recommend that the related PR be merged first and that @christophbuehler runs the performance benchmarks against the proposed new font as the time saved is significant!

Indeed the results with a new font are surprising. I was also not able to get a PDF/A-3 compliant document with that font because of some incorrect measurements. However, to keep things separated, we should merge #241 asap and update the performance monitoring (this branch) to take the most recent code changes (attachments / new font / PDF/A3) into consideration. Let's not mix responsibilities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf test Adding tests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants