-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch from athenapdf to pagedjs-cli from HTML to PDF conversion #394
Comments
I tried a few quick tests with the pagedjs-cli Docker image from DockerHub, which corresponds to version 0.0.9. I was able to convert a toy HTML file that had a single header and a single paragraph. However, it hangs if I try to convert
where the page count continued increasing indefinitely until I killed it after 20 min. There's a good chance I'm doing something wrong or that it would work better by building the Docker image locally using their latest version of pagedjs. If anyone wants to test the Docker image, the executable is |
I installed pagedjs-cli 0.1.1 from npm: pagedjs-cli \
--page-size=A4 \
--inputs https://manubot.github.io/rootstock/v/97b294802ffcd39071b6e5b8ab59f60faf4be118/ \
--output output/pagedjs.pdf Output:
Here's the rendered PDF: pagedjs.pdf. Compare to athenapdf PDF here generated from Lines 72 to 75 in 97b2948
Opened upstream issues for the problems: |
Thanks for opening those issues. |
Here's something I hadn't considered until now: writing our own pdf conversion. It actually might not be as hard as we think... Take a look at this library: https://github.com/Richienb/pdfly/blob/master/index.js All we really need to do is have a way to programmatically open an instance of chrome (e.g. via Puppeteer) and print a document. https://github.com/westmonroe/pdf-puppeteer#readme (javascript) |
That’s depend on how much functionnalities you’d like to support. having a headless browser that generate a pdf is one thing, having a way to support css print features is way more complex (page number, cross references, footnotes, etc. for example —check the list here. We’ve been working hard on the footnotes for the last 6 months or so, so we’re a little bit behind our timeline. Especially as there is some cli update in the works. The issue opened are the ones we want to check as soon as the footnotes are shipped. What are the feature you may want to use? |
Yes, those features are difficult. Afaik we don't support those features yet, which is why I suggested using Puppeteer. But those features have been requested and are something that the team has wanted to support for a long time, so perhaps using Puppeteer wasn't a good suggestion in the long term. It could be something to switch to in the short term if Athena gives us problems though. Fwiw, of that feature list, I believe the most requested ones were page numbers and footnotes. |
it’s a good starting point to see what’s doable :)
Awesome, we’re almost there with that (page number is already something that work fines (it’s easy to build table of content) :) I’ll come back when our release is testable, so we’ll be able to help you if you wanna try it out. |
I'd strongly prefer if we could piggy back on an existing project, as I don't think we want the responsibility of maintaining a converter. Athena has worked quite well, but is no longer maintained. I think HTML-to-PDF is common enough of a conversion task we should be able to find existing projects with long-term backing. Time might be best spent contributing features to existing projects if there are small blockers for Manubot's use case. The pagedjs feature list looks impressive. And it's affiliation with Cabbage Tree Labs, whose mission is to make publishing more open, is promising. In my comment above, I linked to three issues that were potential blockers for Manubot to adopt pagedjs. I haven't gotten a reply on any of those issues. @julientaq is there a problem with notifications on the PagedMedia GitLab or insufficient developer bandwidth to respond to user feedback? We'd love to switch to pagedjs, and Manubot seems like an ideal use case for it, but we'll need the above issues looked at as well as a more confidence that the project will have the resources to deal with user requests and bug reports in a timely fashion. |
Noting that the source code for pagedjs has been migrated from Interestingly, there is also a pagedjs github at https://github.com/pagedjs/pagedjs. Not clear if that repo or https://gitlab.coko.foundation/pagedjs/pagedjs is where contributions should occur. @fchasen (active contributor) might know? Also @fchasen any ability to look into the issues we posted? |
Hi there! I’m sorry, i completely miss your message (from last year, that not really acceptable, i’m sorry!) So basically, our gitlab got completely screwed up by a couple of attaks and issues, and it was so silent that it wasn’t adressed for a while. And the github was supposedly a way to handle issues and merge requests coming in different places, but it’s not working as we’d hope (so long interoperability :-/). So yes, we’re back in in coko’s gitlab, which is the right place to manage your issues. I’ll check your issues right now! |
@dhimmel do you have an account on gitlab.coko.foundation? So i can add you to the issues? |
|
Please see this issue for another strong reason we need to abandon Athena: Key points: Athena is using Electron 3.0.5. The current version of Electron is 18. Electron 3.0.5 is using Chromium version 66.0.3359.181. The current version of Chrome is ~100. Something about combining |
Athenapdf has worked well but has two problems:
From https://www.pagedjs.org/documentation/02-getting-started-with-paged-js/
Links:
It looks like
pagedjs-cli
is installed via npm, with a Dockerfile available such that we could also create an image if needed.First step is to see whether pagedjs-cli has conversion fidelity as good or better than athenapdf.
The text was updated successfully, but these errors were encountered: