-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Render action randomly fails when exporting to PDF: ERROR: Couldn't find open server #45
Comments
This error does not ring a bell to me. @cscheid did you already encountered such error ? @rogerbramon can you share more on the |
This is the first time I see this error. Thank you for the report! |
Content is basically pure Markdown and a mermaid diagram using |
I think this could come from that content. For PDF output, it needs to be printed to image using Chrome or Chromium. And it seems this cause issue (with server not found)
If you do a test repo to reproduce, that will be really useful to reproduce and investigate |
Here you have the test repo, and I was able to reproduce the problem at the 3rd attempt (same code, just re-run jobs). Successful attempt: https://github.com/rogerbramon/test-quarto/actions/runs/3045981181/attempts/2 |
Thanks a lot!
It seems not good that this is working on some run, and not on other... Like internal issue in the runners with connecting to chrome headless. 🤔 |
Could it be that sometimes the Google Chrome takes a bit longer to spin up? If I get it right, you have a timeout of only 3 seconds on I tried adding a previous step that calls the Chrome headless, and the problem is not occurring then. That's why timeout is my guess... Any thoughts? - name: Check chrome
run: |
echo $(which google-chrome)
$(which google-chrome) --headless --single-process https://www.chromestatus.com |
Somewhat disconcertingly, the only hit on google for that error is another open issue on quarto quarto-dev/quarto-cli#1822 |
Oh interesting. Thanks @cscheid ! I am still trying to find a clean environment to reproduce the issue as now it working in my WSL, after I install the deb google-chrome. I just don't know the change. Before that I could reproduce each time though... here it is also an error in Ubuntu in GHA. Maybe this will help us find the reason if I manage to debug this in the workflow directly. |
@cderv, you may use the action-tmate action to get access to the runner system via SSH and debug there. |
Just to add a bit of info, I've also experienced this problem on macOS a couple of times. But it's much more difficult to reproduce. Not sure if that adds noise or are different issues, but besides the
It's not easy to reproduce, but it's easier to get the error when there's no instance of Google Chrome opened. |
So I manage to reproduce locally for quarto-dev/quarto-cli#1822 - the issue there is systematic not occasional and probably due to missing system requirement for the chromium that we install from puppeteer. I believe Github action has it all so not the issue here because occasional also.
By the way this is the only hit because this is an error thrown by Quarto When the I have made a PR in Quarto so that we can have more information if the error is a failed attempt to run chrome.
We could indeed look in the timeout. However, I think we attempt already 60 times with 50s between each attempt. Maybe quarto-dev/quarto-cli#2499 will show some issue with running chrome itself and not with timeout. It should be merged soon and available in a pre-release to use it.
@rogerbramon this issue is thrown by quarto also when targets are not valid somehow for the chrome remote interface. It means the chrome was launched correctly, quarto connected correctly to the remote debugging port, but... there is something else not correct in the interaction with the headless browser. If you have an example to share where this happens, that would help |
Thanks @cderv for your time. I tried the latest pre-release version (1.2.134) on the testing repo and, unfortunately, I've not been able to get the error because the render step now hangs (3 of 7 attempts). I had to cancel the workflow after 5min. https://github.com/rogerbramon/test-quarto/actions/runs/3088361081/jobs/4994835499 Regarding the |
That is bad. It is probably a different issue than quarto-dev/quarto-cli#1822 - I'll look into that next; |
Hi @cderv, not sure if it's related to that but with version 1.3 sometimes it hangs forever. |
oh no... sorry about that. We added printing stack trace by default when there is an error. Do you have more error too share ? or a link to an action log ? Is this happening every time ? |
It happens randomly like before, but now it doesn't fail but keeps running, and you need to cancel it. I'm not able to see any log. Using the same test repo, I just updated to use the latest version. You can see that Attempt #1 and Attempt#3 got stuck and I had to cancel them, and Attempt#2 succeeded. I enabled debug logging on the latest attempt, but I don't see many insights. Thanks. |
That will not be easy to debug. I am surprised we get no log at all, no trace. Thanks for the report again. I don't think that will change anything, because it is probably not the action itself and something on GHA runners with Sorry for the inconvenience, I will try to investigate but not sure where to look exactly. |
Thanks @cderv, what I've noticed is that this issue seems to disappear when adding a step that calls chromium before running quarto: - name: Check chromium
run: |
echo $(which chromium-browser)
$(which chromium-browser) --headless https://www.chromestatus.com With this step, the action ran successfully for 10 times in a row. However, as soon as I removed it, it started freezing again. HTH |
Oh really interesting debugging step. 🤔 Can you help me understand further your testing ?
Does this step only call chromium and close it ? Or does it leave it open for next step you think ? I wonder what is the effect this command could have. I don't know if you have the environment needed, but did you observe the freezing on a non-gha unbuntu machine ? i'll read through the code in quarto with this new information in mind. @cscheid if you have ideas, feel free to chime in. |
@rogerbramon That is super fascinating. I wonder if that check causes some deferred library loading that takes a while, and prevents an eventual race condition. I would be happy keeping the check in our actions. In fact, I wonder if this check would also fix some of the hard-to-track bugs we've been seeing with chromium in Linux on the quarto-cli repo! @cderv What do you think about simply adding that action to our render step? |
Oh great idea ! Not sure what it did not occured to me 😅 I guess it cost nothing to do it just in case someone needs chrome with Quarto. Sounds good ! |
It does cost the time to run it, but that really shouldn't be much. Let's try that! |
I have a new v2 release of the action including this fix for Ubuntu only right now. MacOs and Windows runner needs som adjustment.
|
Thank you guys for looking into this. I don't have more info at this moment to answer your questions. I'll need to invest more time. So far, I've only experienced this issue on GHA. Locally, I use Mac. I can try to use devcontainers or codespaces to see if this issue can be reproduced.
Would make sense to add this workaround on the Setup action instead of the render one? I'm saying this because the Publish action also renders by default and sometimes, depending on the parameters you need, you have to use a shell step to directly run |
Oh indeed... It would probably make sense to add that to the setup action instead so that it covers render and publish. Thanks for the feedback ! |
In my case, I don't use the render action because I need to define extra parameters to the Not use if anyone is using the render or publish action without the setup, but in my case I find it very useful. I don't want to force anything, just explaining my use case. |
Maybe we should allow that too ?
That makes sense. We could probably expect someone using the publish or render action to have used the setup action in the first place. Maybe we should document this chromium trick also |
Would be handy to have a free parameter to add whatever parameters you need. |
Hey, just to let you know that I've been using this workaround for a while, but unfortunately we still experience the problem randomly. |
So even for ubuntu this is not enough ? It seems really related to GHA runners. We are looking at new chrome development that may improve things for Quarto https://developer.chrome.com/blog/chrome-for-testing/ |
I just wanted to report that we were experience something similar in the mlr3 book, i.e. the actions just times out randomly. When rendering with the
the chapter that uses mermaid ended with
One CI run where this happened can be found here. We are rendering to both html and pdf. What we also observed (not with 100% certainty, as the error is stochastic) is that rendering to html and pdf in two separate CI steps ( We also included the installation of chromium in our CI:
|
Thanks a lot for the detailed explanation.
There was a hang in the CI and it was cancelled because I am not surprised that the chapter with mermaid is the issue. We think this issue is related to using Chrome on GHA runner. But we really don't know what happens really, and how to solve. It seems initiating Chromium somehow helped for some times (#45 (comment)) but still error appears. Maybe the version we allow to install with It could worth a try and see if you still encounter the issue maybe ? |
Thanks for the quick response! For now I will just render the mermaid diagram once and include it as a figure, until it is clear what the bug was and how it can be solved. What was also surprising retrospect, is why the
In our particular case the additional information about the knitr engine allowed us to track down the bug. |
This is mainly a documentation issue; This flag will set a |
Thanks! I have created an issue about this: quarto-dev/quarto-cli#7502 |
For a while, we have suffered from random timeouts when rendering the book. The log-output from GHA during the "Render book" stage just ended with: |.............................................| 100% output file: advanced_technical_aspects_of_mlr3.knit.md Error: The operation was canceled. after hitting the maximum runtime of 6 hours. (the success rate was around 50/50). When rendering with the --execute-debug flag more log-output was given. The log output at the end of rendering the technical chapter (pdf) was: | |.............................................| 100% output file: advanced_technical_aspects_of_mlr3.knit.md [knitr engine]: writing results [knitr engine]: exiting Error: The operation was canceled. for the other chapters when rendering to pdf, the output was |............................................| 100% output file: preprocessing.knit.md [knitr engine]: writing results [knitr engine]: exiting [knitr engine]: postprocess [knitr engine]: writing results [knitr engine]: exiting --> something with the postprocessing went wrong and the bug was identified to be in the technical chapter The problem was **NOT** the large-scale benchmarking chapter quarto-dev/quarto-actions#45
For a while, we have suffered from random timeouts when rendering the book. The log-output from GHA during the "Render book" stage just ended with: |.............................................| 100% output file: advanced_technical_aspects_of_mlr3.knit.md Error: The operation was canceled. after hitting the maximum runtime of 6 hours. (the success rate was around 50/50). When rendering with the --execute-debug flag more log-output was given. The log output at the end of rendering the technical chapter (pdf) was: | |.............................................| 100% output file: advanced_technical_aspects_of_mlr3.knit.md [knitr engine]: writing results [knitr engine]: exiting Error: The operation was canceled. for the other chapters when rendering to pdf, the output was |............................................| 100% output file: preprocessing.knit.md [knitr engine]: writing results [knitr engine]: exiting [knitr engine]: postprocess [knitr engine]: writing results [knitr engine]: exiting --> something with the postprocessing went wrong and the bug was identified to be in the technical chapter The problem was NOT the large-scale benchmarking chapter quarto-dev/quarto-actions#45 (comment)
we need to deal with initial issue considering new chrome update #45
Initially done because of quarto-dev/quarto-actions#45
FWIW chrome update is causing some problem, and the previous "fix" discussed above (#45 (comment)) is creating issues (it is handing the workflow). It seems using chrome headless in CI is not that easy. I am going to probably revert the change of adding the above line by default in the render action. It is causing hanging in all actions using render currently. |
…uild path doesn't support it right now. quarto-dev/quarto-actions#45
Occasionally,
quarto-actions/render@v2
fails to run, causingERROR: Couldn't find open server
. I've only experienced this problem when rendering to PDF, HTML output seems fine.Has anyone experienced it?
Tested on:
Ubuntu 20.04 and 22.04
Quarto: latest version (1.1.189) and fixing one (1.0.37)
Workflow example:
This is the output of all Quarto actions of the workflow when fails:
The text was updated successfully, but these errors were encountered: