-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion about use of https://web.lcrc.anl.gov/ in CI #900
Comments
@mahf708, do you have suggestions (e.g. containers) that we could use for e3sm_diags instead of downloading directly from the LCRC server? |
We are seeing time-outs in conda-forge/e3sm_diags-feedstock#38, which are likely to cause ongoing trouble building conda packages. |
I was able to get CI to pass on conda-forge/e3sm_diags-feedstock#38 after restarting 3 times. |
After 8 attempts, I was finally able to get the conda package to build. Needless to say, this is not sustainable. |
I will write a more detailed comment, but I think we should use a container. I can make one for e3sm diags like the others I made for testing in https://github.com/E3SM-Project/containers |
It's been on my list of todos to get a generic conda container that has some of our data from the servers... I disabled two workflows (scream defaults and mkatmsrf...) because of this very reason |
@xylar yes, this now became an outstanding issue, and we should find alternatives for hosting data needed for CI. Does mpas-analysis has a similar need or it is handled differently? @mahf708 it looks like the container repo, you already have codes for data from input data directory, it sounds like we can just mimic it to add the diagnostics data. |
@chengzhuzhang, this issue doesn't affect MPAS-Analysis because we don't try to do anything so sophisticated in CI. I still run tests manually on Chrysalis as needed. |
I had a inclination that running the GH Actions build with Python 3.9-3.12 while simultaneously downloading the same data for each run would throttle LCRC. We can make GH Actions only run when a PR is marked as ready for review if a short-term solution is needed. A possible alternative solution that was mentioned before is to cache the diagnostic data on GitHub Actions, then updating the cache if updated diags data on LCRC is detected. It looks like we still need a general solution for azure pipelines though. |
I think we really need to make it forbidden to download files from LCRC in CI. It's badly affecting our ability to do other work. |
So I think even if we allow it in fewer circumstances, it's still not good enough. |
I would like to make a container based on the official conda-forge miniconda container, then add the needed inputdata to it. I will put up a prototype on https://github.com/E3SM-Project/containers in the next few days (I need to collect info about the data needed) |
Can we get @rljacob to weigh in just in case he prefers something else? Rob, should we institute a policy that none of our testing should be touching the inputdata server? I doubt it is the sole reason we are seeing issues, but who knows... I am happy to streamline a few containers with everything we need, so that we have no reason to download stuff from the server |
A resolution is offered in #901 |
Yes the PR testing that might run several times a day on cloud resources should not be downloading lots of data. "lots" is somewhat subjective. |
We have been getting downloads throttled by LCRC on https://web.lcrc.anl.gov/ because we use too much bandwidth. This is affecting research.
We should seek alternatives to https://web.lcrc.anl.gov/ in our CI, e.g.:
e3sm_diags/tests/integration/download_data.py
Line 89 in ca41b0e
The text was updated successfully, but these errors were encountered: