Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion about use of https://web.lcrc.anl.gov/ in CI #900

Open
xylar opened this issue Nov 21, 2024 · 16 comments · May be fixed by #901
Open

Discussion about use of https://web.lcrc.anl.gov/ in CI #900

xylar opened this issue Nov 21, 2024 · 16 comments · May be fixed by #901

Comments

@xylar
Copy link
Contributor

xylar commented Nov 21, 2024

We have been getting downloads throttled by LCRC on https://web.lcrc.anl.gov/ because we use too much bandwidth. This is affecting research.

We should seek alternatives to https://web.lcrc.anl.gov/ in our CI, e.g.:

"https://web.lcrc.anl.gov/public/e3sm/e3sm_diags_test_data/integration",

@xylar
Copy link
Contributor Author

xylar commented Nov 21, 2024

@mahf708, do you have suggestions (e.g. containers) that we could use for e3sm_diags instead of downloading directly from the LCRC server?

@xylar
Copy link
Contributor Author

xylar commented Nov 21, 2024

We are seeing time-outs in conda-forge/e3sm_diags-feedstock#38, which are likely to cause ongoing trouble building conda packages.

@xylar
Copy link
Contributor Author

xylar commented Nov 21, 2024

I was able to get CI to pass on conda-forge/e3sm_diags-feedstock#38 after restarting 3 times.

@xylar
Copy link
Contributor Author

xylar commented Nov 21, 2024

After 8 attempts, I was finally able to get the conda package to build. Needless to say, this is not sustainable.

https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=1086951&view=logs&j=c1df603f-8689-50eb-30e7-21f597a4c2a3&t=e484c5e5-b8bf-5dcd-d19a-a6e998d41ead

@mahf708
Copy link
Contributor

mahf708 commented Nov 21, 2024

I will write a more detailed comment, but I think we should use a container. I can make one for e3sm diags like the others I made for testing in https://github.com/E3SM-Project/containers

@mahf708
Copy link
Contributor

mahf708 commented Nov 21, 2024

It's been on my list of todos to get a generic conda container that has some of our data from the servers...

I disabled two workflows (scream defaults and mkatmsrf...) because of this very reason

@chengzhuzhang
Copy link
Contributor

@xylar yes, this now became an outstanding issue, and we should find alternatives for hosting data needed for CI. Does mpas-analysis has a similar need or it is handled differently?

@mahf708 it looks like the container repo, you already have codes for data from input data directory, it sounds like we can just mimic it to add the diagnostics data.

@xylar
Copy link
Contributor Author

xylar commented Nov 21, 2024

@chengzhuzhang, this issue doesn't affect MPAS-Analysis because we don't try to do anything so sophisticated in CI. I still run tests manually on Chrysalis as needed.

@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Nov 21, 2024

I had a inclination that running the GH Actions build with Python 3.9-3.12 while simultaneously downloading the same data for each run would throttle LCRC. We can make GH Actions only run when a PR is marked as ready for review if a short-term solution is needed. A possible alternative solution that was mentioned before is to cache the diagnostic data on GitHub Actions, then updating the cache if updated diags data on LCRC is detected.

It looks like we still need a general solution for azure pipelines though.

@xylar
Copy link
Contributor Author

xylar commented Nov 21, 2024

I think we really need to make it forbidden to download files from LCRC in CI. It's badly affecting our ability to do other work.

@xylar
Copy link
Contributor Author

xylar commented Nov 21, 2024

So I think even if we allow it in fewer circumstances, it's still not good enough.

@mahf708
Copy link
Contributor

mahf708 commented Nov 21, 2024

I would like to make a container based on the official conda-forge miniconda container, then add the needed inputdata to it. I will put up a prototype on https://github.com/E3SM-Project/containers in the next few days (I need to collect info about the data needed)

@tomvothecoder
Copy link
Collaborator

@xylar Makes sense to me.

@mahf708 Let me know if you'd like me to test it out with e3sm_diags when it is ready.

@mahf708
Copy link
Contributor

mahf708 commented Nov 22, 2024

Can we get @rljacob to weigh in just in case he prefers something else?

Rob, should we institute a policy that none of our testing should be touching the inputdata server? I doubt it is the sole reason we are seeing issues, but who knows...

I am happy to streamline a few containers with everything we need, so that we have no reason to download stuff from the server

@mahf708 mahf708 linked a pull request Nov 23, 2024 that will close this issue
9 tasks
@mahf708
Copy link
Contributor

mahf708 commented Nov 23, 2024

A resolution is offered in #901

@rljacob
Copy link
Member

rljacob commented Nov 26, 2024

Yes the PR testing that might run several times a day on cloud resources should not be downloading lots of data. "lots" is somewhat subjective.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants