Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a minimal test of entire workflow #11

Open
timtroendle opened this issue Mar 24, 2021 · 11 comments
Open

Add a minimal test of entire workflow #11

timtroendle opened this issue Mar 24, 2021 · 11 comments

Comments

@timtroendle
Copy link
Member

As we are adding changes to this repo from time to time now, it would be good if there was a continuous integration test. For that, the workflow must be 100% automatic, and we should have a configuration that requires minimal downloads and minimal runtime. We can then use a simple GitHub action that runs Snakemake with this configuration (example).

@timtroendle
Copy link
Member Author

Here's a list of things that need to be solved so that we have a 100% automatic, low data, low runtime workflow test:

  • automatic download of ESM data]
  • automatic download of EEZ data]
  • automatic download of renewable.ninja data]
  • download of NUTS with lower resolution]
  • download of LAU with lower resolution]
  • download of SRTM data based on scope configuration]
  • download of GMTD data based on scope configuration]
  • download of WDPA data based on scope configuration (is this possible?)]
  • configurable geospatial resolution]

@brynpickering
Copy link
Member

Lovely idea, but I can't say I have any idea how it would be feasible... We could pre-package a bunch of datasets at lower res / smaller scope, but then we lose out on being able to test any of the workflow rules which act to access these datasets.

For WDPA, see #12 for a code snippet to automatically handle the constantly changing URL, but it looks like you can't choose to download only a section.

@timtroendle
Copy link
Member Author

It seems to be possible to cache downloads of GitHub actions (up to 5GB for up to a week). This may help.

@brynpickering
Copy link
Member

Or if we use Azure pipelines for CI, we get "unlimited" cache for up to a week: https://docs.microsoft.com/en-us/azure/devops/pipelines/release/caching?view=azure-devops

@timtroendle
Copy link
Member Author

Then again, I just ran into an error running a euro-calliope workflow built based on v7 of hydro stations with a cached download of v4.

If we were to cache downloads, we'd need to make sure the cache is wiped whenever necessary. I don't see a trivial way of doing so right now.

@timtroendle
Copy link
Member Author

In theory, we could also use our own test runners which would make caching easy. @suvayu, you mentioned using own runners in GitHub actions earlier. Do you have any idea how much work that would be? Also, do we have machines that we could use this way?

@suvayu
Copy link
Member

suvayu commented Apr 21, 2021

@timtroendle I think it is some amount of work (but not an unreasonable amount given the flexibility and control that you gain). These are the hurdles I see:

  1. we need install the runner application (doesn't look like requires admin privileges)
  2. we need the runner to be accessible from the Internet (no VPN)
  3. not clear about the environment setup, e.g. will workflows with the usual pip/conda actions continue to work unmodified (other than the small edit required to specify the runners)?

ETH IT can help with 1 & 2. For 1 doing it ourselves also doesn't seem difficult. I don't know if ETH security policy will come in the way of 2. 3, I have no idea, it seems it's either no work or quite a bit of work, no middle ground.

Docs

@suvayu
Copy link
Member

suvayu commented Apr 21, 2021

BTW, If someone can deal with the "getting resources & permissions from ETH IT" part, I volunteer for the rest ;)

@timtroendle
Copy link
Member Author

Thanks a lot @suvayu. I will see what I can do about 1. and 2. Can you clarify 3. a little more? What exactly could be the problem here?

@suvayu
Copy link
Member

suvayu commented Apr 22, 2021

steps:
  - uses: actions/checkout@v2
  - name: Set up Python ${{ matrix.python-version }}
    uses: actions/setup-python@v2
    with:
      python-version: ${{ matrix.python-version }}

When a workflow has a step like this, that actions/setup-python runs, and I don't know:

  1. how many of these actions would be supported by the runner application, I would guess while this one might be supported, but something like peaceiris/actions-gh-pages might not be. So probably there's some amount of work required to make sure they are separated into different jobs (most likely straightforward).
  2. for pip it might be simple, but for conda or docker, it might require local resources or setup.

@timtroendle
Copy link
Member Author

A short update: USYS IT is looking into this right now. If we are very lucky, we may be able to set this up ahead of the lost siblings sprint, which would be a major plus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants