Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify conda env yaml files and combine into single env #144

Merged
merged 5 commits into from
Oct 26, 2023
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 0 additions & 32 deletions conda-env/ci.yml

This file was deleted.

37 changes: 0 additions & 37 deletions conda-env/proc.yml

This file was deleted.

37 changes: 37 additions & 0 deletions conda-env/prod.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Conda production environment for datasm operations.
name: datasm_prod
channels:
- conda-forge
- e3sm
TonyB9000 marked this conversation as resolved.
Show resolved Hide resolved
- defaults
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably do nodefaults here and across the entire E3SM ecosystem, but I would defer to Xylar to make a recommendation.

https://conda-forge.org/docs/user/tipsandtricks.html#using-multiple-channels

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I need to start replacing defaults with nodefaults across the many conda env yaml files in E3SM repos.

dependencies:
# Base
# ==================
- python >=3.9
- pip
- distributed
- ipdb
- matplotlib
- netcdf4
- numpy >=1.23.0 # This version of numpy includes support for Python 3.11.
- pyyaml
- termcolor
- tqdm
- watchdog
- xarray >=2022.02.0 # This version of Xarray drops support for Python 3.8.
Comment on lines +6 to +20
Copy link
Collaborator Author

@tomvothecoder tomvothecoder Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Packages are considered direct dependencies if they are directly imported by datasm, or some datasm module call requires an 'optional' dependency (e.g., xarray with the matplotlib optional dependency for xarray plotting).
Update this section by removing or adding any dependencies as needed.

# Required for CWL workflows.
- cwltool >=3.1.20220202173120
- nodejs >=17.4.0
# Used in modules for `extract`, `validate`, and `postprocess` operations.
- nco >=5.1.3
- e3sm_to_cmip >=1.9.1
- zstash >=1.2.0
# Testing
# ==================
- pytest >=7.1.1
# Quality Assurance
# ==================
- black >=22.3.0
- pip:
- esgcet>=5.2.0
Comment on lines +34 to +35
Copy link
Collaborator Author

@tomvothecoder tomvothecoder Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

esgcet>=5.2.0 now includes an -xarray flag to replace autocurator. I removed autocurator as a result, which allowed me to combine proc.yml with pub.yml.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow - Thanks Tom! This is great. I have some 5/6 different environments (some for dev, some for pub), some customized for v1 or v2 Large Ensemble processing, some with/without e2c v1.10.0rc1 (nee rc2) - and it will be great to cut them down.

I have long-running CMIP6 jobs running (expected to complete around Sept 5) and just finished a publication run) so I'll need to create a new environment (or two) to test these - don't want to destabilize running stuff...

I need to think about how to test this (publish without actually publishing, etc) or else publish something small.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow - Thanks Tom! This is great. I have some 5/6 different environments (some for dev, some for pub), some customized for v1 or v2 Large Ensemble processing, some with/without e2c v1.10.0rc1 (nee rc2) - and it will be great to cut them down.

No problem Tony! I hope this optimizes your workflow so you don't need to manage many different conda environments.

Ideally, you should have just two environments: 1 for production and 1 for development.

  • The production environment should include official, stable releases of packages (no release candidates) since it is meant for production usage.
  • The development environment has more flexibility and can use package release candidates (e.g., e3sm_to_cmip=v1.10.rc1) to test datasm on.

This brings a good point that we might need a dev.yml to define the development environment.

For the custom environments, it will be harder to version control if you install packages manually without using the yaml file specs (as you are probably aware of by now). Also as I mentioned before, pip installing a local build of e3sm_to_cmip can cause issues too.

Copy link
Contributor

@TonyB9000 TonyB9000 Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomvothecoder Given that there as never been a datasm "stable release", this would leave me with only a "dev" environment ...

The scope of datasm is so broad, no regression testing makes headway before operational exigencies demand changes to accommodate new data irregularities. Not to mention - I have scores of big jobs queued up that are pushing against slurm "PENDING (resources)". I suppose I need to develop a "non-slurm" test regime.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomvothecoder Although the prod.yml includes

- cwltool >=3.1.20220202173120

when I build a prod environment and pip install datasm (and e3sm_to_cmip, FWIW), datasm postprocess fails with

/var/spool/slurmd/job210469/slurm_script: line 7: cwltool: command not found

The environment I built shows (mamba list):

cryptography              41.0.3          py310h75e40e8_0    conda-forge
curl                      8.2.1                hca28451_0    conda-forge
cwl-upgrader              1.2.8              pyhd8ed1ab_0    conda-forge
cwl-utils                 0.28               pyh1d7be83_0    conda-forge
cwlformat                 2022.02.18         pyhd8ed1ab_0    conda-forge
cycler                    0.11.0             pyhd8ed1ab_0    conda-forge

(not that I ever looked for cwltool before, so I don't know what to expect.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomvothecoder This could be my fault (naturally...). I changed "- default" to "- nodefault", having read a comment about that. Now that I've changed it back and rebuilt, cwltool appears... magic!

Copy link

@mahf708 mahf708 Aug 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's weird... cwltool only exists in conda-forge (i.e., nodefaults shouldn't impact it ...) ... hmmm 🤔

https://anaconda.org/search?q=cwltool

Copy link
Contributor

@TonyB9000 TonyB9000 Aug 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be superstition on my part. Other intervening changes may have occurred. I should create another env with "nodefaults" and check again. Maybe the (acme1) base environment is different. (too many variables...)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't worry about it --- whatever gets the job done! 😄

prefix: /opt/miniconda3/envs/datasm_prod
32 changes: 0 additions & 32 deletions conda-env/pub.yml

This file was deleted.