Simplify conda env yaml files and combine into single env #144

tomvothecoder · 2023-08-14T18:43:24Z

Summary of changes

Add prod.yml
- This file is a combination of proc.yml with pub.yml
- Add pip section with esgcet>=5.2.0
- Remove autocurator as a dependency
  - Replaced by esgcet>=5.2.0 with the -xarray flag
  - Closes pub.yml dependency autocurator is not maintained, only supports Linux and Python <=3.8 #143
- Update dependencies using looser constraints
Remove pub.yml and proc.yml
Remove ci.yml -- this repo has no GitHub Actions CI/CD workflows to run builds

Action Items

Build single datasm environment using prod.yml (works on my end -- @tomvothecoder), also works for @TonyB9000
Add or remove dependencies based on needs of datasm -- in progress (@TonyB9000)
Try publishing with esgcet>=5.2.0 and the -xarray command which replaces autocurator -- in progress (@TonyB9000)

- Combine `proc.yml` with `pub.yml` - Remove `autocurator` is a dependency - Update dependencies using looser constraints - Make `esgconfigparser >=1.0.0a1`

tomvothecoder

Hey @TonyB9000, I kickstarted some work to simplify the conda env yaml files. Please review the description and update the dependencies as needed. There are a few action items for you to complete.

tomvothecoder · 2023-08-14T18:53:51Z

conda-env/prod.yml

+dependencies:
+  # Base
+  # ==================
+  - python >=3.9
+  - pip
+  - distributed
+  - ipdb
+  - matplotlib
+  - netcdf4
+  - numpy >=1.23.0 # This version of numpy includes support for Python 3.11.
+  - pyyaml
+  - termcolor
+  - tqdm
+  - watchdog
+  - xarray >=2022.02.0 # This version of Xarray drops support for Python 3.8.


Packages are considered direct dependencies if they are directly imported by datasm, or some datasm module call requires an 'optional' dependency (e.g., xarray with the matplotlib optional dependency for xarray plotting).
Update this section by removing or adding any dependencies as needed.

tomvothecoder · 2023-08-14T18:54:18Z

conda-env/prod.yml

+  - pip:
+    - esgcet>=5.2.0


esgcet>=5.2.0 now includes an -xarray flag to replace autocurator. I removed autocurator as a result, which allowed me to combine proc.yml with pub.yml.

Wow - Thanks Tom! This is great. I have some 5/6 different environments (some for dev, some for pub), some customized for v1 or v2 Large Ensemble processing, some with/without e2c v1.10.0rc1 (nee rc2) - and it will be great to cut them down.

I have long-running CMIP6 jobs running (expected to complete around Sept 5) and just finished a publication run) so I'll need to create a new environment (or two) to test these - don't want to destabilize running stuff...

I need to think about how to test this (publish without actually publishing, etc) or else publish something small.

Wow - Thanks Tom! This is great. I have some 5/6 different environments (some for dev, some for pub), some customized for v1 or v2 Large Ensemble processing, some with/without e2c v1.10.0rc1 (nee rc2) - and it will be great to cut them down.

No problem Tony! I hope this optimizes your workflow so you don't need to manage many different conda environments.

Ideally, you should have just two environments: 1 for production and 1 for development.

The production environment should include official, stable releases of packages (no release candidates) since it is meant for production usage.

The development environment has more flexibility and can use package release candidates (e.g., e3sm_to_cmip=v1.10.rc1) to test datasm on.

This brings a good point that we might need a dev.yml to define the development environment.

For the custom environments, it will be harder to version control if you install packages manually without using the yaml file specs (as you are probably aware of by now). Also as I mentioned before, pip installing a local build of e3sm_to_cmip can cause issues too.

@tomvothecoder Given that there as never been a datasm "stable release", this would leave me with only a "dev" environment ...

The scope of datasm is so broad, no regression testing makes headway before operational exigencies demand changes to accommodate new data irregularities. Not to mention - I have scores of big jobs queued up that are pushing against slurm "PENDING (resources)". I suppose I need to develop a "non-slurm" test regime.

@tomvothecoder Although the prod.yml includes

- cwltool >=3.1.20220202173120

when I build a prod environment and pip install datasm (and e3sm_to_cmip, FWIW), datasm postprocess fails with

/var/spool/slurmd/job210469/slurm_script: line 7: cwltool: command not found

The environment I built shows (mamba list):

cryptography 41.0.3 py310h75e40e8_0 conda-forge curl 8.2.1 hca28451_0 conda-forge cwl-upgrader 1.2.8 pyhd8ed1ab_0 conda-forge cwl-utils 0.28 pyh1d7be83_0 conda-forge cwlformat 2022.02.18 pyhd8ed1ab_0 conda-forge cycler 0.11.0 pyhd8ed1ab_0 conda-forge

(not that I ever looked for cwltool before, so I don't know what to expect.)

@tomvothecoder This could be my fault (naturally...). I changed "- default" to "- nodefault", having read a comment about that. Now that I've changed it back and rebuilt, cwltool appears... magic!

That's weird... cwltool only exists in conda-forge (i.e., nodefaults shouldn't impact it ...) ... hmmm 🤔

https://anaconda.org/search?q=cwltool

Could be superstition on my part. Other intervening changes may have occurred. I should create another env with "nodefaults" and check again. Maybe the (acme1) base environment is different. (too many variables...)

I wouldn't worry about it --- whatever gets the job done! 😄

chengzhuzhang

Glad that the recipes can be consolidated into one. The changes look good to me based on a visual check. We will need Tony's confirmation with a test.

conda-env/prod.yml

Co-authored-by: Tom Vo <[email protected]>

TonyB9000 · 2023-08-16T17:32:38Z

@chengzhuzhang Do you feel it is OK to test the resulting "prod" environment with a publication? Nothing (at present) is publication-authorized, and (AFAIK) the only way to test egspublish is to publish ;). I can retract/delete an E3SM set (and only retract a CMIP6 set).

@tomvothecoder

I have built a new environment based upon the prod.yml. It includes "- e3sm_to_cmip >=1.9.1", but issuing

e3sm_to_cmip --version

produces

Traceback (most recent call last):
  File "/home/bartoletti1/mambaforge/envs/dsm_prod/bin/e3sm_to_cmip", line 6, in <module>
    from e3sm_to_cmip.__main__ import main
  File "/home/bartoletti1/mambaforge/envs/dsm_prod/lib/python3.10/site-packages/e3sm_to_cmip/__main__.py", line 35, in <module>
    np.warnings.filterwarnings("ignore")
  File "/home/bartoletti1/mambaforge/envs/dsm_prod/lib/python3.10/site-packages/numpy/__init__.py", line 328, in __getattr__
    raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'warnings'. Did you mean: 'hanning'?

Also, "pip install ."(datasm) returns:

ERROR: Package 'datasm' requires a different Python: 3.10.12 not in '<3.10,>=3.8' (how do I investigate?)

Finally, if I "pip install ."(e3sm_to_cmip), to obtain v1.10.0rc1, the e2c version works:

Successfully installed e3sm-to-cmip-1.10.0rc1
(dsm_prod) -bash-4.2$ e3sm_to_cmip --version
e3sm_to_cmip 1.10.0rc1

mahf708 · 2023-08-16T18:04:48Z

conda-env/prod.yml

+name: datasm_prod
+channels:
+  - conda-forge
+  - defaults


We should probably do nodefaults here and across the entire E3SM ecosystem, but I would defer to Xylar to make a recommendation.

https://conda-forge.org/docs/user/tipsandtricks.html#using-multiple-channels

Good idea. I need to start replacing defaults with nodefaults across the many conda env yaml files in E3SM repos.

chengzhuzhang · 2023-08-16T18:11:54Z

@chengzhuzhang Do you feel it is OK to test the resulting "prod" environment with a publication? Nothing (at present) is publication-authorized, and (AFAIK) the only way to test egspublish is to publish ;). I can retract/delete an E3SM set (and only retract a CMIP6 set).

Finally, if I "pip install ."(e3sm_to_cmip), to obtain v1.10.0rc1, the e2c version works:

Successfully installed e3sm-to-cmip-1.10.0rc1 (dsm_prod) -bash-4.2$ e3sm_to_cmip --version e3sm_to_cmip 1.10.0rc1

Yes, I think you can test publishing an ensemble from v1 ssp375, because we are only waiting for the final 4 ensembles to be processed, so that we can push all ensembles together. Test publishing already cmorized data should be fine and you don't need to retract.

v1.10.0rc1 fixed the deprecated numpy.warning, though as @tomvothecoder suggested we need to be careful with the fix, see here E3SM-Project/e3sm_to_cmip#205

TonyB9000 · 2023-08-16T19:08:36Z

@tomvothecoder @chengzhuzhang

Datasm has a "setup.py" (written by Sterling I suppose) that I have never knowingly invoked, and it appears to be the source of the error Python: 3.10.12 not in '<3.10,>=3.8'

setup(
    author="datasm developers",
    python_requires=">=3.8,<3.10",
    classifiers=[
        "Development Status :: 3 - Alpha",
        "Intended Audience :: Developers",
        "License :: OSI Approved :: MIT License",
        "Natural Language :: English",
        "Programming Language :: Python :: 3",
        "Programming Language :: Python :: 3.8",
    ],
    description="A cli utility to automate complex nested workflows for publishing E3SM data to ESGF.",
    license="MIT License",
    include_package_data=True,
    keywords=["datasm"],
    name="datasm",
    packages=find_packages(include=["datasm", "datasm.*"]),
    package_dir={"datasm": "datasm"},
    entry_points={
        "console_scripts": [
            "datasm = datasm.__main__:main",
        ],
    },
    test_suite="tests",
    # TODO: Update this URL once the repo is renamed.
    url="https://github.com/E3SM-Project/esgfpub",
    version="0.1.0",
    zip_safe=False,
    cmdclass={
        "clean": CleanCommand,
    },
)

I assume pip reads this file. (# TODO -lol)

Can I simply change the line python_requires=">=3.8,<3.10", or must I recalculate these limits somehow?

tomvothecoder · 2023-08-17T22:22:58Z

@tomvothecoder @chengzhuzhang

Datasm has a "setup.py" (written by Sterling I suppose) that I have never knowingly invoked, and it appears to be the source of the error Python: 3.10.12 not in '<3.10,>=3.8'

setup(
    author="datasm developers",
    python_requires=">=3.8,<3.10",
    classifiers=[
        "Development Status :: 3 - Alpha",
        "Intended Audience :: Developers",
        "License :: OSI Approved :: MIT License",
        "Natural Language :: English",
        "Programming Language :: Python :: 3",
        "Programming Language :: Python :: 3.8",
    ],
    description="A cli utility to automate complex nested workflows for publishing E3SM data to ESGF.",
    license="MIT License",
    include_package_data=True,
    keywords=["datasm"],
    name="datasm",
    packages=find_packages(include=["datasm", "datasm.*"]),
    package_dir={"datasm": "datasm"},
    entry_points={
        "console_scripts": [
            "datasm = datasm.__main__:main",
        ],
    },
    test_suite="tests",
    # TODO: Update this URL once the repo is renamed.
    url="https://github.com/E3SM-Project/esgfpub",
    version="0.1.0",
    zip_safe=False,
    cmdclass={
        "clean": CleanCommand,
    },
)

I added a setup.py awhile ago for the chance that we make datasm a Python package hosted on Anaconda. I don't think we are going down that path though.

I assume pip reads this file. (# TODO -lol)

Yes, pip reads this file when you invoke pip install . (Anaconda also uses this file to build conda packages).

Can I simply change the line python_requires=">=3.8,<3.10", or must I recalculate these limits somehow?

I set the constraint to python_requires=">=3.8,<3.10" based on the limitations of the conda proc.yml and pub.yml environments.

The line should be updated to python_requires=">=3.9". The new conda environment (prod.yml) has much more flexibility with the Python version now. We want Python >=3.9 because Python 3.8 is being phased out.

TonyB9000 · 2023-08-18T00:10:53Z

@tomvothecoder Thanks Tom! I've managed a conda/mamba build (and datasm_install) having made the adjustments. (Full-up testing is another matter, but will require a bit more time.) ASIDE: When I recently did "pip install", I added "-vvv" to get more information, and was surprised to see a syntax error in a module that had long ago been "git removed". I poked around and found that "build" and "datasm-egg-info" listed old files. What process is supposed to update these?

tomvothecoder · 2023-08-18T16:19:25Z

@tomvothecoder Thanks Tom! I've managed a conda/mamba build (and datasm_install) having made the adjustments. (Full-up testing is another matter, but will require a bit more time.) ASIDE: When I recently did "pip install", I added "-vvv" to get more information, and was surprised to see a syntax error in a module that had long ago been "git removed". I poked around and found that "build" and "datasm-egg-info" listed old files. What process is supposed to update these?

I have not encountered the issue of old files being present in those directories so I'm not exactly sure how they are handled. I think they usually get overwritten after running pip install ..

Are those directories in the root of the datasm repo and generated after running pip install .? If so, you can probably safely delete them.

tomvothecoder · 2023-09-18T20:27:40Z

From email chain 9/11/23:

Tony received this error when building and using the prod.yml file:

Traceback (most recent call last):
  File "/home/bartoletti1/mambaforge/envs/dsm_prod_test2/bin/esgpublish", line 8, in <module>
    sys.exit(main())
  File "/home/bartoletti1/mambaforge/envs/dsm_prod_test2/lib/python3.10/site-packages/esgcet/pub_internal.py", line 83, in main
    pub = pub_args.get_args()
  File "/home/bartoletti1/mambaforge/envs/dsm_prod_test2/lib/python3.10/site-packages/esgcet/args.py", line 46, in get_args
    parser.add_argument("--version", action="version", version=f"esgpublish v{esgcet.__version__}",help="Print the version and exit")
NameError: name 'esgcet' is not defined
 
For what its worth,  my environment (dsm_prod_test2) lists this esgcet:
 
(dsm_prod_test2) -bash-4.2$ mamba list | grep esgcet
esgcet                    5.2.0                    pypi_0    pypi

Sasha replied that the esgcet/args.py is missing an import statement.

We are waiting on a new version of esgcet to be released from Sasha.

TonyB9000

Looks good to me.

tomvothecoder · 2023-10-26T16:12:31Z

Great, this is a big W and should make your life easier.

Thanks @TonyB9000.

TonyB9000 · 2023-10-26T16:21:15Z

@tomvothecoder Just a side note - during publication of the v2 Large Ensemble datasets, the publisher balked at the CFmon variables. Sasha recognized the problem and was to address it - but it necessitated reverting to the "pub-only" environment (older publisher), albeit only for the CFmon publications. I am ignorant of the range of specific checks that the publisher makes during publication.

chengzhuzhang · 2023-10-26T16:25:00Z

@TonyB9000 could you submit an issue to https://github.com/ESGF/esg-publisher so that this won't slip?

TonyB9000 · 2023-10-26T17:00:25Z

@chengzhuzhang Done (Issue #226)

tomvothecoder added 4 commits August 14, 2023 11:28

Simplify conda env yaml files

a5b5162

- Combine `proc.yml` with `pub.yml` - Remove `autocurator` is a dependency - Update dependencies using looser constraints - Make `esgconfigparser >=1.0.0a1`

Update name of env

d68d41e

Remove esgconfigparser

7d2c529

Remove ci.ym

f1217e6

tomvothecoder commented Aug 14, 2023

View reviewed changes

tomvothecoder changed the title ~~Simplify conda env yaml files~~ Simplify conda env yaml files and combine yaml files together Aug 14, 2023

tomvothecoder changed the title ~~Simplify conda env yaml files and combine yaml files together~~ Simplify conda env yaml files and combine into single env Aug 14, 2023

tomvothecoder requested review from TonyB9000 and chengzhuzhang August 14, 2023 19:07

tomvothecoder assigned TonyB9000 and tomvothecoder Aug 14, 2023

chengzhuzhang approved these changes Aug 14, 2023

View reviewed changes

tomvothecoder commented Aug 14, 2023

View reviewed changes

conda-env/prod.yml Outdated Show resolved Hide resolved

Update conda-env/prod.yml

00299aa

Co-authored-by: Tom Vo <[email protected]>

mahf708 reviewed Aug 16, 2023

View reviewed changes

tomvothecoder marked this pull request as ready for review September 18, 2023 20:44

TonyB9000 approved these changes Oct 25, 2023

View reviewed changes

tomvothecoder merged commit ff5044f into master Oct 26, 2023

tomvothecoder deleted the devops/143-simplify-conda-envs branch October 26, 2023 16:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify conda env yaml files and combine into single env #144

Simplify conda env yaml files and combine into single env #144

tomvothecoder commented Aug 14, 2023 •

edited

Loading

tomvothecoder left a comment

tomvothecoder Aug 14, 2023 •

edited

Loading

tomvothecoder Aug 14, 2023 •

edited

Loading

TonyB9000 Aug 14, 2023

tomvothecoder Aug 17, 2023

TonyB9000 Aug 18, 2023 •

edited

Loading

TonyB9000 Aug 25, 2023

TonyB9000 Aug 25, 2023

mahf708 Aug 25, 2023 •

edited

Loading

TonyB9000 Aug 25, 2023 •

edited

Loading

mahf708 Aug 25, 2023

chengzhuzhang left a comment

TonyB9000 commented Aug 16, 2023

mahf708 Aug 16, 2023

tomvothecoder Aug 17, 2023

chengzhuzhang commented Aug 16, 2023

TonyB9000 commented Aug 16, 2023

tomvothecoder commented Aug 17, 2023

TonyB9000 commented Aug 18, 2023

tomvothecoder commented Aug 18, 2023

tomvothecoder commented Sep 18, 2023 •

edited

Loading

TonyB9000 left a comment

tomvothecoder commented Oct 26, 2023

TonyB9000 commented Oct 26, 2023 •

edited

Loading

chengzhuzhang commented Oct 26, 2023

TonyB9000 commented Oct 26, 2023

Simplify conda env yaml files and combine into single env #144

Simplify conda env yaml files and combine into single env #144

Conversation

tomvothecoder commented Aug 14, 2023 • edited Loading

tomvothecoder left a comment

Choose a reason for hiding this comment

tomvothecoder Aug 14, 2023 • edited Loading

Choose a reason for hiding this comment

tomvothecoder Aug 14, 2023 • edited Loading

Choose a reason for hiding this comment

TonyB9000 Aug 14, 2023

Choose a reason for hiding this comment

tomvothecoder Aug 17, 2023

Choose a reason for hiding this comment

TonyB9000 Aug 18, 2023 • edited Loading

Choose a reason for hiding this comment

TonyB9000 Aug 25, 2023

Choose a reason for hiding this comment

TonyB9000 Aug 25, 2023

Choose a reason for hiding this comment

mahf708 Aug 25, 2023 • edited Loading

Choose a reason for hiding this comment

TonyB9000 Aug 25, 2023 • edited Loading

Choose a reason for hiding this comment

mahf708 Aug 25, 2023

Choose a reason for hiding this comment

chengzhuzhang left a comment

Choose a reason for hiding this comment

TonyB9000 commented Aug 16, 2023

mahf708 Aug 16, 2023

Choose a reason for hiding this comment

tomvothecoder Aug 17, 2023

Choose a reason for hiding this comment

chengzhuzhang commented Aug 16, 2023

TonyB9000 commented Aug 16, 2023

tomvothecoder commented Aug 17, 2023

TonyB9000 commented Aug 18, 2023

tomvothecoder commented Aug 18, 2023

tomvothecoder commented Sep 18, 2023 • edited Loading

TonyB9000 left a comment

Choose a reason for hiding this comment

tomvothecoder commented Oct 26, 2023

TonyB9000 commented Oct 26, 2023 • edited Loading

chengzhuzhang commented Oct 26, 2023

TonyB9000 commented Oct 26, 2023

tomvothecoder commented Aug 14, 2023 •

edited

Loading

tomvothecoder Aug 14, 2023 •

edited

Loading

tomvothecoder Aug 14, 2023 •

edited

Loading

TonyB9000 Aug 18, 2023 •

edited

Loading

mahf708 Aug 25, 2023 •

edited

Loading

TonyB9000 Aug 25, 2023 •

edited

Loading

tomvothecoder commented Sep 18, 2023 •

edited

Loading

TonyB9000 commented Oct 26, 2023 •

edited

Loading