Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python3.10 builds are failing #357

Open
Zeitsperre opened this issue Apr 23, 2024 · 12 comments
Open

Python3.10 builds are failing #357

Zeitsperre opened this issue Apr 23, 2024 · 12 comments

Comments

@Zeitsperre
Copy link
Collaborator

  • RavenPy version: 0.14.0
  • Python version: 3.10
  • Operating System: Any

I'm not certain why, but all builds of RavenPy running under Python3.10 seem to be failing for a handful of the same tests, those being:

  • test_hindcasting.py::TestHindcasting::test_climpred_hindcast_verif, and
  • test_calibration.py::test_spotpy_calibration[SACSMA]

This might be due to climpred or another library. Will see what can be done.

@tlvu
Copy link
Collaborator

tlvu commented Apr 24, 2024

Jenkins also fail against the latest Jupyter env is also timing out for me, wonder if it's related. Here is the conda env export change with a previous build that still work: Ouranosinc/PAVICS-e2e-workflow-tests@2f8c450

17:50:13  RavenPy-master/docs/notebooks/11_Climatological_ESP_forecasting.ipynb Fx [ 54%]
17:50:13  xxxxxx                                                                   [ 57%]
18:23:36  RavenPy-master/docs/notebooks/12_Performing_hindcasting_experiments.ipynb F [ 57%]
18:23:36  xxxxxxx                                                                  [ 60%]
18:23:36  RavenPy-master/docs/notebooks/Assess_probabilistic_flood_risk.ipynb .... [ 62%]
18:57:13  Fxxx                                                                     [ 64%]
19:06:06  Cancelling nested steps due to timeout

@tlvu
Copy link
Collaborator

tlvu commented Apr 24, 2024

Here is the conda env export change with a previous build that still work: Ouranosinc/PAVICS-e2e-workflow-tests@2f8c450

Potential suspects:

- pydantic=2.6.4=pyhd8ed1ab_0
+ pydantic=2.7.0=pyhd8ed1ab_0

- rioxarray=0.15.3=pyhd8ed1ab_0
+ rioxarray=0.15.4=pyhd8ed1ab_0

- xesmf=0.8.4=pyhd8ed1ab_1
+ xesmf=0.8.5=pyhd8ed1ab_0

@Zeitsperre
Copy link
Collaborator Author

@tlvu Do you notice this only happening for Python 3.10?

@tlvu
Copy link
Collaborator

tlvu commented Apr 24, 2024

@tlvu Do you notice this only happening for Python 3.10?

I only noticed the failure simply because the upcoming Jupyter env will be python 3.10, instead of 3.9. I do not run Jenkins on multiples flavors of Python.

Note it starts failing only since my most recently build. All previous python 3.10 builds were working fine.

@tlvu
Copy link
Collaborator

tlvu commented Apr 24, 2024

Here is the conda env export change with a previous build that still work: Ouranosinc/PAVICS-e2e-workflow-tests@2f8c450

Potential suspects:

- pydantic=2.6.4=pyhd8ed1ab_0
+ pydantic=2.7.0=pyhd8ed1ab_0

- rioxarray=0.15.3=pyhd8ed1ab_0
+ rioxarray=0.15.4=pyhd8ed1ab_0

- xesmf=0.8.4=pyhd8ed1ab_1
+ xesmf=0.8.5=pyhd8ed1ab_0

It does not look like one of those package. I took the bad image py310-240419 and I downgrade each of those 3 separately. Ran Jenkins separately, and they all fail (hang).

But interractively in the JupyterLab env, everything just works. Very weird and not helpful. Don't have any specific error to search.

@Zeitsperre
Copy link
Collaborator Author

It's quite frustrating. I can't figure it out either. I've disabled one particularly flaky test in #358 and tried to prevent read access problems, but the issue remains. climpred might be what's culpable here.

@tlvu
Copy link
Collaborator

tlvu commented Apr 24, 2024

I am getting something here. Running only the hanging notebook in Jenkins, I got Kernel died while it tries to import the various modules at the beginning of the notebook !!! This is so weird.

15:59:09  =================================== FAILURES ===================================
15:59:09  _ RavenPy-master/docs/notebooks/11_Climatological_ESP_forecasting.ipynb::Cell 0 _
15:59:09  Notebook cell execution failed
15:59:09  Cell 0: Timeout of 2000 seconds exceeded while executing cell. Failed to interrupt kernel in 5 seconds, so failing without traceback.
15:59:09  
15:59:09  Input:
15:59:09  import datetime as dt
15:59:09  
15:59:09  from matplotlib import pyplot as plt
15:59:09  
15:59:09  from ravenpy.config import commands as rc
15:59:09  from ravenpy.config.emulators import GR4JCN
15:59:09  from ravenpy.utilities import forecasting
15:59:09  from ravenpy.utilities.testdata import get_file
15:59:09  
15:59:09  =========================== short test summary info ============================
15:59:09  FAILED RavenPy-master/docs/notebooks/11_Climatological_ESP_forecasting.ipynb::Cell 0
15:59:09  ================== 1 failed, 7 xfailed in 2006.10s (0:33:26) ===================
15:59:09  + EXIT_CODE=1
15:59:09  + tr [:upper:] [:lower:]
15:59:09  + echo true
15:59:09  + SAVE_RESULTING_NOTEBOOK=true
15:59:09  + [ xtrue = xtrue ]
15:59:09  + mkdir -p buildout
15:59:09  + basename RavenPy-master/docs/notebooks/11_Climatological_ESP_forecasting.ipynb
15:59:09  + filename=11_Climatological_ESP_forecasting.ipynb
15:59:09  + echo 11_Climatological_ESP_forecasting.ipynb
15:59:09  + sed s/.ipynb$//
15:59:09  + filename=11_Climatological_ESP_forecasting
15:59:09  + [ -e buildout/11_Climatological_ESP_forecasting.output.ipynb ]
15:59:09  + jupyter nbconvert --to notebook --execute --ExecutePreprocessor.timeout=600 --allow-errors --output-dir buildout --output 11_Climatological_ESP_forecasting.output.ipynb RavenPy-master/docs/notebooks/11_Climatological_ESP_forecasting.ipynb
15:59:09  [NbConvertApp] Converting notebook RavenPy-master/docs/notebooks/11_Climatological_ESP_forecasting.ipynb to notebook
15:59:19  [NbConvertApp] ERROR | Kernel died while waiting for execute reply.
15:59:19  Traceback (most recent call last):
15:59:19    File "/opt/conda/envs/birdy/bin/jupyter-nbconvert", line 10, in <module>
15:59:19      sys.exit(main())
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/jupyter_core/application.py", line 283, in launch_instance
15:59:19      super().launch_instance(argv=argv, **kwargs)
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/traitlets/config/application.py", line 1075, in launch_instance
15:59:19      app.start()
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/nbconvert/nbconvertapp.py", line 420, in start
15:59:19      self.convert_notebooks()
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/nbconvert/nbconvertapp.py", line 597, in convert_notebooks
15:59:19      self.convert_single_notebook(notebook_filename)
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/nbconvert/nbconvertapp.py", line 563, in convert_single_notebook
15:59:19      output, resources = self.export_single_notebook(
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/nbconvert/nbconvertapp.py", line 487, in export_single_notebook
15:59:19      output, resources = self.exporter.from_filename(
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/nbconvert/exporters/exporter.py", line 201, in from_filename
15:59:19      return self.from_file(f, resources=resources, **kw)
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/nbconvert/exporters/exporter.py", line 220, in from_file
15:59:19      return self.from_notebook_node(
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/nbconvert/exporters/notebook.py", line 36, in from_notebook_node
15:59:19      nb_copy, resources = super().from_notebook_node(nb, resources, **kw)
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/nbconvert/exporters/exporter.py", line 154, in from_notebook_node
15:59:19      nb_copy, resources = self._preprocess(nb_copy, resources)
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/nbconvert/exporters/exporter.py", line 353, in _preprocess
15:59:19      nbc, resc = preprocessor(nbc, resc)
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/nbconvert/preprocessors/base.py", line 48, in __call__
15:59:19      return self.preprocess(nb, resources)
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/nbconvert/preprocessors/execute.py", line 102, in preprocess
15:59:19      self.preprocess_cell(cell, resources, index)
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/nbconvert/preprocessors/execute.py", line 123, in preprocess_cell
15:59:19      cell = self.execute_cell(cell, index, store_history=True)
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/jupyter_core/utils/__init__.py", line 165, in wrapped
15:59:19      return loop.run_until_complete(inner)
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
15:59:19      return future.result()
15:59:19    File "/opt/conda/envs/birdy/lib/python3.10/site-packages/nbclient/client.py", line 1005, in async_execute_cell
15:59:19      raise DeadKernelError("Kernel died") from None
15:59:19  nbclient.exceptions.DeadKernelError: Kernel died

@Zeitsperre
Copy link
Collaborator Author

That's essentially the error that we often see when running some of these tests. Look through the failing builds on GitHub, and you'll see that kernels often die randomly. I can't explain it other than my suspicion that it has to do with illegal memory access from an unsafe data read operation.

@tlvu
Copy link
Collaborator

tlvu commented Apr 24, 2024

That's essentially the error that we often see when running some of these tests. Look through the failing builds on GitHub, and you'll see that kernels often die randomly. I can't explain it other than my suspicion that it has to do with illegal memory access from an unsafe data read operation.

For Jenkins, this is not random at all. With the latest build, Kernel died all the time, 100% reproducible !!! All previous builds Kernel do not die. Notebook code did not change so if it is illegal memory access, then something changed somewhere in the environment, not in the notebook code.

@tlvu
Copy link
Collaborator

tlvu commented Apr 24, 2024

That's essentially the error that we often see when running some of these tests. Look through the failing builds on GitHub, and you'll see that kernels often die randomly. I can't explain it other than my suspicion that it has to do with illegal memory access from an unsafe data read operation.

For Jenkins, this is not random at all. With the latest build, Kernel died all the time, 100% reproducible !!! All previous builds Kernel do not die. Notebook code did not change so if it is illegal memory access, then something changed somewhere in the environment, not in the notebook code.

Moreover, it seems to die during the import at the beginning of the notebook. It would be pretty weird to have illegal memory access at import time !

Also why no illegal memory access if run interractively on JupyterLab?!

Everything is so weird !

@tlvu
Copy link
Collaborator

tlvu commented Apr 26, 2024

FYI, the Beta env on PAVICS has the broken env py310-240419 and the Gamma has the working env py310-240411.

@tlvu
Copy link
Collaborator

tlvu commented May 8, 2024

Just a note that for the Jupyter env, we moved to python 3.11 and the hanging error is gone !

tlvu added a commit to Ouranosinc/PAVICS-e2e-workflow-tests that referenced this issue May 9, 2024
…t xclim and ravenpy to smooth transition (#121)

# Overview

This new full build has latest of almost everything except `xclim` and
`ravenpy` as intermediate step to smooth transition to `pandas` 2.2 freq
strings changes.

## Changes

- New: save conda env export, DockerHub build logs and Jenkins test
result in the repo to track changes much more easily between releases

- Jenkins: add `SAVE_RESULTING_NOTEBOOK_TIMEOUT` for slow notebooks or
slow machine

- Jupyter env changes:
- add `conda-pack` so we can export the conda env outside of the docker
image if need to run locally without docker
  - upgrade from Python 3.9 to 3.11
  - Relevant changes (alphabetical order):
```diff
-  - birdy=0.8.4=pyh1a96a4e_0
+      - birdhouse-birdy==0.8.7

# major upgrade from v2 to v3
-  - bokeh=2.4.3=pyhd8ed1ab_3
+  - bokeh=3.4.1=pyhd8ed1ab_0

-  - cartopy=0.21.1=py39h6e7ad6e_0
+  - cartopy=0.23.0=py311h320fe9a_0

-  - cf_xarray=0.8.0=pyhd8ed1ab_0
+  - cf_xarray=0.9.0=pyhd8ed1ab_0

-  - cfgrib=0.9.10.4=pyhd8ed1ab_0
+  - cfgrib=0.9.11.0=pyhd8ed1ab_0

-  - cftime=1.6.2=py39h2ae25f5_1
+  - cftime=1.6.3=py311h1f0f07a_0

-  - climpred=2.3.0=pyhd8ed1ab_0
+  - climpred=2.4.0=pyhd8ed1ab_0

-  - clisops=0.9.6=pyh1a96a4e_0
+  - clisops=0.13.0=pyhca7485f_0

-  - dask=2023.5.1=pyhd8ed1ab_0
+  - dask=2024.5.0=pyhd8ed1ab_0

-  - geopandas=0.13.0=pyhd8ed1ab_0
+  - geopandas=0.14.4=pyhd8ed1ab_0

-  - hvplot=0.8.3=pyhd8ed1ab_0
+  - hvplot=0.9.2=pyhd8ed1ab_0

-  - numpy=1.23.5=py39h3d75532_0
+  - numpy=1.24.4=py311h64a7726_0

-  - numba=0.57.0=py39hb75a051_1
+  - numba=0.59.1=py311h96b013e_0

# major upgrade from v1 to v2
-  - pandas=1.3.5=py39hde0f152_0
+  - pandas=2.1.4=py311h320fe9a_0

# major upgrade to v1
-  - panel=0.14.4=pyhd8ed1ab_0
+  - panel=1.4.2=pyhd8ed1ab_0

# major upgrade from v1 to v2
-  - pydantic=1.10.8=py39hd1e30aa_0
+  - pydantic=2.7.1=pyhd8ed1ab_0

# Python 3.9 to 3.11
-  - python=3.9.16=h2782a2a_0_cpython
+  - python=3.11.6=hab00c5b_0_cpython

-  - raven-hydro=0.2.1=py39h8e2dbb5_1
+  - raven-hydro=0.2.4=py311h64a4d7b_0

-  - ravenpy=0.12.1=py39hf3d152e_0
+      - ravenpy==0.13.1

-  - rioxarray=0.14.1=pyhd8ed1ab_0
+  - rioxarray=0.15.5=pyhd8ed1ab_0

-  - roocs-utils=0.6.4=pyh1a96a4e_0
+  - roocs-utils=0.6.8=pyhd8ed1ab_0

-  - scipy=1.9.1=py39h8ba3f38_0
+  - scipy=1.13.0=py311h517d4fd_1

-  - xarray=2023.1.0=pyhd8ed1ab_0
+  - xarray=2023.8.0=pyhd8ed1ab_0

-  - xclim=0.43.0=py39hf3d152e_1
+  - xclim=0.47.0=py311h38be061_0

-  - xesmf=0.7.1=pyhd8ed1ab_0
+  - xesmf=0.8.5=pyhd8ed1ab_0

-  - xskillscore=0.0.24=pyhd8ed1ab_0
+  - xskillscore=0.0.26=pyhd8ed1ab_0

+  - xscen=0.8.2=pyhd8ed1ab_0

+      - figanos==0.3.0

-      - xncml==0.2
+      - xncml==0.4.0

```


## Test

- Deployed as "beta" image in production for bokeh visualization
performance regression testing.
- Manual test notebook
https://github.com/Ouranosinc/PAVICS-landing/blob/master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-5Visualization.ipynb
for bokeh visualization performance and it looks fine.
- Jenkins build:
- Default notebooks, all passed:
https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/blob/54792e6510adfcd1bb21e1bd31fdfa36c5c634e0/docker/saved_buildout/jenkins-buildlogs-default.txt
- Raven notebooks, only known `HydroShare_integration.ipynb` failing:
https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/blob/931cfc924a147d07b59e88badff9f170e852a03b/docker/saved_buildout/jenkins-buildlogs-raven.txt


## Related Issue / Discussion

- Matching notebook fixes:
  - Pavics-sdi: PR Ouranosinc/pavics-sdi#321
  - Finch: PR url: None
- PAVICS-landing: PR
Ouranosinc/PAVICS-landing#78
  - RavenPy: PR CSHS-CWRA/RavenPy#356
  - Resolves Ouranosinc/PAVICS-landing#65
  - Resolves Ouranosinc/PAVICS-landing#66

- Deployment to PAVICS:
bird-house/birdhouse-deploy#453

- Jenkins-config changes for new notebooks: PR url: None

- Other issues found while working on this one
  - computationalmodelling/nbval#204
  - jupyterlab-contrib/jupyter-archive#132
  - CSHS-CWRA/RavenPy#357
  - CSHS-CWRA/RavenPy#361
  - CSHS-CWRA/RavenPy#362

- Previous release: PR
#134


## Additional Information

Full diff conda env export:

81deb99...931cfc9#diff-e8f2a6a53085ae29bb7cedc701c1d345a330651ae971555e85a5c005e94f4cd9


Full new conda env export:

https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/blob/931cfc924a147d07b59e88badff9f170e852a03b/docker/saved_buildout/conda-env-export.yml


DockerHub build log

https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/blob/931cfc924a147d07b59e88badff9f170e852a03b/docker/saved_buildout/docker-buildlogs.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants