Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R workspaces need to support maap-py and python. #684

Closed
wildintellect opened this issue Feb 24, 2023 · 20 comments
Closed

R workspaces need to support maap-py and python. #684

wildintellect opened this issue Feb 24, 2023 · 20 comments
Assignees
Labels
ADE Algorithm Development Environment Subsystem
Milestone

Comments

@wildintellect
Copy link
Collaborator

wildintellect commented Feb 24, 2023

Is your feature request related to a problem? Please describe.
When using the R Stable workspace https://github.com/MAAP-Project/maap-workspaces/blob/develop/base_images/r/docker/Dockerfile
maap-py and python are not available inside the r-with-gdal environment. This causes issues when a user wants to mix Python and R across notebooks or to make user of maap-py from within R. We need to make sure the r-with-gdal env is usable as either Python or R, and consider moving all the packages to the base env.

Users are struggling to get into the correct environment and find the libraries they expect.

Adding r-reticulate to R installs will also allow users to utilize maap-py search and download tools from R directly. (Data team will help write documentation), this is key to using NASA data protected by EDL.

Describe the solution you'd like

  • maap-py, ipkernel and r-reticulate should be present in default environments (and all workspaces that support R)
  • (optional), install R into the base conda environment with the r-kernel so that mulitple conda environments are not required

Describe alternatives you've considered
Asking users to user a different workspace for Python and for R, this can be even more confusing.

Additional context
This is related to troubleshooting with Laura and Paromita recently with @bsatoriu

@gchang
Copy link
Collaborator

gchang commented Feb 28, 2023

Will create a new r-py workspace to have both R and python in the same base image.

@gchang gchang added the ADE Algorithm Development Environment Subsystem label Feb 28, 2023
@wildintellect
Copy link
Collaborator Author

This should become the new default R workspace (with R>4.0) we should retire the older R workspace option (or discourage it's use). Which if we're doing maybe we can revisit the default R packages too?

@gchang
Copy link
Collaborator

gchang commented Mar 22, 2023

I've updated the R workspace to build as part of figuring out the CI process. Let's discuss at tagup.

@grallewellyn
Copy link
Collaborator

grallewellyn commented Apr 12, 2023

New R workspace changes were merged into develop and now I can successfully create a workspace from R stable in the DIT environment. I pushed this image by using Marjorie's work around for the CI process being in flux by building the images locally and pushing them to the gitlab machines.
With this new workspace created, I was able to successfully test all the r packages we added. The version of python is 3.10.10 and the version of R is 4.2.3.
Also, running library("rgdal") gives a warning message

Please note that rgdal will be retired by the end of 2023,
plan transition to sf/stars/terra functions using GDAL and PROJ
at your earliest convenience.

However, I encountered a bug where my notebook testingRPackages.ipynb isn't opening correctly anymore
Screenshot 2023-04-10 at 11.12.27 AM.png
I can access this notebook in a Basic Stable workspace, but I cannot access this notebook in my R stable workspace despite changing the kernel to be Python 3 (ipykernel)
Here is what I am seeing from testing with other notebooks (with one as simple as `print("hello world"):

  1. Create a new workspace with an R kernel in a R stable workspace.
  2. You may need to run the first cell twice to get it to run
  3. Exit out of that jupyter notebook and stop and restart your current environment
  4. You may get this error message
    Screenshot 2023-04-12 at 10.09.49 AM.png
  5. Go to a different R stable workspace, and you will be able to open the notebook, but it looks like
    Screenshot 2023-04-10 at 11.12.27 AM.png
  6. Go to a basic stable workspace and you will be able to open the notebook fine

Doing Python 3 (ipykernel) in a R stable workspace, gives the same not valid JSON error, but when you switch to another R stable workspace, you can open the notebook with Python 3 (ipykernel) (but not an R kernel)

Does anyone have advice for looking into this issue more? Maybe using Kubernetes to look into some logs?
@gchang @marjo-luc

@wildintellect
Copy link
Collaborator Author

rgdal retirement is a known quantity. Can you point me to the list of default R libraries. We can make a pass at updating that with the UWG. sf, terra, raster should all be part of the default.

@grallewellyn
Copy link
Collaborator

By default libraries, do you mean what we are installing with the R base image? If so:

r==4.2 r-rgdal==1.5_32 r-sf==1.0_7 r-irkernel==1.3.2 r-gridExtra==2.3 \
    r-tidyverse==2.0.0 r-randomForest==4.7_1.1 r-raster==3.6_20 r-data.table==1.14.8 r-rlist==0.4.6.2 \
    r-gdalutils==2.0.3.2 r-stringr==1.5.0

@grallewellyn
Copy link
Collaborator

Update: I tested 5 notebooks this same way in the /home directory and 2 of them exhibited this behavior... I am not sure if we like those odds

For anyone else curious, there is a bug in the R stable workspace because we are using jupyterlab 3.6.1, but basic stable is using jlab 3.4.4 and appears to be working fine
I might have to change R stable back to using 3.4.4 to avoid this bug even though isc needs jlab v3.6.1, but I am hoping this PR gets merged soon

@gchang
Copy link
Collaborator

gchang commented May 1, 2023

UWG member also reported the ydoc issue in jlab 3.6.1 when trying to access notebook shared-buckets/abarenblitt/GEDI_Subsetting.ipynb. Docker logs show:

[E 2023-05-01 15:35:59.266 ServerApp] Uncaught exception GET /api/yjs/json:notebook:2350bbc9-5a1c-4fdd-afff-7056e7def86c (10.1.177.74)
    HTTPServerRequest(protocol='http', host='ade.maap-project.org', method='GET', uri='/api/yjs/json:notebook:2350bbc9-5a1c-4fdd-afff-7056e7def86c', version='HTTP/1.1', remote_ip='10.1.177.74')
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.10/site-packages/tornado/websocket.py", line 944, in _accept_connection
        await open_result
      File "/opt/conda/lib/python3.10/site-packages/jupyter_server_ydoc/handlers.py", line 222, in open
        if self.room.document.source != model["content"]:
      File "/opt/conda/lib/python3.10/site-packages/jupyter_ydoc/ydoc.py", line 26, in source
        return self.get()
      File "/opt/conda/lib/python3.10/site-packages/jupyter_ydoc/ydoc.py", line 166, in get
        metadata=meta["metadata"],
    KeyError: 'metadata'

@grallewellyn
Copy link
Collaborator

UWG member who reported this issue (Abigail Barenblitt) was in https://ade.maap-project.org/ using a Basic Stable workspace

We previously thought the problem was only for the R stable workspace.

@gchang Since the jupyterlab bug fix has already been merged, did we decide the best way to handle this was wait for jupyterlab v3.6.4?

@gchang
Copy link
Collaborator

gchang commented May 11, 2023

Still waiting on 3.6.4

@grallewellyn
Copy link
Collaborator

Jupyter lab 4.0.0 has been deployed, and going to do testing with that version

@grallewellyn
Copy link
Collaborator

Relevant issues:
jupyterlab/jupyterlab#14278
jupyterlab/jupyterlab#13930 (comment) (but this one is supposed to be resolved)

We resolved this not by changing the jupyterlab version, but by removing the --collaborative tag during jupyterlab setup

@gchang
Copy link
Collaborator

gchang commented Feb 1, 2024

Both bullets identified in the original notes have been completed.

@grallewellyn
Copy link
Collaborator

@oddes Code to help with testing:

library(reticulate)
maap <- import("maap.maap")
maap_obj <- maap$MAAP(maap_host='api.dit.maap-project.org')

result <- maap_obj$submitJob()

print(result)

@wildintellect
Copy link
Collaborator Author

wildintellect commented Feb 1, 2024

Now we just need to open a ticket on https://github.com/MAAP-Project/maap-documentation/issues to create user facing documentation on common usage of:

  • maap-py data access with CMR
  • maap-py job management

@gchang gchang assigned rtapella and unassigned gchang, marjo-luc and grallewellyn Apr 25, 2024
@gchang gchang added this to the 4.0.0 milestone Apr 25, 2024
@gchang
Copy link
Collaborator

gchang commented Apr 25, 2024

@rtapella to create tix in documentation repo, and close this one.

@rtapella
Copy link
Collaborator

rtapella commented May 1, 2024

@rtapella
Copy link
Collaborator

rtapella commented May 1, 2024

Now we just need to open a ticket on https://github.com/MAAP-Project/maap-documentation/issues to create user facing documentation on common usage of:

  • maap-py data access with CMR
  • maap-py job management

@wildintellect do you mean usage of maap-py from R/reticulate?

@wildintellect
Copy link
Collaborator Author

@rtapella yes R/reticulate is likely the solution users need if working in an R environment.

@rtapella
Copy link
Collaborator

rtapella commented May 2, 2024

okay closing this one and using MAAP-Project/maap-documentation#381 for the remainder

@rtapella rtapella closed this as completed May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ADE Algorithm Development Environment Subsystem
Projects
None yet
Development

No branches or pull requests

5 participants