Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete Python3 update and integrate THREDDS+ORCA #282

Open
wants to merge 79 commits into
base: master
Choose a base branch
from
Open

Conversation

eyvorchuk
Copy link
Contributor

@eyvorchuk eyvorchuk commented Mar 28, 2023

This PR completes the migration of the Python code from Python2 to Python3, and it directs requests made in the raster and hydro station data portals to the OPeNDAP Request Compiler Application (orca), which redirects these requests to the THREDDS server. This means we no longer have to use PyDAP for these portals. This PR makes use of code from the generate-metadata branch in pdp_util and the metadata-requests branch in orca.

Resolves #212.

corviday and others added 30 commits September 15, 2021 17:10
@eyvorchuk eyvorchuk requested review from corviday and rod-glover June 26, 2024 19:08
Copy link
Contributor

@corviday corviday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! The last thing I want to try before Eric leaves is getting it up and running on my workstation, so he can help me out if I have trouble.

@@ -1,7 +1,7 @@
FROM pcic/pdp-base-minimal-unsafe:1.0.0
FROM pcic/pdp-base-minimal-unsafe:pdp-python3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "unsafe" mean here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the pdp-docker project for documentation on that

])
def test_climo_index(
pcic_data_portal, url, title, body_strings):
req = Request.blank(url)
resp = req.get_response(pcic_data_portal)
assert resp.status == '200 OK'
assert resp.content_type == 'text/html'
assert resp.content_length < 0
#assert resp.content_length < 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the deal with the previous version of this line? why would content length have been less than zero at any point? Is this leftover from an intensive debugging process (possibly before you took over)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what this is. I can get rid of it.

@@ -24,18 +24,26 @@ All raster overlay layers are rendered and served by a PCIC-modificiation of the
Pydap
-----

Using Pydap for our OPeNDAP backend server has presented us with a variety of opportunities and challenges. On one hand, development of pydap is very modular, dynamic, and open. This has allowed us to easily write custom code to accomplish things that would be otherwise impossible, such as streaming large data responses, having a near-zero memory footprint, and write are own custom data handlers and responses. On the other hand, pydap can be a moving target. Pydap's development repository has lived in three different locations since we started, most of the code base is not rigorously tested (until lately), and API changes have been common. Few of our contributions have been upstreamed, which means that we live in a perpertual state of fear of upgrade. Pydap is mostly a one man show, which mean works-for-me syndrome is common.
In the past, we used Pydap as our OPeNDAP backend server for all of our data portals, but it is now solely used for the (now deprecated) PCDS portal. Using Pydap has presented us with a variety of opportunities and challenges. On one hand, development of pydap is very modular, dynamic, and open. This has allowed us to easily write custom code to accomplish things that would be otherwise impossible, such as streaming large data responses, having a near-zero memory footprint, and write our own custom data handlers and responses. On the other hand, pydap can be a moving target. Pydap's development repository has lived in three different locations since we started, most of the code base is not rigorously tested (until lately), and API changes have been common. Few of our contributions have been upstreamed, which means that we live in a perpertual state of fear of upgrade. Pydap is mostly a one man show, which mean works-for-me syndrome is common.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could probably shorten this paragraph to just the first sentence or so, since we're not using pydap any more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

https://www.unidata.ucar.edu/software/tds/current/

Using THREDDS has allowed us to mitigate the challenges associated with maintaining the codebase while using Pydap. Despite this, it comes with its own challenges. Most notably, OPenDAP requests have a size limit of 500 MB. To allow users to request larger datasets, we developed an OPeNDAP Request Compiler Application (ORCA), which recursively bisects initial requests larger than 500 MB, sends those smaller requests to THREDDS, and concatenates the returned data before returning that to the user. More information about this application can be found here:
https://github.com/pacificclimate/orca
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to have this in here.

@@ -206,7 +208,7 @@ At present, there are eight pages for which one can retrieve catalogs: ``bc_pris

Metadata and Data
^^^^^^^^^^^^^^^^^
All of our multidimensional raster data is made available via `Open-source Project for a Network Data Access Protocol (OPeNDAP) <http://opendap.org/>`_, the specification of which can be found `here <http://www.opendap.org/pdf/ESE-RFC-004v1.2.pdf>`_. Requests are serviced by our deployment of the `Pydap server <http://www.pydap.org/>`_ which PCIC has heavily modified and rewritten to be able to stream large data requests.
All of our multidimensional raster data is made available via `Open-source Project for a Network Data Access Protocol (OPeNDAP) <http://opendap.org/>`_, the specification of which can be found `here <http://www.opendap.org/pdf/ESE-RFC-004v1.2.pdf>`_. Requests are serviced by our deployment of the `THREDDS server <https://www.unidata.ucar.edu/software/tds/current/>`_ which, when used in conjunction with our OPeNDAP Request Compiler Application (ORCA), allows PCIC to be able to stream large data requests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does a user downloading data with a script need to combine multiple requests themself (and it needs to be documented here?), or does ORCA do it for them? I think ORCA does it for them, but wanted to check...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ORCA will do it for them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming said script amounts to something like wget https://data.pacificclimate.org/data/<portal>/<filename>?<varname>[time_bounds][lat_bounds][lon_bounds].

pdp/error.py Outdated
# * strings - sometimes sent by backend API calls
# * Response objects - not yet clear who is sending these
# TODO: get everyone to send expected generators,
# remove this failsafe code.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully this comment isn't true now that everything has been fixed. Update?

response_headers = [
("content-type", "text/plain"),
("Location", response_iter.location),
("Access-Control-Allow-Origin", "*")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(looks like this has been fixed)

@corviday
Copy link
Contributor

Played with downloading things and looking at DAS and DDS calls from the demo, that seemed to go well.

Copy link
Contributor

@rod-glover rod-glover left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few small suggestions and a couple of questions.

@@ -24,18 +24,26 @@ All raster overlay layers are rendered and served by a PCIC-modificiation of the
Pydap
-----

Using Pydap for our OPeNDAP backend server has presented us with a variety of opportunities and challenges. On one hand, development of pydap is very modular, dynamic, and open. This has allowed us to easily write custom code to accomplish things that would be otherwise impossible, such as streaming large data responses, having a near-zero memory footprint, and write are own custom data handlers and responses. On the other hand, pydap can be a moving target. Pydap's development repository has lived in three different locations since we started, most of the code base is not rigorously tested (until lately), and API changes have been common. Few of our contributions have been upstreamed, which means that we live in a perpertual state of fear of upgrade. Pydap is mostly a one man show, which mean works-for-me syndrome is common.
In the past, we used Pydap as our OPeNDAP backend server for all of our data portals, but it is now solely used for the (now deprecated) PCDS portal. Using Pydap has presented us with a variety of opportunities and challenges. On one hand, development of pydap is very modular, dynamic, and open. This has allowed us to easily write custom code to accomplish things that would be otherwise impossible, such as streaming large data responses, having a near-zero memory footprint, and write our own custom data handlers and responses. On the other hand, pydap can be a moving target. Pydap's development repository has lived in three different locations since we started, most of the code base is not rigorously tested (until lately), and API changes have been common. Few of our contributions have been upstreamed, which means that we live in a perpertual state of fear of upgrade. Pydap is mostly a one man show, which mean works-for-me syndrome is common.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

@@ -1,7 +1,7 @@
FROM pcic/pdp-base-minimal-unsafe:1.0.0
FROM pcic/pdp-base-minimal-unsafe:pdp-python3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment that this should be replaced by a release tag when that branch has been merged.

docker/local-run/Dockerfile Show resolved Hide resolved
docker/production/Dockerfile Show resolved Hide resolved
@@ -87,7 +87,7 @@ function addToSidebar(idx, dataArray) {
});

link = document.createElement('a');
link.href = pdp.data_root + "/hydro_stn_cmip5/" + dataArray[idx].FileName + '.ascii';
link.href = `${pdp.data_root}/hydro_stn_cmip5/${dataArray[idx].FileName}`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we lose .ascii here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because CSV files from THREDDS don't support the .ascii extension. So, they're just downloaded as filename.csv instead of filename.csv.csv or filename.csv.ascii.

@corviday
Copy link
Contributor

corviday commented Sep 5, 2024

I have deployed this branch to beehive, backended by ORCA and THREDDS, and everything seems to be working. Maybe needs a bit more testing, then merge and release the new version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python 3 upgrade
3 participants