-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Complete Python3 update and integrate THREDDS+ORCA #282
base: master
Are you sure you want to change the base?
Conversation
…es from ORCA urls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! The last thing I want to try before Eric leaves is getting it up and running on my workstation, so he can help me out if I have trouble.
@@ -1,7 +1,7 @@ | |||
FROM pcic/pdp-base-minimal-unsafe:1.0.0 | |||
FROM pcic/pdp-base-minimal-unsafe:pdp-python3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does "unsafe" mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the pdp-docker project for documentation on that
]) | ||
def test_climo_index( | ||
pcic_data_portal, url, title, body_strings): | ||
req = Request.blank(url) | ||
resp = req.get_response(pcic_data_portal) | ||
assert resp.status == '200 OK' | ||
assert resp.content_type == 'text/html' | ||
assert resp.content_length < 0 | ||
#assert resp.content_length < 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the deal with the previous version of this line? why would content length have been less than zero at any point? Is this leftover from an intensive debugging process (possibly before you took over)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea what this is. I can get rid of it.
doc/source/org.rst
Outdated
@@ -24,18 +24,26 @@ All raster overlay layers are rendered and served by a PCIC-modificiation of the | |||
Pydap | |||
----- | |||
|
|||
Using Pydap for our OPeNDAP backend server has presented us with a variety of opportunities and challenges. On one hand, development of pydap is very modular, dynamic, and open. This has allowed us to easily write custom code to accomplish things that would be otherwise impossible, such as streaming large data responses, having a near-zero memory footprint, and write are own custom data handlers and responses. On the other hand, pydap can be a moving target. Pydap's development repository has lived in three different locations since we started, most of the code base is not rigorously tested (until lately), and API changes have been common. Few of our contributions have been upstreamed, which means that we live in a perpertual state of fear of upgrade. Pydap is mostly a one man show, which mean works-for-me syndrome is common. | |||
In the past, we used Pydap as our OPeNDAP backend server for all of our data portals, but it is now solely used for the (now deprecated) PCDS portal. Using Pydap has presented us with a variety of opportunities and challenges. On one hand, development of pydap is very modular, dynamic, and open. This has allowed us to easily write custom code to accomplish things that would be otherwise impossible, such as streaming large data responses, having a near-zero memory footprint, and write our own custom data handlers and responses. On the other hand, pydap can be a moving target. Pydap's development repository has lived in three different locations since we started, most of the code base is not rigorously tested (until lately), and API changes have been common. Few of our contributions have been upstreamed, which means that we live in a perpertual state of fear of upgrade. Pydap is mostly a one man show, which mean works-for-me syndrome is common. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could probably shorten this paragraph to just the first sentence or so, since we're not using pydap any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
https://www.unidata.ucar.edu/software/tds/current/ | ||
|
||
Using THREDDS has allowed us to mitigate the challenges associated with maintaining the codebase while using Pydap. Despite this, it comes with its own challenges. Most notably, OPenDAP requests have a size limit of 500 MB. To allow users to request larger datasets, we developed an OPeNDAP Request Compiler Application (ORCA), which recursively bisects initial requests larger than 500 MB, sends those smaller requests to THREDDS, and concatenates the returned data before returning that to the user. More information about this application can be found here: | ||
https://github.com/pacificclimate/orca |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to have this in here.
@@ -206,7 +208,7 @@ At present, there are eight pages for which one can retrieve catalogs: ``bc_pris | |||
|
|||
Metadata and Data | |||
^^^^^^^^^^^^^^^^^ | |||
All of our multidimensional raster data is made available via `Open-source Project for a Network Data Access Protocol (OPeNDAP) <http://opendap.org/>`_, the specification of which can be found `here <http://www.opendap.org/pdf/ESE-RFC-004v1.2.pdf>`_. Requests are serviced by our deployment of the `Pydap server <http://www.pydap.org/>`_ which PCIC has heavily modified and rewritten to be able to stream large data requests. | |||
All of our multidimensional raster data is made available via `Open-source Project for a Network Data Access Protocol (OPeNDAP) <http://opendap.org/>`_, the specification of which can be found `here <http://www.opendap.org/pdf/ESE-RFC-004v1.2.pdf>`_. Requests are serviced by our deployment of the `THREDDS server <https://www.unidata.ucar.edu/software/tds/current/>`_ which, when used in conjunction with our OPeNDAP Request Compiler Application (ORCA), allows PCIC to be able to stream large data requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does a user downloading data with a script need to combine multiple requests themself (and it needs to be documented here?), or does ORCA do it for them? I think ORCA does it for them, but wanted to check...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ORCA will do it for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming said script amounts to something like wget https://data.pacificclimate.org/data/<portal>/<filename>?<varname>[time_bounds][lat_bounds][lon_bounds]
.
pdp/error.py
Outdated
# * strings - sometimes sent by backend API calls | ||
# * Response objects - not yet clear who is sending these | ||
# TODO: get everyone to send expected generators, | ||
# remove this failsafe code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully this comment isn't true now that everything has been fixed. Update?
response_headers = [ | ||
("content-type", "text/plain"), | ||
("Location", response_iter.location), | ||
("Access-Control-Allow-Origin", "*") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(looks like this has been fixed)
Played with downloading things and looking at DAS and DDS calls from the demo, that seemed to go well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few small suggestions and a couple of questions.
doc/source/org.rst
Outdated
@@ -24,18 +24,26 @@ All raster overlay layers are rendered and served by a PCIC-modificiation of the | |||
Pydap | |||
----- | |||
|
|||
Using Pydap for our OPeNDAP backend server has presented us with a variety of opportunities and challenges. On one hand, development of pydap is very modular, dynamic, and open. This has allowed us to easily write custom code to accomplish things that would be otherwise impossible, such as streaming large data responses, having a near-zero memory footprint, and write are own custom data handlers and responses. On the other hand, pydap can be a moving target. Pydap's development repository has lived in three different locations since we started, most of the code base is not rigorously tested (until lately), and API changes have been common. Few of our contributions have been upstreamed, which means that we live in a perpertual state of fear of upgrade. Pydap is mostly a one man show, which mean works-for-me syndrome is common. | |||
In the past, we used Pydap as our OPeNDAP backend server for all of our data portals, but it is now solely used for the (now deprecated) PCDS portal. Using Pydap has presented us with a variety of opportunities and challenges. On one hand, development of pydap is very modular, dynamic, and open. This has allowed us to easily write custom code to accomplish things that would be otherwise impossible, such as streaming large data responses, having a near-zero memory footprint, and write our own custom data handlers and responses. On the other hand, pydap can be a moving target. Pydap's development repository has lived in three different locations since we started, most of the code base is not rigorously tested (until lately), and API changes have been common. Few of our contributions have been upstreamed, which means that we live in a perpertual state of fear of upgrade. Pydap is mostly a one man show, which mean works-for-me syndrome is common. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
@@ -1,7 +1,7 @@ | |||
FROM pcic/pdp-base-minimal-unsafe:1.0.0 | |||
FROM pcic/pdp-base-minimal-unsafe:pdp-python3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment that this should be replaced by a release tag when that branch has been merged.
@@ -87,7 +87,7 @@ function addToSidebar(idx, dataArray) { | |||
}); | |||
|
|||
link = document.createElement('a'); | |||
link.href = pdp.data_root + "/hydro_stn_cmip5/" + dataArray[idx].FileName + '.ascii'; | |||
link.href = `${pdp.data_root}/hydro_stn_cmip5/${dataArray[idx].FileName}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we lose .ascii
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because CSV files from THREDDS don't support the .ascii
extension. So, they're just downloaded as filename.csv
instead of filename.csv.csv
or filename.csv.ascii
.
I have deployed this branch to beehive, backended by ORCA and THREDDS, and everything seems to be working. Maybe needs a bit more testing, then merge and release the new version. |
This PR completes the migration of the Python code from Python2 to Python3, and it directs requests made in the raster and hydro station data portals to the OPeNDAP Request Compiler Application (
orca
), which redirects these requests to the THREDDS server. This means we no longer have to use PyDAP for these portals. This PR makes use of code from thegenerate-metadata
branch inpdp_util
and themetadata-requests
branch inorca
.Resolves #212.