-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Complete Python3 update and integrate THREDDS+ORCA #282
base: master
Are you sure you want to change the base?
Changes from all commits
5dfbe9b
ab3ca52
a32d9bd
a175295
80c521c
5fbe711
51f9084
b96ee56
dfc5b24
727499a
98d9ddf
2663e70
a1a14d9
cdf110f
6fe24a1
2c0017f
cb8d42d
5eb84ee
0c024b3
7a3eaa9
b490a98
0e199df
0929d4b
18caba6
877027c
634f472
10049bc
f42b02c
92d5a47
ddc3b97
f3d3f77
53cd13f
9530f37
5adc419
906b2d7
3d3f9d3
b7f6d91
ecd9a9d
b197649
c679848
5427682
26346c2
3e7eef6
3ef2798
c27b40c
4890598
540c57b
11909d5
d09360e
e9e39d8
a089edb
50e8389
0d3c677
1815883
037b05a
0d0056e
44d43b6
ce23940
ddb5b9d
922c427
d856af2
b0ee0cb
c243954
98dabe6
9c25441
6557fad
0670787
175e5ab
e74f36a
68a0045
5acf408
c2f0c37
e84dfe9
df55cce
9d72ffe
56ea346
b818f9e
4821f09
1f8dd16
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,18 +24,26 @@ All raster overlay layers are rendered and served by a PCIC-modificiation of the | |
Pydap | ||
----- | ||
|
||
Using Pydap for our OPeNDAP backend server has presented us with a variety of opportunities and challenges. On one hand, development of pydap is very modular, dynamic, and open. This has allowed us to easily write custom code to accomplish things that would be otherwise impossible, such as streaming large data responses, having a near-zero memory footprint, and write are own custom data handlers and responses. On the other hand, pydap can be a moving target. Pydap's development repository has lived in three different locations since we started, most of the code base is not rigorously tested (until lately), and API changes have been common. Few of our contributions have been upstreamed, which means that we live in a perpertual state of fear of upgrade. Pydap is mostly a one man show, which mean works-for-me syndrome is common. | ||
In the past, we used Pydap as our OPeNDAP backend server for all of our data portals, but it is now solely used for the (now deprecated) PCDS portal. | ||
|
||
Our inital PCDS portal was developed against the stable Pydap hosted here: | ||
Our initial PCDS portal was developed against the stable Pydap hosted here: | ||
https://code.google.com/p/pydap/ | ||
|
||
Our inital raster portal was developed against the development version of Pydap hosted here: | ||
Our initial raster portal was developed against the development version of Pydap hosted here: | ||
https://bitbucket.org/robertodealmeida/pydap | ||
|
||
But now he's developing on github with a branch that looks pretty similar to the inital stable version: | ||
https://github.com/robertodealmeida/pydap | ||
|
||
Where to go? Nobody knows. I fear that we may need to maintain our own fork in perpetuity. | ||
|
||
THREDDS | ||
------- | ||
|
||
In the latest version of the data portal, we have transitioned from serving our raster data and hydro station data via Pydap to our deployment of the THREDDS Data Server (TDS), which is developed and supported by Unidata, a division of the University Corporation for Atmospheric Research (UCAR). More information about this server can be found here: | ||
https://www.unidata.ucar.edu/software/tds/current/ | ||
|
||
Using THREDDS has allowed us to mitigate the challenges associated with maintaining the codebase while using Pydap. Despite this, it comes with its own challenges. Most notably, OPenDAP requests have a size limit of 500 MB. To allow users to request larger datasets, we developed an OPeNDAP Request Compiler Application (ORCA), which recursively bisects initial requests larger than 500 MB, sends those smaller requests to THREDDS, and concatenates the returned data before returning that to the user. More information about this application can be found here: | ||
https://github.com/pacificclimate/orca | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good to have this in here. |
||
|
||
Data Interfaces | ||
--------------- | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -153,6 +153,8 @@ Some of the larger datasets have been packed in accordance with the `netCDF stan | |
|
||
The `scale_factor` and `add_offset` values are documented in the metadata of a packed variable. | ||
|
||
Please note that in the past, we have offered an additional "ArcInfo/ASCII Grid" format, which consisted of a Zip archive containing one .asc file and one .prj (projection) file representing a map at each timestamp; however, this format is no longer offered as of the latest version of the data portal. | ||
|
||
.. _power-user: | ||
|
||
Power user HOWTO | ||
|
@@ -206,7 +208,7 @@ At present, there are eight pages for which one can retrieve catalogs: ``bc_pris | |
|
||
Metadata and Data | ||
^^^^^^^^^^^^^^^^^ | ||
All of our multidimensional raster data is made available via `Open-source Project for a Network Data Access Protocol (OPeNDAP) <http://opendap.org/>`_, the specification of which can be found `here <http://www.opendap.org/pdf/ESE-RFC-004v1.2.pdf>`_. Requests are serviced by our deployment of the `Pydap server <http://www.pydap.org/>`_ which PCIC has heavily modified and rewritten to be able to stream large data requests. | ||
All of our multidimensional raster data is made available via `Open-source Project for a Network Data Access Protocol (OPeNDAP) <http://opendap.org/>`_, the specification of which can be found `here <http://www.opendap.org/pdf/ESE-RFC-004v1.2.pdf>`_. Requests are serviced by our deployment of the `THREDDS server <https://www.unidata.ucar.edu/software/tds/current/>`_ which, when used in conjunction with our OPeNDAP Request Compiler Application (ORCA), allows PCIC to be able to stream large data requests. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does a user downloading data with a script need to combine multiple requests themself (and it needs to be documented here?), or does ORCA do it for them? I think ORCA does it for them, but wanted to check... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ORCA will do it for them. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Assuming said script amounts to something like |
||
|
||
The *structure* and *attributes* of a dataset can be retrieved using OPeNDAP by making a `DDS or DAS <http://www.opendap.org/api/pguide-html/pguide_6.html>`_ request respectively. For example, to determine how many timesteps are available from one of the BCSD datasets, one can make a DDS request against that dataset as such: :: | ||
|
||
|
@@ -363,61 +365,4 @@ To construct a proper DAP selection, please refer to the `DAP specification <htt | |
|
||
Note that for this example the temperature values are all packed integer values and to obtain the proper value you may need to apply a floating point offset and/or scale factor which are available in the DAS response and the netcdf data response. | ||
|
||
Download multiple variables | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
For users that are interested in downloading multiple variables for a single dataset, this *is* possible for certain datasets. The web user interface does not expose this functionality, but if you are willing to do some scripting or URL hacking, you'll be rewarded with a faster download. | ||
|
||
To determine whether your dataset of interest contains multiple variables, check by reading the `Dataset Descriptor Structure (DDS) <http://docs.opendap.org/index.php/UserGuideOPeNDAPMessages>`_. You can get this by making a request to the dataset of interest with the ".dds" suffix appended to the end. E.g. the following DDS request shows that the dataset in question contains 3 independent variables (pr, tasmax, tasmin) and 3 axis variables (lon ,lat, time). All of those are requestable in a single request. :: | ||
|
||
james@basalt:~$ curl 'https://data.pacificclimate.org/data/downscaled_gcms_archive/pr+tasmax+tasmin_day_BCCAQ+ANUSPLIN300+MPI-ESM-LR_historical+rcp26_r3i1p1_19500101-21001231.nc.dds' | ||
Dataset { | ||
Float64 lon[lon = 1068]; | ||
Float64 lat[lat = 510]; | ||
Float64 time[time = 55152]; | ||
Grid { | ||
Array: | ||
Float32 pr[time = 55152][lat = 510][lon = 1068]; | ||
Maps: | ||
Float64 time[time = 55152]; | ||
Float64 lat[lat = 510]; | ||
Float64 lon[lon = 1068]; | ||
} pr; | ||
Grid { | ||
Array: | ||
Float32 tasmax[time = 55152][lat = 510][lon = 1068]; | ||
Maps: | ||
Float64 time[time = 55152]; | ||
Float64 lat[lat = 510]; | ||
Float64 lon[lon = 1068]; | ||
} tasmax; | ||
Grid { | ||
Array: | ||
Float32 tasmin[time = 55152][lat = 510][lon = 1068]; | ||
Maps: | ||
Float64 time[time = 55152]; | ||
Float64 lat[lat = 510]; | ||
Float64 lon[lon = 1068]; | ||
} tasmin; | ||
} pr%2Btasmax%2Btasmin_day_BCCAQ%2BANUSPLIN300%2BMPI-ESM-LR_historical%2Brcp26_r3i1p1_19500101-21001231%2Enc; | ||
|
||
To request multiple variables in a single request, you need to use multiple comma separated variable requests in | ||
the query params. That format looks like this: :: | ||
|
||
[dataset_url].[response_extension]?[variable_name_0][subset_spec],[variable_name_1][subset_spec],... | ||
|
||
So if the base dataset that you want to download is | ||
https://data.pacificclimate.org/data/downscaled_gcms_archive/pr+tasmax+tasmin_day_BCCAQ+ANUSPLIN300+MPI-ESM-LR_historical+rcp26_r3i1p1_19500101-21001231.nc, | ||
and you want to download the NetCDF response, so your extension will | ||
be '.nc'. | ||
|
||
Assume you just want the first 100 timesteps ([0:99]) and a 50x50 | ||
square somewhere in the middle ([250:299][500:549]). | ||
|
||
Putting that all together, it will look something like this: :: | ||
|
||
https://data.pacificclimate.org/data/downscaled_gcms_archive/pr+tasmax+tasmin_day_BCCAQ+ANUSPLIN300+MPI-ESM-LR_historical+rcp26_r3i1p1_19500101-21001231.nc.nc?tasmax[0:99][250:299][500:549],tasmin[0:99][250:299][500:549],pr[0:99][250:299][500:549] | ||
|
||
It's not quite as easy as clicking a few buttons on the web page, but | ||
depending on your use case, you can evaluate whether it's worth your | ||
effort to script together these multi-variable requests. | ||
Please note that in the past, we have allowed users to download multiple variables for a single dataset using a single request; however, this functionality is no longer supported as of the latest version of the data portal. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,8 @@ | ||
FROM pcic/pdp-base-minimal-unsafe:1.0.0 | ||
# TODO: Replace pdp-base-minimal tag with new release tag when this branch of pdp-docker has been merged | ||
FROM pcic/pdp-base-minimal-unsafe:pdp-python3 | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what does "unsafe" mean here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See the pdp-docker project for documentation on that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add a comment that this should be replaced by a release tag when that branch has been merged. |
||
COPY ./ ${USER_DIR} | ||
|
||
RUN python -m pip install -r requirements.txt -r test_requirements.txt | ||
RUN python -m pip install sphinx | ||
RUN python -m pip install . | ||
RUN python3 -m pip install -r requirements.txt -r test_requirements.txt | ||
RUN python3 -m pip install sphinx==1.8.5 | ||
RUN python3 -m pip install . |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,5 @@ | ||
FROM pcic/pdp-base-minimal:1.0.0 | ||
# TODO: Replace pdp-base-minimal tag with new release tag when this branch of pdp-docker has been merged | ||
FROM pcic/pdp-base-minimal:pdp-python3 | ||
LABEL Maintainer="Rod Glover <[email protected]>" | ||
rod-glover marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
USER root | ||
|
@@ -16,6 +17,6 @@ USER ${USERNAME} | |
WORKDIR /codebase | ||
ADD *requirements.txt /codebase/ | ||
|
||
RUN pip install -r requirements.txt -r test_requirements.txt -r deploy_requirements.txt | ||
RUN pip3 install -r requirements.txt -r test_requirements.txt -r deploy_requirements.txt | ||
|
||
ENTRYPOINT ./docker/local-run/entrypoint.sh | ||
ENTRYPOINT ./docker/local-run/entrypoint.sh |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,4 @@ | ||
DSN=postgresql://httpd_meta:[email protected]:5432/pcic_meta | ||
PCDS_DSN=postgresql://httpd:[email protected]:5432/crmp | ||
ORCA_ROOT=https://services.pacificclimate.org/dev/orca | ||
THREDDS_ROOT=https://marble-dev01.pcic.uvic.ca/twitcher/ows/proxy/thredds/dodsC/datasets |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,7 +2,8 @@ | |
# Dockerfile to run the PCIC data portal # | ||
############################################ | ||
|
||
FROM pcic/pdp-base-minimal:1.0.0 | ||
# TODO: Replace pdp-base-minimal tag with new release tag when this branch of pdp-docker has been merged | ||
FROM pcic/pdp-base-minimal:pdp-python3 | ||
rod-glover marked this conversation as resolved.
Show resolved
Hide resolved
|
||
LABEL Maintainer="James Hiebert <[email protected]>" | ||
|
||
USER root | ||
|
@@ -21,14 +22,14 @@ ADD --chown=${USERNAME}:${GROUPNAME} . ${USER_DIR}/ | |
|
||
# Install dependencies. Note: Base image already contains several of the | ||
# heaviest ones. | ||
RUN pip install -r requirements.txt -r deploy_requirements.txt | ||
RUN pip3 install -r requirements.txt -r deploy_requirements.txt | ||
|
||
# Install and build the docs | ||
# Must pre-install to provide dependencies and version number | ||
# for build_spinx | ||
RUN pip install . | ||
RUN python setup.py build_sphinx | ||
RUN pip install . | ||
RUN pip3 install . | ||
RUN python3 setup.py build_sphinx | ||
RUN pip3 install . | ||
|
||
# gunicorn.conf is set up so that one can tune gunicorn settings when | ||
# running the container by setting environment an variable | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍