diff --git a/.gitignore b/.gitignore index bfb08484..8d1050b8 100644 --- a/.gitignore +++ b/.gitignore @@ -21,4 +21,5 @@ examples/NSIDC/data /.venv/ -external/ \ No newline at end of file +external/ +how-tos/test.nc diff --git a/_quarto.yml b/_quarto.yml index 2ef520fe..c9e7a2ce 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -68,6 +68,8 @@ website: href: how-tos/read_data.qmd - text: "subset data" href: how-tos/subset.qmd + - text: "Store data in the cloud" + href: how-tos/using-s3-storage.ipynb #- text: "reformat data files" # href: how-tos/reformat.qmd #- text: "reproject and regrid" diff --git a/how-tos/using-s3-storage.ipynb b/how-tos/using-s3-storage.ipynb new file mode 100644 index 00000000..7bb936eb --- /dev/null +++ b/how-tos/using-s3-storage.ipynb @@ -0,0 +1,3249 @@ +{ + "cells": [ + { + "cell_type": "raw", + "id": "39768489-1042-412c-b3f7-6844b2b7346b", + "metadata": {}, + "source": [ + "---\n", + "title: \"Using S3 Bucket Storage in NASA-Openscapes Hub\"\n", + "---" + ] + }, + { + "cell_type": "markdown", + "id": "555a951e-9764-4d29-b2e7-1ffefafbd8da", + "metadata": {}, + "source": [ + "# Overview\n", + "\n", + "When you are working in the NASA Openscapes Hub, there are strategies we can use to manage our storage both in terms of cost and performance. The default storage location is the `HOME` directory (`/home/jovyan/`) mounted to the compute instance (the cloud computer that is doing the computations). The Hub uses an [EC2](https://aws.amazon.com/ec2/) compute instance, with the `HOME` directory mounted to [AWS Elastic File System (EFS)](https://aws.amazon.com/efs/) storage. This drive is really handy because it is persistent across server restarts and is a great place to store your code. However the `HOME` directory is not a great place to store data, as it is very expensive, and can also be quite slow to read from and write to. \n", + "\n", + "To that end, the hub provides every user access to two [AWS S3](https://aws.amazon.com/s3/) buckets - a \"scratch\" bucket for short-term storage, and a \"persistent\" bucket for longer-term storage. S3 buckets have fast read/write, and storage costs are relatively inexpensive compared to storing in your `HOME` directory. A useful way to think of S3 buckets in relation to your compute instance is like attaching a cheap but fast external hard drive to your expensive laptop. \n", + "\n", + "One other thing to note about these buckets is that all hub users can access each other's user directories. These buckets are accessible only when you are working inside the hub; you can access them using the environment variables:\n", + "\n", + "- `$SCRATCH_BUCKET` pointing to `s3://openscapeshub-scratch/[your-username]`\n", + " - Scratch buckets are designed for storage of temporary files, e.g. intermediate results. Objects stored in a scratch bucket are removed after 7 days from their creation.\n", + "- `$PERSISTENT_BUCKET` pointing to `s3://openscapeshub-persistent/[your-username]`\n", + " - Persistent buckets are designed for storing data that is consistently used throughout the lifetime of a project. There is no automatic purging of objects in persistent buckets, so it is the responsibility of the hub admin and/or hub users to delete objects when they are no longer needed to minimize cloud billing costs.\n", + "\n", + "We can interact with these directories in Python using the packages `boto3` and/or `s3fs`, or in a terminal with the `awsv2` cli tool. This tutorial will focus on using the `s3fs` package. [See this page](https://docs.2i2c.org/admin/howto/manage-object-storage-aws/) for more information on using S3 buckets in a 2i2c hub, and tips on using the `aws` cli tool.\n", + "\n", + "## Reading and writing to the `$SCRATCH_BUCKET`\n", + "\n", + "We will start by accessing the same data we did in the [Earthdata Cloud Clinic](/tutorials/Earthdata-cloud-clinic.ipynb) - reading it into memory as an xarray object and subsetting it." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "a422d8df-21b5-4011-99b9-411d731dbbeb", + "metadata": {}, + "outputs": [ + { + "data": { + "application/javascript": [ + "(function(root) {\n", + " function now() {\n", + " return new Date();\n", + " }\n", + "\n", + " var force = true;\n", + " var py_version = '3.4.0'.replace('rc', '-rc.').replace('.dev', '-dev.');\n", + " var reloading = false;\n", + " var Bokeh = root.Bokeh;\n", + "\n", + " if (typeof (root._bokeh_timeout) === \"undefined\" || force) {\n", + " root._bokeh_timeout = Date.now() + 5000;\n", + " root._bokeh_failed_load = false;\n", + " }\n", + "\n", + " function run_callbacks() {\n", + " try {\n", + " root._bokeh_onload_callbacks.forEach(function(callback) {\n", + " if (callback != null)\n", + " callback();\n", + " });\n", + " } finally {\n", + " delete root._bokeh_onload_callbacks;\n", + " }\n", + " console.debug(\"Bokeh: all callbacks have finished\");\n", + " }\n", + "\n", + " function load_libs(css_urls, js_urls, js_modules, js_exports, callback) {\n", + " if (css_urls == null) css_urls = [];\n", + " if (js_urls == null) js_urls = [];\n", + " if (js_modules == null) js_modules = [];\n", + " if (js_exports == null) js_exports = {};\n", + "\n", + " root._bokeh_onload_callbacks.push(callback);\n", + "\n", + " if (root._bokeh_is_loading > 0) {\n", + " console.debug(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n", + " return null;\n", + " }\n", + " if (js_urls.length === 0 && js_modules.length === 0 && Object.keys(js_exports).length === 0) {\n", + " run_callbacks();\n", + " return null;\n", + " }\n", + " if (!reloading) {\n", + " console.debug(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n", + " }\n", + "\n", + " function on_load() {\n", + " root._bokeh_is_loading--;\n", + " if (root._bokeh_is_loading === 0) {\n", + " console.debug(\"Bokeh: all BokehJS libraries/stylesheets loaded\");\n", + " run_callbacks()\n", + " }\n", + " }\n", + " window._bokeh_on_load = on_load\n", + "\n", + " function on_error() {\n", + " console.error(\"failed to load \" + url);\n", + " }\n", + "\n", + " var skip = [];\n", + " if (window.requirejs) {\n", + " window.requirejs.config({'packages': {}, 'paths': {}, 'shim': {}});\n", + " root._bokeh_is_loading = css_urls.length + 0;\n", + " } else {\n", + " root._bokeh_is_loading = css_urls.length + js_urls.length + js_modules.length + Object.keys(js_exports).length;\n", + " }\n", + "\n", + " var existing_stylesheets = []\n", + " var links = document.getElementsByTagName('link')\n", + " for (var i = 0; i < links.length; i++) {\n", + " var link = links[i]\n", + " if (link.href != null) {\n", + "\texisting_stylesheets.push(link.href)\n", + " }\n", + " }\n", + " for (var i = 0; i < css_urls.length; i++) {\n", + " var url = css_urls[i];\n", + " if (existing_stylesheets.indexOf(url) !== -1) {\n", + "\ton_load()\n", + "\tcontinue;\n", + " }\n", + " const element = document.createElement(\"link\");\n", + " element.onload = on_load;\n", + " element.onerror = on_error;\n", + " element.rel = \"stylesheet\";\n", + " element.type = \"text/css\";\n", + " element.href = url;\n", + " console.debug(\"Bokeh: injecting link tag for BokehJS stylesheet: \", url);\n", + " document.body.appendChild(element);\n", + " } var existing_scripts = []\n", + " var scripts = document.getElementsByTagName('script')\n", + " for (var i = 0; i < scripts.length; i++) {\n", + " var script = scripts[i]\n", + " if (script.src != null) {\n", + "\texisting_scripts.push(script.src)\n", + " }\n", + " }\n", + " for (var i = 0; i < js_urls.length; i++) {\n", + " var url = js_urls[i];\n", + " if (skip.indexOf(url) !== -1 || existing_scripts.indexOf(url) !== -1) {\n", + "\tif (!window.requirejs) {\n", + "\t on_load();\n", + "\t}\n", + "\tcontinue;\n", + " }\n", + " var element = document.createElement('script');\n", + " element.onload = on_load;\n", + " element.onerror = on_error;\n", + " element.async = false;\n", + " element.src = url;\n", + " console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n", + " document.head.appendChild(element);\n", + " }\n", + " for (var i = 0; i < js_modules.length; i++) {\n", + " var url = js_modules[i];\n", + " if (skip.indexOf(url) !== -1 || existing_scripts.indexOf(url) !== -1) {\n", + "\tif (!window.requirejs) {\n", + "\t on_load();\n", + "\t}\n", + "\tcontinue;\n", + " }\n", + " var element = document.createElement('script');\n", + " element.onload = on_load;\n", + " element.onerror = on_error;\n", + " element.async = false;\n", + " element.src = url;\n", + " element.type = \"module\";\n", + " console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n", + " document.head.appendChild(element);\n", + " }\n", + " for (const name in js_exports) {\n", + " var url = js_exports[name];\n", + " if (skip.indexOf(url) >= 0 || root[name] != null) {\n", + "\tif (!window.requirejs) {\n", + "\t on_load();\n", + "\t}\n", + "\tcontinue;\n", + " }\n", + " var element = document.createElement('script');\n", + " element.onerror = on_error;\n", + " element.async = false;\n", + " element.type = \"module\";\n", + " console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n", + " element.textContent = `\n", + " import ${name} from \"${url}\"\n", + " window.${name} = ${name}\n", + " window._bokeh_on_load()\n", + " `\n", + " document.head.appendChild(element);\n", + " }\n", + " if (!js_urls.length && !js_modules.length) {\n", + " on_load()\n", + " }\n", + " };\n", + "\n", + " function inject_raw_css(css) {\n", + " const element = document.createElement(\"style\");\n", + " element.appendChild(document.createTextNode(css));\n", + " document.body.appendChild(element);\n", + " }\n", + "\n", + " var js_urls = [\"https://cdn.bokeh.org/bokeh/release/bokeh-3.4.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-gl-3.4.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-3.4.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-3.4.0.min.js\", \"https://cdn.holoviz.org/panel/1.4.1/dist/panel.min.js\"];\n", + " var js_modules = [];\n", + " var js_exports = {};\n", + " var css_urls = [];\n", + " var inline_js = [ function(Bokeh) {\n", + " Bokeh.set_log_level(\"info\");\n", + " },\n", + "function(Bokeh) {} // ensure no trailing comma for IE\n", + " ];\n", + "\n", + " function run_inline_js() {\n", + " if ((root.Bokeh !== undefined) || (force === true)) {\n", + " for (var i = 0; i < inline_js.length; i++) {\n", + "\ttry {\n", + " inline_js[i].call(root, root.Bokeh);\n", + "\t} catch(e) {\n", + "\t if (!reloading) {\n", + "\t throw e;\n", + "\t }\n", + "\t}\n", + " }\n", + " // Cache old bokeh versions\n", + " if (Bokeh != undefined && !reloading) {\n", + "\tvar NewBokeh = root.Bokeh;\n", + "\tif (Bokeh.versions === undefined) {\n", + "\t Bokeh.versions = new Map();\n", + "\t}\n", + "\tif (NewBokeh.version !== Bokeh.version) {\n", + "\t Bokeh.versions.set(NewBokeh.version, NewBokeh)\n", + "\t}\n", + "\troot.Bokeh = Bokeh;\n", + " }} else if (Date.now() < root._bokeh_timeout) {\n", + " setTimeout(run_inline_js, 100);\n", + " } else if (!root._bokeh_failed_load) {\n", + " console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n", + " root._bokeh_failed_load = true;\n", + " }\n", + " root._bokeh_is_initializing = false\n", + " }\n", + "\n", + " function load_or_wait() {\n", + " // Implement a backoff loop that tries to ensure we do not load multiple\n", + " // versions of Bokeh and its dependencies at the same time.\n", + " // In recent versions we use the root._bokeh_is_initializing flag\n", + " // to determine whether there is an ongoing attempt to initialize\n", + " // bokeh, however for backward compatibility we also try to ensure\n", + " // that we do not start loading a newer (Panel>=1.0 and Bokeh>3) version\n", + " // before older versions are fully initialized.\n", + " if (root._bokeh_is_initializing && Date.now() > root._bokeh_timeout) {\n", + " root._bokeh_is_initializing = false;\n", + " root._bokeh_onload_callbacks = undefined;\n", + " console.log(\"Bokeh: BokehJS was loaded multiple times but one version failed to initialize.\");\n", + " load_or_wait();\n", + " } else if (root._bokeh_is_initializing || (typeof root._bokeh_is_initializing === \"undefined\" && root._bokeh_onload_callbacks !== undefined)) {\n", + " setTimeout(load_or_wait, 100);\n", + " } else {\n", + " root._bokeh_is_initializing = true\n", + " root._bokeh_onload_callbacks = []\n", + " var bokeh_loaded = Bokeh != null && (Bokeh.version === py_version || (Bokeh.versions !== undefined && Bokeh.versions.has(py_version)));\n", + " if (!reloading && !bokeh_loaded) {\n", + "\troot.Bokeh = undefined;\n", + " }\n", + " load_libs(css_urls, js_urls, js_modules, js_exports, function() {\n", + "\tconsole.debug(\"Bokeh: BokehJS plotting callback run at\", now());\n", + "\trun_inline_js();\n", + " });\n", + " }\n", + " }\n", + " // Give older versions of the autoload script a head-start to ensure\n", + " // they initialize before we start loading newer version.\n", + " setTimeout(load_or_wait, 100)\n", + "}(window));" + ], + "application/vnd.holoviews_load.v0+json": "(function(root) {\n function now() {\n return new Date();\n }\n\n var force = true;\n var py_version = '3.4.0'.replace('rc', '-rc.').replace('.dev', '-dev.');\n var reloading = false;\n var Bokeh = root.Bokeh;\n\n if (typeof (root._bokeh_timeout) === \"undefined\" || force) {\n root._bokeh_timeout = Date.now() + 5000;\n root._bokeh_failed_load = false;\n }\n\n function run_callbacks() {\n try {\n root._bokeh_onload_callbacks.forEach(function(callback) {\n if (callback != null)\n callback();\n });\n } finally {\n delete root._bokeh_onload_callbacks;\n }\n console.debug(\"Bokeh: all callbacks have finished\");\n }\n\n function load_libs(css_urls, js_urls, js_modules, js_exports, callback) {\n if (css_urls == null) css_urls = [];\n if (js_urls == null) js_urls = [];\n if (js_modules == null) js_modules = [];\n if (js_exports == null) js_exports = {};\n\n root._bokeh_onload_callbacks.push(callback);\n\n if (root._bokeh_is_loading > 0) {\n console.debug(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n return null;\n }\n if (js_urls.length === 0 && js_modules.length === 0 && Object.keys(js_exports).length === 0) {\n run_callbacks();\n return null;\n }\n if (!reloading) {\n console.debug(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n }\n\n function on_load() {\n root._bokeh_is_loading--;\n if (root._bokeh_is_loading === 0) {\n console.debug(\"Bokeh: all BokehJS libraries/stylesheets loaded\");\n run_callbacks()\n }\n }\n window._bokeh_on_load = on_load\n\n function on_error() {\n console.error(\"failed to load \" + url);\n }\n\n var skip = [];\n if (window.requirejs) {\n window.requirejs.config({'packages': {}, 'paths': {}, 'shim': {}});\n root._bokeh_is_loading = css_urls.length + 0;\n } else {\n root._bokeh_is_loading = css_urls.length + js_urls.length + js_modules.length + Object.keys(js_exports).length;\n }\n\n var existing_stylesheets = []\n var links = document.getElementsByTagName('link')\n for (var i = 0; i < links.length; i++) {\n var link = links[i]\n if (link.href != null) {\n\texisting_stylesheets.push(link.href)\n }\n }\n for (var i = 0; i < css_urls.length; i++) {\n var url = css_urls[i];\n if (existing_stylesheets.indexOf(url) !== -1) {\n\ton_load()\n\tcontinue;\n }\n const element = document.createElement(\"link\");\n element.onload = on_load;\n element.onerror = on_error;\n element.rel = \"stylesheet\";\n element.type = \"text/css\";\n element.href = url;\n console.debug(\"Bokeh: injecting link tag for BokehJS stylesheet: \", url);\n document.body.appendChild(element);\n } var existing_scripts = []\n var scripts = document.getElementsByTagName('script')\n for (var i = 0; i < scripts.length; i++) {\n var script = scripts[i]\n if (script.src != null) {\n\texisting_scripts.push(script.src)\n }\n }\n for (var i = 0; i < js_urls.length; i++) {\n var url = js_urls[i];\n if (skip.indexOf(url) !== -1 || existing_scripts.indexOf(url) !== -1) {\n\tif (!window.requirejs) {\n\t on_load();\n\t}\n\tcontinue;\n }\n var element = document.createElement('script');\n element.onload = on_load;\n element.onerror = on_error;\n element.async = false;\n element.src = url;\n console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n document.head.appendChild(element);\n }\n for (var i = 0; i < js_modules.length; i++) {\n var url = js_modules[i];\n if (skip.indexOf(url) !== -1 || existing_scripts.indexOf(url) !== -1) {\n\tif (!window.requirejs) {\n\t on_load();\n\t}\n\tcontinue;\n }\n var element = document.createElement('script');\n element.onload = on_load;\n element.onerror = on_error;\n element.async = false;\n element.src = url;\n element.type = \"module\";\n console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n document.head.appendChild(element);\n }\n for (const name in js_exports) {\n var url = js_exports[name];\n if (skip.indexOf(url) >= 0 || root[name] != null) {\n\tif (!window.requirejs) {\n\t on_load();\n\t}\n\tcontinue;\n }\n var element = document.createElement('script');\n element.onerror = on_error;\n element.async = false;\n element.type = \"module\";\n console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n element.textContent = `\n import ${name} from \"${url}\"\n window.${name} = ${name}\n window._bokeh_on_load()\n `\n document.head.appendChild(element);\n }\n if (!js_urls.length && !js_modules.length) {\n on_load()\n }\n };\n\n function inject_raw_css(css) {\n const element = document.createElement(\"style\");\n element.appendChild(document.createTextNode(css));\n document.body.appendChild(element);\n }\n\n var js_urls = [\"https://cdn.bokeh.org/bokeh/release/bokeh-3.4.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-gl-3.4.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-3.4.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-3.4.0.min.js\", \"https://cdn.holoviz.org/panel/1.4.1/dist/panel.min.js\"];\n var js_modules = [];\n var js_exports = {};\n var css_urls = [];\n var inline_js = [ function(Bokeh) {\n Bokeh.set_log_level(\"info\");\n },\nfunction(Bokeh) {} // ensure no trailing comma for IE\n ];\n\n function run_inline_js() {\n if ((root.Bokeh !== undefined) || (force === true)) {\n for (var i = 0; i < inline_js.length; i++) {\n\ttry {\n inline_js[i].call(root, root.Bokeh);\n\t} catch(e) {\n\t if (!reloading) {\n\t throw e;\n\t }\n\t}\n }\n // Cache old bokeh versions\n if (Bokeh != undefined && !reloading) {\n\tvar NewBokeh = root.Bokeh;\n\tif (Bokeh.versions === undefined) {\n\t Bokeh.versions = new Map();\n\t}\n\tif (NewBokeh.version !== Bokeh.version) {\n\t Bokeh.versions.set(NewBokeh.version, NewBokeh)\n\t}\n\troot.Bokeh = Bokeh;\n }} else if (Date.now() < root._bokeh_timeout) {\n setTimeout(run_inline_js, 100);\n } else if (!root._bokeh_failed_load) {\n console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n root._bokeh_failed_load = true;\n }\n root._bokeh_is_initializing = false\n }\n\n function load_or_wait() {\n // Implement a backoff loop that tries to ensure we do not load multiple\n // versions of Bokeh and its dependencies at the same time.\n // In recent versions we use the root._bokeh_is_initializing flag\n // to determine whether there is an ongoing attempt to initialize\n // bokeh, however for backward compatibility we also try to ensure\n // that we do not start loading a newer (Panel>=1.0 and Bokeh>3) version\n // before older versions are fully initialized.\n if (root._bokeh_is_initializing && Date.now() > root._bokeh_timeout) {\n root._bokeh_is_initializing = false;\n root._bokeh_onload_callbacks = undefined;\n console.log(\"Bokeh: BokehJS was loaded multiple times but one version failed to initialize.\");\n load_or_wait();\n } else if (root._bokeh_is_initializing || (typeof root._bokeh_is_initializing === \"undefined\" && root._bokeh_onload_callbacks !== undefined)) {\n setTimeout(load_or_wait, 100);\n } else {\n root._bokeh_is_initializing = true\n root._bokeh_onload_callbacks = []\n var bokeh_loaded = Bokeh != null && (Bokeh.version === py_version || (Bokeh.versions !== undefined && Bokeh.versions.has(py_version)));\n if (!reloading && !bokeh_loaded) {\n\troot.Bokeh = undefined;\n }\n load_libs(css_urls, js_urls, js_modules, js_exports, function() {\n\tconsole.debug(\"Bokeh: BokehJS plotting callback run at\", now());\n\trun_inline_js();\n });\n }\n }\n // Give older versions of the autoload script a head-start to ensure\n // they initialize before we start loading newer version.\n setTimeout(load_or_wait, 100)\n}(window));" + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/javascript": [ + "\n", + "if ((window.PyViz === undefined) || (window.PyViz instanceof HTMLElement)) {\n", + " window.PyViz = {comms: {}, comm_status:{}, kernels:{}, receivers: {}, plot_index: []}\n", + "}\n", + "\n", + "\n", + " function JupyterCommManager() {\n", + " }\n", + "\n", + " JupyterCommManager.prototype.register_target = function(plot_id, comm_id, msg_handler) {\n", + " if (window.comm_manager || ((window.Jupyter !== undefined) && (Jupyter.notebook.kernel != null))) {\n", + " var comm_manager = window.comm_manager || Jupyter.notebook.kernel.comm_manager;\n", + " comm_manager.register_target(comm_id, function(comm) {\n", + " comm.on_msg(msg_handler);\n", + " });\n", + " } else if ((plot_id in window.PyViz.kernels) && (window.PyViz.kernels[plot_id])) {\n", + " window.PyViz.kernels[plot_id].registerCommTarget(comm_id, function(comm) {\n", + " comm.onMsg = msg_handler;\n", + " });\n", + " } else if (typeof google != 'undefined' && google.colab.kernel != null) {\n", + " google.colab.kernel.comms.registerTarget(comm_id, (comm) => {\n", + " var messages = comm.messages[Symbol.asyncIterator]();\n", + " function processIteratorResult(result) {\n", + " var message = result.value;\n", + " console.log(message)\n", + " var content = {data: message.data, comm_id};\n", + " var buffers = []\n", + " for (var buffer of message.buffers || []) {\n", + " buffers.push(new DataView(buffer))\n", + " }\n", + " var metadata = message.metadata || {};\n", + " var msg = {content, buffers, metadata}\n", + " msg_handler(msg);\n", + " return messages.next().then(processIteratorResult);\n", + " }\n", + " return messages.next().then(processIteratorResult);\n", + " })\n", + " }\n", + " }\n", + "\n", + " JupyterCommManager.prototype.get_client_comm = function(plot_id, comm_id, msg_handler) {\n", + " if (comm_id in window.PyViz.comms) {\n", + " return window.PyViz.comms[comm_id];\n", + " } else if (window.comm_manager || ((window.Jupyter !== undefined) && (Jupyter.notebook.kernel != null))) {\n", + " var comm_manager = window.comm_manager || Jupyter.notebook.kernel.comm_manager;\n", + " var comm = comm_manager.new_comm(comm_id, {}, {}, {}, comm_id);\n", + " if (msg_handler) {\n", + " comm.on_msg(msg_handler);\n", + " }\n", + " } else if ((plot_id in window.PyViz.kernels) && (window.PyViz.kernels[plot_id])) {\n", + " var comm = window.PyViz.kernels[plot_id].connectToComm(comm_id);\n", + " comm.open();\n", + " if (msg_handler) {\n", + " comm.onMsg = msg_handler;\n", + " }\n", + " } else if (typeof google != 'undefined' && google.colab.kernel != null) {\n", + " var comm_promise = google.colab.kernel.comms.open(comm_id)\n", + " comm_promise.then((comm) => {\n", + " window.PyViz.comms[comm_id] = comm;\n", + " if (msg_handler) {\n", + " var messages = comm.messages[Symbol.asyncIterator]();\n", + " function processIteratorResult(result) {\n", + " var message = result.value;\n", + " var content = {data: message.data};\n", + " var metadata = message.metadata || {comm_id};\n", + " var msg = {content, metadata}\n", + " msg_handler(msg);\n", + " return messages.next().then(processIteratorResult);\n", + " }\n", + " return messages.next().then(processIteratorResult);\n", + " }\n", + " }) \n", + " var sendClosure = (data, metadata, buffers, disposeOnDone) => {\n", + " return comm_promise.then((comm) => {\n", + " comm.send(data, metadata, buffers, disposeOnDone);\n", + " });\n", + " };\n", + " var comm = {\n", + " send: sendClosure\n", + " };\n", + " }\n", + " window.PyViz.comms[comm_id] = comm;\n", + " return comm;\n", + " }\n", + " window.PyViz.comm_manager = new JupyterCommManager();\n", + " \n", + "\n", + "\n", + "var JS_MIME_TYPE = 'application/javascript';\n", + "var HTML_MIME_TYPE = 'text/html';\n", + "var EXEC_MIME_TYPE = 'application/vnd.holoviews_exec.v0+json';\n", + "var CLASS_NAME = 'output';\n", + "\n", + "/**\n", + " * Render data to the DOM node\n", + " */\n", + "function render(props, node) {\n", + " var div = document.createElement(\"div\");\n", + " var script = document.createElement(\"script\");\n", + " node.appendChild(div);\n", + " node.appendChild(script);\n", + "}\n", + "\n", + "/**\n", + " * Handle when a new output is added\n", + " */\n", + "function handle_add_output(event, handle) {\n", + " var output_area = handle.output_area;\n", + " var output = handle.output;\n", + " if ((output.data == undefined) || (!output.data.hasOwnProperty(EXEC_MIME_TYPE))) {\n", + " return\n", + " }\n", + " var id = output.metadata[EXEC_MIME_TYPE][\"id\"];\n", + " var toinsert = output_area.element.find(\".\" + CLASS_NAME.split(' ')[0]);\n", + " if (id !== undefined) {\n", + " var nchildren = toinsert.length;\n", + " var html_node = toinsert[nchildren-1].children[0];\n", + " html_node.innerHTML = output.data[HTML_MIME_TYPE];\n", + " var scripts = [];\n", + " var nodelist = html_node.querySelectorAll(\"script\");\n", + " for (var i in nodelist) {\n", + " if (nodelist.hasOwnProperty(i)) {\n", + " scripts.push(nodelist[i])\n", + " }\n", + " }\n", + "\n", + " scripts.forEach( function (oldScript) {\n", + " var newScript = document.createElement(\"script\");\n", + " var attrs = [];\n", + " var nodemap = oldScript.attributes;\n", + " for (var j in nodemap) {\n", + " if (nodemap.hasOwnProperty(j)) {\n", + " attrs.push(nodemap[j])\n", + " }\n", + " }\n", + " attrs.forEach(function(attr) { newScript.setAttribute(attr.name, attr.value) });\n", + " newScript.appendChild(document.createTextNode(oldScript.innerHTML));\n", + " oldScript.parentNode.replaceChild(newScript, oldScript);\n", + " });\n", + " if (JS_MIME_TYPE in output.data) {\n", + " toinsert[nchildren-1].children[1].textContent = output.data[JS_MIME_TYPE];\n", + " }\n", + " output_area._hv_plot_id = id;\n", + " if ((window.Bokeh !== undefined) && (id in Bokeh.index)) {\n", + " window.PyViz.plot_index[id] = Bokeh.index[id];\n", + " } else {\n", + " window.PyViz.plot_index[id] = null;\n", + " }\n", + " } else if (output.metadata[EXEC_MIME_TYPE][\"server_id\"] !== undefined) {\n", + " var bk_div = document.createElement(\"div\");\n", + " bk_div.innerHTML = output.data[HTML_MIME_TYPE];\n", + " var script_attrs = bk_div.children[0].attributes;\n", + " for (var i = 0; i < script_attrs.length; i++) {\n", + " toinsert[toinsert.length - 1].childNodes[1].setAttribute(script_attrs[i].name, script_attrs[i].value);\n", + " }\n", + " // store reference to server id on output_area\n", + " output_area._bokeh_server_id = output.metadata[EXEC_MIME_TYPE][\"server_id\"];\n", + " }\n", + "}\n", + "\n", + "/**\n", + " * Handle when an output is cleared or removed\n", + " */\n", + "function handle_clear_output(event, handle) {\n", + " var id = handle.cell.output_area._hv_plot_id;\n", + " var server_id = handle.cell.output_area._bokeh_server_id;\n", + " if (((id === undefined) || !(id in PyViz.plot_index)) && (server_id !== undefined)) { return; }\n", + " var comm = window.PyViz.comm_manager.get_client_comm(\"hv-extension-comm\", \"hv-extension-comm\", function () {});\n", + " if (server_id !== null) {\n", + " comm.send({event_type: 'server_delete', 'id': server_id});\n", + " return;\n", + " } else if (comm !== null) {\n", + " comm.send({event_type: 'delete', 'id': id});\n", + " }\n", + " delete PyViz.plot_index[id];\n", + " if ((window.Bokeh !== undefined) & (id in window.Bokeh.index)) {\n", + " var doc = window.Bokeh.index[id].model.document\n", + " doc.clear();\n", + " const i = window.Bokeh.documents.indexOf(doc);\n", + " if (i > -1) {\n", + " window.Bokeh.documents.splice(i, 1);\n", + " }\n", + " }\n", + "}\n", + "\n", + "/**\n", + " * Handle kernel restart event\n", + " */\n", + "function handle_kernel_cleanup(event, handle) {\n", + " delete PyViz.comms[\"hv-extension-comm\"];\n", + " window.PyViz.plot_index = {}\n", + "}\n", + "\n", + "/**\n", + " * Handle update_display_data messages\n", + " */\n", + "function handle_update_output(event, handle) {\n", + " handle_clear_output(event, {cell: {output_area: handle.output_area}})\n", + " handle_add_output(event, handle)\n", + "}\n", + "\n", + "function register_renderer(events, OutputArea) {\n", + " function append_mime(data, metadata, element) {\n", + " // create a DOM node to render to\n", + " var toinsert = this.create_output_subarea(\n", + " metadata,\n", + " CLASS_NAME,\n", + " EXEC_MIME_TYPE\n", + " );\n", + " this.keyboard_manager.register_events(toinsert);\n", + " // Render to node\n", + " var props = {data: data, metadata: metadata[EXEC_MIME_TYPE]};\n", + " render(props, toinsert[0]);\n", + " element.append(toinsert);\n", + " return toinsert\n", + " }\n", + "\n", + " events.on('output_added.OutputArea', handle_add_output);\n", + " events.on('output_updated.OutputArea', handle_update_output);\n", + " events.on('clear_output.CodeCell', handle_clear_output);\n", + " events.on('delete.Cell', handle_clear_output);\n", + " events.on('kernel_ready.Kernel', handle_kernel_cleanup);\n", + "\n", + " OutputArea.prototype.register_mime_type(EXEC_MIME_TYPE, append_mime, {\n", + " safe: true,\n", + " index: 0\n", + " });\n", + "}\n", + "\n", + "if (window.Jupyter !== undefined) {\n", + " try {\n", + " var events = require('base/js/events');\n", + " var OutputArea = require('notebook/js/outputarea').OutputArea;\n", + " if (OutputArea.prototype.mime_types().indexOf(EXEC_MIME_TYPE) == -1) {\n", + " register_renderer(events, OutputArea);\n", + " }\n", + " } catch(err) {\n", + " }\n", + "}\n" + ], + "application/vnd.holoviews_load.v0+json": "\nif ((window.PyViz === undefined) || (window.PyViz instanceof HTMLElement)) {\n window.PyViz = {comms: {}, comm_status:{}, kernels:{}, receivers: {}, plot_index: []}\n}\n\n\n function JupyterCommManager() {\n }\n\n JupyterCommManager.prototype.register_target = function(plot_id, comm_id, msg_handler) {\n if (window.comm_manager || ((window.Jupyter !== undefined) && (Jupyter.notebook.kernel != null))) {\n var comm_manager = window.comm_manager || Jupyter.notebook.kernel.comm_manager;\n comm_manager.register_target(comm_id, function(comm) {\n comm.on_msg(msg_handler);\n });\n } else if ((plot_id in window.PyViz.kernels) && (window.PyViz.kernels[plot_id])) {\n window.PyViz.kernels[plot_id].registerCommTarget(comm_id, function(comm) {\n comm.onMsg = msg_handler;\n });\n } else if (typeof google != 'undefined' && google.colab.kernel != null) {\n google.colab.kernel.comms.registerTarget(comm_id, (comm) => {\n var messages = comm.messages[Symbol.asyncIterator]();\n function processIteratorResult(result) {\n var message = result.value;\n console.log(message)\n var content = {data: message.data, comm_id};\n var buffers = []\n for (var buffer of message.buffers || []) {\n buffers.push(new DataView(buffer))\n }\n var metadata = message.metadata || {};\n var msg = {content, buffers, metadata}\n msg_handler(msg);\n return messages.next().then(processIteratorResult);\n }\n return messages.next().then(processIteratorResult);\n })\n }\n }\n\n JupyterCommManager.prototype.get_client_comm = function(plot_id, comm_id, msg_handler) {\n if (comm_id in window.PyViz.comms) {\n return window.PyViz.comms[comm_id];\n } else if (window.comm_manager || ((window.Jupyter !== undefined) && (Jupyter.notebook.kernel != null))) {\n var comm_manager = window.comm_manager || Jupyter.notebook.kernel.comm_manager;\n var comm = comm_manager.new_comm(comm_id, {}, {}, {}, comm_id);\n if (msg_handler) {\n comm.on_msg(msg_handler);\n }\n } else if ((plot_id in window.PyViz.kernels) && (window.PyViz.kernels[plot_id])) {\n var comm = window.PyViz.kernels[plot_id].connectToComm(comm_id);\n comm.open();\n if (msg_handler) {\n comm.onMsg = msg_handler;\n }\n } else if (typeof google != 'undefined' && google.colab.kernel != null) {\n var comm_promise = google.colab.kernel.comms.open(comm_id)\n comm_promise.then((comm) => {\n window.PyViz.comms[comm_id] = comm;\n if (msg_handler) {\n var messages = comm.messages[Symbol.asyncIterator]();\n function processIteratorResult(result) {\n var message = result.value;\n var content = {data: message.data};\n var metadata = message.metadata || {comm_id};\n var msg = {content, metadata}\n msg_handler(msg);\n return messages.next().then(processIteratorResult);\n }\n return messages.next().then(processIteratorResult);\n }\n }) \n var sendClosure = (data, metadata, buffers, disposeOnDone) => {\n return comm_promise.then((comm) => {\n comm.send(data, metadata, buffers, disposeOnDone);\n });\n };\n var comm = {\n send: sendClosure\n };\n }\n window.PyViz.comms[comm_id] = comm;\n return comm;\n }\n window.PyViz.comm_manager = new JupyterCommManager();\n \n\n\nvar JS_MIME_TYPE = 'application/javascript';\nvar HTML_MIME_TYPE = 'text/html';\nvar EXEC_MIME_TYPE = 'application/vnd.holoviews_exec.v0+json';\nvar CLASS_NAME = 'output';\n\n/**\n * Render data to the DOM node\n */\nfunction render(props, node) {\n var div = document.createElement(\"div\");\n var script = document.createElement(\"script\");\n node.appendChild(div);\n node.appendChild(script);\n}\n\n/**\n * Handle when a new output is added\n */\nfunction handle_add_output(event, handle) {\n var output_area = handle.output_area;\n var output = handle.output;\n if ((output.data == undefined) || (!output.data.hasOwnProperty(EXEC_MIME_TYPE))) {\n return\n }\n var id = output.metadata[EXEC_MIME_TYPE][\"id\"];\n var toinsert = output_area.element.find(\".\" + CLASS_NAME.split(' ')[0]);\n if (id !== undefined) {\n var nchildren = toinsert.length;\n var html_node = toinsert[nchildren-1].children[0];\n html_node.innerHTML = output.data[HTML_MIME_TYPE];\n var scripts = [];\n var nodelist = html_node.querySelectorAll(\"script\");\n for (var i in nodelist) {\n if (nodelist.hasOwnProperty(i)) {\n scripts.push(nodelist[i])\n }\n }\n\n scripts.forEach( function (oldScript) {\n var newScript = document.createElement(\"script\");\n var attrs = [];\n var nodemap = oldScript.attributes;\n for (var j in nodemap) {\n if (nodemap.hasOwnProperty(j)) {\n attrs.push(nodemap[j])\n }\n }\n attrs.forEach(function(attr) { newScript.setAttribute(attr.name, attr.value) });\n newScript.appendChild(document.createTextNode(oldScript.innerHTML));\n oldScript.parentNode.replaceChild(newScript, oldScript);\n });\n if (JS_MIME_TYPE in output.data) {\n toinsert[nchildren-1].children[1].textContent = output.data[JS_MIME_TYPE];\n }\n output_area._hv_plot_id = id;\n if ((window.Bokeh !== undefined) && (id in Bokeh.index)) {\n window.PyViz.plot_index[id] = Bokeh.index[id];\n } else {\n window.PyViz.plot_index[id] = null;\n }\n } else if (output.metadata[EXEC_MIME_TYPE][\"server_id\"] !== undefined) {\n var bk_div = document.createElement(\"div\");\n bk_div.innerHTML = output.data[HTML_MIME_TYPE];\n var script_attrs = bk_div.children[0].attributes;\n for (var i = 0; i < script_attrs.length; i++) {\n toinsert[toinsert.length - 1].childNodes[1].setAttribute(script_attrs[i].name, script_attrs[i].value);\n }\n // store reference to server id on output_area\n output_area._bokeh_server_id = output.metadata[EXEC_MIME_TYPE][\"server_id\"];\n }\n}\n\n/**\n * Handle when an output is cleared or removed\n */\nfunction handle_clear_output(event, handle) {\n var id = handle.cell.output_area._hv_plot_id;\n var server_id = handle.cell.output_area._bokeh_server_id;\n if (((id === undefined) || !(id in PyViz.plot_index)) && (server_id !== undefined)) { return; }\n var comm = window.PyViz.comm_manager.get_client_comm(\"hv-extension-comm\", \"hv-extension-comm\", function () {});\n if (server_id !== null) {\n comm.send({event_type: 'server_delete', 'id': server_id});\n return;\n } else if (comm !== null) {\n comm.send({event_type: 'delete', 'id': id});\n }\n delete PyViz.plot_index[id];\n if ((window.Bokeh !== undefined) & (id in window.Bokeh.index)) {\n var doc = window.Bokeh.index[id].model.document\n doc.clear();\n const i = window.Bokeh.documents.indexOf(doc);\n if (i > -1) {\n window.Bokeh.documents.splice(i, 1);\n }\n }\n}\n\n/**\n * Handle kernel restart event\n */\nfunction handle_kernel_cleanup(event, handle) {\n delete PyViz.comms[\"hv-extension-comm\"];\n window.PyViz.plot_index = {}\n}\n\n/**\n * Handle update_display_data messages\n */\nfunction handle_update_output(event, handle) {\n handle_clear_output(event, {cell: {output_area: handle.output_area}})\n handle_add_output(event, handle)\n}\n\nfunction register_renderer(events, OutputArea) {\n function append_mime(data, metadata, element) {\n // create a DOM node to render to\n var toinsert = this.create_output_subarea(\n metadata,\n CLASS_NAME,\n EXEC_MIME_TYPE\n );\n this.keyboard_manager.register_events(toinsert);\n // Render to node\n var props = {data: data, metadata: metadata[EXEC_MIME_TYPE]};\n render(props, toinsert[0]);\n element.append(toinsert);\n return toinsert\n }\n\n events.on('output_added.OutputArea', handle_add_output);\n events.on('output_updated.OutputArea', handle_update_output);\n events.on('clear_output.CodeCell', handle_clear_output);\n events.on('delete.Cell', handle_clear_output);\n events.on('kernel_ready.Kernel', handle_kernel_cleanup);\n\n OutputArea.prototype.register_mime_type(EXEC_MIME_TYPE, append_mime, {\n safe: true,\n index: 0\n });\n}\n\nif (window.Jupyter !== undefined) {\n try {\n var events = require('base/js/events');\n var OutputArea = require('notebook/js/outputarea').OutputArea;\n if (OutputArea.prototype.mime_types().indexOf(EXEC_MIME_TYPE) == -1) {\n register_renderer(events, OutputArea);\n }\n } catch(err) {\n }\n}\n" + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.holoviews_exec.v0+json": "", + "text/html": [ + "
\n", + "
\n", + "
\n", + "" + ] + }, + "metadata": { + "application/vnd.holoviews_exec.v0+json": { + "id": "p1002" + } + }, + "output_type": "display_data" + } + ], + "source": [ + "import earthaccess \n", + "import xarray as xr\n", + "import hvplot.xarray #plot\n", + "import os\n", + "import tempfile\n", + "import s3fs # aws s3 access" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "85163eed-9829-4464-a058-1fcbce505739", + "metadata": {}, + "outputs": [], + "source": [ + "auth = earthaccess.login()" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "d1067f3a-0866-4b79-aa22-f8f0ee9cfd61", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Granules found: 18\n" + ] + } + ], + "source": [ + "data_name = \"SEA_SURFACE_HEIGHT_ALT_GRIDS_L4_2SATS_5DAY_6THDEG_V_JPL2205\"\n", + "\n", + "results = earthaccess.search_data(\n", + " short_name=data_name,\n", + " cloud_hosted=True,\n", + " temporal=(\"2021-07-01\", \"2021-09-30\"),\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "c8081124-bcd6-49c6-a8f6-9de519fa484d", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Opening 18 granules, approx size: 0.16 GB\n", + "using endpoint: https://archive.podaac.earthdata.nasa.gov/s3credentials\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "0da73e8a66444a17960860366518a061", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "QUEUEING TASKS | : 0%| | 0/18 [00:00\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
<xarray.Dataset> Size: 299MB\n",
+       "Dimensions:      (Time: 18, Longitude: 2160, nv: 2, Latitude: 960)\n",
+       "Coordinates:\n",
+       "  * Longitude    (Longitude) float32 9kB 0.08333 0.25 0.4167 ... 359.8 359.9\n",
+       "  * Latitude     (Latitude) float32 4kB -79.92 -79.75 -79.58 ... 79.75 79.92\n",
+       "  * Time         (Time) datetime64[ns] 144B 2021-07-05T12:00:00 ... 2021-09-2...\n",
+       "Dimensions without coordinates: nv\n",
+       "Data variables:\n",
+       "    Lon_bounds   (Time, Longitude, nv) float32 311kB dask.array<chunksize=(1, 2160, 2), meta=np.ndarray>\n",
+       "    Lat_bounds   (Time, Latitude, nv) float32 138kB dask.array<chunksize=(1, 960, 2), meta=np.ndarray>\n",
+       "    Time_bounds  (Time, nv) datetime64[ns] 288B dask.array<chunksize=(1, 2), meta=np.ndarray>\n",
+       "    SLA          (Time, Latitude, Longitude) float32 149MB dask.array<chunksize=(1, 960, 2160), meta=np.ndarray>\n",
+       "    SLA_ERR      (Time, Latitude, Longitude) float32 149MB dask.array<chunksize=(1, 960, 2160), meta=np.ndarray>\n",
+       "Attributes: (12/21)\n",
+       "    Conventions:            CF-1.6\n",
+       "    ncei_template_version:  NCEI_NetCDF_Grid_Template_v2.0\n",
+       "    Institution:            Jet Propulsion Laboratory\n",
+       "    geospatial_lat_min:     -79.916664\n",
+       "    geospatial_lat_max:     79.916664\n",
+       "    geospatial_lon_min:     0.083333336\n",
+       "    ...                     ...\n",
+       "    version_number:         2205\n",
+       "    Data_Pnts_Each_Sat:     {"16": 743215, "1007": 674076}\n",
+       "    source_version:         commit 58c7da13c0c0069ae940c33a82bf1544b7d991bf\n",
+       "    SLA_Global_MEAN:        0.06428374482174487\n",
+       "    SLA_Global_STD:         0.0905195660534004\n",
+       "    latency:                final
" + ], + "text/plain": [ + " Size: 299MB\n", + "Dimensions: (Time: 18, Longitude: 2160, nv: 2, Latitude: 960)\n", + "Coordinates:\n", + " * Longitude (Longitude) float32 9kB 0.08333 0.25 0.4167 ... 359.8 359.9\n", + " * Latitude (Latitude) float32 4kB -79.92 -79.75 -79.58 ... 79.75 79.92\n", + " * Time (Time) datetime64[ns] 144B 2021-07-05T12:00:00 ... 2021-09-2...\n", + "Dimensions without coordinates: nv\n", + "Data variables:\n", + " Lon_bounds (Time, Longitude, nv) float32 311kB dask.array\n", + " Lat_bounds (Time, Latitude, nv) float32 138kB dask.array\n", + " Time_bounds (Time, nv) datetime64[ns] 288B dask.array\n", + " SLA (Time, Latitude, Longitude) float32 149MB dask.array\n", + " SLA_ERR (Time, Latitude, Longitude) float32 149MB dask.array\n", + "Attributes: (12/21)\n", + " Conventions: CF-1.6\n", + " ncei_template_version: NCEI_NetCDF_Grid_Template_v2.0\n", + " Institution: Jet Propulsion Laboratory\n", + " geospatial_lat_min: -79.916664\n", + " geospatial_lat_max: 79.916664\n", + " geospatial_lon_min: 0.083333336\n", + " ... ...\n", + " version_number: 2205\n", + " Data_Pnts_Each_Sat: {\"16\": 743215, \"1007\": 674076}\n", + " source_version: commit 58c7da13c0c0069ae940c33a82bf1544b7d991bf\n", + " SLA_Global_MEAN: 0.06428374482174487\n", + " SLA_Global_STD: 0.0905195660534004\n", + " latency: final" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ds = xr.open_mfdataset(earthaccess.open(results))\n", + "ds" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "893d2d8f-278d-4439-9698-9420d295bd6b", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
<xarray.DataArray 'SLA' (Time: 18, Latitude: 120, Longitude: 156)> Size: 1MB\n",
+       "dask.array<getitem, shape=(18, 120, 156), dtype=float32, chunksize=(1, 120, 156), chunktype=numpy.ndarray>\n",
+       "Coordinates:\n",
+       "  * Longitude  (Longitude) float32 624B 234.6 234.8 234.9 ... 260.1 260.2 260.4\n",
+       "  * Latitude   (Latitude) float32 480B 15.92 16.08 16.25 ... 35.42 35.58 35.75\n",
+       "  * Time       (Time) datetime64[ns] 144B 2021-07-05T12:00:00 ... 2021-09-28T...\n",
+       "Attributes:\n",
+       "    units:          m\n",
+       "    long_name:      Sea Level Anomaly Estimate\n",
+       "    standard_name:  sea_surface_height_above_sea_level\n",
+       "    alias:          sea_surface_height_above_sea_level
" + ], + "text/plain": [ + " Size: 1MB\n", + "dask.array\n", + "Coordinates:\n", + " * Longitude (Longitude) float32 624B 234.6 234.8 234.9 ... 260.1 260.2 260.4\n", + " * Latitude (Latitude) float32 480B 15.92 16.08 16.25 ... 35.42 35.58 35.75\n", + " * Time (Time) datetime64[ns] 144B 2021-07-05T12:00:00 ... 2021-09-28T...\n", + "Attributes:\n", + " units: m\n", + " long_name: Sea Level Anomaly Estimate\n", + " standard_name: sea_surface_height_above_sea_level\n", + " alias: sea_surface_height_above_sea_level" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ds_subset = ds['SLA'].sel(Latitude=slice(15.8, 35.9), Longitude=slice(234.5,260.5)) \n", + "ds_subset" + ] + }, + { + "cell_type": "markdown", + "id": "84f20657-a04a-45cd-a252-340d7b0a0c2a", + "metadata": {}, + "source": [ + "## Home directory\n", + "\n", + "Imagining this `ds_subset` object is now an important intermediate dataset, or the result of a complex analysis and we want to save it. Our default action might be to just save it to our `HOME` directory. This is simple, but we want to avoid this as it incurs significant storage costs, and using this data later will be slow." + ] + }, + { + "cell_type": "markdown", + "id": "1547aaca-cc37-49b4-8379-7186f83fdcec", + "metadata": {}, + "source": [ + "```python\n", + "ds_subset.to_netcdf(\"test.nc\") # avoid writing to home directory like this\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "2527d864-7297-4a98-9a6b-a0712a35a753", + "metadata": {}, + "source": [ + "## Use the s3fs package to interact with our S3 bucket.\n", + "\n", + "[s3fs](https://s3fs.readthedocs.io/en/latest/) is a Python library that allows us to interact with S3 objects in a file-system like manner." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "b9ded82a-9041-4f31-8eab-707e7d151dbf", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "s3://openscapeshub-scratch/ateucher\n", + "s3://openscapeshub-persistent/ateucher\n" + ] + } + ], + "source": [ + "# Create a S3FileSystem class\n", + "s3 = s3fs.S3FileSystem()\n", + "\n", + "# Get scratch and persistent buckets\n", + "scratch = os.environ[\"SCRATCH_BUCKET\"]\n", + "persistent = os.environ[\"PERSISTENT_BUCKET\"]\n", + "\n", + "print(scratch)\n", + "print(persistent)" + ] + }, + { + "cell_type": "markdown", + "id": "d173e6ae-ada8-497c-b395-9b68407d0f01", + "metadata": {}, + "source": [ + "Our user-specific directories in the two buckets aren't actually created until we put something in them, so if we try to check\n", + "for their existence or list their contents before they are created, we will get an error. We will use the `S3FileSystem.touch()` method to place a simple empty file called `.placeholder` in each one to bring them into existence." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "6b4de62d-e6bf-46d8-9aed-28d214ee31e4", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['openscapeshub-scratch/ateucher/.placeholder']" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "s3.touch(f\"{scratch}/.placeholder\")\n", + "\n", + "s3.ls(scratch)" + ] + }, + { + "cell_type": "markdown", + "id": "c96c523a-8ce9-4d18-a0b1-d776b661334b", + "metadata": {}, + "source": [ + "and in our persistent bucket:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "460dfc00-9dfc-4d5a-ac9f-9cb452b4fde9", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['openscapeshub-persistent/ateucher/.placeholder']" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "s3.touch(f\"{persistent}/.placeholder\")\n", + "\n", + "s3.ls(persistent)" + ] + }, + { + "cell_type": "markdown", + "id": "fa5dbcae-b45f-472b-acd1-33b91945e63c", + "metadata": {}, + "source": [ + "(Note that adding these placeholders isn't strictly necessary, as the first time you write anything to these buckets they will be created.)" + ] + }, + { + "cell_type": "markdown", + "id": "686b39a3-345e-44ab-a122-87b24060f7cf", + "metadata": {}, + "source": [ + "## Save dataset as netcdf file in SCRATCH bucket\n", + "\n", + "Next we can save `ds_subset` as a netcdf file in our scratch bucket. This involves writing to a temporary directory first, and then moving that to the `SCRATCH` bucket:" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "36bf4cee-e487-4917-a0de-46ace731ac03", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['openscapeshub-scratch/ateucher/.placeholder',\n", + " 'openscapeshub-scratch/ateucher/test123.nc']" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Where we want to store it:\n", + "scratch_nc_file_path = f\"{scratch}/test123.nc\"\n", + "\n", + "# Create a temporary intermediate file and save it to the bucket\n", + "with tempfile.NamedTemporaryFile(suffix = \".nc\") as tmp:\n", + " ds_subset.to_netcdf(tmp.name) # save it to a temporary file\n", + " s3.put(tmp.name, scratch_nc_file_path) # move that file to the scratch bucket\n", + "\n", + "# Ensure the file is there\n", + "s3.ls(scratch)" + ] + }, + { + "cell_type": "markdown", + "id": "213aefbc-9bd0-41a0-9a9e-b4c4f60cc960", + "metadata": {}, + "source": [ + "And we can open it to ensure it worked:" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "99d836c3-2045-4961-b6ff-d7c7bd4df688", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
<xarray.DataArray 'SLA' (Time: 18, Latitude: 120, Longitude: 156)> Size: 1MB\n",
+       "[336960 values with dtype=float32]\n",
+       "Coordinates:\n",
+       "  * Longitude  (Longitude) float32 624B 234.6 234.8 234.9 ... 260.1 260.2 260.4\n",
+       "  * Latitude   (Latitude) float32 480B 15.92 16.08 16.25 ... 35.42 35.58 35.75\n",
+       "  * Time       (Time) datetime64[ns] 144B 2021-07-05T12:00:00 ... 2021-09-28T...\n",
+       "Attributes:\n",
+       "    units:          m\n",
+       "    long_name:      Sea Level Anomaly Estimate\n",
+       "    standard_name:  sea_surface_height_above_sea_level\n",
+       "    alias:          sea_surface_height_above_sea_level
" + ], + "text/plain": [ + " Size: 1MB\n", + "[336960 values with dtype=float32]\n", + "Coordinates:\n", + " * Longitude (Longitude) float32 624B 234.6 234.8 234.9 ... 260.1 260.2 260.4\n", + " * Latitude (Latitude) float32 480B 15.92 16.08 16.25 ... 35.42 35.58 35.75\n", + " * Time (Time) datetime64[ns] 144B 2021-07-05T12:00:00 ... 2021-09-28T...\n", + "Attributes:\n", + " units: m\n", + " long_name: Sea Level Anomaly Estimate\n", + " standard_name: sea_surface_height_above_sea_level\n", + " alias: sea_surface_height_above_sea_level" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ds_from_scratch = xr.open_dataarray(s3.open(scratch_nc_file_path))\n", + "\n", + "ds_from_scratch" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "0f3d6a7e-a72e-4a8b-80de-24604e464683", + "metadata": {}, + "outputs": [ + { + "data": {}, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.holoviews_exec.v0+json": "", + "text/html": [ + "
\n", + "
\n", + "
\n", + "" + ], + "text/plain": [ + ":DynamicMap [Time]\n", + " :Image [Longitude,Latitude] (SLA)" + ] + }, + "execution_count": 11, + "metadata": { + "application/vnd.holoviews_exec.v0+json": { + "id": "p1004" + } + }, + "output_type": "execute_result" + } + ], + "source": [ + "ds_from_scratch.hvplot.image(x='Longitude', y='Latitude', cmap='RdBu', clim=(-0.5, 0.5), title=\"Sea Level Anomaly Estimate (m)\")" + ] + }, + { + "cell_type": "markdown", + "id": "e844b0f7-8478-4d54-af8c-75e2b831d05e", + "metadata": {}, + "source": [ + "## Move data to the persistent bucket\n", + "\n", + "If we decide this is a file we want to keep around for a longer time period, we can move it to our persistent bucket. We can even make a subdirectory in our persistent bucket to keep us organized:" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "e4ef6258-b7cd-4493-9dd4-7591775fa952", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['openscapeshub-scratch/ateucher/.placeholder']" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "persistent_dest_dir = f\"{persistent}/my-analysis-data/\"\n", + "\n", + "# Make directory in persistent bucket\n", + "s3.mkdir(persistent_dest_dir)\n", + "\n", + "# Move the file\n", + "s3.mv(scratch_nc_file_path, persistent_dest_dir)\n", + "\n", + "# Check the scratch and persistent bucket listings:\n", + "s3.ls(scratch)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "d6a41774-297c-4251-bbda-fb701858961f", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['openscapeshub-persistent/ateucher/.placeholder',\n", + " 'openscapeshub-persistent/ateucher/my-analysis-data']" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "s3.ls(persistent)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "b4ec6c5e-6bc8-4cc8-85a6-a39778131129", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['openscapeshub-persistent/ateucher/my-analysis-data/test123.nc']" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "s3.ls(persistent_dest_dir)" + ] + }, + { + "cell_type": "markdown", + "id": "e9bccda9-e106-4983-8d9f-020e429a970d", + "metadata": {}, + "source": [ + "## Move existing data from `HOME` to `PERSISTENT_BUCKET`\n", + "\n", + "You may already have some data in your `HOME` directory that you would like to move out to a persistent bucket. You can do that using the `awsv2 s3` command line tool, which is already installed on the hub. You can open a terminal from the Hub Launcher - it will open in your `HOME` directory. You can then use the `awsv2 s3 mv` command to move a file to your bucket.\n", + "\n", + "#### Move a single file from `HOME` to `PERSISTENT_BUCKET`:" + ] + }, + { + "cell_type": "markdown", + "id": "3650259e-0aad-4371-9b97-e9ddbff43e0e", + "metadata": {}, + "source": [ + "```bash\n", + "$ awsv2 s3 mv my-big-file.nc $PERSISTENT_BUCKET/ # The trailing slash is important here\n", + "move: ./my-big-file.nc to s3://openscapeshub-persistent/ateucher/my-big-file.nc\n", + "``` " + ] + }, + { + "cell_type": "markdown", + "id": "da9ce114-74d2-445b-af41-26a0d6d023cc", + "metadata": {}, + "source": [ + "#### Move a directory of data from `HOME` to `PERSISTENT`\n", + "\n", + "List the contents of the local `results-data` directory:\n", + "\n", + "```bash\n", + "$ ls results-data/\n", + "my-big-file1.nc my-big-file2.nc\n", + "```\n", + "\n", + "Use `awsv2 s3 mv` with the `--recursive` flag to move all files in a directory to a new directory in `PERSISTENT_BUCKET`\n", + "\n", + "```bash\n", + "$ awsv2 s3 mv --recursive results-data $PERSISTENT_BUCKET/results-data/\n", + "move: results-data/my-big-file1.nc to s3://openscapeshub-persistent/ateucher/results-data/my-big-file1.nc\n", + "move: results-data/my-big-file2.nc to s3://openscapeshub-persistent/ateucher/results-data/my-big-file2.nc\n", + "```" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.14" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}