diff --git a/0_Introduction/TutorialIntroduction.ipynb b/0_Introduction/TutorialIntroduction.ipynb index 60d8e81..9318cd4 100644 --- a/0_Introduction/TutorialIntroduction.ipynb +++ b/0_Introduction/TutorialIntroduction.ipynb @@ -155,7 +155,9 @@ "\n", "All materials used in today's tutorial are available at https://github.com/ARM-DOE/armhpc-tutorial\n", "\n", - "Get a copy: `git clone https://github.com/ARM-DOE/armhpc-tutorial`" + "Get a copy: `git clone https://github.com/ARM-DOE/armhpc-tutorial`\n", + "\n", + "Login: https://arm-jupyter.ornl.gov" ] }, { diff --git a/2_DataMovement/DataMovement.ipynb b/2_DataMovement/DataMovement.ipynb new file mode 100644 index 0000000..c184eb2 --- /dev/null +++ b/2_DataMovement/DataMovement.ipynb @@ -0,0 +1,220 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# **Data Movement**\n", + "- Big data science using ARM data requires access to large volumes of observations.\n", + "- Data search and download can be an daunting and time taking task.\n", + "- ARM provides custom developed services to allow seamless access to ARM data archives.\n", + "- These services are designed for easy use in a programmatic environment.\n", + "- Our custom tools alleviates some of these by:\n", + " - integrating search and download\n", + " - handling authentication, security etc.\n", + " - command line and python API's to enable fully automated workflows" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# **Tools for data movement**\n", + "\n", + "|ARM Live Data Web Service | \"stage_arm_data\"|\n", + "|--------------------|-----------------|\n", + "| Web service |Globus based transfer|\n", + "| Works from any where in the world | Custom tool available only on ARM HPC clusters |\n", + "| Access to all user accessible data | Access to **ALL** ARM data, including raw |\n", + "| Download only | Also allow data movement within ARM clusters and ADC |\n", + "| HTTP download driven by your internet speed|Fast transfers over dedicated Infiniband network|\n", + "| Allow search and download based on simple queries | Allow search and download based on simple queries |\n", + "| Command line, Python API | Command line, Python API |\n", + "| Enable fully portable application | Applications would be limited to ARM cluster |" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# **ARM Live Data Web Service**\n", + "\n", + "**URL: https://adc.arm.gov/armlive**\n", + "\n", + "Developed and maintained by: Ranjeet Devarakonda and ADC Web Tools Team\n", + "\n", + "Provides: REST API, Command line (Wget, Curl), Bash scripting, Python API" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## **Using REST API**\n", + "`https://adc.arm.gov/armlive/data/query?user=USER_ID:ACCESS_TOKEN&ds=DATASTREAM&start=START_DATE&end=END_DATE`\n", + "\n", + "## **Command line download using Curl**\n", + "`$ wget 'https://adc.arm.gov/armlive/data/saveData?user=example:abcd1234&file=sgpmetE11.b1.20180101.000000.cdf'`\n", + "\n", + "## **Command line download using Wget**\n", + "`$ curl 'https://adc.arm.gov/armlive/data/saveData?user=example:abcd1234&file=sgpmetE11.b1.20180101.000000.cdf'`\n", + "\n", + "## **Using Python API**\n", + "\n", + "Installation: `$ pip install git+https://code.ornl.gov/ofg/armlive_getfiles.git`\n", + "\n", + "Execution: `$ getARMFiles -u example:abcd1234 -ds sgpmetE11.b1 -s 2018-01-01 -e 2018-02-01`\n", + "\n", + "OR \n", + "\n", + "Installation: `$ git clone https://code.ornl.gov/ofg/armlive_getfiles.git`\n", + "\n", + "Execution: `$ python armlive_getfiles/src/getFiles.py -u example:abcd1234 -ds sgpmetE11.b1 -s 2018-01-01 -e 2018-02-01`\n", + "\n", + "## **Using bash scripting**\n", + "Bash script: `https://adc.arm.gov/armlive/scripts/getFiles.sh`\n", + "\n", + "Execution: `$ bash getFiles.sh example:abcd1234 sgpmetE11.b1 2018-01-01 2018-02-01`" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## **This Python function parses the JSON blob and downloads the responsive files into the output directory.**\n", + "The ARMLive web service returns a JSON blob with download links for archive files based on the datastream, start, and end dates provided. \n", + "\n", + "\n", + "```python\n", + "def download_arm_files(user, token, datastream, start, end, output_directory):\n", + " params = {\n", + " 'user': f'{user}:{token}',\n", + " 'ds': datastream,\n", + " 'start': start,\n", + " 'end': end,\n", + " 'wt': 'json',\n", + " }\n", + "\n", + "# print(params)\n", + " response = requests.get('https://adc.arm.gov/armlive/livedata/query', params=params)\n", + "# print(response.url)\n", + " response = response.json()\n", + "# print(response)\n", + " downloaded_files = []\n", + " for filename in response['files']:\n", + " download_url = f'https://adc.arm.gov/armlive/livedata/saveData?user={user}:{token}&file={filename}'\n", + " file_path = Path(output_directory, filename)\n", + " file_path.parent.mkdir(parents=True, exist_ok=True)\n", + " with requests.get(download_url, stream=True) as r:\n", + " with open(file_path, 'wb') as f:\n", + " shutil.copyfileobj(r.raw, f)\n", + " downloaded_files.append(file_path)\n", + " return downloaded_files\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# stage_arm_data\n", + "\n", + "Developed and maintained by: Zach Price, Jitu Kumar and ARM HPC Team\n", + "\n", + "- stage_arm_data uses Globus protocols to\n", + " - query ARM database\n", + " - identify the files as per search criteria\n", + " - download from ARM archive\n", + " - handles all security and authentications\n", + "- data is always stages in project shared area on Lustre filesystem\n", + "- Currently stage_arm_data is only avaliable from the login and compute nodes of Stratus. \n", + "- They are not available from Jupyter notebook. The recommended workaround in the meantime is to open a web terminal in Jupyter, ssh to stratus.ornl.gov, and stage data.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# stage_arm_data\n", + "\n", + "Add to your environment: `module load stage_arm_data`\n", + "\n", + "Data transfer command: `stage_arm_data --to Stratus --datastream corkasacrcfrhsrhiM1.a1 --start 2019-01-01 00:00:00 --end 2019-02-01 00:00:00`\n", + "\n", + "Files will be staged at: `/lustre/or-hydra/cades-arm/proj-shared/data_transfer/cor/corkasacrcfrhsrhiM1.a1/`\n", + "\n", + "From within a Python script:\n", + "```python\n", + "from stage_arm_data.core import transfer_files\n", + "from stage_arm_data.endpoints import Stratus\n", + "\n", + "constraints = {\n", + " 'start_time': 1552608000,\n", + " 'end_time': 1552694400,\n", + " 'datastream': 'corkasacrcfrhsrhiM1.a1'\n", + "}\n", + "\n", + "transfer_files(constraints, Stratus)\n", + "```\n", + "\n", + "Use `--dry-run=True` option in constraints to list available files without transferring." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/4_Examples/cloud_vap_demo.ipynb b/4_Examples/cloud_vap_demo.ipynb index 275dbf2..a7baa45 100644 --- a/4_Examples/cloud_vap_demo.ipynb +++ b/4_Examples/cloud_vap_demo.ipynb @@ -2,20 +2,20 @@ "cells": [ { "cell_type": "code", - "execution_count": 1, + "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "# Add ARM Live credentials here\n", - "user = \"Jitendra.Kumar\"\n", - "credential = \"3dc7f653f422b302\"\n", + "user = \"your ARM username\"\n", + "credential = \"your credentials\"\n", "\n", "# Get your credentials at https://adc.arm.gov/armlive/" ] }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 17, "metadata": {}, "outputs": [], "source": [ @@ -39,7 +39,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 18, "metadata": {}, "outputs": [], "source": [ @@ -116,7 +116,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 19, "metadata": {}, "outputs": [], "source": [ @@ -134,7 +134,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 20, "metadata": {}, "outputs": [], "source": [ @@ -160,7 +160,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 21, "metadata": {}, "outputs": [], "source": [ @@ -186,7 +186,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 22, "metadata": {}, "outputs": [], "source": [ @@ -210,7 +210,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 23, "metadata": {}, "outputs": [], "source": [ @@ -231,7 +231,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 24, "metadata": {}, "outputs": [], "source": [ @@ -248,7 +248,7 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 25, "metadata": {}, "outputs": [], "source": [ @@ -271,7 +271,7 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 26, "metadata": {}, "outputs": [], "source": [ @@ -303,7 +303,7 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 27, "metadata": {}, "outputs": [ { @@ -506,8 +506,8 @@ "name": "stdout", "output_type": "stream", "text": [ - "CPU times: user 3min 34s, sys: 23 s, total: 3min 57s\n", - "Wall time: 2min 45s\n" + "CPU times: user 3min 49s, sys: 19.9 s, total: 4min 9s\n", + "Wall time: 2min 58s\n" ] } ], @@ -529,7 +529,7 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 28, "metadata": {}, "outputs": [ { @@ -559,7 +559,7 @@ "" ] }, - "execution_count": 13, + "execution_count": 28, "metadata": {}, "output_type": "execute_result" } @@ -573,7 +573,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 29, "metadata": {}, "outputs": [ { @@ -582,7 +582,7 @@ "3" ] }, - "execution_count": 14, + "execution_count": 29, "metadata": {}, "output_type": "execute_result" } @@ -593,13 +593,13 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { - "model_id": "ae1939419fe444dca364d8a1dab324d8", + "model_id": "d527819023014e72bb34ba497d1e3112", "version_major": 2, "version_minor": 0 },