-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Jitendra (Jitu) Kumar
committed
Jun 11, 2019
1 parent
b3ef3d6
commit 69dd167
Showing
3 changed files
with
245 additions
and
23 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,220 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"slideshow": { | ||
"slide_type": "slide" | ||
} | ||
}, | ||
"source": [ | ||
"# **Data Movement**\n", | ||
"- Big data science using ARM data requires access to large volumes of observations.\n", | ||
"- Data search and download can be an daunting and time taking task.\n", | ||
"- ARM provides custom developed services to allow seamless access to ARM data archives.\n", | ||
"- These services are designed for easy use in a programmatic environment.\n", | ||
"- Our custom tools alleviates some of these by:\n", | ||
" - integrating search and download\n", | ||
" - handling authentication, security etc.\n", | ||
" - command line and python API's to enable fully automated workflows" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"slideshow": { | ||
"slide_type": "slide" | ||
} | ||
}, | ||
"source": [ | ||
"# **Tools for data movement**\n", | ||
"\n", | ||
"|ARM Live Data Web Service | \"stage_arm_data\"|\n", | ||
"|--------------------|-----------------|\n", | ||
"| Web service |Globus based transfer|\n", | ||
"| Works from any where in the world | Custom tool available only on ARM HPC clusters |\n", | ||
"| Access to all user accessible data | Access to **ALL** ARM data, including raw |\n", | ||
"| Download only | Also allow data movement within ARM clusters and ADC |\n", | ||
"| HTTP download driven by your internet speed|Fast transfers over dedicated Infiniband network|\n", | ||
"| Allow search and download based on simple queries | Allow search and download based on simple queries |\n", | ||
"| Command line, Python API | Command line, Python API |\n", | ||
"| Enable fully portable application | Applications would be limited to ARM cluster |" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"slideshow": { | ||
"slide_type": "slide" | ||
} | ||
}, | ||
"source": [ | ||
"# **ARM Live Data Web Service**\n", | ||
"\n", | ||
"**URL: https://adc.arm.gov/armlive**\n", | ||
"\n", | ||
"Developed and maintained by: Ranjeet Devarakonda and ADC Web Tools Team\n", | ||
"\n", | ||
"Provides: REST API, Command line (Wget, Curl), Bash scripting, Python API" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"slideshow": { | ||
"slide_type": "slide" | ||
} | ||
}, | ||
"source": [ | ||
"## **Using REST API**\n", | ||
"`https://adc.arm.gov/armlive/data/query?user=USER_ID:ACCESS_TOKEN&ds=DATASTREAM&start=START_DATE&end=END_DATE`\n", | ||
"\n", | ||
"## **Command line download using Curl**\n", | ||
"`$ wget 'https://adc.arm.gov/armlive/data/saveData?user=example:abcd1234&file=sgpmetE11.b1.20180101.000000.cdf'`\n", | ||
"\n", | ||
"## **Command line download using Wget**\n", | ||
"`$ curl 'https://adc.arm.gov/armlive/data/saveData?user=example:abcd1234&file=sgpmetE11.b1.20180101.000000.cdf'`\n", | ||
"\n", | ||
"## **Using Python API**\n", | ||
"\n", | ||
"Installation: `$ pip install git+https://code.ornl.gov/ofg/armlive_getfiles.git`\n", | ||
"\n", | ||
"Execution: `$ getARMFiles -u example:abcd1234 -ds sgpmetE11.b1 -s 2018-01-01 -e 2018-02-01`\n", | ||
"\n", | ||
"OR \n", | ||
"\n", | ||
"Installation: `$ git clone https://code.ornl.gov/ofg/armlive_getfiles.git`\n", | ||
"\n", | ||
"Execution: `$ python armlive_getfiles/src/getFiles.py -u example:abcd1234 -ds sgpmetE11.b1 -s 2018-01-01 -e 2018-02-01`\n", | ||
"\n", | ||
"## **Using bash scripting**\n", | ||
"Bash script: `https://adc.arm.gov/armlive/scripts/getFiles.sh`\n", | ||
"\n", | ||
"Execution: `$ bash getFiles.sh example:abcd1234 sgpmetE11.b1 2018-01-01 2018-02-01`" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"slideshow": { | ||
"slide_type": "slide" | ||
} | ||
}, | ||
"source": [ | ||
"## **This Python function parses the JSON blob and downloads the responsive files into the output directory.**\n", | ||
"The ARMLive web service returns a JSON blob with download links for archive files based on the datastream, start, and end dates provided. \n", | ||
"\n", | ||
"\n", | ||
"```python\n", | ||
"def download_arm_files(user, token, datastream, start, end, output_directory):\n", | ||
" params = {\n", | ||
" 'user': f'{user}:{token}',\n", | ||
" 'ds': datastream,\n", | ||
" 'start': start,\n", | ||
" 'end': end,\n", | ||
" 'wt': 'json',\n", | ||
" }\n", | ||
"\n", | ||
"# print(params)\n", | ||
" response = requests.get('https://adc.arm.gov/armlive/livedata/query', params=params)\n", | ||
"# print(response.url)\n", | ||
" response = response.json()\n", | ||
"# print(response)\n", | ||
" downloaded_files = []\n", | ||
" for filename in response['files']:\n", | ||
" download_url = f'https://adc.arm.gov/armlive/livedata/saveData?user={user}:{token}&file={filename}'\n", | ||
" file_path = Path(output_directory, filename)\n", | ||
" file_path.parent.mkdir(parents=True, exist_ok=True)\n", | ||
" with requests.get(download_url, stream=True) as r:\n", | ||
" with open(file_path, 'wb') as f:\n", | ||
" shutil.copyfileobj(r.raw, f)\n", | ||
" downloaded_files.append(file_path)\n", | ||
" return downloaded_files\n", | ||
"```" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"slideshow": { | ||
"slide_type": "slide" | ||
} | ||
}, | ||
"source": [ | ||
"# stage_arm_data\n", | ||
"\n", | ||
"Developed and maintained by: Zach Price, Jitu Kumar and ARM HPC Team\n", | ||
"\n", | ||
"- stage_arm_data uses Globus protocols to\n", | ||
" - query ARM database\n", | ||
" - identify the files as per search criteria\n", | ||
" - download from ARM archive\n", | ||
" - handles all security and authentications\n", | ||
"- data is always stages in project shared area on Lustre filesystem\n", | ||
"- Currently stage_arm_data is only avaliable from the login and compute nodes of Stratus. \n", | ||
"- They are not available from Jupyter notebook. The recommended workaround in the meantime is to open a web terminal in Jupyter, ssh to stratus.ornl.gov, and stage data.\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"slideshow": { | ||
"slide_type": "slide" | ||
} | ||
}, | ||
"source": [ | ||
"# stage_arm_data\n", | ||
"\n", | ||
"Add to your environment: `module load stage_arm_data`\n", | ||
"\n", | ||
"Data transfer command: `stage_arm_data --to Stratus --datastream corkasacrcfrhsrhiM1.a1 --start 2019-01-01 00:00:00 --end 2019-02-01 00:00:00`\n", | ||
"\n", | ||
"Files will be staged at: `/lustre/or-hydra/cades-arm/proj-shared/data_transfer/cor/corkasacrcfrhsrhiM1.a1/`\n", | ||
"\n", | ||
"From within a Python script:\n", | ||
"```python\n", | ||
"from stage_arm_data.core import transfer_files\n", | ||
"from stage_arm_data.endpoints import Stratus\n", | ||
"\n", | ||
"constraints = {\n", | ||
" 'start_time': 1552608000,\n", | ||
" 'end_time': 1552694400,\n", | ||
" 'datastream': 'corkasacrcfrhsrhiM1.a1'\n", | ||
"}\n", | ||
"\n", | ||
"transfer_files(constraints, Stratus)\n", | ||
"```\n", | ||
"\n", | ||
"Use `--dry-run=True` option in constraints to list available files without transferring." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.6.7" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
Oops, something went wrong.