This is a simple and evolving class for learning Python 3.* for use with data analysis, plotting, and some other more scientific processing.
To use Python for this class you will need to use Python 3.7 or greater and some additional libraries. Getting these libraries installed can be trivial or sometimes can be confusing. Hopefully we can walk though getting you set up with little effort.
This course is designed to work with Jupyter Notebooks. You can run a Jupter notebook locally or through a JupyterHub website. But the Jupyter Notebook is stored in the repo with the cells executed. This will let you use this repo as a reference as well as a training tool. There is no need to memorize the specific syntax. Just come back here and look it up. It will come eventually with use.
Jupyter Notebooks are an interactive method to run Python code and have rich text mixed together to provide the best experience.
You will need to use the revision control software Git to download the class materials.
All the teaching Notebooks are in a GitHub repo. To get them enter the following commands in the terminal: (Mac users may be prompted to install xcode command line tools, choose install)
> cd (or change into a different directory to clone files)
> git clone https://github.com/kenkehoe/AtmosphericPythonCourse.git
> cd AtmosphericPythonCourse
> ls (Windows: dir)
As a last check to ensure everything is ready, run one Notebook from the GitHub repo you just cloned. This will check if all the libraries you need for the class are installed. Open test_requirements.ipynb and execute the code block to check if you have all the necessary libraries installed. If not follow the instructions.
Eventually, you will want some way to edit the Python files you develop. For now you can ignore this section and come back to this when you want to write a Python script file and execute from the command line. You can use the text editor of your choosing. Some options include VI (or MacVIM for Mac), Emacs, TextEdit (Mac), TextWrangler (Mac), Notepad (Windows), Notepad++ (Windows), Sublime Text, … It does not matter which one you choose but don’t use something like Microsoft Word. It will not make you happy.
Once you have a basic start to Python you can start working through the advanced libraries. Below is a suggestion for which files to review and the order. You should not save the files after editing, or if you do want to save a change save to a different name. If you do make a change and accidentally save the Notebook files you can revert any of the files back to their original state by using the git command
> git checkout <name of file to revert back>
This will overwrite your current file with any changes to the one stored in the Git repo. If you have a file that is no longer working and you can't figure out why, you can copy that file to a different name and then checkout the original. This will allow comparing to see how they differ to find the error.
There is no required order to go through this class, but this list is a pretty good order. You are free to skip over the sections that do not pertain right now, but you should be exposed to all of these topics at some point so you understand the suggested Python solutions.
- Introduction to Python
- Numpy, (which is a subset of SciPy) - To work with arrays of data efficiently = Scientific_Libraries_Numpy.ipynb
- Plotting with matplotlib = make_plot.ipynb
- Pandas - To store/read data and use some powerful tools with the data = scientific_libraries/Scientific_Libraries_Pandas.ipynb
- Xarray - To store/read data and use some powerful tools with the data over 1-D dimensionality = Scientific_Libraries_Xarray.ipynb
- Advanced plotting with matplotlib (also use Pandas and Xarray wrapper around matplotlib) = make_plot_real_data.ipynb
- Advanced Xarray to start working with data and perform analysis = Scientific_Libraries_Xarray_2.ipynb
- Metadata with JSON and YAML = use_json.ipynb & use_yaml.ipynb
- Handling paths and filepaths pathlib.Path = path_stuff.ipynb
- Saving data with Numpy = data_save.ipynb
- Atmospheric data Community Toolkit (ACT)
- ACT Examples
- Download data
- Plot Data
- Reading, QC, formatting, plotting with ACT
- Plot with preprocessing
- Intro to QC and ACT = ACT_QC.ipynb
- Work with and Plot QC = plot_arm_qc.py
- Building your own QC = ACT_build_your_own_QC.ipynb
- Download, read and plot NOAA SurfRad data = plot_surfrad.py
- Download, read and plot ANL Sodar data = plot_sodar.py
- Retrieve stability indicies from a sounding
- Cloud Base Height Retrievals
- Example to merge multiple data products into one using ACT.
- ARM Tutorials
- Requested Examples
- Using Python Collections extension for basic data structures = collections.ipynb
- Multiprocessing larger data blocks with dask = Scientific_Libraries_Dask.ipynb
- Creating multiprocessing child processes for faster concurrentcy = data_multiprocessing.ipynb
- Tip of the iceberg with regular expressions = regular_expressions.ipynb
- Logging = logging.ipynb
- How to set up a working program with a command line script and library functions = Python workflow
- Managing versions and libraries in a larger process = Manage Libraries
- What to do when the code fails = exception_handling.ipynb
- Testing you code = python_testing.ipynb
It is not required but we will encourage using a standard syntax for our Python files. All the Python example codes use pep8. Your code will run without following this formatting standard, but for sharing the code with others, getting used to a standard format will make everyone’s life better.
A nice way to see if your code is following the standard style is to use the flake8 command line tool.
> flake8 --max-line-length=115 my_python_program.py
If something does not meet the standards it will give you the line number and a short description of the issue. It just takes practice to understand the style and codes.