Quick Consulting Examples

Collection of quick pandas, python, R, and other coding examples based on real consulting requests.

VoltStats Data Archive - Webscraping example

Scenario: This is an example of webscraping a website that containts 10 years of historical user-generated data that used OnStar to collect data about the performance of Chevy Volts driving in the real world.

Geopandas Discovery Project Example - How to create a heat map using geopandas

Scenario: This is an example consulting request from a Discovery project by an Undergraduate Student from Econ and Data Science requesting help with Data Visualization as a Debugging or Tech support request saying: "I would like to create a geopandas heat map of India (with coordinates and a legend of certain levels of GDP per capita), but I've never used geopandas before so a little unsure on how to create this mapping. Also unsure if I need to convert to a shape file."

JoinMulltipleCSV - How to join multiple CSV files into a single Pandas DataFrame based on a join key.

Scenario: Recording student scores for each class lecture, where the student email address and score is stored in a separate CSV file for each lecture.

StataFileVariableSearch - How to search Stata files that contain matching variable names.

Scenario: Loop over a directory tree containing Stata .dta files. Read the files into a pandas DataFrame and search for files that contain matching variable names. The result is a dictionary with the Stata filename as the key and the value is the variable names as a list (either full or narrowed just to the matches we're interested in).

See also the related non-notebook scripts. The finddta.py script essentially is a script-based copy of the notebook version above, and the scrubdta.py script takes the output of finddta.py as the input for producing a stata file that contains only the columns that match the variables we want to keep, which is useful to de-identify sensitive data.

Wrangle Psych Survey Data - How to manipulate survey data outputs to evaluate distributive qualities of text responses and create matrix "dummy variables".

Scenario: You are presented with survey data containing text string responses to questions. These responses are represent combinations (mulitple elements per observation), but are separated with a clear structure. Consult is looking for a way to evaluate the many combinations of user responses, and structure data in a form that would allow for regression analysis. The script presents how to access and organize string data, exploring frequency of responses, (and combinations of responses) on a sample set of observations. The script goes on to pivot cleaned string data to create dummy variables out of categorical symptom responses.

County-Level-Chloropleth-Map

Scenario Students would like to visualize a metric they have created with a chloropleth map. Use the following script to join data they have assembled with a shapefile of California at the County level. Use tmap package to plot map for presentation with a few options enabled.

Crop-spatial-points-with-shapefile - take a raw dataset of spatial points and initialize the CRS, and then crop with a shapefile.

Scenario: You are presented with a large spatial dataset of floral species in the continental United States. The researcher is only concerned with data mapped into the boundaries of the state of Florida. Dataset is presented without a Coordinate Reference System. Format the raw spatial data with a CRS, and use a shapefile of the state of Florida to crop only the points that land within its boundary.

Lasso-Variable-Importance - use tidymodels framework to structure, preprocess, and tune hyperparameters for a lasso regression analysis

Scenario: A student is hoping to run a lasso regression analysis on some data for their final project in a class. They have been working with the glmnet package but have encountered errors when formatting data for model preparation. Walk through the process of splitting and model preparation of data, along with bootstrapping and tuning grid approaches to hyperparameter optimization. use vip package to visualize feature importance from tidymodels object.

Network-Analysis-Visualization - How to visualize a social network with contact tracing data.

Scenario: Take a dataset recording "relationships" between cases and contacts during a COVID19 outbreak, implement complex join functions to wrangle and format dataframes to be handled by the VisNetwork package. Use the formatted dataframes to create an interactive html object that illustrates acyclic exposure events from cases to contacts.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
County-Level-Chloropleth-Map.zip		County-Level-Chloropleth-Map.zip
Crop-spatial-points-with-shapefile.Rmd		Crop-spatial-points-with-shapefile.Rmd
Geopandas Discovery Project Example.ipynb		Geopandas Discovery Project Example.ipynb
JoinMulltipleCSV.ipynb		JoinMulltipleCSV.ipynb
Lasso-Variable-Importance.Rmd		Lasso-Variable-Importance.Rmd
Network-Analysis-Visualization.Rmd		Network-Analysis-Visualization.Rmd
README.md		README.md
StataFileVariableSearch.ipynb		StataFileVariableSearch.ipynb
Table-Scrape-Example.Rmd		Table-Scrape-Example.Rmd
Volt Stats solution in Python.ipynb		Volt Stats solution in Python.ipynb
Volt Stats solution in R.ipynb		Volt Stats solution in R.ipynb
Wrangling-Psych-Survey-Data.zip		Wrangling-Psych-Survey-Data.zip
finddta.py		finddta.py
lecture23.csv		lecture23.csv
lecture4.csv		lecture4.csv
lecture5.csv		lecture5.csv
pdf_data_scrape.R		pdf_data_scrape.R
requirements.txt		requirements.txt
scrubdta.py		scrubdta.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quick Consulting Examples

VoltStats Data Archive - Webscraping example

Geopandas Discovery Project Example - How to create a heat map using geopandas

JoinMulltipleCSV - How to join multiple CSV files into a single Pandas DataFrame based on a join key.

StataFileVariableSearch - How to search Stata files that contain matching variable names.

Wrangle Psych Survey Data - How to manipulate survey data outputs to evaluate distributive qualities of text responses and create matrix "dummy variables".

County-Level-Chloropleth-Map

Crop-spatial-points-with-shapefile - take a raw dataset of spatial points and initialize the CRS, and then crop with a shapefile.

Lasso-Variable-Importance - use tidymodels framework to structure, preprocess, and tune hyperparameters for a lasso regression analysis

Network-Analysis-Visualization - How to visualize a social network with contact tracing data.

Next Example Goes Here...

About

Releases

Packages

Contributors 2

Languages

dlab-consulting/quick-consulting-examples

Folders and files

Latest commit

History

Repository files navigation

Quick Consulting Examples

VoltStats Data Archive - Webscraping example

Geopandas Discovery Project Example - How to create a heat map using geopandas

JoinMulltipleCSV - How to join multiple CSV files into a single Pandas DataFrame based on a join key.

StataFileVariableSearch - How to search Stata files that contain matching variable names.

Wrangle Psych Survey Data - How to manipulate survey data outputs to evaluate distributive qualities of text responses and create matrix "dummy variables".

County-Level-Chloropleth-Map

Crop-spatial-points-with-shapefile - take a raw dataset of spatial points and initialize the CRS, and then crop with a shapefile.

Lasso-Variable-Importance - use tidymodels framework to structure, preprocess, and tune hyperparameters for a lasso regression analysis

Network-Analysis-Visualization - How to visualize a social network with contact tracing data.

Next Example Goes Here...

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages