Hi, I'm Pat Mendoza, a recent grad from Cornell. I enjoy figuring out ways to integrate and present data in an aesthetically pleasing and informative format. This often involves transforming files in python then integrating them and creating visualizations in R.
I've worked in SQL, R, and Python for several years and am currently working on uploading some examples of my work.
Here are a few that I have so far (more upcoming):
- Google Data Analytics Professional Certificate Capstone Project - Tableau Link
- This is the final project for my certificate. I extract viewership data from an API and convert the JSON format into tabular data which gets loaded onto kaggle. Then I created a viz on Tableau that allows users to explore the data.
- Extracting the data - here I go through the code that allows an extraction of data from MyAnimeList via their API and convert them into tables.
- Cleaning the data - here I go through my cleaning process for the data so that I ensure that its ready for loading into Tableau and Kaggle.
- R
- Mirrorplot - this is creating a simple mirrorplot that can be good visualization for showing up/down regulated genes in an RNA-seq.
- Clustering Samples - clustering is a common exercise to determine how closely samples are related to eachother. This shows how samples can be clustered using a PCoA and PCA and visualizing using ggplot.
- Data Wrangling with tidyr and dplyr - converting and integrating data from multiple sources is often tricky business. Luckily there are some great tools available that make this a breeze.
- Python
- Converting files in non-tabular to tabular format - Oftentimes, we come across data that isn't in the form that we need to make joins, when that happens, we can convert those using simple python scripts
- Data Wrangling with pandas and numpy - This is a replica of the above R data wrangling but using python with pandas and numpy in place of tidyr and dplyr
- Misc Python (in development) - some misc scripts that I used to add in RNA-seq generation.