- Taught by Jose Portilla, course site:
https://www.udemy.com/python-for-data-science-and-machine-learning-bootcamp/
- This repo contains Jupyter notebook for exercise and projects from this Udemy course.
- This course offers fundamentals of Data Science and Machine Learning using python. This is my learning process to understand Python and hopefully move towards PySpark and SparkR, alongside Apache Airflow and Apache Kafka.
- Python for Data Analysis using NumPy
- Schema Evolutions
- Python for Data Analysis - Pandas
- Python for Data Visualization - Matplotlib
- Python for Data Visualization - Seaborn
- Linear Regression and other Machine learning algorithms
- Web Scrapping using BeautifulSoup4
- Big data with Spark
Using MatplotLib package in Python for data visualization. Matplotlib is similar to ggplot2 in R which is a widely used package for data visualization.
https://github.com/dev-pasa/PythonForData/blob/master/MatPlotlibForVisual.ipynb
- Machine Learning with PySpark by Pramod Singh (Apress, 2019).
https://www.amazon.com/Machine-Learning-PySpark-Processing-Recommender/dp/1484241304