Skip to content

Latest commit

 

History

History
108 lines (75 loc) · 5.92 KB

File metadata and controls

108 lines (75 loc) · 5.92 KB

Header

Data Analysis with Python

📄 Summary

This course involves using Python to explore many different types of data. It covers how to prepare data for analysis, perform simple statistical analysis, create meaningful data visualizations, predict future trends from data, and more. It concludes with a final assignment predicting of the market prices of houses based on a detailed dataset. Each notebook here is incredibly detailed, and they collectively show the full process of predictive analysis. Some topics, such as data wrangling, have additional associated notebooks, due to the breadth of content covered in this course.

📑 Main Topics

  • Importing datasets

    • Understanding the data
    • Importing and exporting data in Python
  • Working with different file format

    • Thus, it is mandatory for any data scientist (or data engineer) to be aware of different file formats, common challenges in handling them and the best, most efficient ways to handle this data in real life.
    • There are various formats for a dataset, .csv, .json, .xlsx etc. The dataset can be stored in different places, on your local machine or sometimes online.
  • Data wrangling

    • Identifying and handling missing values
    • Data formatting
    • Data normalization
    • Binning
    • Indicator variables
  • Exploratory Data Analysis

    • Summarizing main characteristics of the data
    • Gaining better understanding of the data set
    • Uncovering relationships between the variables
    • Extracting important variables
  • Model Development

    • Simple and Multiple Linear Regression
    • Model Evaluation Using Visualization
    • Polynomial Regression and Pipelines
    • R-squared and MSE for In-Sample Evaluation
    • Prediction and Decision Making
  • Model Evaluation and Refinement

    • Over-fitting, under-fitting and model selection
    • Ridge regression
    • GridSearch
    • Model refinement
  • Auto_EDA_Dataprep

    • DataPrep is an open-source library available for python that lets you prepare your data using a single library with only a few lines of code.
    • DataPrep can be used to address multiple data-related problems, and the library provides numerous features through which every problem can be solved and taken care of.

🔑 Key Skills Learned

  • Using Pandas, Numpy and Scipy libraries for data manipulation
  • Using Scikit-Learn to build smart models and make predictions
  • Building machine learning regression models
  • Building data pipelines

🛠️ Tools

The following tools were used to complete this certification:

(Python, Jupyter, GitHub, IBM Watson Studio, IBM Cloud Pak)

📖 Libraries

The following Python libraries were used throughout the certification:


cognitiveclass.ai logo

🏆 Certificates

To verify the certificates, click the images to follow the links.

Data Analysis with Python Issued by Coursera Authorized by IBM This badge earner has the core skills in Data Analysis using Python. They can readily clean, visualize and summarize data using Pandas. Using Scikit-learn, the earner can develop Data Pipelines, construct Machine learning models for Regression and evaluate these models.