GitHub - tk-randhawa/Predictive-Modelling-and-Clustering: Project for a previous assignment

This project was for a previous data science assignment -- see the instructions for the assignment below. The task was very vague, and no context on the data was provided, so it was a very interesting project!

INSTRUCTIONS:
• Please complete the below exercises using Python in a Jupyter notebook.
• For the SQL exercises (#3), include your query and results. You do not need to query this dataset directly from the notebook.

Consider data set 1 (ds1.csv). The data set comprises features (the Five xs) along with three sequences that may or may not be generated from the features (3 ys). a. Describe the data set in a few sentences. E.g. What are the distributions of each feature? Summary statistics? b. Try to come up with a predictive model, e.g. y = f(x_1 , ... , x_n) for each y sequence. Describe your models and how you came up with them. What (if any) are the predictive variables? How good would you say each of your models is?
Consider data set 2 (ds2.csv). The dataset comprises a set of observations that correspond to multiple groups. a. Describe the data in a few sentences b. How would you visualize this data set? c. Can you identify the number of groups in the data and assign each row to its group? d. Can you create a good visualization of your groupings?
Stack Overflow provides a tool at https://data.stackexchange.com/stackoverflow/query/new that allows SQL queries to be run against their data. After reviewing the database schema provided on their site, please answer the questions below by providing both your answer and the query used to derive it. a. How many posts were created in 2017? b. What post/question received the most answers? c. For posts created in 2020, what were the top 10 tags?

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Instructions.pdf		Instructions.pdf
README.md		README.md
Tejpal_Randhawa_DS_Sample_Project.ipynb		Tejpal_Randhawa_DS_Sample_Project.ipynb
ds1 (1).csv		ds1 (1).csv
ds2 (1).csv		ds2 (1).csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

tk-randhawa/Predictive-Modelling-and-Clustering

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages