Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added ML track #41

Merged
merged 1 commit into from
Jul 16, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 18 additions & 11 deletions soa/tracks/ml/1.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,20 @@
## Example
# ML Track
Welcome to the ML track. We hope you're really excited for this.
For starters we'll brush up your Python Skills. This includes your understanding of
- [Numpy](https://numpy.org/)
- [Pandas](https://pandas.pydata.org/)
- [Matplotlib](https://matplotlib.org/)

### Click [here](https://github.com/kabirnagpal/SoA-ML-14/blob/master/week%201.ipynb) to view the Jupyter-Notebook.
If you don't have any Python Environment, you can also try the code in [Google Colab](https://colab.research.google.com/).

Nothing to see here yet. Example code.

<form method='POST'>
<input name='answer'>
<input type='submit' value='Submit'>
<code class='code_checker'>
def answer(s):
return '=' == s.replace(' ', '').strip()
</code>
</form>
How to get mean of each column in a Data Frame named `df`?
Please write the full command. ( answer is case sensitive )
<form method='POST'>
<input name='answer'>
<input type='submit' value='Submit'>
<code class='code_checker'>
def answer(s):
return s == 'df.mean()'
</code>
</form>
28 changes: 28 additions & 0 deletions soa/tracks/ml/2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# ML Track - Week 2
We hope you're really excited to get started with actual Machine Learning. But just hold on!!
A big problem in machine learning algorithms is that, they're not humans. They are just bunch of formulas being applied in a loop of conditional statements.
So it cannot handle certain types of data like Strings. Also it will not be able to handle missing values.
These concept were discussed in last week tracks, and now is the time to learn in depth.
This week we'll learn about:

- One Hot encoding
- Label Encoding
- Normalization
- Dealing with Missing values
- Introduction to Machine learning
- Types of Learning (Supervised, Unsupervised and Reinforcement)
- Application of Machine Learning

### Click [here](https://github.com/kabirnagpal/SoA-ML-14/blob/master/week%202.ipynb) to view the Jupyter-Notebook.
If you don't have any Python Environment, you can also try the code in [Google Colab](https://colab.research.google.com/).


Write the command to One Hot encode Column named 'company' using pandas function on data frame `df`.
<form method='POST'>
<input name='answer'>
<input type='submit' value='Submit'>
<code class='code_checker'>
def answer(s):
return s == "df['company'].get_dummies()"
</code>
</form>
24 changes: 24 additions & 0 deletions soa/tracks/ml/3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# ML Track - Week 3
Congratulations for making upto here!
As now we've completed the preprocessing methods, we can start with Machine Learing Algorithms.
We'll start with **Regression**.
Regression analysis is a supervised method, used to predict **Continous**, **Independent** variable using dependent variables.
This week will require you to have prior knowledge in linear, quadratic and polynomial equations.
This week we'll learn about:

- Linear Regression
- Multiple Linear Regression
- Polynomial Regression

### Click [here](https://github.com/kabirnagpal/SoA-ML-14/blob/master/week%203.ipynb) to view the Jupyter-Notebook.
If you don't have any Python Environment, you can also try the code in [Google Colab](https://colab.research.google.com/)

`mean_squared_error` is a method from which class in Sklearn?
<form method='POST'>
<input name='answer'>
<input type='submit' value='Submit'>
<code class='code_checker'>
def answer(s):
return s.lower() == 'metrics'
</code>
</form>
28 changes: 28 additions & 0 deletions soa/tracks/ml/4.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# ML Track - Week 4
This week we are going to learn a type of widely used supervised machine learning algorithm - **Classification**.

Classification is the process of predicting the class of given data points.

For example, spam detection in email service providers can be identified as a classification problem. This is a binary classification since there are only 2 classes : spam and not spam. A classifier utilizes some training data to understand how given input variables relate to the class.

In this week, we will cover the following classifier algorithnms:

- Support Vector Classifier (SVC)
- Decision Tree Classifier
- Random Forest Classifier
- Voting Classifier

### Click [here](https://github.com/kabirnagpal/SoA-ML-14/blob/master/week%202.ipynb) to view the Jupyter-Notebook.
If you don't have any Python Environment, you can also try the code in [Google Colab](https://colab.research.google.com/)


#### What is the number of estimators used for Random Forest Classifier?
<form method='POST'>
<input name='answer'>
<input type='submit' value='Submit'>
<code class='code_checker'>

def answer(s):
return s=='200'
</code>
</form>
31 changes: 31 additions & 0 deletions soa/tracks/ml/5.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# ML Track - Week 5
Congratulations, You have come mid-way!
Now, let's learn how good or bad our model is performing and why?

Topics covered in this week:
- Underfitting
- Overfitting
- Bias Variance Trade-off
- Regularization
- Support Vector Machine

### Click [here](https://github.com/kabirnagpal/SoA-ML-14/blob/master/week%206.ipynb) to view the Jupyter-Notebook.
If you don't have any Python Environment, you can also try the code in [Google Colab](https://colab.research.google.com/)

I hope this that week would have proven useful to you and let's wind it up with a quick question .

Ques) In terms of the bias-variance trade-off, which of the following is/are substantially more harmful to the test error than the training error? (Input the correct option)
a) Bias
b) Loss
c) Variance
d) Risk


<form method='POST'>
<input name='answer'>
<input type='submit' value='Submit'>
<code class='code_checker'>
def answer(s):
return s.lower() == 'c'
</code>
</form>
30 changes: 30 additions & 0 deletions soa/tracks/ml/6.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# ML Track - Week 6
Congratulations,
You have come a long way! Till now we have been working on supervised machine learning , so now gear up for the first chapter of unsupervised machine learning - Clustering .

Clustering is basically the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group and dissimilar to the data points in other groups.

So in this week we are going to dive deep into the clustering and cover the following topics:

- what is clustering
- Difference between clustering and clasification
- K-means clustering
-- Silhouette Score
- Hierarchical clustering


### Click [here](https://github.com/kabirnagpal/SoA-ML-14/blob/master/week%206.ipynb) to view the Jupyter-Notebook.
If you don't have any Python Environment, you can also try the code in [Google Colab](https://colab.research.google.com/)

I hope this that week would have proven useful to you and let's wind it up with a quick question .
Ques) What is the name of the linkage that we have used in Agglomerative Clustering?
<form method='POST'>
<input name='answer'>
<input type='submit' value='Submit'>
<code class='code_checker'>
def answer(s):
return s.lower() == 'ward'
</code>
</form>


26 changes: 26 additions & 0 deletions soa/tracks/ml/7.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# ML Track-Week 7

Congratulations for making it upto here !

This week will introduce you to Dimensionality Reduction Techniques and Model Selection strategies like K cross fold validation, Grid Search and Stacking.

Dimensionality Reduction means reducing the number of features(columns) in a given dataset.Imagine working with a dataset with nearly 20000 features. Having
so many features makes it problematic to draw insights from the data. It’s not feasible to analyze each and every variable at a microscopic level. Hence, we use Dimensionality Reduction techniques.

Model selection,on the hand, is the process of selecting one final machine learning model from among a collection of candidate machine learning models
for a training dataset.

Let's start then ,shall we ?

### Click [here](https://github.com/kabirnagpal/SoA-ML-14/blob/master/week%207.ipynb).If you don't have any Python Environment, you can also try the code in [Google Colab](https://colab.research.google.com/)

Question to be answered after you complete your notebook
Kernel PCA cannot be used for non linear data. (True / False)
<form method='POST'>
<input name='answer'>
<input type='submit' value='Submit'>
<code class='code_checker'>
def answer(s):
return s.lower() == 'false'
</code>
</form>