Skip to content

Writing statistical models and ML algorithms from almost scratch

Notifications You must be signed in to change notification settings

sourabhv/ml-from-scratch

Repository files navigation

ML From Scratch

This repository contains implementations of Machine Learning algorithms and models from scratch. The purpose of this repository is to provide a simplistic form of these algorithms and match them to their respective libraries in Python.

Table of Contents

Statistical Models

  • Linear Regression – Basic regression model, foundation for learning.
  • Logistic Regression – Basic classification model.
  • Ridge and Lasso Regression – Regularized versions of Linear Regression.
  • K-Nearest Neighbors (KNN) – Simple non-parametric model for classification/regression.
  • Naive Bayes – Probabilistic model based on Bayes' theorem.
  • K-Means – Basic clustering algorithm.
  • T-SNE - t-Distributed Stochastic Neighbor Embedding, dimensionality reduction technique.
  • Singular Value Decomposition (SVD) – Matrix factorization for dimensionality reduction.
  • Principal Component Analysis (PCA) – Dimensionality reduction.
  • Decision Trees – Simple interpretable model for classification/regression.
  • Random Forests – Ensemble of Decision Trees, improves accuracy.
  • Support Vector Machines (SVM) – Powerful classification/regression model.
  • XGBoost – State-of-the-art boosting algorithm for classification/regression.
  • Bayesian Networks – Probabilistic graphical model.
  • Markov Decision Processes (MDPs) – Framework for modeling decisions, often used in RL.

Neural Network based Models:

  • Neural Networks (NNs) – Foundation for deep learning, learning weights and activations.
  • Backpropagation – Essential algorithm for training NNs.
  • Gradient Descent – Optimization method used in NNs.
  • Convolutional Neural Networks (CNNs) – Specialized NNs for image data.
  • Recurrent Neural Networks (RNNs) – NNs for sequential data.
  • Long Short-Term Memory Networks (LSTMs) – Improved RNNs, handling long sequences.
  • Gated Recurrent Units (GRUs) – Another variant of RNNs, simpler than LSTMs.
  • Transformer Networks – State-of-the-art model for sequential data (e.g., NLP).
  • Autoencoders – NNs for unsupervised learning and dimensionality reduction.
  • Generative Adversarial Networks (GANs) – NNs for generating new data.
  • Reinforcement Learning – Learning to take actions in an environment.
  • Q-Learning – Used in Reinforcement Learning; can involve neural nets.

Installation

  1. Clone the repository
  2. Install the required dependencies
  3. Run the desired notebook

FAQs

Why are you doing this?

I am doing this to learn more about the inner workings of Machine Learning algorithms and models. I believe that by implementing these algorithms from scratch, I will have a better understanding of how they work and how they can be improved.

Why not just use libraries like scikit-learn or tensorflow or pytorch?

While libraries like scikit-learn and tensorflow are great for building Machine Learning models quickly, they abstract away a lot of the details of how these models work. By implementing these models from scratch, I can gain a deeper understanding of how they work and how they can be improved.

How do I know if my implementation is correct?

I will be comparing the results of my implementations to the results of the corresponding libraries in Python. If the results match, then I can be confident that my implementation is correct.

Can I use this in production?

Send me an email, I can tell you faster ways to get a headache :)

Contributing

If you would like to contribute to this repository, please open an issue or a pull request.

License

This repository is licensed under the MIT License. Do whatever you want with it!

About

Writing statistical models and ML algorithms from almost scratch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published