This is a pick-your-problem style guide I created to educate everyone from my leadership team to my ML engineers on how to work with AI in production settings. This is the stuff you won't learn in most ML/AI courses.
Readings covering a fundamental overview of how Machine Learning Systems are made
A Taxonomy of ML and AI for those who are unfamiliar with the field
Machine Learning Systems Design - Part 1
Machine Learning Systems Design - Part 2
If you're tackling a specific system design stage and want to dive deeper, I'd recommend perusing specific parts of:
and/or it's more formal book version
Rules of Machine Learning: | Google Developers
Large Language Models | Full Stack Deep Learning
Focus on the speed at which you can run valid experiments. It is the only way to find a viable model for your problem.
- Your problems will often lie more in the data than the modelling. Fundamentals of Data Science (Shit in Shit Out) always apply, so focus on figuring out how to refine the data quality and use a system that focuses on this.
Product Leadership team can stop here
Recent experiences have shown me that AutoML has come a long way in the past five to seven years, especially for Tabular Machine Learning. So my latest recommendation is to use it. Use it first.
- Get your data in order ~ clean | preprocess | shrink/project if needed
- Use AutoML
- See what baselines it gives - if it works out of the box, I'm happy for you but very jealous! :p
When not to use AutoML right away?
- Sometimes you've got a really complex problem and/or no one has solved something similar before and/or a "lot" of data. Here AutoML probably will be inefficient compared to taking a first stab at doing research on narrowing down what to use architecture/preprocessing wise.
AutoML tools
Tip: You can use a foundation model like CLIP-VIT or GPTx as a pre-processor to make any data into structured data (embedding) for tasks as a quick and dirty experiment.
*Structured ~ Tabular ~ Embedding ~ Preprocessed
- Lazy Predict - Structured
- AutoGluon - Structured | Image | Text | Multimodal | Time series
- H2O - Tabular possibly Structured
- MLJar - Structured - Has auto-feature engineering
- AutoPytorch - Structured
- AutoSklearn - Structured
- TPOT - Structured - Has auto-feature engineering
- TPOT2 - Structured - Has auto-feature engineering
- AutoKeras - Structured | Image | Text | Time Series | Multimodal
- FLAML - Structured
- PyCaret - Structured | Time Series | Text
- AutoGen - LLMs
- TransmogrifyAI - Structured
- Model Search by Google - Structured | Image | Text | Audio | Time Series - Use with care this is compute expensive
A number of these are also extendable with your custom models which aren't just Tabular - FLAML, AutoGluon, AutoKeras
If you have the (
Theoretrically you can also use any model hub for "AutoML" if you combine it with a sweeping agent.
E.g. HuggingFace Autotrain + Weights and Biases Sweeps - Technically not AutoML but so many models so it's so very easy to do
Where to look for models/techniques and the like?
- Model Zoos
- Hugging face and their Github
- PyTorch Hub
- Torchvision
- Torchaudio
- Torchtext - Hugging face is way better for this but just in case
- TIMM - Vision models - Also check their Hugging Face Page
- Tensorflow Hub
- Model Zoo.co
- Nvidia Model Zoo
- ONNX Model Zoo
- NVIDIA NGC Model Zoo
- Facebook Research Model Zoo
- Keras Applications
- Caffe Model Zoo
- MXNet/Gluon Model Zoo
- Apple Machine Learning Models
- some more exist...
Please note that the availability and content of these model zoos may vary, so it's always best to refer to the official documentation provided by each platform.
-
AI company Githubs
- Laion
- Ultralytics
- AirBnB
- Facebook AI
- Google AI
- Microsoft AI
- Netflix TechBlog
- and many more…
-
AI lab blogs
- CSAIL - MIT/Stanford
- CMU Blog
- Taiwan AI Labs
- Deepmind blog
- OpenAI Blog
- Synced Review
- BAIR
- and many more…
-
AI researcher's personal blogs
-
Paper author's Github
-
Github search - but follow the rules below to quickly filter out duds
Where not to look for models?
- Towards Data Science and other unmoderated blogs (unless they link to one of the above)
- Kaggle
- Github Snippets
Rules to figure out what is and is not promising
-
Look at whether existing implementations exist
-
If no then I highly recommend finding another architecture, this process can be excruciating and time consuming but if you do have to:
- Select a design pattern to write the Neural Network in - I love a class based system like PyTorch does and then using PyTorch Lightning's prescribed format on top of it.
- Read the research paper and see if they've specified all the bits and bobs of the architecture and the training process - if not email the authors - you may get lucky
- Write the pseudocode - especially the math
- Implementation
- Beginner Advice on Learning to Implement Machine Learning Models
- Courses that help and are great reference material:
- Coursera and Udacity courses tend to have everything handed to you in a sliver platter so while they're good for basics they don't help with this much.
- Optimize later - premature optimization is the bane of all good code
-
If yes then
- Look at code cleanliness first and foremost. Bad AI code is a major pain. Ask me for stories about PyTorch’s FasterRCCN being broken and how we wasted 1 month behind it.
- Look at Git repository popularity. More people using something often means bugs have been caught or addressed.
- Look at open issues in the git repo.
- Look at if people have used these kinds of models in production already
- Look at if you can find if these models have been used on data of similar complexity and scale as the use case
- Understand the math and the process. Is it actually going to do something meaningful to your data - especially in the case of self-supervised learning. E.g. random cropping images of buttons to learn features in an auto-encoder won’t make sense but doing it for a whole UI image might.
- Look at code cleanliness first and foremost. Bad AI code is a major pain. Ask me for stories about PyTorch’s FasterRCCN being broken and how we wasted 1 month behind it.
-
See if the dataset the model is trained and tested on is publicly available and feasibly downloadable - if not don't fret too much on this step since the goal is to make it work for your data and your problem.
-
Test your implementation on the dataset and see if you can reproduce results within the ball park (~2-5% error difference is fine)
-
See debugging your AI models section below
More covered in planning below.
Finding the best way to collect data + Finding the right metric
Understanding and Planning LLMs
So simple question - Should I train and/or invest in working with an LLM?
- I think the answer is "it" depends - but know that it's probably expensive. So see if you can get close with prompt tuning, and if you can't then fine-tuning it on your own data, then consider model distillation, and finally think about full training.
Also covered further in pre-requisite readings
Make your test dataset before you train anything
Setting up an appropriate baseline is an important step that many candidates forget. There are three different baselines that you should think about:
- Random baseline: if your model predicts everything randomly, what's the expected performance?
- Human baseline: how well would humans perform on this task?
- Simple heuristic: for example, for the task of recommending the app to use next on your phone, the simplest model would be to recommend your most frequently used app. If this simple heuristic can predict the next app accurately 70% of the time, any model you build has to outperform it significantly to justify the added complexity.
Testing LLMs is hard
But here's the best we've been able to figure out - research here is always progressing
It's extremely hard to find good advice or a one-size fits all solution on with data annotation and what works well but here are a few resources I've been able to find.
Tagging Guidelines Labelling Guidelines by Eugene Yan
How to Develop Annotation Guidelines by Prof. Dr. Nils Reiter
Data pipeline integrity
- Great Expectations: Helps data teams eliminate pipeline debt, through data testing, documentation, and profiling. - Your new best friend as a Data Scientist
- Soda Core: Data profiling, testing, and monitoring for SQL accessible data. - Your kinda sorta-best friend
- ydata-quality: Data Quality assessment with one line of code. - is cool but inflexible
- Pandas Profiling: Extends the pandas DataFrame with df.profile_report() for quick data analysis.
- DataProfiler: A Python library designed to make data analysis, monitoring and sensitive data detection easy. - Bit tough to use
Data tagging platforms I love using Scale AI for tagging but if you're looking for something free then LabelStudio is a good start
Section not essential for anyone but MLEs
Understanding common data challenges in training
Understanding training, model selection, and other processes
Section not essential for anyone but MLEs
Full Stack Deep Learning - Lecture 7: Troubleshooting Deep Neural Networks
It's often very useful to setup an internal prototyping/testing interface for any AI model + it's data that you plan to deploy
Infrastructure challenges and considerations
Full Stack Deep Learning - Lecture 10: Testing & Explainability
Tools
-
ZenoML - Data and model result explainability - very new but simple and great for computer vision
-
Netron: Visualizer for neural network, deep learning, and machine learning models.
-
Deepchecks: Test Suites for Validating ML Models & Data. Deepchecks is a Python package for comprehensively validating your machine learning models and data with minimal effort.
-
Evidently: Interactive reports to analyze ML models during validation or production monitoring.
-
I'd also highly recommend some kind of hardware usage monitoring to see if models are actually efficient e.g. RAM, CPU, GPU % util - most if not Cloud Platforms have this.
Deployment checklist
Understanding infrastructure in general
This moves very fast and gets crazier by the month. Just take a look at the MAD: Machine Learning, Artificial Intelligence & Data Landscape for 2023.
But here are the essentials you need to make AI happen smoothly:
- Code versioning
- Model and Artifact versioning
- Data versioning
- Data storage and collection/collation pipeline
- Model training infra - GPU machines + and preferably platforms like KubeFlow or SageMaker
- Experiment Tracking - My go to is Weights and Biases or Tensorboard but if you're looking for a more packaged solution MLFlow is great
- Monitoring/Logging
- Inference Deployment Mechanism (more below)
There are also a bunch of all-in-one platforms that do all or most of these things like MLFlow, Neptune, Sagemaker, Vertex, or Polyaxon.
Full Stack Deep Learning - Lecture 6: MLOps Infrastructure & Tooling
Deployment
The deployment “stack” - this also keeps moving quite fast but the basic principles remain the same. A quick Google Search doesn't hurt though.
Here's the last thing I saw that showed the latest changes in the landscape: A Shift in ML Deployment by James Detweiler (Felicis VC)
- Know your use-case's deployment platform e.g. Mobile, Web, Edge, etc.
- Find the stack/toolkit/library that works on your platform and with company requirements e.g Tensorflow Lite (Mobile/Edge), Google's Vertex AI prediction (SaaS), Torchserve/TFX (Backend and enterprise grade) and BentoML (Backend but simpler)
- Understand your use-case's constraints e.g. real-time for video or batch for recommendation engines
- Optimize for time/cost/performance/hardware
Full Stack Deep Learning - Lecture 11: Deployment & Monitoring
UPDATE: Lecture 5: Deployment Specific Deployment System Examples which are short but good Neptune AI Blog has some good examples too
LLM - Large Language Models by popular request
I've found that deployment depends on model needs but Hugging Face has done a great job providing a API interface that "just works".
- Deploying LLMs via HuggingFace
- Otherwise the same stuff as above
How to make sure you know when ML systems fail and you can see and know it
Tools
- Aporia: Observability with customized monitoring and explainability for ML models.
- Gantry: ML Observability platform with analytics, alerting, and human feedback
- Arize: An end-to-end ML observability and model monitoring platform.
- WhyLabs: AI Observability platform - they also have opensource components
- Fiddler: Monitor, explain, and analyze your AI in production.
- Superwise: Fully automated, enterprise-grade model observability in a self-service SaaS platform.
If these platforms don't work for you I recommend making your own pipeline using either the:
- ELK Stack
- Grafana's Stack
- Manifold: A model-agnostic visual debugging tool for machine learning.
- Your own logger + Your own data stores + Your own BI (Metabase, Superset, etc.) - not recommended
I'd also highly recommend some kind of hardware usage monitoring to see if models are actually efficient e.g. RAM, CPU, GPU % util - most if not Cloud Platforms have this.
- I follow a few newsletters like The Gradient, TLDR AI and The Batch. Then I augment them by the RSS feed below
- I tend to sometimes look at Arxiv Sanity
- I look at popular topics on Twitter and the common Hashtags.
- I tend to loosely follow the RSS feeds of the following blogs (I've uploaded the OPML file for this in this repo):
- Machine Learning Blog | ML@CMU | Carnegie Mellon University
- KDnuggets
- Meta Research
- The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
- MachineLearningMastery.com
- Synced
- fast.ai
- MIT News - Computer Science and Artificial Intelligence Laboratory
- The Gradient
- DeepMind
- Paperspace Blog
- PyTorch - Medium
- MLOps Community
- ScienceDaily - Artificial Intelligence
- Taiwan AILabs
- The Official Blog of BigML.com
- Arize AI
- The TensorFlow Blog
- The AI Blog
- PyTorch Website
- The Stanford AI Lab Blog
- Google AI Blog
- TruEra
- OpenAI
- The Berkeley Artificial Intelligence Research Blog
- neptune.ai
- Apple Machine Learning Research
Stanford CS329S Course by Chip Hyuen - CS 329S | Syllabus
Full Stack Deep Learning by Josh Tobin and Sergey Karayev
Rules of Machine Learning: | Google Developers
Labelling Guidelines by Eugene Yan
How to Develop Annotation Guidelines by Prof. Dr. Nils Reiter