Skip to content

Commit

Permalink
docs: update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
dgcnz committed Oct 31, 2024
1 parent 62de7db commit 052d4e1
Show file tree
Hide file tree
Showing 4 changed files with 26 additions and 9 deletions.
2 changes: 1 addition & 1 deletion docs/buildpdf
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
#!/bin/bash
jb build --path-output _build/pdf src/ --builder pdfhtml
jb build --path-output _build/pdf src/ --builder pdflatex
6 changes: 5 additions & 1 deletion docs/src/_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,4 +34,8 @@ html:

sphinx:
config:
html_show_copyright: false
html_show_copyright: false
latex_elements:
preamble:
\usepackage{etoolbox}
\AtBeginEnvironment{figure}{\pretocmd{\hyperlink}{\protect}{}{}}
17 changes: 10 additions & 7 deletions docs/src/part1/problem.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

## Motivation

Currently, the most popular approach for deploying an object detection model in a production environment is to use YOLO because it's fast and easy to use. However, to make it usable for your specific task you need to recollect a domain-specific dataset and fine-tune or train the model from scratch. This is a time-consuming and expensive process because such a dataset needs to be comprehensive enough to cover most of the possible scenarios that the model will encounter in the real world (weather conditions, different object scales and textures, etc).
Currently, the most popular approach for deploying an object detection model in a production environment is to use YOLO {cite}`yolo` because it's fast and easy to use. However, to make it usable for your specific task you need to recollect a domain-specific dataset and fine-tune or train the model from scratch. This is a time-consuming and expensive process because such a dataset needs to be comprehensive enough to cover most of the possible scenarios that the model will encounter in the real world (weather conditions, different object scales and textures, etc).

Recently, a new paradigm shift has emerged in the field as described by {cite}`bommasani2022`: instead of training a model from scratch for a specific task, you can use a model that was pre-trained on a generic task with massive data and compute as a backbone for your model and only fine-tune a decoder/head for your specific task (see {numref}`Figure {number} <knowledgetransfer>`). These pre-trained models are called Foundation Models and are great at capturing features that are useful for a wide range of tasks.

Expand Down Expand Up @@ -33,14 +33,17 @@ The drawback of using these foundation models is that they are large and computa

## Objectives

According to {cite}`mcip`, practitioners at Apple do the following when asked to deploy a model to some edge device.
Find a feasibility model A.
Find a smaller model B equivalent to A that is suitable for the device.
Compress model B to reach production-ready model C.
To ground our problem, we can use the framework described by {cite}`mcip` that Apple engineers use to deploy machine learning models on their devices. As we can see on {numref}`Figure {number} <apple_practice>`, this consists on three steps:

1. **Find a feasibility model A**: This involves training a model that can achieve the desired accuracy for the task.
2. **Find a smaller model B equivalent to A that is suitable for the device**: This may involve distilling the knowledge of model A into a smaller and/or more architecturally efficient model B that can be found through (neural) architecture search.
3. **Compress model B to reach production-ready model C**: This involves tuning the precision of the model to reduce its latency and memory cost to reach the performance budget.


:::{figure-md} apple_practice
<img src="apple_practice.png" alt="">

Caption
:::
Source: {cite}`mcip`
:::

In this work, we will focus on steps 1 and 3 which will be covered by Part 1 (Finding a Feasibility Model) and Part 2 (Optimization) respectively.
10 changes: 10 additions & 0 deletions docs/src/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -151,4 +151,14 @@ @Article{mae2021
journal = {arXiv:2111.06377},
title = {Masked Autoencoders Are Scalable Vision Learners},
year = {2021},
}

@misc{yolo,
author = {Jocher, Glenn and Qiu, Jing and Chaurasia, Ayush},
license = {AGPL-3.0},
month = jan,
title = {{Ultralytics YOLO}},
url = {https://github.com/ultralytics/ultralytics},
version = {8.0.0},
year = {2023}
}

0 comments on commit 052d4e1

Please sign in to comment.