From 052d4e12b0e565a9ac22823f17f38ebf3b50759a Mon Sep 17 00:00:00 2001 From: Diego Canez Date: Thu, 31 Oct 2024 21:28:37 +0100 Subject: [PATCH] docs: update docs --- docs/buildpdf | 2 +- docs/src/_config.yml | 6 +++++- docs/src/part1/problem.md | 17 ++++++++++------- docs/src/references.bib | 10 ++++++++++ 4 files changed, 26 insertions(+), 9 deletions(-) diff --git a/docs/buildpdf b/docs/buildpdf index ba19b77..d3d3428 100755 --- a/docs/buildpdf +++ b/docs/buildpdf @@ -1,2 +1,2 @@ #!/bin/bash -jb build --path-output _build/pdf src/ --builder pdfhtml +jb build --path-output _build/pdf src/ --builder pdflatex diff --git a/docs/src/_config.yml b/docs/src/_config.yml index 508aff9..2e5eada 100644 --- a/docs/src/_config.yml +++ b/docs/src/_config.yml @@ -34,4 +34,8 @@ html: sphinx: config: - html_show_copyright: false \ No newline at end of file + html_show_copyright: false + latex_elements: + preamble: + \usepackage{etoolbox} + \AtBeginEnvironment{figure}{\pretocmd{\hyperlink}{\protect}{}{}} diff --git a/docs/src/part1/problem.md b/docs/src/part1/problem.md index 006eaae..7245d9e 100644 --- a/docs/src/part1/problem.md +++ b/docs/src/part1/problem.md @@ -5,7 +5,7 @@ ## Motivation -Currently, the most popular approach for deploying an object detection model in a production environment is to use YOLO because it's fast and easy to use. However, to make it usable for your specific task you need to recollect a domain-specific dataset and fine-tune or train the model from scratch. This is a time-consuming and expensive process because such a dataset needs to be comprehensive enough to cover most of the possible scenarios that the model will encounter in the real world (weather conditions, different object scales and textures, etc). +Currently, the most popular approach for deploying an object detection model in a production environment is to use YOLO {cite}`yolo` because it's fast and easy to use. However, to make it usable for your specific task you need to recollect a domain-specific dataset and fine-tune or train the model from scratch. This is a time-consuming and expensive process because such a dataset needs to be comprehensive enough to cover most of the possible scenarios that the model will encounter in the real world (weather conditions, different object scales and textures, etc). Recently, a new paradigm shift has emerged in the field as described by {cite}`bommasani2022`: instead of training a model from scratch for a specific task, you can use a model that was pre-trained on a generic task with massive data and compute as a backbone for your model and only fine-tune a decoder/head for your specific task (see {numref}`Figure {number} `). These pre-trained models are called Foundation Models and are great at capturing features that are useful for a wide range of tasks. @@ -33,14 +33,17 @@ The drawback of using these foundation models is that they are large and computa ## Objectives -According to {cite}`mcip`, practitioners at Apple do the following when asked to deploy a model to some edge device. -Find a feasibility model A. -Find a smaller model B equivalent to A that is suitable for the device. -Compress model B to reach production-ready model C. +To ground our problem, we can use the framework described by {cite}`mcip` that Apple engineers use to deploy machine learning models on their devices. As we can see on {numref}`Figure {number} `, this consists on three steps: + +1. **Find a feasibility model A**: This involves training a model that can achieve the desired accuracy for the task. +2. **Find a smaller model B equivalent to A that is suitable for the device**: This may involve distilling the knowledge of model A into a smaller and/or more architecturally efficient model B that can be found through (neural) architecture search. +3. **Compress model B to reach production-ready model C**: This involves tuning the precision of the model to reduce its latency and memory cost to reach the performance budget. :::{figure-md} apple_practice -Caption -::: \ No newline at end of file +Source: {cite}`mcip` +::: + +In this work, we will focus on steps 1 and 3 which will be covered by Part 1 (Finding a Feasibility Model) and Part 2 (Optimization) respectively. \ No newline at end of file diff --git a/docs/src/references.bib b/docs/src/references.bib index 8bc3687..9525a63 100644 --- a/docs/src/references.bib +++ b/docs/src/references.bib @@ -151,4 +151,14 @@ @Article{mae2021 journal = {arXiv:2111.06377}, title = {Masked Autoencoders Are Scalable Vision Learners}, year = {2021}, +} + +@misc{yolo, +author = {Jocher, Glenn and Qiu, Jing and Chaurasia, Ayush}, +license = {AGPL-3.0}, +month = jan, +title = {{Ultralytics YOLO}}, +url = {https://github.com/ultralytics/ultralytics}, +version = {8.0.0}, +year = {2023} } \ No newline at end of file