From 913e0519ad8695ae99b563377c6d76d87d50a3a6 Mon Sep 17 00:00:00 2001 From: MariusDrulea Date: Sun, 10 Mar 2024 13:55:42 +0200 Subject: [PATCH 01/10] add the AD topic --- gsoc.md | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/gsoc.md b/gsoc.md index 3f45910..5aa4b39 100644 --- a/gsoc.md +++ b/gsoc.md @@ -84,3 +84,29 @@ The ideal candidate should have practical experience with training deep learning - A new FluxML package, FluxBenchmarks.jl, that will perform configurable benchmarking across our ML stack. - Github Actions integration for FluxBenchmarks.jl to invoke the tool from PRs. - A benchmarking suite that will build your experience with different types of ML models and operations across the stack. + + + + +## Tape based automated differentiation engine in Julia + +Write a new AD engine in julia and integrate it into the FluxML environment. + +**Difficulty.** Hard. **Duration.** 350 hours + +### Description + +TODO: Why is this needed? State of the current ADs. Advantages of tape-based ADs. + +**Mentors.** [Marius Drulea](https://github.com/jpsamaroo), [Kyle Daruwalla](https://github.com/darsnack) + +### Prerequisites + +- Strong knowledge of graph processing algorightms +- Familiarity with the machine learning methods: forward and backward pass and gradient descent +- Good programming skills in any of the languages: Julia, Python, C++, Java, C# is required. +- Julia language is nice to know, but not an absoute requierement. + +### Your contributions + +TODO From 7ef5cc6ea71499a700598a43d1d526e5d2c3aa82 Mon Sep 17 00:00:00 2001 From: MariusDrulea Date: Sun, 10 Mar 2024 20:07:08 +0200 Subject: [PATCH 02/10] fix mentor github page --- gsoc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gsoc.md b/gsoc.md index 5aa4b39..9d17411 100644 --- a/gsoc.md +++ b/gsoc.md @@ -98,7 +98,7 @@ Write a new AD engine in julia and integrate it into the FluxML environment. TODO: Why is this needed? State of the current ADs. Advantages of tape-based ADs. -**Mentors.** [Marius Drulea](https://github.com/jpsamaroo), [Kyle Daruwalla](https://github.com/darsnack) +**Mentors.** [Marius Drulea](https://github.com/MariusDrulea), [Kyle Daruwalla](https://github.com/darsnack) ### Prerequisites From f8197ca8a7828a68b1c04c4d3ac3ed4ff4ed82b5 Mon Sep 17 00:00:00 2001 From: MariusDrulea Date: Tue, 19 Mar 2024 22:50:43 +0200 Subject: [PATCH 03/10] taped based AD engine, first version --- gsoc.md | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/gsoc.md b/gsoc.md index 9d17411..11422dc 100644 --- a/gsoc.md +++ b/gsoc.md @@ -86,17 +86,18 @@ The ideal candidate should have practical experience with training deep learning - A benchmarking suite that will build your experience with different types of ML models and operations across the stack. - - ## Tape based automated differentiation engine in Julia -Write a new AD engine in julia and integrate it into the FluxML environment. +Write a new AD (automated differentiation) engine in julia and integrate it into the FluxML environment. +The AD engine will be used for the typical DNN architectures. **Difficulty.** Hard. **Duration.** 350 hours ### Description -TODO: Why is this needed? State of the current ADs. Advantages of tape-based ADs. +The family of AD engines in Julia consists mostly of Zygote, Enzyme and the upcoming Diffractor. These packages operate on the LLVM intermediate representation (IR) output of the first compiler pass. They are very complex, takes many months or years to develop and requires specialized knowledge for this. Maintaining these packages is also big pain point: as the original developers often engage in other projects, over the years the community is left with these hard-to-maintain packages. These packages have their advantages of course, but we shall see them more like premium AD packages. They can be used, but we shall always have a baseline AD package which does the job and it's easy to maintain and improve. + +In this project we aim to solve this problem by using a simple and yet very effective approach: tapes. Tape based automated differentiation is in use in PyTorch, Tensorflow and Jax. Despite their simplicity, taped-based ADs are the main tool in such succesfull deep learning frameworks. While PyTorch, Tensorflow and Jax are monoliths, the FluxML ecosystem consists of several packages and a new AD engine can be added quite easily. We will make use of the excellent ChainRules and NNlib packages and make the AD integrate with Flux.jl and Lux.jl. **Mentors.** [Marius Drulea](https://github.com/MariusDrulea), [Kyle Daruwalla](https://github.com/darsnack) @@ -104,9 +105,12 @@ TODO: Why is this needed? State of the current ADs. Advantages of tape-based ADs - Strong knowledge of graph processing algorightms - Familiarity with the machine learning methods: forward and backward pass and gradient descent -- Good programming skills in any of the languages: Julia, Python, C++, Java, C# is required. +- Familiarity with one of the machine learning libraries: FluxML, PyTorch, Tensorflow, Jax +- Good programming skills in any of the folowing languages is required: Julia, Python, C/C++, Java, C# - Julia language is nice to know, but not an absoute requierement. ### Your contributions - -TODO +- Write a new AD engine. This will lead to a new Julia package, or we can completely replace the content of the old Tracker.jl package. +- Integrate the new engine in the Julia ML ecosystem: Flux, Lux, ChainRules, NNlib. +- Write extensive documentation and extensively document the code. This must be a package were the commnity can easily get involved if there will arise a need for it. +- Provide a youtube video on how to use the package. From fac81288b8bb979ba999a8337768ece6a200f271 Mon Sep 17 00:00:00 2001 From: MariusDrulea Date: Tue, 19 Mar 2024 22:52:35 +0200 Subject: [PATCH 04/10] taped based AD engine, first version --- gsoc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gsoc.md b/gsoc.md index 11422dc..ad4ad47 100644 --- a/gsoc.md +++ b/gsoc.md @@ -107,7 +107,7 @@ In this project we aim to solve this problem by using a simple and yet very effe - Familiarity with the machine learning methods: forward and backward pass and gradient descent - Familiarity with one of the machine learning libraries: FluxML, PyTorch, Tensorflow, Jax - Good programming skills in any of the folowing languages is required: Julia, Python, C/C++, Java, C# -- Julia language is nice to know, but not an absoute requierement. +- Julia language is nice to know, but not an absoute requierement ### Your contributions - Write a new AD engine. This will lead to a new Julia package, or we can completely replace the content of the old Tracker.jl package. From d98960004af96bedc42adfc11542469dbc9c88fa Mon Sep 17 00:00:00 2001 From: MariusDrulea Date: Wed, 20 Mar 2024 20:40:43 +0200 Subject: [PATCH 05/10] improved description of "low-tech" AD vs current Co-authored-by: Kyle Daruwalla --- gsoc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gsoc.md b/gsoc.md index ad4ad47..0f6b6d2 100644 --- a/gsoc.md +++ b/gsoc.md @@ -95,7 +95,7 @@ The AD engine will be used for the typical DNN architectures. ### Description -The family of AD engines in Julia consists mostly of Zygote, Enzyme and the upcoming Diffractor. These packages operate on the LLVM intermediate representation (IR) output of the first compiler pass. They are very complex, takes many months or years to develop and requires specialized knowledge for this. Maintaining these packages is also big pain point: as the original developers often engage in other projects, over the years the community is left with these hard-to-maintain packages. These packages have their advantages of course, but we shall see them more like premium AD packages. They can be used, but we shall always have a baseline AD package which does the job and it's easy to maintain and improve. +The family of reverse-mode AD engines in Julia consists mostly of Zygote, Enzyme and the upcoming Diffractor. These packages operate on the intermediate representation (IR) output of the compiler. They are very complex, and it takes many months or years to develop the specialized knowledge required to build these tools. As a result, fixing bugs or adding features is a time consuming task for non-expert developers. In this project, we will develop a "lower-tech" tape-based AD engine, in the spirit of Tracker.jl, which will be easier to maintain while offering fewer features than the existing, complex engines. In this project we aim to solve this problem by using a simple and yet very effective approach: tapes. Tape based automated differentiation is in use in PyTorch, Tensorflow and Jax. Despite their simplicity, taped-based ADs are the main tool in such succesfull deep learning frameworks. While PyTorch, Tensorflow and Jax are monoliths, the FluxML ecosystem consists of several packages and a new AD engine can be added quite easily. We will make use of the excellent ChainRules and NNlib packages and make the AD integrate with Flux.jl and Lux.jl. From 7ddfd2fa784a79ad94f0c7025dbd52a54074f829 Mon Sep 17 00:00:00 2001 From: MariusDrulea Date: Wed, 20 Mar 2024 20:40:59 +0200 Subject: [PATCH 06/10] Julia instead of julia Co-authored-by: Kyle Daruwalla --- gsoc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gsoc.md b/gsoc.md index 0f6b6d2..6ee8f41 100644 --- a/gsoc.md +++ b/gsoc.md @@ -88,7 +88,7 @@ The ideal candidate should have practical experience with training deep learning ## Tape based automated differentiation engine in Julia -Write a new AD (automated differentiation) engine in julia and integrate it into the FluxML environment. +Write a new AD (automated differentiation) engine in Julia and integrate it into the FluxML environment. The AD engine will be used for the typical DNN architectures. **Difficulty.** Hard. **Duration.** 350 hours From 5b019c7fae3d36f13737cff3bd9eba532f760c4a Mon Sep 17 00:00:00 2001 From: MariusDrulea Date: Wed, 20 Mar 2024 20:41:19 +0200 Subject: [PATCH 07/10] fix minor typo Co-authored-by: Kyle Daruwalla --- gsoc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gsoc.md b/gsoc.md index 6ee8f41..334d0db 100644 --- a/gsoc.md +++ b/gsoc.md @@ -97,7 +97,7 @@ The AD engine will be used for the typical DNN architectures. The family of reverse-mode AD engines in Julia consists mostly of Zygote, Enzyme and the upcoming Diffractor. These packages operate on the intermediate representation (IR) output of the compiler. They are very complex, and it takes many months or years to develop the specialized knowledge required to build these tools. As a result, fixing bugs or adding features is a time consuming task for non-expert developers. In this project, we will develop a "lower-tech" tape-based AD engine, in the spirit of Tracker.jl, which will be easier to maintain while offering fewer features than the existing, complex engines. -In this project we aim to solve this problem by using a simple and yet very effective approach: tapes. Tape based automated differentiation is in use in PyTorch, Tensorflow and Jax. Despite their simplicity, taped-based ADs are the main tool in such succesfull deep learning frameworks. While PyTorch, Tensorflow and Jax are monoliths, the FluxML ecosystem consists of several packages and a new AD engine can be added quite easily. We will make use of the excellent ChainRules and NNlib packages and make the AD integrate with Flux.jl and Lux.jl. +In this project we aim to solve this problem by using a simple and yet very effective approach: tapes. Tape based automated differentiation is in use in PyTorch, Tensorflow, and Jax. Despite their simplicity, taped-based ADs are the main tool in such successful deep learning frameworks. While PyTorch, Tensorflow and Jax are monoliths, the FluxML ecosystem consists of several packages and a new AD engine can be added quite easily. We will make use of the excellent ChainRules and NNlib packages and make the AD integrate with Flux.jl and Lux.jl. **Mentors.** [Marius Drulea](https://github.com/MariusDrulea), [Kyle Daruwalla](https://github.com/darsnack) From ac7a488fc995ffe16e694a37543370b4b83b8c35 Mon Sep 17 00:00:00 2001 From: MariusDrulea Date: Wed, 20 Mar 2024 20:47:19 +0200 Subject: [PATCH 08/10] appeared twice --- gsoc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gsoc.md b/gsoc.md index 334d0db..6ee577c 100644 --- a/gsoc.md +++ b/gsoc.md @@ -95,7 +95,7 @@ The AD engine will be used for the typical DNN architectures. ### Description -The family of reverse-mode AD engines in Julia consists mostly of Zygote, Enzyme and the upcoming Diffractor. These packages operate on the intermediate representation (IR) output of the compiler. They are very complex, and it takes many months or years to develop the specialized knowledge required to build these tools. As a result, fixing bugs or adding features is a time consuming task for non-expert developers. In this project, we will develop a "lower-tech" tape-based AD engine, in the spirit of Tracker.jl, which will be easier to maintain while offering fewer features than the existing, complex engines. +The family of reverse-mode AD engines in Julia consists mostly of Zygote, Enzyme and the upcoming Diffractor. These packages operate on the intermediate representation (IR) output of the compiler. They are very complex, and it takes many months or years to develop the specialized knowledge required to build these tools. As a result, fixing bugs or adding features is a time consuming task for non-expert developers. In this project we aim to solve this problem by using a simple and yet very effective approach: tapes. Tape based automated differentiation is in use in PyTorch, Tensorflow, and Jax. Despite their simplicity, taped-based ADs are the main tool in such successful deep learning frameworks. While PyTorch, Tensorflow and Jax are monoliths, the FluxML ecosystem consists of several packages and a new AD engine can be added quite easily. We will make use of the excellent ChainRules and NNlib packages and make the AD integrate with Flux.jl and Lux.jl. From 93cabe461ddf9ce5fcc9dc84e1c691fca01fb271 Mon Sep 17 00:00:00 2001 From: MariusDrulea Date: Wed, 20 Mar 2024 20:57:58 +0200 Subject: [PATCH 09/10] mention tapes offer fewer features than existing engines --- gsoc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gsoc.md b/gsoc.md index 6ee577c..ab06bed 100644 --- a/gsoc.md +++ b/gsoc.md @@ -97,7 +97,7 @@ The AD engine will be used for the typical DNN architectures. The family of reverse-mode AD engines in Julia consists mostly of Zygote, Enzyme and the upcoming Diffractor. These packages operate on the intermediate representation (IR) output of the compiler. They are very complex, and it takes many months or years to develop the specialized knowledge required to build these tools. As a result, fixing bugs or adding features is a time consuming task for non-expert developers. -In this project we aim to solve this problem by using a simple and yet very effective approach: tapes. Tape based automated differentiation is in use in PyTorch, Tensorflow, and Jax. Despite their simplicity, taped-based ADs are the main tool in such successful deep learning frameworks. While PyTorch, Tensorflow and Jax are monoliths, the FluxML ecosystem consists of several packages and a new AD engine can be added quite easily. We will make use of the excellent ChainRules and NNlib packages and make the AD integrate with Flux.jl and Lux.jl. +In this project we aim to solve this problem by using a simple and yet very effective approach: tapes. This "lower-tech" tape-based AD engine will be easier to maintain while offering fewer features than the existing, complex engines. Tape based automated differentiation is in use in PyTorch, Tensorflow, and Jax. Despite their simplicity, taped-based ADs are the main tool in such successful deep learning frameworks. While PyTorch, Tensorflow and Jax are monoliths, the FluxML ecosystem consists of several packages and a new AD engine can be added quite easily. We will make use of the excellent ChainRules and NNlib packages and make the AD integrate with Flux.jl and Lux.jl. **Mentors.** [Marius Drulea](https://github.com/MariusDrulea), [Kyle Daruwalla](https://github.com/darsnack) From a8f31b6df5c5540397906a54b7f8863b91688944 Mon Sep 17 00:00:00 2001 From: MariusDrulea Date: Thu, 21 Mar 2024 01:29:38 +0200 Subject: [PATCH 10/10] fix minor typo Co-authored-by: Kyle Daruwalla --- gsoc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gsoc.md b/gsoc.md index ab06bed..6f17d78 100644 --- a/gsoc.md +++ b/gsoc.md @@ -112,5 +112,5 @@ In this project we aim to solve this problem by using a simple and yet very effe ### Your contributions - Write a new AD engine. This will lead to a new Julia package, or we can completely replace the content of the old Tracker.jl package. - Integrate the new engine in the Julia ML ecosystem: Flux, Lux, ChainRules, NNlib. -- Write extensive documentation and extensively document the code. This must be a package were the commnity can easily get involved if there will arise a need for it. +- Write extensive documentation and extensively document the code. This must be a package were the community can easily get involved if there will arise a need for it. - Provide a youtube video on how to use the package.