Merge pull request #1996 from Karthik-d-k/namecase

replace ADAM with Adam and its variants thereof
FluxML · Jun 16, 2022 · 952c4a5 · 952c4a5
2 parents 0b01b77 + 7640149
commit 952c4a5
Show file tree

Hide file tree

Showing 8 changed files with 92 additions and 83 deletions.
diff --git a/docs/src/models/recurrence.md b/docs/src/models/recurrence.md
@@ -173,7 +173,7 @@ Flux.reset!(m)
 [m(x) for x in seq_init]
 
 ps = Flux.params(m)
-opt= ADAM(1e-3)
+opt= Adam(1e-3)
 Flux.train!(loss, ps, data, opt)
 ```
 

diff --git a/docs/src/saving.md b/docs/src/saving.md
@@ -135,6 +135,6 @@ You can store the optimiser state alongside the model, to resume training
 exactly where you left off. BSON is smart enough to [cache values](https://github.com/JuliaIO/BSON.jl/blob/v0.3.4/src/write.jl#L71) and insert links when saving, but only if it knows everything to be saved up front. Thus models and optimizers must be saved together to have the latter work after restoring.
 
 ```julia
-opt = ADAM()
+opt = Adam()
 @save "model-$(now()).bson" model opt
 ```
diff --git a/docs/src/training/optimisers.md b/docs/src/training/optimisers.md
@@ -39,7 +39,7 @@ for p in (W, b)
 end
 ```
 
-An optimiser `update!` accepts a parameter and a gradient, and updates the parameter according to the chosen rule. We can also pass `opt` to our [training loop](training.md), which will update all parameters of the model in a loop. However, we can now easily replace `Descent` with a more advanced optimiser such as `ADAM`.
+An optimiser `update!` accepts a parameter and a gradient, and updates the parameter according to the chosen rule. We can also pass `opt` to our [training loop](training.md), which will update all parameters of the model in a loop. However, we can now easily replace `Descent` with a more advanced optimiser such as `Adam`.
 
 ## Optimiser Reference
 
@@ -51,15 +51,15 @@ Descent
 Momentum
 Nesterov
 RMSProp
-ADAM
-RADAM
+Adam
+RAdam
 AdaMax
-ADAGrad
-ADADelta
+AdaGrad
+AdaDelta
 AMSGrad
-NADAM
-ADAMW
-OADAM
+NAdam
+AdamW
+OAdam
 AdaBelief
 ```
 
@@ -182,7 +182,7 @@ WeightDecay
 Gradient clipping is useful for training recurrent neural networks, which have a tendency to suffer from the exploding gradient problem. An example usage is
 
 ```julia
-opt = Optimiser(ClipValue(1e-3), ADAM(1e-3))
+opt = Optimiser(ClipValue(1e-3), Adam(1e-3))
 ```
 
 ```@docs

diff --git a/src/Flux.jl b/src/Flux.jl
@@ -29,9 +29,9 @@ include("optimise/Optimise.jl")
 using .Optimise
 using .Optimise: @epochs
 using .Optimise: skip
-export Descent, ADAM, Momentum, Nesterov, RMSProp,
-  ADAGrad, AdaMax, ADADelta, AMSGrad, NADAM, OADAM,
-  ADAMW, RADAM, AdaBelief, InvDecay, ExpDecay,
+export Descent, Adam, Momentum, Nesterov, RMSProp,
+  AdaGrad, AdaMax, AdaDelta, AMSGrad, NAdam, OAdam,
+  AdamW, RAdam, AdaBelief, InvDecay, ExpDecay,
   WeightDecay, ClipValue, ClipNorm
 
 using CUDA

diff --git a/src/deprecations.jl b/src/deprecations.jl
@@ -71,3 +71,12 @@ LSTMCell(in::Integer, out::Integer; kw...) = LSTMCell(in => out; kw...)
 
 GRUCell(in::Integer, out::Integer; kw...) = GRUCell(in => out; kw...)
 GRUv3Cell(in::Integer, out::Integer; kw...) = GRUv3Cell(in => out; kw...)
+
+# Optimisers with old naming convention
+Base.@deprecate_binding ADAM Adam
+Base.@deprecate_binding NADAM NAdam
+Base.@deprecate_binding ADAMW AdamW
+Base.@deprecate_binding RADAM RAdam
+Base.@deprecate_binding OADAM OAdam
+Base.@deprecate_binding ADAGrad AdaGrad
+Base.@deprecate_binding ADADelta AdaDelta
diff --git a/src/optimise/Optimise.jl b/src/optimise/Optimise.jl
@@ -4,8 +4,8 @@ using LinearAlgebra
 import ArrayInterface
 
 export train!, update!,
-	Descent, ADAM, Momentum, Nesterov, RMSProp,
-	ADAGrad, AdaMax, ADADelta, AMSGrad, NADAM, ADAMW,RADAM, OADAM, AdaBelief,
+	Descent, Adam, Momentum, Nesterov, RMSProp,
+	AdaGrad, AdaMax, AdaDelta, AMSGrad, NAdam, AdamW,RAdam, OAdam, AdaBelief,
 	InvDecay, ExpDecay, WeightDecay, stop, skip, Optimiser,
 	ClipValue, ClipNorm