Skip to content

Commit

Permalink
Update README's and the documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
Tony-Y committed Nov 12, 2024
1 parent 8c60431 commit f113591
Show file tree
Hide file tree
Showing 3 changed files with 33 additions and 12 deletions.
31 changes: 23 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,15 +26,21 @@ pip install -U pytorch_warmup
## Examples

* [CIFAR10](https://github.com/Tony-Y/pytorch_warmup/tree/master/examples/cifar10) -
A sample script to train a ResNet20 model on the CIFAR10 dataset using an optimization algorithm with a warmup.
A sample script to train a ResNet model on the CIFAR10 dataset using an optimization algorithm with a warmup schedule.
Its README presents ResNet20 results obtained using each of AdamW, NAdamW, AMSGradW, and AdaMax
together with each of various warmup schedules.
In addition, there is a ResNet performance comparison (up to ResNet110) obtained using the SGD algorithm
with a linear warmup schedule.
* [EMNIST](https://github.com/Tony-Y/pytorch_warmup/tree/master/examples/emnist) -
A sample script to train a CNN model on the EMNIST dataset using the Adam algorithm with a warmup.
A sample script to train a CNN model on the EMNIST dataset using the AdamW algorithm with a warmup schedule.
Its README presents a result obtained using the AdamW algorithm with each of the untuned linear and exponential warmup,
and the RAdam warmup.
* [Plots](https://github.com/Tony-Y/pytorch_warmup/tree/master/examples/plots) -
A script to plot effective warmup periods as a function of β₂, and warmup schedules over time.

## Usage

The [Documentation](https://tony-y.github.io/pytorch_warmup/) provides more detailed information on this library, unseen below.
The [documentation](https://tony-y.github.io/pytorch_warmup/) provides more detailed information on this library, unseen below.

### Sample Codes

Expand Down Expand Up @@ -165,7 +171,7 @@ for epoch in range(epochs+warmup_epochs):

#### Manual Warmup

The warmup factor `w(t)` depends on the warmup period, which must manually be specified, for `LinearWarmup` and `ExponentialWarmup`.
In `LinearWarmup` and `ExponentialWarmup`, the warmup factor `w(t)` depends on the warmup period that must manually be specified.

##### Linear

Expand All @@ -175,6 +181,8 @@ The warmup factor `w(t)` depends on the warmup period, which must manually be sp
warmup_scheduler = warmup.LinearWarmup(optimizer, warmup_period=2000)
```

For details please refer to [LinearWarmup](https://tony-y.github.io/pytorch_warmup/manual_warmup.html#pytorch_warmup.base.LinearWarmup) in the documentation.

##### Exponential

`w(t) = 1 - exp(-t / warmup_period)`
Expand All @@ -183,9 +191,11 @@ warmup_scheduler = warmup.LinearWarmup(optimizer, warmup_period=2000)
warmup_scheduler = warmup.ExponentialWarmup(optimizer, warmup_period=1000)
```

For details please refer to [ExponentialWarmup](https://tony-y.github.io/pytorch_warmup/manual_warmup.html#pytorch_warmup.base.ExponentialWarmup) in the documentation.

#### Untuned Warmup

The warmup period is determined by a function of Adam's `beta2` parameter for `UntunedLinearWarmup` and `UntunedExponentialWarmup`.
In `UntunedLinearWarmup` and `UntunedExponentialWarmup`, the warmup period is determined by a function of Adam's `beta2` parameter.

##### Linear

Expand All @@ -195,6 +205,8 @@ The warmup period is determined by a function of Adam's `beta2` parameter for `U
warmup_scheduler = warmup.UntunedLinearWarmup(optimizer)
```

For details please refer to [UntunedLinearWarmup](https://tony-y.github.io/pytorch_warmup/untuned_warmup.html#pytorch_warmup.untuned.UntunedLinearWarmup) in the documentation.

##### Exponential

`warmup_period = 1 / (1 - beta2)`
Expand All @@ -203,16 +215,19 @@ warmup_scheduler = warmup.UntunedLinearWarmup(optimizer)
warmup_scheduler = warmup.UntunedExponentialWarmup(optimizer)
```

For details please refer to [UntunedExponentialWarmup](https://tony-y.github.io/pytorch_warmup/untuned_warmup.html#pytorch_warmup.untuned.UntunedExponentialWarmup) in the documentation.

#### RAdam Warmup

The warmup factor depends on Adam's `beta2` parameter for `RAdamWarmup`. For details please refer to the
[Documentation](https://tony-y.github.io/pytorch_warmup/radam_warmup.html) or
"[On the Variance of the Adaptive Learning Rate and Beyond](https://arxiv.org/abs/1908.03265)."
In `RAdamWarmup`, the warmup factor `w(t)` is a complicated function depending on Adam's `beta2` parameter.

```python
warmup_scheduler = warmup.RAdamWarmup(optimizer)
```

For details please refer to [RAdamWarmup](https://tony-y.github.io/pytorch_warmup/radam_warmup.html#pytorch_warmup.radam.RAdamWarmup) in the documentation, or
"[On the Variance of the Adaptive Learning Rate and Beyond](https://arxiv.org/abs/1908.03265)."

### Apex's Adam

The Apex library provides an Adam optimizer tuned for CUDA devices, [FusedAdam](https://nvidia.github.io/apex/optimizers.html#apex.optimizers.FusedAdam). The FusedAdam optimizer can be used together with any one of the warmup schedules above. For example:
Expand Down
10 changes: 8 additions & 2 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,10 +48,16 @@ Examples
:target: https://colab.research.google.com/github/Tony-Y/colab-notebooks/blob/master/PyTorch_Warmup_Approach1_chaining.ipynb

* `CIFAR10 <https://github.com/Tony-Y/pytorch_warmup/tree/master/examples/cifar10>`_ -
A sample script to train a ResNet20 model on the CIFAR10 dataset using an optimization algorithm with a warmup.
A sample script to train a ResNet model on the CIFAR10 dataset using an optimization algorithm with a warmup schedule.
Its README presents ResNet20 results obtained using each of AdamW, NAdamW, AMSGradW, and AdaMax
together with each of various warmup schedules.
In addition, there is a ResNet performance comparison (up to ResNet110) obtained using the SGD algorithm
with a linear warmup schedule.

* `EMNIST <https://github.com/Tony-Y/pytorch_warmup/tree/master/examples/emnist>`_ -
A sample script to train a CNN model on the EMNIST dataset using the Adam algorithm with a warmup.
A sample script to train a CNN model on the EMNIST dataset using the AdamW algorithm with a warmup schedule.
Its README presents a result obtained using the AdamW algorithm with each of the untuned linear and exponential warmup,
and the RAdam warmup.

* `Plots <https://github.com/Tony-Y/pytorch_warmup/tree/master/examples/plots>`_ -
A script to plot effective warmup periods as a function of :math:`\beta_{2}`, and warmup schedules over time.
Expand Down
4 changes: 2 additions & 2 deletions examples/cifar10/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -328,7 +328,7 @@ The exponential decay rates of Adam variants are set as $\beta_{1} = 0.9$ and $\
| Expo-5k | `8.60 ± 0.19` | `8.63 ± 0.23` |
| Linear-10k | `8.53 ± 0.09` | `8.43 ± 0.14` |

## ResNet Performance Comparison for the SGD Algorithm
## ResNet Performance Comparison

We employ the ResNet20, ResNet32, ResNet44, ResNet56, and ResNet110 architecture for comparison.
The SGD with momentum is used as the optimization algorithm.
Expand All @@ -351,7 +351,7 @@ The other implementation details are described in Supplemental Information above
This bar chart presents the mean values. The error bar indicates the standard deviation.</i>
</p>

### SGD with Momentum
### SGD

The top-1 errors are shown as mean ± std.

Expand Down

0 comments on commit f113591

Please sign in to comment.