Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tutorials : Pretraining VGG from Scrtach #2971

Open
wants to merge 59 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
327738e
modify spelling
woongjoonchoi Mar 5, 2022
c53e7d4
Merge branch 'pytorch:main' into master
woongjoonchoi Jul 13, 2024
6e92650
feat : tutorials
woongjoonchoi Jul 13, 2024
124e01f
Merge branch 'main' into master
svekars Jul 15, 2024
c401bcb
modify : tutorial
woongjoonchoi Jul 16, 2024
981ded5
Merge remote-tracking branch 'origin/master'
woongjoonchoi Jul 16, 2024
32ed435
modify : VGG training from scratch
woongjoonchoi Jul 16, 2024
3ae5d7d
Merge branch 'main' into master
woongjoonchoi Jul 18, 2024
9297a3d
Merge branch 'main' into master
woongjoonchoi Jul 23, 2024
f8dbb6e
modify : pyspellchecker
woongjoonchoi Jul 24, 2024
13709a5
modify :Training VGG froms scratch
woongjoonchoi Jul 24, 2024
b2cd7bf
modify : Pretraining VGG from scrach
woongjoonchoi Jul 24, 2024
af08545
modify : Pretraining VGG from scrach
woongjoonchoi Jul 24, 2024
95b16d4
modify : Pretraining VGG from scratch
woongjoonchoi Jul 24, 2024
2f3f93a
Merge branch 'main' into master
woongjoonchoi Jul 24, 2024
cd53e64
Merge branch 'main' into master
svekars Jul 29, 2024
2c891c9
modify : VGG tutorial
woongjoonchoi Aug 9, 2024
07194b5
Merge remote-tracking branch 'origin/master'
woongjoonchoi Aug 9, 2024
ef6cf15
Merge branch 'main' into master
woongjoonchoi Aug 9, 2024
7938635
Merge branch 'main' into master
woongjoonchoi Aug 20, 2024
f706a64
modify : VGG tutorial
woongjoonchoi Sep 16, 2024
4cb9238
Merge branch 'main' into master
woongjoonchoi Sep 16, 2024
993d32a
Update Pretraining_Vgg_from_scratch.py
svekars Sep 16, 2024
d25d920
Apply suggestions from code review
svekars Sep 16, 2024
e6c7a9b
Apply suggestions from code review
svekars Sep 16, 2024
be72b01
Fix indentation
svekars Sep 16, 2024
c900f66
modify: VGG tutorial
woongjoonchoi Sep 17, 2024
c9723cf
modify: VGG tutorial
woongjoonchoi Sep 17, 2024
2db1099
modify : VGG tutorial
woongjoonchoi Sep 17, 2024
a6e8ca2
Update index.rst
woongjoonchoi Sep 24, 2024
7f58d3c
Update index.rst
svekars Sep 24, 2024
f11d798
Update index.rst
svekars Sep 24, 2024
61f65fd
Merge branch 'main' into master
woongjoonchoi Sep 25, 2024
68920a0
Merge branch 'main' into master
woongjoonchoi Sep 26, 2024
ef33eaa
Fix rendering
svekars Sep 26, 2024
d4bec3a
Update beginner_source/Pretraining_Vgg_from_scratch.py
svekars Sep 26, 2024
a0547f5
Merge branch 'main' into master
woongjoonchoi Oct 3, 2024
787bca7
Merge branch 'main' into master
woongjoonchoi Oct 7, 2024
0187f0f
Merge branch 'main' into master
woongjoonchoi Oct 8, 2024
092bf1a
Merge branch 'pytorch:main' into master
woongjoonchoi Nov 16, 2024
ed09fb2
modify :Pretraing VGG tutorial
woongjoonchoi Nov 16, 2024
a54832b
modify : Pretraining VGG from sctratch
woongjoonchoi Nov 16, 2024
66f5a78
Merge branch 'pytorch:main' into master
woongjoonchoi Nov 20, 2024
066297d
modify : VGG pretraining
woongjoonchoi Nov 20, 2024
bb15e46
Merge remote-tracking branch 'origin2/master'
woongjoonchoi Nov 20, 2024
29d6ee5
Update .ci/docker/requirements.txt
svekars Dec 2, 2024
8819481
Merge branch 'main' into master
svekars Dec 2, 2024
be4c9f1
modify : Pretraining_VGG_from_scratch.rst
woongjoonchoi Dec 3, 2024
36d2ad9
Merge remote-tracking branch 'origin2/master'
woongjoonchoi Dec 3, 2024
49b246c
Merge branch 'main' into master
woongjoonchoi Dec 8, 2024
d5796d7
Merge branch 'main' into master
woongjoonchoi Dec 13, 2024
5ee88a7
Merge branch 'pytorch:main' into master
woongjoonchoi Dec 14, 2024
cda8bba
modify : Pretraining Vgg from scratch
woongjoonchoi Dec 17, 2024
b4d6d9c
modify : Pretraining Vgg from scratch
woongjoonchoi Dec 17, 2024
e8311bd
modify :Pretraining _VGG from scrtach.rst
woongjoonchoi Dec 17, 2024
18dbfc3
Merge branch 'main' into master
svekars Dec 17, 2024
e1b1f4c
Merge branch 'main' into master
woongjoonchoi Dec 25, 2024
c9a1c81
Merge branch 'main' into master
woongjoonchoi Jan 2, 2025
264543e
Merge branch 'main' into master
svekars Mar 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
modify : pyspellchecker
woongjoonchoi committed Jul 24, 2024
commit f8dbb6e343e8d71f489f3b26a5ac4d35f72d7cbf
77 changes: 30 additions & 47 deletions beginner_source/Pretraining_Vgg_from_scratch.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""
Pretraining VGG from scratch
``Pretraining`` VGG from scratch
============================


@@ -55,7 +55,7 @@
# - We train the model from scratch using only the configuration
# presented in the paper.
#
# - we do not use future method, like BatchNormalization,Adam , He
# - we do not use future method, like Batch normalization,Adam , He
# initialization.
#
# - You can apply to ImageNet Data.
@@ -68,15 +68,15 @@


######################################################################
# Why Vgg is so popluar ?
# Why VGG is so popular ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest renaming this section to Background and possibly beefing up the history behind VGG a tad bit more. Specifically, what was the core innovation that it introduced to become the new SotA? (e.g. deep layers of smaller, consistent 3x3 kernel convolutions replacing the variable-sized filters from AlexNet).

# -----------------------
#


######################################################################
# VGG became a model that attracted attention because it succeeded in
# building deeper layers and dramatically shortening the training time
# compared to alexNet, which was the sota model at the time.:
# compared to alexnet, which was the SOTA model at the time.:
#


@@ -91,12 +91,12 @@
# this configuration will be explained below section.
#

DatasetName = 'Cifar' # Cifar ,Cifar10, Mnist , ImageNet
DatasetName = 'Cifar' # CIFAR ,CIFAR10, MNIST , ImageNet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DatasetName = 'Cifar' # CIFAR ,CIFAR10, MNIST , ImageNet
DatasetName = 'CIFAR' # CIFAR, CIFAR10, MNIST, ImageNet


## model configuration

num_classes = 100
# CalTech 257 Cifar 100 Cifar10 10 ,Mnist 10 ImageNet 1000
# Caltech 257 CIFAR 100 CIFAR10 10 ,MNIST 10 ImageNet 1000
model_version = None ## you must configure it.

## data configuration
@@ -119,7 +119,7 @@

update_count = int(256/batch_size)
accum_step = int(256/batch_size)
eval_step =26 * accum_step ## CalTech 5 Cifar 5 Mnist 6 , Cifar10 5 ImageNet 26
eval_step =26 * accum_step ## Caltech 5 CIFAR 5 MNIST 6 , CIFAR10 5 ImageNet 26


## model configuration
@@ -147,9 +147,9 @@


######################################################################
# We use ``CIFAR100`` Dataset in this tutorial. In Vgg paper , the authors
# scales image istropically . Then , they apply
# Normalization,RandomCrop,HorizontalFlip . So , we need to override
# We use ``CIFAR100`` Dataset in this tutorial. In VGG paper , the authors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# We use ``CIFAR100`` Dataset in this tutorial. In VGG paper , the authors
# We use the ``CIFAR100`` dataset in this tutorial. In VGG paper, the authors

# scales image isotropically . Then , they apply
# Normalization,``RandomCrop``,``HorizontalFlip`` . So , we need to override
# CIFAR100 class to apply preprocessing.
#

@@ -168,8 +168,7 @@ def __init__(self,root,transform = None,multi=False,s_max=None,s_min=256,downloa
A.Normalize(mean =(0.5071, 0.4867, 0.4408) , std = (0.2675, 0.2565, 0.2761)),
A.SmallestMaxSize(max_size=self.S),
A.RandomCrop(height =224,width=224),
A.HorizontalFlip(),
# A.RGBShift()
A.HorizontalFlip()
]

)
@@ -216,12 +215,12 @@ def __getitem__(self, index: int) :


######################################################################
# | In Vgg paper, they do experiment over 6 models. model A is 11 layers,
# model B is 13 layers, model C is 16 layers , model D is 16 laeyrs and
# | In VGG paper, they do experiment over 6 models. model A is 11 layers,
# model B is 13 layers, model C is 16 layers , model D is 16 layers and
# model D is 19 layers . you can train all version of models to
# reproduce VGG .
# | ``Config_Channels`` means output channels and ``Config_kernels`` means
# kerenl size.
# kernel size.
#

import torch
@@ -284,8 +283,7 @@ def __init__(self,version , num_classes):
self.num_classes = num_classes
self.linear_out = 4096
self.xavier_count = xavier_count
self.last_xavier= last_xavier ## if >0 , initialize last 3 fully connected noraml distribution
# conv_1_by_1_3_outchannel = num_classes
self.last_xavier= last_xavier ## if >0 , initialize last 3 fully connected normal distribution
self.except_xavier = except_xavier

super().__init__()
@@ -307,8 +305,6 @@ def __init__(self,version , num_classes):
print('weight intialize end')
def forward(self,x):
x = self.feature_extractor(x)
# x= self.avgpool(x) ## If Linear is output, use this
# x= torch.flatten(x,start_dim = 1) ## If Linear is output, use this
x = self.output_layer(x)
x= self.avgpool(x)
x= torch.flatten(x,start_dim = 1)
@@ -318,15 +314,12 @@ def forward(self,x):
@torch.no_grad()
def _init_weights(self,m):

# print(m)
if isinstance(m,nn.Conv2d):
print('-------------')
print(m.kernel_size)
print(m.out_channels)
# if (m.out_channels == self.num_classes or m.out_channels == self.linear_out) and self.last_xavier>0 :
if self.last_xavier>0 and (self.except_xavier is None or self.last_xavier!=self.except_xavier):
print('xavier')
# self.last_xavier-=1
nn.init.xavier_uniform_(m.weight)
elif self.xavier_count >0 :
print('xavier')
@@ -335,10 +328,8 @@ def _init_weights(self,m):
else :
std = 0.1
print(f'normal std : {std}')

torch.nn.init.normal_(m.weight,std=std)
# if (m.out_channels == self.num_classes or m.out_channels == self.linear_out) :
# self.last_xavier+=10

self.last_xavier +=1
if m.bias is not None :
print('bias zero init')
@@ -361,21 +352,21 @@ def _init_weights(self,m):


######################################################################
# When training Vgg , the authors first train model A , then initialized
# When training VGG , the authors first train model A , then initialized
# the weights of other models with the weights of model A. Waiting for
# Model A to be trained takes a long time . The authors mention how to
# train with xavier initialization rather than initializing with the
# train with ``xavier`` initialization rather than initializing with the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capitalization nit: xavier -> Xavier throughout

# weights of model A. But, they do not mention how to initialize .
#
# | To Reproduce Vgg , we use xavier initialization method to initialize
# weights. We apply initialization to few first layes and last layers.
# | To Reproduce VGG , we use ``xavier`` initialization method to initialize
# weights. We apply initialization to few first layers and last layers.
# Then , we apply random initialization to other layers.
# | **we must fix stdandrad deviation to 0.1**. If standard deviation is
# | **we must fix standard deviation to 0.1**. If standard deviation is
# larger than 0.1, the weight get NAN values. For stability, we use 0.1
# for standard deviation.
# | The ``front_xavier`` means how many layers we initialize with xavier
# | The ``front_xavier`` means how many layers we initialize with ``xavier``
# initialization in front of layers and The ``last_xavier`` means how
# many layers we initializae with xavier initialization in last of
# many layers we initialize with ``xavier`` initialization in last of
# layers.
#
# In My experiment, we can use ``front_xavier`` = 4 , ``last_xavier``\ =5
@@ -406,17 +397,15 @@ def accuracy(output, target, topk=(1,)):

res = []
for k in topk:
# print(f'top {k}')
correct_k = correct[:k].reshape(-1).float().sum(0,keepdim=True)
# res.append(correct_k.mul_(100.0 / batch_size))
res.append(correct_k)
return res


######################################################################
# we initiate model and loss function and optimizer and schedulers. In
# vgg, they use softmax output ,Momentum Optimizer , and Scheduling based
# on accuarcy.
# VGG, they use softmax output ,Momentum Optimizer , and Scheduling based
# on accuracy.
#

model = Model_vgg(model_version,num_classes)
@@ -440,9 +429,7 @@ def accuracy(output, target, topk=(1,)):
[
A.Normalize(mean =(0.5071, 0.4867, 0.4408) , std = (0.2675, 0.2565, 0.2761)),
A.SmallestMaxSize(max_size=val_data.S),
A.CenterCrop(height =224,width=224),
# A.HorizontalFlip(),
# A.RGBShift()
A.CenterCrop(height =224,width=224)
]

)
@@ -492,7 +479,6 @@ def accuracy(output, target, topk=(1,)):
if i> 0 and i%update_count == 0 :
print(f'Training steps : {i} parameter update loss :{total_loss} ')
if grad_clip is not None:
# print(f'Training steps : {i} parameter grad clip to {grad_clip}')
torch.nn.utils.clip_grad_norm_(model.parameters(), grad_clip)
optimizer.step()
optimizer.zero_grad(set_to_none=True)
@@ -594,8 +580,7 @@ def __init__(self,root,transform = None,multi=False,s_max=None,s_min=256,split=N
A.Normalize(),
A.SmallestMaxSize(max_size=self.S),
A.RandomCrop(height =224,width=224),
A.HorizontalFlip(),
# A.RGBShift()
A.HorizontalFlip()
]

)
@@ -644,17 +629,15 @@ def __getitem__(self, index: int) :
[
A.Normalize(),
A.SmallestMaxSize(max_size=val_data.S),
A.CenterCrop(height =224,width=224),
# A.HorizontalFlip(),
# A.RGBShift()
A.CenterCrop(height =224,width=224)
]

)

######################################################################
# Conculsion
# ----------
# We have seen how pretraining VGG from scratch . This Tutorial will be helpful to reproduce another Foundation Model .
# We have seen how ``pretraining`` VGG from scratch . This Tutorial will be helpful to reproduce another Foundation Model .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be expanded a bit, and I'm not sure what is meant by the Foundation Model comment


######################################################################
# More things to try
@@ -668,5 +651,5 @@ def __getitem__(self, index: int) :
# Further Reading
# ---------------

# - `VGG training using python script <https://github.com/woongjoonchoi/DeepLearningPaper-Reproducing/tree/master/Vgg>`__
# - `VGG training using python script <https://github.com/woongjoonchoi/DeepLearningPaper-Reproducing/tree/master/VGG>`__
# - `VGG paper <https://arxiv.org/abs/1409.1556>`__