VGG 的实现

原文：towardsdatascience.com/an-implementation-of-vgg-dea082804e14

面向初学者的教程

Mina Ghashami

·发表于 Towards Data Science ·9 分钟阅读·2023 年 10 月 31 日

--

在这篇文章中，我们查看 VGG 的实现及其在 STL10 [2, 3]数据集上的训练。

我们在上一篇文章中回顾了 VGG 架构。如果不熟悉，请查看。

## 面向初学者的图像分类

VGG 和 ResNet 架构自 2014 年

[towardsdatascience.com

简而言之，

VGG 代表 视觉几何组 ，是牛津大学的一个研究小组。2014 年，他们设计了一种用于图像分类任务的深度卷积神经网络架构，并以他们自己命名；即 VGG [1].

VGGNet 有几种配置，如 VGG16（16 层）和 VGG19（19 层）。

VGG16 的架构如下：它有 13 个卷积层和 3 个全连接层。

作者提供的图像

模型实现

让我们在 PyTorch 中实现 VGG16。

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
import numpy as np
import matplotlib.pyplot as plt

class VGG16(nn.Module):
    def __init__(self, input_channel, num_classes):
        super(VGG16, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(input_channel, 64, kernel_size=3, padding=1), nn.ReLU(inplace=True),
            nn.Conv2d(64, 64, kernel_size=3, padding=1), nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(64, 128, kernel_size=3, padding=1), nn.ReLU(inplace=True),
            nn.Conv2d(128, 128, kernel_size=3, padding=1), nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(128, 256, kernel_size=3, padding=1), nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1), nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1), nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(256, 512, kernel_size=3, padding=1), nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1), nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1), nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(512, 512, kernel_size=3, padding=1), nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1), nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1), nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        self.classifier = nn.Sequential(
            nn.Linear(512 * 3* 3, 4096), nn.ReLU(True), nn.Dropout(),
            nn.Linear(4096, 4096), nn.ReLU(True), nn.Dropout(),
            nn.Linear(4096, num_classes)
        )

请注意，实现的结构基于两个属性：

特征：包含所有的卷积层和最大池化层
分类器：包含全连接层和用于分类的 softmax 层

还需注意，我们将input_channel作为输入参数传递。该参数为 3 时表示图像为彩色，为 1 时表示图像为灰度。

最后但同样重要的是，第一个全连接层是nn.Linear(512 * 3 * 3, 4096). 之所以输入维度是512 * 3 * 3，是因为我们设置它以适应输入图像为96 * 96。如果我们传递不同尺寸的图像，则需要更改此值。例如，对于 224 * 224 的图像，该层变为nn.Linear(512 * 7 * 7, 4096).

然后我们实现*forward()*方法：

def forward(self, x):
  layer_outputs = []
  for i in range(len(self.features)):
    x = self.featuresi
    layer_outputs.append(x)

  x = x.view(x.size(0), -1)

  for i in range(len(self.classifier)):
    x = self.classifieri
    layer_outputs.append(x)

  return x, layer_outputs

现在网络已经完成，让我们通过它传递一个随机张量，并查看它在经过各层时形状的变化。

vgg_model = VGG16(3, 10)
input_tensor = torch.rand(1, 3, 96, 96)
x, layer_outputs = vgg_model(input_tensor)
for l in layer_outputs:
  print(l.shape)

它打印出以下形状：

 torch.Size([1, 64, 96, 96])
torch.Size([1, 64, 96, 96])
torch.Size([1, 64, 96, 96])
torch.Size([1, 64, 96, 96])
torch.Size([1, 64, 48, 48])
torch.Size([1, 128, 48, 48])
torch.Size([1, 128, 48, 48])
torch.Size([1, 128, 48, 48])
torch.Size([1, 128, 48, 48])
torch.Size([1, 128, 24, 24])
torch.Size([1, 256, 24, 24])
torch.Size([1, 256, 24, 24])
torch.Size([1, 256, 24, 24])
torch.Size([1, 256, 24, 24])
torch.Size([1, 256, 24, 24])
torch.Size([1, 256, 24, 24])
torch.Size([1, 256, 12, 12])
torch.Size([1, 512, 12, 12])
torch.Size([1, 512, 12, 12])
torch.Size([1, 512, 12, 12])
torch.Size([1, 512, 12, 12])
torch.Size([1, 512, 12, 12])
torch.Size([1, 512, 12, 12])
torch.Size([1, 512, 6, 6])
torch.Size([1, 512, 6, 6])
torch.Size([1, 512, 6, 6])
torch.Size([1, 512, 6, 6])
torch.Size([1, 512, 6, 6])
torch.Size([1, 512, 6, 6])
torch.Size([1, 512, 6, 6])
torch.Size([1, 512, 3, 3])
torch.Size([1, 4096])
torch.Size([1, 4096])
torch.Size([1, 4096])
torch.Size([1, 4096])
torch.Size([1, 4096])
torch.Size([1, 4096])
torch.Size([1, 10])

所以最终输出是一个 10 维的向量，表示图像属于 10 个类别中任何一个类别的概率。

数据转换 — STL10

现在，让我们在 STL10 数据集[2,3]上进行训练，该数据集已获得商业使用许可。此数据集包含 5000 张彩色图像，分为 10 个类别。

每张图像为 96x96 像素，10 个类别如下：

classes = ('airplane', 'bird', 'car', 'cat', 'deer', 'dog',\
           'horse', 'monkey', 'ship', 'truck')

让我们加载数据并查看一些图像：

 transform = transforms.Compose([
    transforms.ToTensor()
])

trainset = torchvision.datasets.STL10(root = './data', split = 'train', download = True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=len(trainset))

classes = ('airplane', 'bird', 'car', 'cat', 'deer', 'dog',\
           'horse', 'monkey', 'ship', 'truck')

images, target = next(iter(trainloader))

np_images = images.numpy() # convert to numpy

# display one image
plt.imshow(np.transpose(np_images[0], (1, 2, 0)))
plt.title(f'class: {classes[target[0]]}')
plt.axis('off')
plt.show()

# display another image
plt.imshow(np.transpose(np_images[1], (1, 2, 0)))
plt.title(f'class: {classes[target[1]]}')
plt.axis('off')
plt.show()

它打印这些图像及其标签：

图片来源于作者

接下来，让我们对数据进行标准化。为了标准化数据，我们首先计算均值和标准差：

transform = transforms.Compose([
    transforms.ToTensor()
])

trainset = torchvision.datasets.STL10(root = './data', split = 'train', download = True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=len(trainset))

classes = ('airplane', 'bird', 'car', 'cat', 'deer', 'dog',\
           'horse', 'monkey', 'ship', 'truck')

images, target = next(iter(trainloader))

np_images = images.numpy() # convert to numpy. 

# calculate mean and std for each channel 
mean = np.mean(np_images, axis=(0,2,3)) 
std = np.std(np_images, axis=(0,2,3))

请注意，在trainloader中，我们设置了batch_size = len(trainset)，以便加载整个数据集来计算均值和标准差。之后，当我们想要训练模型时，我们将数据以 128 张图像的小批量加载。

从上面可以看出，np_images的形状是(5000, 3, 96, 96)，即 5000 张 96x96 像素的彩色图像（注意通道数为 3，表示图像是彩色的）。因此，均值和标准差如下：

**均值 = [**0.44671103, 0.43980882, 0.40664575]

标准差 = [0.2603408, 0.25657743, 0.2712671

我们将使用这个均值和标准差来标准化测试数据和训练数据。让我们定义每个数据集的转换：

# train transformation
transform_train = transforms.Compose([
    transforms.RandomCrop(96, padding = 4), # we first pad by 4 pixels on each side then crop
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.44671103, 0.43980882, 0.40664575), (0.2603408 , 0.25657743, 0.2712671))
])

# test transformation
transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.44671103, 0.43980882, 0.40664575), (0.2603408 , 0.25657743, 0.2712671))
])

trainset = torchvision.datasets.STL10(root = './data', split = 'train', download = True, transform=transform_train)
trainloader = torch.utils.data.DataLoader(trainset, batch_size = 128, shuffle = True, num_workers = 2)

testset = torchvision.datasets.STL10(root = './data', split = 'test', download = True, transform=transform_test)
testloader = torch.utils.data.DataLoader(testset, batch_size = 256, shuffle = True, num_workers = 2)In above transformation that we have defined on train data you see that we are augmenting the data by cropping a random 28x28 patch and flipping it. The reason we augment the data is to increase diversity in the training data and force the model to learn better.

训练模型

我们首先定义超参数，例如：

学习率
学习率调度器
损失函数：用于分类的交叉熵
优化器

# instantiate the model
vgg_model = VGG16(input_channel=1, num_classes=10) 
device = 'cuda' if torch.cuda.is_available() else 'cpu'
vgg_model = vgg_model.to(device)

# define hyper-parameters: learning rate, optimizer, scheduler
lr = 0.00001
criterion = nn.CrossEntropyLoss()
vgg_optimizer = optim.SGD(vgg_model.parameters(), lr = lr, momentum=0.9, weight_decay = 5e-4)
vgg_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(vgg_optimizer, T_max = 200)And we write the train function on each batch:

接下来，我们定义两个函数：

方法 1： train_batch：对于数据中的所有批次，它训练模型，计算损失并更新参数。此方法应用反向传播并计算训练损失。

def train_batch(epoch, model, optimizer):
    print("epoch ", epoch)
    model.train()
    train_loss = 0
    correct = 0
    total = 0

    for batch_idx, (input, targets) in enumerate(trainloader):
        inputs, targets = input.to(device), targets.to(device)
        optimizer.zero_grad()
        outputs, _ = model(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

        train_loss += loss.item()
        _, predicted = outputs.max(1)
        total += targets.size(0)
        correct += predicted.eq(targets).sum().item()
    print(batch_idx, len(trainloader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)'
                         % (train_loss/(batch_idx+1), 100.*correct/total, correct, total))

方法 2：是 validate_batch 函数，其中我们在测试加载器的一个批次上验证模型。通常，在每个训练周期之后，我们调用此函数来获取模型在每个训练周期末的性能。此函数计算测试集（即未见数据）的损失，并且不进行任何反向传播。

def validate_batch(epoch, model):
    model.eval()
    test_loss = 0
    correct = 0
    total = 0
    with torch.no_grad():
        for batch_idx, (inputs, targets) in enumerate(testloader):
            inputs, targets = inputs.to(device), targets.to(device)
            outputs,_ = model(inputs)
            loss = criterion(outputs, targets)

            test_loss += loss.item()
            _, predicted = outputs.max(1)
            total += targets.size(0)
            correct += predicted.eq(targets).sum().item()

    print(batch_idx, len(testloader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)'
                 % (test_loss/(batch_idx+1), 100.*correct/total, correct, total))

让实际训练开始 …

对于每一个训练周期，我们训练模型并检查模型在测试数据集上的表现。我们调用vgg_scheduler.step()，然后通知调度器递增其内部计数器并更新学习率。

start_epoch = 0
for epoch in range(start_epoch, start_epoch+20):
    train_batch(epoch, vgg_model, vgg_optimizer)
    validate_batch(epoch, vgg_model)
    vgg_scheduler.step()

我们看到以下性能：

epoch  0
390 391 Loss: 5.506 | Acc: 24.864% (12432/50000)
39 40 Loss: 4.512 | Acc: 49.780% (4978/10000)
epoch  1
390 391 Loss: 5.140 | Acc: 33.226% (16613/50000)
39 40 Loss: 4.156 | Acc: 57.120% (5712/10000)
epoch  2
390 391 Loss: 4.978 | Acc: 36.594% (18297/50000)
39 40 Loss: 3.953 | Acc: 60.450% (6045/10000)
epoch  3
390 391 Loss: 4.908 | Acc: 38.498% (19249/50000)
39 40 Loss: 3.898 | Acc: 69.430% (6943/10000)
epoch  4
390 391 Loss: 4.827 | Acc: 39.982% (19991/50000)
39 40 Loss: 3.631 | Acc: 68.240% (6824/10000)
epoch  5
390 391 Loss: 4.767 | Acc: 40.876% (20438/50000)
39 40 Loss: 3.677 | Acc: 71.260% (7126/10000)
epoch  6
390 391 Loss: 4.686 | Acc: 42.356% (21178/50000)
39 40 Loss: 3.180 | Acc: 73.560% (7356/10000)
epoch  7
390 391 Loss: 4.664 | Acc: 42.606% (21303/50000)
39 40 Loss: 3.259 | Acc: 76.920% (7692/10000)
epoch  8
390 391 Loss: 4.653 | Acc: 43.014% (21507/50000)
39 40 Loss: 3.118 | Acc: 77.150% (7715/10000)
epoch  9
390 391 Loss: 4.606 | Acc: 43.762% (21881/50000)
39 40 Loss: 2.961 | Acc: 75.850% (7585/10000)
epoch  10
390 391 Loss: 4.608 | Acc: 43.802% (21901/50000)
39 40 Loss: 2.840 | Acc: 81.130% (8113/10000)
epoch  11
390 391 Loss: 4.582 | Acc: 44.156% (22078/50000)
39 40 Loss: 2.878 | Acc: 80.810% (8081/10000)
....
...
..

我们看到模型在第 11 个周期的准确率达到 80.8%。

接下来，让我们查看 10 张图像以及模型对它们标签的预测：

model = vgg19_model

mean = [0.44671103, 0.43980882, 0.40664575]
std = [0.2603408 , 0.25657743, 0.2712671]

# Evaluate the model on random images and display results
for _ in range(10):
    # Get a random test image
    data, target = next(iter(testloader))

    # Get model's predictions
    output, _ = model(data.to(device))
    _, predicted = torch.max(output, 1)

    # Display the image along with predicted and actual labels
    # Unnormalize the image
    display_img = data[0]
    unnormalized_image = display_img.clone()  # Create a copy to avoid modifying the original tensor
    for i in range(3):
      unnormalized_image[i] = (unnormalized_image[i] * std[i]) + mean[i]
    plt.imshow(np.transpose(unnormalized_image.numpy(), (1, 2, 0)))
    plt.title(f'Predicted: {classes[predicted[0]]}, Actual: {classes[target[0]]}')
    plt.axis('off')
    plt.show()

例如，我们看到以下图像是一只鸟，模型正确地预测为鸟。

图片来源于作者

然后我们看到一个错误预测的例子，其中图像是一架飞机，但 VGG 将其预测为一只鸟：

图片来源于作者

这就结束了我们对 VGG 模型的实现部分。我们看到 VGG 具有非常深的架构和许多参数，但其实现非常直接，这归功于其架构的统一性。

到目前为止，我们已经回顾了 VGG 和 ResNet 的概念以及 VGG 的代码。在下一篇文章中，我们可以查看 ResNet 的实现。

如有任何评论或问题，请告知我。

如果你有任何问题或建议，请随时联系我：

邮箱: mina.ghashami@gmail.com

LinkedIn: www.linkedin.com/in/minaghashami/

参考资料

非常深层的卷积网络用于大规模图像识别
pytorch.org/vision/main/generated/torchvision.datasets.STL10.html
cs.stanford.edu/~acoates/stl10/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

an-implementation-of-vgg-dea082804e14.md

an-implementation-of-vgg-dea082804e14.md

VGG 的实现

面向初学者的教程

VGG 和 ResNet 架构自 2014 年

模型实现

数据转换 — STL10

训练模型

让实际训练开始 …

参考资料

Files

an-implementation-of-vgg-dea082804e14.md

Latest commit

History

an-implementation-of-vgg-dea082804e14.md

File metadata and controls

VGG 的实现

面向初学者的教程

VGG 和 ResNet 架构自 2014 年

模型实现

数据转换 — STL10

训练模型

让实际训练开始 …

参考资料