Build-Your-Own PyTorch Image Classifier via Transfer Learning


Thanks for taking an interest in my project! This originally started as an Image Classifier project I worked on in my Udacity Nanodegree program (highly recommended if you have the time & money!). In this notebook, I'll walk you through how the image classifier works. This does not explain how my command-line application works. This simply walks you through how the image classifier is trained and predicts. Please note...

  • I am not a deep learning expert, nor do I claim to be. This notebook is a way for me to help me understand the concepts I'm learning.
  • I am writing this under the assumption the reader has minimal knowledge of neural networks, but some knowledge of machine learning concepts.

In this project, we will train an image classifier leveraging a pretraind Convolutional Neural Network using the PyTorch framework to recognize different species of flowers. You can imagine using something like this in a phone app that tells you the name of the flower your camera is looking at. In practice you'd train this classifier, then export it for use in your application. We'll be using this dataset of 102 flower categories, you can see a few examples below.

The project is broken down into multiple steps:

  • Load and preprocess the image dataset
  • Train the image classifier on your dataset
  • Use the trained classifier to predict image content

Data Preparation


Importing Libraries

In [1]:
%matplotlib inline

import matplotlib.pyplot as plt
from PIL import Image

import numpy as np

import torch
from torch import nn, optim
import torch.nn.functional as F
from torchvision import datasets, transforms, models

from collections import OrderedDict
import time,json
from workspace_utils import active_session

Importing Data

In order for the neural network (BTW, whenever I say "model," I mean neural network) to train properly, our images need to be organized in folders named as their class name within training, testing, and validation folders. For example...

  • train
    • (Class Name to predict)
      • (Image File)
      • ...
    • (Class Name to predict)
    • ...
  • valid
  • test

If your images aren't already organized like this, it can easily be done with a function like split folders.

In [2]:
data_dir = 'flowers'
train_dir = data_dir + '/train'
valid_dir = data_dir + '/valid'
test_dir = data_dir + '/test'

Tensor Data Prep

We're going to use pre-trained networks that come with the torchvision library. These networks were trained on the ImageNet dataset. All we have to do is replace the fully-connected classifier and re-train that on our flowers images. But in order for us to use the pretrained networks, we need the data to be prepared in a certain way, which we'll do in the cell below.

There are 3 general requirements that our image data need to meet in order to work with the pre-trained networks...

  1. Resized to 224x224 pixels
  2. In Tensor form
    • In PyTorch, our data needs to be in the Tensor data type so we can load the images and pass it through our networks
    • Tensors can be viewed as NumPy Arrays. This discussion is way over my head, but it explains the difference between matrices and tensors if you're interested. Otherwise, just think of tensors as an extension of matrices/arrays. Run the below code to see what I mean. array = np.array([1,2,3]) tensor = torch.from_numpy(array) print(tensor)
  3. Normalized to means [0.485, 0.456, 0.406] & standard deviations [0.229, 0.224, 0.225]
    • To explain, image tensors in PyTorch have dimensions CxHxW (Color x Height x Width)
      • Each pixel can be described as either one number (for grayscale) or 3 numbers (RedGreenBlue for colors)
        • Each of these numbers need to be normalized to these means and standard deviations because this is how the ImageNet images were normalized. Otherwise, we'd be comparing apples to oranges.

This 1) defines transformations for our data, 2) creates Torch Dataset objects using ImageFolder, and 3) creates DataLoader objects that lets us work with our data.

In [3]:
# Transformations literally "transform" our images into tensors that work with our model
# We want our training transformations to have some whacky randomness to it (i.e RandomHorizontalFlip)
    # This helps our model generalize to new images.
data_transforms_train = transforms.Compose([transforms.Resize(224),
                                        transforms.RandomHorizontalFlip(),
                                        transforms.RandomRotation(30),
                                        transforms.RandomResizedCrop(224),
                                        transforms.ToTensor(),
                                        transforms.Normalize([0.485, 0.456, 0.406],
                                                            [0.229, 0.224, 0.225])])

# We don't want to mess with the images we're using for validation, 
    # so this is a more basic transform with no random mutations
data_transforms_test = transforms.Compose([transforms.Resize(224),
                                        transforms.CenterCrop(224),
                                        transforms.ToTensor(),
                                        transforms.Normalize([0.485, 0.456, 0.406],
                                                            [0.229, 0.224, 0.225])])

# Creating DataSet objects that "hold" our images
image_datasets_train = datasets.ImageFolder(train_dir,data_transforms_train)
image_datasets_valid = datasets.ImageFolder(valid_dir,data_transforms_test)
image_datasets_test = datasets.ImageFolder(test_dir,data_transforms_test)

# Creating DataLoader objects that let us work with our images in batches
train_dataloader = torch.utils.data.DataLoader(image_datasets_train,batch_size=64,shuffle=True)
valid_dataloader = torch.utils.data.DataLoader(image_datasets_valid,batch_size=64,shuffle=True)
test_dataloader = torch.utils.data.DataLoader(image_datasets_test,batch_size=64,shuffle=True)

# ImageFolder automatically maps the name of the image folders holding our data to the index that our model wil predict
    # If the first folder with images is "1", our model will predict "0" and we can use this mapping to map "0" to "1"
class_to_idx = image_datasets_train.class_to_idx
    # More on this later...

Label mapping

Our flower image folders aren't the actual names of the flowers. They're numbers instead. This json file has the mapping from flower names to folder labels. It was confusing for me to keep track of all the different labels when I worked on this project, but in sum, this is the flow...

1) Model predicts by giving us an index (the idx, between 0-101). 2) This idx corresponds to one of the folder labels (between 1-102) 3) The folder labels (between 1-102) correspond to flower names for which this json file helps us map.

In [4]:
file = 'cat_to_name.json'
In [5]:
with open(file, 'r') as f:
    cat_to_name = json.load(f)
In [6]:
no_output_categories = len(cat_to_name)
In [7]:
cat_to_name
Out[7]:
{'21': 'fire lily',
 '3': 'canterbury bells',
 '45': 'bolero deep blue',
 '1': 'pink primrose',
 '34': 'mexican aster',
 '27': 'prince of wales feathers',
 '7': 'moon orchid',
 '16': 'globe-flower',
 '25': 'grape hyacinth',
 '26': 'corn poppy',
 '79': 'toad lily',
 '39': 'siam tulip',
 '24': 'red ginger',
 '67': 'spring crocus',
 '35': 'alpine sea holly',
 '32': 'garden phlox',
 '10': 'globe thistle',
 '6': 'tiger lily',
 '93': 'ball moss',
 '33': 'love in the mist',
 '9': 'monkshood',
 '102': 'blackberry lily',
 '14': 'spear thistle',
 '19': 'balloon flower',
 '100': 'blanket flower',
 '13': 'king protea',
 '49': 'oxeye daisy',
 '15': 'yellow iris',
 '61': 'cautleya spicata',
 '31': 'carnation',
 '64': 'silverbush',
 '68': 'bearded iris',
 '63': 'black-eyed susan',
 '69': 'windflower',
 '62': 'japanese anemone',
 '20': 'giant white arum lily',
 '38': 'great masterwort',
 '4': 'sweet pea',
 '86': 'tree mallow',
 '101': 'trumpet creeper',
 '42': 'daffodil',
 '22': 'pincushion flower',
 '2': 'hard-leaved pocket orchid',
 '54': 'sunflower',
 '66': 'osteospermum',
 '70': 'tree poppy',
 '85': 'desert-rose',
 '99': 'bromelia',
 '87': 'magnolia',
 '5': 'english marigold',
 '92': 'bee balm',
 '28': 'stemless gentian',
 '97': 'mallow',
 '57': 'gaura',
 '40': 'lenten rose',
 '47': 'marigold',
 '59': 'orange dahlia',
 '48': 'buttercup',
 '55': 'pelargonium',
 '36': 'ruby-lipped cattleya',
 '91': 'hippeastrum',
 '29': 'artichoke',
 '71': 'gazania',
 '90': 'canna lily',
 '18': 'peruvian lily',
 '98': 'mexican petunia',
 '8': 'bird of paradise',
 '30': 'sweet william',
 '17': 'purple coneflower',
 '52': 'wild pansy',
 '84': 'columbine',
 '12': "colt's foot",
 '11': 'snapdragon',
 '96': 'camellia',
 '23': 'fritillary',
 '50': 'common dandelion',
 '44': 'poinsettia',
 '53': 'primula',
 '72': 'azalea',
 '65': 'californian poppy',
 '80': 'anthurium',
 '76': 'morning glory',
 '37': 'cape flower',
 '56': 'bishop of llandaff',
 '60': 'pink-yellow dahlia',
 '82': 'clematis',
 '58': 'geranium',
 '75': 'thorn apple',
 '41': 'barbeton daisy',
 '95': 'bougainvillea',
 '43': 'sword lily',
 '83': 'hibiscus',
 '78': 'lotus lotus',
 '88': 'cyclamen',
 '94': 'foxglove',
 '81': 'frangipani',
 '74': 'rose',
 '89': 'watercress',
 '73': 'water lily',
 '46': 'wallflower',
 '77': 'passion flower',
 '51': 'petunia'}

Building & Training the Classifier


Below we will import one of the pre-trained networks from torchvision. We can leverage the model's convolutional layer weights and retrain the fully connected layers to classify our images. Below I've selected VGG16_bn (bn standing for batch normalization). VGG16_bn is a highly accurate Convolutional network, but is very slow to train.

Uploading the Pre-Trained Model & Preparing the Classifier

In [8]:
# Defining number of hidden units in our fully connected layer
hidden_units = 4096
In [9]:
# Downloading the pretrained model for us to use
model = models.vgg16_bn(pretrained=True)
In [10]:
# Freezing the feature parameters so they stay static (the convolutional layers)
    # Leveraging the feature parameters that were trained on ImageNet
for param in model.parameters():
    param.requires_grad = False
In [11]:
# This is the old classifier that came with vgg16_bn
model.classifier
Out[11]:
Sequential(
  (0): Linear(in_features=25088, out_features=4096, bias=True)
  (1): ReLU(inplace)
  (2): Dropout(p=0.5)
  (3): Linear(in_features=4096, out_features=4096, bias=True)
  (4): ReLU(inplace)
  (5): Dropout(p=0.5)
  (6): Linear(in_features=4096, out_features=1000, bias=True)
)
In [12]:
# Defining the fully connected layer that will be trained on the flower images
classifier = nn.Sequential(OrderedDict([
                            ('fc1', nn.Linear(25088,hidden_units)),
                            ('relu', nn.ReLU()),
                            ('dropout', nn.Dropout(0.5)),
                            ('fc2', nn.Linear(hidden_units,no_output_categories)),
                            ('output', nn.LogSoftmax(dim=1))
                            ]))
model.classifier = classifier
In [13]:
model.classifier
Out[13]:
Sequential(
  (fc1): Linear(in_features=25088, out_features=4096, bias=True)
  (relu): ReLU()
  (dropout): Dropout(p=0.5)
  (fc2): Linear(in_features=4096, out_features=102, bias=True)
  (output): LogSoftmax()
)

Training & Testing the Network

If a GPU is available, we will use the GPU. The GPU (Graphical Processing Unit) was built to parallel process many matrix operations. This makes training a deep neural network VERY QUICK. Otherwise, the CPU will take forever (perhaps around 100x slower depending on the GPU). The cell below assigns the device variable to 'cuda' if we can use a GPU or to 'cpu' otherwise. We will move our model and our data over to the device of choice manually using... model.to(device) or data.to(device).

In [14]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

Below is a heapload of code. Feel free to read through it if you want, but if you want the TL;DR version of this code, here it is...

  1. Sets hyperparameters for training (i.e. epochs, learning rate, etc).
  2. Uses active_session() (function provided by Udacity) to make sure the vm I used with GPU doesn't sleep on me while I'm training.
  3. Loops through epochs
    • 1 batch is 64 images. The model trains on 20 batches at at time (as defined by print_every)
    • After the 20 batches, we test our model's progress on the validation data
      • Then we print our training and validation metrics (skip ahead below to see the metrics).
In [66]:
model.to(device)

# Setting training hyperparameters
epochs = 10
optimizer = optim.Adam(model.classifier.parameters(),lr=.001)
criterion = nn.NLLLoss()
print_every = 20
running_loss = running_accuracy = 0
validation_losses, training_losses = [],[]

with active_session():
    for e in range(epochs):
        batches = 0
        # Turning on training mode
        model.train()
        for images,labels in train_dataloader:
            start = time.time()
            batches += 1
            # Moving images & labels to the GPU
            images,labels = images.to(device),labels.to(device)
            # Pushing batch through network, calculating loss & gradient, and updating weights
            log_ps = model.forward(images)
            loss = criterion(log_ps,labels)
            loss.backward()
            optimizer.step()
            # Calculating metrics
            ps = torch.exp(log_ps)
            top_ps, top_class = ps.topk(1,dim=1)
            matches = (top_class == labels.view(*top_class.shape)).type(torch.FloatTensor)
            accuracy = matches.mean()
            # Resetting optimizer gradient & tracking metrics
            optimizer.zero_grad()
            running_loss += loss.item()
            running_accuracy += accuracy.item()
            # Running the model on the validation set every 5 loops
            if batches%print_every == 0:
                end = time.time()
                training_time = end-start
                start = time.time()
                # Setting metrics
                validation_loss = 0
                validation_accuracy = 0
                # Turning on evaluation mode & turning off calculation of gradients
                model.eval()
                with torch.no_grad():
                    for images,labels in valid_dataloader:
                        images,labels = images.to(device),labels.to(device)
                        log_ps = model.forward(images)
                        loss = criterion(log_ps,labels)
                        ps = torch.exp(log_ps)
                        top_ps, top_class = ps.topk(1,dim=1)
                        matches = (top_class == \
                                   labels.view(*top_class.shape)).type(torch.FloatTensor)
                        accuracy = matches.mean()
                        # Tracking validation metrics
                        validation_loss += loss.item()
                        validation_accuracy += accuracy.item()
                
                # Tracking metrics
                end = time.time()
                validation_time = end-start
                validation_losses.append(running_loss/print_every)
                training_losses.append(validation_loss/len(valid_dataloader))
                
                # Printing Results
                print(f'Epoch {e+1}/{epochs} | Batch {batches}')
                print(f'Running Training Loss: {running_loss/print_every:.3f}')
                print(f'Running Training Accuracy: {running_accuracy/print_every*100:.2f}%')
                print(f'Validation Loss: {validation_loss/len(valid_dataloader):.3f}')
                print(f'Validation Accuracy: {validation_accuracy/ \
                                                len(valid_dataloader)*100:.2f}%')
                print(f'Training Time: {training_time:.3f} seconds for {print_every} batches.')
                print(f'Validation Time: {validation_time:.3f} seconds.\n')

                # Resetting metrics & turning on training mode
                running_loss = running_accuracy = 0
                model.train()
Epoch 1/10 | Batch 20
Running Training Loss: 4.631
Running Training Accuracy: 6.80%
Validation Loss: 3.704
Validation Accuracy: 20.38%
Training Time: 1.081 seconds for 20 batches.
Validation Time: 19.859 seconds.

Epoch 1/10 | Batch 40
Running Training Loss: 3.282
Running Training Accuracy: 25.70%
Validation Loss: 2.395
Validation Accuracy: 38.41%
Training Time: 1.108 seconds for 20 batches.
Validation Time: 19.830 seconds.

Epoch 1/10 | Batch 60
Running Training Loss: 2.426
Running Training Accuracy: 37.58%
Validation Loss: 1.739
Validation Accuracy: 52.45%
Training Time: 1.070 seconds for 20 batches.
Validation Time: 19.818 seconds.

Epoch 1/10 | Batch 80
Running Training Loss: 1.995
Running Training Accuracy: 47.27%
Validation Loss: 1.586
Validation Accuracy: 54.03%
Training Time: 1.064 seconds for 20 batches.
Validation Time: 19.888 seconds.

Epoch 1/10 | Batch 100
Running Training Loss: 1.749
Running Training Accuracy: 53.36%
Validation Loss: 1.313
Validation Accuracy: 62.80%
Training Time: 1.099 seconds for 20 batches.
Validation Time: 19.846 seconds.

Epoch 2/10 | Batch 20
Running Training Loss: 1.890
Running Training Accuracy: 63.96%
Validation Loss: 1.093
Validation Accuracy: 68.50%
Training Time: 1.067 seconds for 20 batches.
Validation Time: 19.868 seconds.

Epoch 2/10 | Batch 40
Running Training Loss: 1.498
Running Training Accuracy: 60.39%
Validation Loss: 1.170
Validation Accuracy: 67.84%
Training Time: 1.081 seconds for 20 batches.
Validation Time: 19.867 seconds.

Epoch 2/10 | Batch 60
Running Training Loss: 1.434
Running Training Accuracy: 60.86%
Validation Loss: 1.001
Validation Accuracy: 70.48%
Training Time: 1.104 seconds for 20 batches.
Validation Time: 19.819 seconds.

Epoch 2/10 | Batch 80
Running Training Loss: 1.298
Running Training Accuracy: 63.83%
Validation Loss: 0.949
Validation Accuracy: 73.51%
Training Time: 1.059 seconds for 20 batches.
Validation Time: 20.166 seconds.

Epoch 2/10 | Batch 100
Running Training Loss: 1.182
Running Training Accuracy: 69.06%
Validation Loss: 0.866
Validation Accuracy: 74.77%
Training Time: 1.059 seconds for 20 batches.
Validation Time: 19.839 seconds.

Epoch 3/10 | Batch 20
Running Training Loss: 1.335
Running Training Accuracy: 79.09%
Validation Loss: 0.943
Validation Accuracy: 75.25%
Training Time: 1.067 seconds for 20 batches.
Validation Time: 19.790 seconds.

Epoch 3/10 | Batch 40
Running Training Loss: 1.157
Running Training Accuracy: 68.98%
Validation Loss: 1.203
Validation Accuracy: 68.85%
Training Time: 1.071 seconds for 20 batches.
Validation Time: 19.795 seconds.

Epoch 3/10 | Batch 60
Running Training Loss: 1.170
Running Training Accuracy: 67.81%
Validation Loss: 0.859
Validation Accuracy: 76.16%
Training Time: 1.080 seconds for 20 batches.
Validation Time: 19.801 seconds.

Epoch 3/10 | Batch 80
Running Training Loss: 1.189
Running Training Accuracy: 67.81%
Validation Loss: 0.779
Validation Accuracy: 78.99%
Training Time: 1.103 seconds for 20 batches.
Validation Time: 19.823 seconds.

Epoch 3/10 | Batch 100
Running Training Loss: 1.168
Running Training Accuracy: 67.89%
Validation Loss: 0.789
Validation Accuracy: 78.41%
Training Time: 1.066 seconds for 20 batches.
Validation Time: 19.735 seconds.

Epoch 4/10 | Batch 20
Running Training Loss: 1.166
Running Training Accuracy: 83.39%
Validation Loss: 0.787
Validation Accuracy: 79.58%
Training Time: 1.079 seconds for 20 batches.
Validation Time: 19.825 seconds.

Epoch 4/10 | Batch 40
Running Training Loss: 0.968
Running Training Accuracy: 72.89%
Validation Loss: 0.804
Validation Accuracy: 77.89%
Training Time: 1.063 seconds for 20 batches.
Validation Time: 19.745 seconds.

Epoch 4/10 | Batch 60
Running Training Loss: 1.022
Running Training Accuracy: 73.67%
Validation Loss: 0.836
Validation Accuracy: 77.69%
Training Time: 1.060 seconds for 20 batches.
Validation Time: 19.753 seconds.

Epoch 4/10 | Batch 80
Running Training Loss: 1.044
Running Training Accuracy: 71.41%
Validation Loss: 0.732
Validation Accuracy: 77.86%
Training Time: 1.077 seconds for 20 batches.
Validation Time: 19.721 seconds.

Epoch 4/10 | Batch 100
Running Training Loss: 1.098
Running Training Accuracy: 71.33%
Validation Loss: 0.821
Validation Accuracy: 76.88%
Training Time: 1.092 seconds for 20 batches.
Validation Time: 19.808 seconds.

Epoch 5/10 | Batch 20
Running Training Loss: 1.169
Running Training Accuracy: 82.53%
Validation Loss: 0.732
Validation Accuracy: 79.25%
Training Time: 1.096 seconds for 20 batches.
Validation Time: 19.818 seconds.

Epoch 5/10 | Batch 40
Running Training Loss: 0.986
Running Training Accuracy: 73.44%
Validation Loss: 0.597
Validation Accuracy: 83.28%
Training Time: 1.085 seconds for 20 batches.
Validation Time: 19.991 seconds.

Epoch 5/10 | Batch 60
Running Training Loss: 0.979
Running Training Accuracy: 73.36%
Validation Loss: 0.717
Validation Accuracy: 80.45%
Training Time: 1.065 seconds for 20 batches.
Validation Time: 19.680 seconds.

Epoch 5/10 | Batch 80
Running Training Loss: 1.034
Running Training Accuracy: 72.66%
Validation Loss: 0.697
Validation Accuracy: 79.93%
Training Time: 1.062 seconds for 20 batches.
Validation Time: 19.769 seconds.

Epoch 5/10 | Batch 100
Running Training Loss: 0.973
Running Training Accuracy: 72.50%
Validation Loss: 0.637
Validation Accuracy: 82.01%
Training Time: 1.082 seconds for 20 batches.
Validation Time: 19.773 seconds.

Epoch 6/10 | Batch 20
Running Training Loss: 1.070
Running Training Accuracy: 87.32%
Validation Loss: 0.604
Validation Accuracy: 81.75%
Training Time: 1.064 seconds for 20 batches.
Validation Time: 19.710 seconds.

Epoch 6/10 | Batch 40
Running Training Loss: 0.924
Running Training Accuracy: 73.44%
Validation Loss: 0.665
Validation Accuracy: 81.98%
Training Time: 1.111 seconds for 20 batches.
Validation Time: 19.761 seconds.

Epoch 6/10 | Batch 60
Running Training Loss: 0.944
Running Training Accuracy: 75.00%
Validation Loss: 0.662
Validation Accuracy: 82.60%
Training Time: 1.083 seconds for 20 batches.
Validation Time: 19.760 seconds.

Epoch 6/10 | Batch 80
Running Training Loss: 1.078
Running Training Accuracy: 70.86%
Validation Loss: 0.701
Validation Accuracy: 81.05%
Training Time: 1.069 seconds for 20 batches.
Validation Time: 19.756 seconds.

Epoch 6/10 | Batch 100
Running Training Loss: 0.923
Running Training Accuracy: 74.84%
Validation Loss: 0.720
Validation Accuracy: 81.93%
Training Time: 1.067 seconds for 20 batches.
Validation Time: 19.745 seconds.

Epoch 7/10 | Batch 20
Running Training Loss: 0.997
Running Training Accuracy: 88.75%
Validation Loss: 0.767
Validation Accuracy: 80.61%
Training Time: 1.061 seconds for 20 batches.
Validation Time: 20.044 seconds.

Epoch 7/10 | Batch 40
Running Training Loss: 0.930
Running Training Accuracy: 75.08%
Validation Loss: 0.645
Validation Accuracy: 82.08%
Training Time: 1.064 seconds for 20 batches.
Validation Time: 19.814 seconds.

Epoch 7/10 | Batch 60
Running Training Loss: 0.861
Running Training Accuracy: 76.80%
Validation Loss: 0.682
Validation Accuracy: 82.33%
Training Time: 1.109 seconds for 20 batches.
Validation Time: 19.869 seconds.

Epoch 7/10 | Batch 80
Running Training Loss: 0.853
Running Training Accuracy: 75.00%
Validation Loss: 0.637
Validation Accuracy: 83.85%
Training Time: 1.085 seconds for 20 batches.
Validation Time: 19.879 seconds.

Epoch 7/10 | Batch 100
Running Training Loss: 0.915
Running Training Accuracy: 76.56%
Validation Loss: 0.606
Validation Accuracy: 83.52%
Training Time: 1.068 seconds for 20 batches.
Validation Time: 19.862 seconds.

Epoch 8/10 | Batch 20
Running Training Loss: 0.992
Running Training Accuracy: 87.55%
Validation Loss: 0.789
Validation Accuracy: 79.64%
Training Time: 1.106 seconds for 20 batches.
Validation Time: 19.859 seconds.

Epoch 8/10 | Batch 40
Running Training Loss: 0.890
Running Training Accuracy: 76.09%
Validation Loss: 0.567
Validation Accuracy: 84.76%
Training Time: 1.069 seconds for 20 batches.
Validation Time: 19.849 seconds.

Epoch 8/10 | Batch 60
Running Training Loss: 0.770
Running Training Accuracy: 77.81%
Validation Loss: 0.727
Validation Accuracy: 81.45%
Training Time: 1.073 seconds for 20 batches.
Validation Time: 19.784 seconds.

Epoch 8/10 | Batch 80
Running Training Loss: 0.816
Running Training Accuracy: 78.12%
Validation Loss: 0.686
Validation Accuracy: 82.25%
Training Time: 1.100 seconds for 20 batches.
Validation Time: 19.810 seconds.

Epoch 8/10 | Batch 100
Running Training Loss: 0.842
Running Training Accuracy: 77.58%
Validation Loss: 0.800
Validation Accuracy: 80.59%
Training Time: 1.095 seconds for 20 batches.
Validation Time: 19.901 seconds.

Epoch 9/10 | Batch 20
Running Training Loss: 1.006
Running Training Accuracy: 88.62%
Validation Loss: 0.687
Validation Accuracy: 83.11%
Training Time: 1.081 seconds for 20 batches.
Validation Time: 19.853 seconds.

Epoch 9/10 | Batch 40
Running Training Loss: 0.885
Running Training Accuracy: 76.09%
Validation Loss: 0.764
Validation Accuracy: 81.00%
Training Time: 1.099 seconds for 20 batches.
Validation Time: 19.810 seconds.

Epoch 9/10 | Batch 60
Running Training Loss: 0.851
Running Training Accuracy: 78.83%
Validation Loss: 0.643
Validation Accuracy: 85.50%
Training Time: 1.076 seconds for 20 batches.
Validation Time: 19.783 seconds.

Epoch 9/10 | Batch 80
Running Training Loss: 0.870
Running Training Accuracy: 76.48%
Validation Loss: 0.653
Validation Accuracy: 83.10%
Training Time: 1.067 seconds for 20 batches.
Validation Time: 19.938 seconds.

Epoch 9/10 | Batch 100
Running Training Loss: 0.823
Running Training Accuracy: 77.58%
Validation Loss: 0.662
Validation Accuracy: 83.06%
Training Time: 1.093 seconds for 20 batches.
Validation Time: 19.825 seconds.

Epoch 10/10 | Batch 20
Running Training Loss: 1.050
Running Training Accuracy: 86.72%
Validation Loss: 0.746
Validation Accuracy: 82.63%
Training Time: 1.077 seconds for 20 batches.
Validation Time: 19.861 seconds.

Epoch 10/10 | Batch 40
Running Training Loss: 0.875
Running Training Accuracy: 76.41%
Validation Loss: 0.551
Validation Accuracy: 85.19%
Training Time: 1.086 seconds for 20 batches.
Validation Time: 19.754 seconds.

Epoch 10/10 | Batch 60
Running Training Loss: 0.743
Running Training Accuracy: 80.00%
Validation Loss: 0.657
Validation Accuracy: 83.95%
Training Time: 1.101 seconds for 20 batches.
Validation Time: 19.812 seconds.

Epoch 10/10 | Batch 80
Running Training Loss: 0.880
Running Training Accuracy: 78.59%
Validation Loss: 0.618
Validation Accuracy: 84.07%
Training Time: 1.065 seconds for 20 batches.
Validation Time: 19.797 seconds.

Epoch 10/10 | Batch 100
Running Training Loss: 0.795
Running Training Accuracy: 79.38%
Validation Loss: 0.631
Validation Accuracy: 83.81%
Training Time: 1.074 seconds for 20 batches.
Validation Time: 19.874 seconds.

And just for fun, here's a plot of how our training and validation loss looked throughout training.

In [67]:
plt.figure()
plt.title("Training Summary")
plt.plot(training_losses,label='Training Loss')
plt.plot(validation_losses,label='Validation Loss')
plt.legend()
plt.show()
Out[67]:
<matplotlib.legend.Legend at 0x7f3740cc75f8>

Testing The Network

We set aside testing data that our model has never seen to see how our trained model will perform.

In [36]:
test_accuracy = 0
for images,labels in test_dataloader:
    model.eval()
    images,labels = images.to(device),labels.to(device)
    log_ps = model.forward(images)
    ps = torch.exp(log_ps)
    top_ps,top_class = ps.topk(1,dim=1)
    matches = (top_class == labels.view(*top_class.shape)).type(torch.FloatTensor)
    accuracy = matches.mean()
    test_accuracy += accuracy
print(f'Model Test Accuracy: {test_accuracy/len(test_dataloader)*100:.2f}%')
Model Test Accuracy: 82.39%

The little guy did pretty well! Solid B-. Much better than I could do on flower images.

Saving Our Model


PyTorch allows us to save a "checkpoint" that essentially acts as a snapshot of our model. We need to save the state_dict in the checkpoint to do this. The state dict contains the learned weights that our model learned during training. So we can use our model to predict or test our model without having to train it again. Note that you can use this checkpoint to pause training and result later by saving your optimizer and scheduler (if learning decay was used) states as well.

In [15]:
# Setting destination directory to None saves the checkpoint in the current directory
dest_dir = None
In [16]:
def save_model(trained_model,hidden_units,output_units,dest_dir,model_arch,class_to_idx):
    '''
    This saves the model's state_dict, as well as a few other details that will help with loading the model for later use.
    '''
    model_checkpoint = {'model_arch':model_arch, 
                    'clf_input':25088,
                    'clf_output':output_units,
                    'clf_hidden':hidden_units,
                    'state_dict':trained_model.state_dict(),
                    'model_class_to_idx':class_to_idx,
                    }
    if dest_dir:
        torch.save(model_checkpoint,dest_dir+"/"+model_arch+"_checkpoint.pth")
        print(f"{model_arch} successfully saved to {dest_dir}")
    else:
        torch.save(model_checkpoint,model_arch+"_checkpoint.pth")
        print(f"{model_arch} successfully saved to current directory as {model_arch}_checkpoint.pth")
In [25]:
save_model(model,hidden_units,no_output_categories,dest_dir,'vgg16_bn',class_to_idx)
vgg16_bn successfully saved to current directory as vgg16_bn_checkpoint.pth

Loading the Checkpoint

In [26]:
checkpoint = 'vgg16_bn_checkpoint.pth'
In [27]:
def load_checkpoint(filepath,device):
    '''
    Inputs...
        Filepath: location of checkpoint
        Device: "gpu" or "cpu"
    
    Returns:
        No. Input units, No. Output units, No. Hidden Units, State_Dict
    '''
    # Loading on GPU If available
    if device=="gpu":
        map_location=lambda device, loc: device.cuda()
    else:
        map_location='cpu'
    checkpoint = torch.load(f=filepath,map_location=map_location)
    return checkpoint['model_arch'],checkpoint['clf_input'], checkpoint['clf_output'], checkpoint['clf_hidden'],checkpoint['state_dict'],checkpoint['model_class_to_idx']
In [28]:
model_arch,input_units, output_units, hidden_units, state_dict, class_to_idx = load_checkpoint(checkpoint,device)
In [29]:
model.load_state_dict(state_dict)
Out[29]:
IncompatibleKeys(missing_keys=[], unexpected_keys=[])

Retesting Loaded Model

Just for fun :).

In [9]:
test_accuracy = 0
for images,labels in test_dataloader:
    model.eval()
    model.to(device)
    images,labels = images.to(device),labels.to(device)
    log_ps = model.forward(images)
    ps = torch.exp(log_ps)
    top_ps,top_class = ps.topk(1,dim=1)
    matches = (top_class == labels.view(*top_class.shape)).type(torch.FloatTensor)
    accuracy = matches.mean()
    test_accuracy += accuracy
    model.train()
print(f'Model Test Accuracy: {test_accuracy/len(test_dataloader)*100:.2f}%')
Model Test Accuracy: 85.54%

Inference/Prediction


Below we will use our model to predict/infer classes from whatever images we want.

Image Preprocessing


First step: preprocessing the images. Remember when we defined the data transformations for our training/testing data? We have to do the same thing on an image-by-image basis. We will use functions to preprocess our image and predict classes. That way, all we have to do to predict classes of images is to tell our functions where our image files are, and the data will be preprocessed and passed through our network easily.

We're using PIL to load images. Color channels of images are typically encoded as integers 0-255, but the model expected floats 0-1, normalized to specific means and standard deviations. We're going to convert the values to numpy arrays that we can work with and manipulate, then convert them to tensors that can be passed into our network.

In [30]:
# This is the location of the image file that we're going to work with
practice_img = './flowers/test/99/image_07833.jpg'
In [23]:
def process_image(image):
    ''' 
    Scales, crops, and normalizes a PIL image for a PyTorch model,

    Input: filepath to image

    Returns: NumPy Array of the image to be passed into the predict function
    '''
    # Open image
    im = Image.open(image).convert('RGB')
    # Resize keeping aspect ratio
    im.thumbnail(size=(256,256))
    # Get dimensions
    width, height = im.size
    # Set new dimensions for center crop
    new_width,new_height = 224,224 
    left = (width - new_width)/2
    top = (height - new_height)/2
    right = (width + new_width)/2
    bottom = (height + new_height)/2
    im = im.crop((left, top, right, bottom))
    # Convert to tensor & normalize
    transf_tens = transforms.ToTensor()
    transf_norm = transforms.Normalize([0.485, 0.456, 0.406],[0.229, 0.224, 0.225])
    tensor = transf_norm(transf_tens(im))
    # Convert to numpy array
    np_im = np.array(tensor)
    return np_im

Udacity provided the imshow function below to help me check my work. This function converts a PyTorch tensor and displays it in the notebook. If my process_image function works, running the output through this function should return the original image (except for the cropped out portions).

In [24]:
def imshow(image, ax=None, title=None):
    """Imshow for Tensor."""
    if ax is None:
        fig, ax = plt.subplots()
        plt.tick_params(
            axis='both',          
            which='both',     
            bottom=False,      
            top=False,
            left=False,         
            labelbottom=False,
            labelleft=False,)
    # We need to move the color channel from the first dimension to the third dimension.
        # PyTorch expects color to be in the 1st dim, but PIL expects it to be in the 3rd!
    image = image.transpose((1, 2, 0))
    
    # Undo preprocessing
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    image = std * image + mean
    # Image needs to be clipped between 0 and 1 or it looks like noise when displayed
    image = np.clip(image, 0, 1)
    
    ax.imshow(image)
    
    return ax

Original Image...

In [25]:
im = Image.open(practice_img)
im
Out[25]:

Preprocessing the image using process_image then undoing the preprocessing with imshow.

In [26]:
imshow(process_image(practice_img))
Out[26]:
<matplotlib.axes._subplots.AxesSubplot at 0x13f1ffd68>

Class Prediction

Now we're going to use a function (class_to_label) that maps our label names to the folder names that our model was originally trained on. This is what the json file in the beginning is used for. Next we're going to write a predict function that will output our model's top-k predictions (in this example, we're having it predict the top 5 possible classes).

In [27]:
def class_to_label(file,classes):
    '''
    Takes a JSON file containing the mapping from class to label and converts it into a dict.
    '''
    with open(file, 'r') as f:
        class_mapping =  json.load(f)
    labels = []
    for c in classes:
        labels.append(class_mapping[c])
    return labels
In [28]:
# The ImageFolder dataset that we used for training in the very beginning has a class_to_idx attribute that
    # Helps us map our index predictions (0-101) to the folder labels (1-102).
        # Note we're using the class_to_label function in the cell above to map folder labels (1-102) to flower names
index_mapping = dict(map(reversed, class_to_idx.items()))
In [29]:
def predict(image_path, model,index_mapping, topk, device):
    ''' Predict the class (or classes) of an image using a trained deep learning model.
    - Mapping is the dictionary mapping indices to classes
    '''
    pre_processed_image = torch.from_numpy(process_image(image_path))
    pre_processed_image = torch.unsqueeze(pre_processed_image,0).to(device).float()
    model.to(device)
    model.eval()
    log_ps = model.forward(pre_processed_image)
    ps = torch.exp(log_ps)
    top_ps,top_idx = ps.topk(topk,dim=1)
    list_ps = top_ps.tolist()[0]
    list_idx = top_idx.tolist()[0]
    classes = []
    model.train()
    for x in list_idx:
        classes.append(index_mapping[x])
    return list_ps, classes
In [30]:
def print_predictions(probabilities, classes,image,category_names=None):
    '''
    Prints the system output of probabilities.
    '''
    print(image)
    if category_names:
        labels = class_to_label(category_names,classes)
        for i,(ps,ls,cs) in enumerate(zip(probabilities,labels,classes),1):
            print(f'{i}) {ps*100:.2f}% {ls.title()} | Class No. {cs}')
    else:
        for i,(ps,cs) in enumerate(zip(probabilities,classes),1):
            print(f'{i}) {ps*100:.2f}% Class No. {cs} ')
    print('') 
In [31]:
probabilities,classes = predict(practice_img,model,index_mapping,5,device)
In [32]:
print_predictions(probabilities,classes,practice_img.split('/')[-1],file)
image_07833.jpg
1) 99.99% Bromelia | Class No. 99
2) 0.00% Cyclamen | Class No. 88
3) 0.00% Gazania | Class No. 71
4) 0.00% Water Lily | Class No. 73
5) 0.00% Cautleya Spicata | Class No. 61

Sanity Checking

In [33]:
imshow(process_image(practice_img))
plt.figure()
plt.barh(class_to_label(file,classes),width=probabilities)
plt.title('Model Predictions')
plt.gca().invert_yaxis()
plt.show()

Conclusion

Thanks for taking the time to read through my notebook! Again, this doesn't go into the details on how my command-line application works. This explains the process of training a model and using it to predict classes of images. As you can see, Transfer Learning is a powerful thing. We successfully took a pre-trained Convolutional Neural Network, modified it, retrained it, and used it to predict species of 102 different flowers with over an 80% accuracy per testing results! The application train.py lets you repeat this process on ANY dataset you want given you have enough images to train on.

Cheers!