Contents

Cats vs Dogs Classification

Image classification exercise

I’m not gonna lie. I’m more of a dog person. Better know how to tell the difference. 😂 Here we go.

Introduction

This is an Image classification exercise. I will play with an expired Kaggle competition. The code is available in a jupyter notebook here. You will need to download the data from the Kaggle competition. https://www.kaggle.com/c/dogs-vs-cats/. The dataset contains 25,000 images of dogs and cats (12,500 from each class). I will create a new dataset containing 3 subsets, a training set with 16,000 images, a validation dataset with 4,500 images and a test set with 4,500 images.

I will first build a vanilla CNN model as the baseline. And then apply transfter learning to further improve the performance.

Keras comes prepackaged with many types of these pretrained models. Some of them are:

  • VGGNET : Introduced by Simonyan and Zisserman in their 2014 paper, Very Deep Convolutional Networks for Large Scale Image Recognition.
  • RESNET : First introduced by He et al. in their 2015 paper, Deep Residual Learning for Image Recognition
  • INCEPTION: The “Inception” micro-architecture was first introduced by Szegedy et al. in their 2014 paper, Going Deeper with Convolutions:

and more. Detailed explanation of some of these architectures can be found here.

Start here

import numpy as np
import pandas as pd
import os, shutil
import random

import keras
from keras.preprocessing import image
from keras.preprocessing.image import ImageDataGenerator
from keras.utils import to_categorical
from sklearn.model_selection import train_test_split

import cv2
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
PATH = '../input/'
train_dir = '../input/train/'
test_dir = '../input/test/'

Prepare data

train_images = []
train_labels = []

for img in os.listdir(train_dir):
    try:
        img_r = cv2.imread(os.path.join(train_dir, img), cv2.IMREAD_COLOR)
        train_images.append(np.array(cv2.resize(img_r, (150, 150), interpolation = cv2.INTER_CUBIC)))
        if 'dog' in img:
            train_labels.append(1)
        else:
            train_labels.append(0)
    except Exception as e:
        print('broken image')
broken image
plt.title(train_labels[0])
plt.imshow(train_images[0])
train_df = pd.DataFrame({ 'train_images':train_images,
                         'train_labels':train_labels})
sns.set(style="darkgrid")
ax = sns.countplot(x="train_labels", data = train_df,palette="Set3")
num_dogs = train_df['train_labels'].sum()
print('We have {} cats and {} dogs to train with.'.format(train_df.shape[0]-num_dogs,num_dogs))
We have 12500 cats and 12500 dogs to train with.

We have 12000 cats and 12000 dogs to train with, so the dataset is balanced.

train_labels = np.array(train_labels)

X_train, X_test, y_train, y_test = train_test_split(train_images, train_labels, test_size=0.2, random_state=13)

X_train = np.array(X_train)
X_test = np.array(X_test)
print("Train Shape:{}".format(X_train.shape))
print("Test Shape:{}".format(X_test.shape))
Train Shape:(20000, 150, 150, 3)
Test Shape:(5000, 150, 150, 3)
X_train = X_train.reshape(-1, 150, 150, 3)
X_test = X_test.reshape(-1, 150, 150, 3)
print("Training data shape : {0}".format(X_train.shape))
print("Training label shape : {0}".format(y_train.shape))
print("Valid data shape : {0}".format(X_test.shape))
print("Valid label shape : {0}".format(y_test.shape))
Training data shape : (20000, 150, 150, 3)
Training label shape : (20000,)
Valid data shape : (5000, 150, 150, 3)
Valid label shape : (5000,)

The base CNN model

from keras.models import Sequential
from keras import layers
from keras import optimizers
from keras import backend as K
from keras.layers import Dense, Activation, Dropout, Conv2D, MaxPooling2D, Flatten, BatchNormalization
from keras.applications import VGG16
# Creating the convnet model
def build_cnn():
    model = Sequential()

    model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(150, 150, 3)))
    model.add(BatchNormalization())
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))

    model.add(Conv2D(64, (3, 3), activation='relu'))
    model.add(BatchNormalization())
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))

    model.add(Conv2D(128, (3, 3), activation='relu'))
    model.add(BatchNormalization())
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))

    model.add(Flatten())
    model.add(Dense(512, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dropout(0.5))
    model.add(Dense(1, activation='sigmoid')) # 2 because we have cat and dog classes

    return model

Fit the model

model_base = build_cnn()
model_base.name = 'base_cnn'
model_base.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 148, 148, 32)      896       
_________________________________________________________________
batch_normalization_1 (Batch (None, 148, 148, 32)      128       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 74, 74, 32)        0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 74, 74, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 72, 72, 64)        18496     
_________________________________________________________________
batch_normalization_2 (Batch (None, 72, 72, 64)        256       
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 36, 36, 64)        0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 36, 36, 64)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 34, 34, 128)       73856     
_________________________________________________________________
batch_normalization_3 (Batch (None, 34, 34, 128)       512       
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 17, 17, 128)       0         
_________________________________________________________________
dropout_3 (Dropout)          (None, 17, 17, 128)       0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 36992)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 512)               18940416  
_________________________________________________________________
batch_normalization_4 (Batch (None, 512)               2048      
_________________________________________________________________
dropout_4 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 513       
=================================================================
Total params: 19,037,121
Trainable params: 19,035,649
Non-trainable params: 1,472
_________________________________________________________________
from keras.optimizers import RMSprop
model_base.compile(loss = 'binary_crossentropy',
                   optimizer = optimizers.RMSprop(lr=1e-4),
                   metrics=['accuracy'])

# model_base.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

history_base=model_base.fit(x=X_train, y=y_train, epochs=20, verbose=1,validation_data=(X_test, y_test))
model_base.save_weights('model_base_wieghts.h5')
model_base.save('model_base.h5')

Performance of base model

loss = model_base.evaluate(x=X_test, y=y_test)[0]
acc = model_base.evaluate(x=X_test, y=y_test)[1]

print('Base CNN Model gives me LogLoss of {}'.format(loss))
print('Base CNN Model gives me Accuracy of {}'.format(acc))
5000/5000 [==============================] - 2s 408us/step
5000/5000 [==============================] - 2s 336us/step
Base CNN Model gives me LogLoss of 0.5596861797451973
Base CNN Model gives me Accuracy of 0.8514
acc = history_base.history['acc']
val_acc = history_base.history['val_acc']
loss = history_base.history['loss']
val_loss = history_base.history['val_loss']

epochs = range(1, len(acc)+1)

fig, (ax1, ax2) = plt.subplots(1, 2,figsize=(12,6))
ax1.plot(epochs, loss, color='blue', label = 'Training loss')
ax1.plot(epochs, val_loss,color='green', label = 'Validation loss')
ax1.set_title('Training and Validation Accuracy')
ax1.legend()

ax2.plot(epochs, acc,  color='blue',label = 'Training acc')
ax2.plot(epochs, val_acc,color='green',label = 'Validation acc')
ax2.set_title('Training and Validation Accuracy')
ax2.legend()

plt.show()

Obvious overfitting charastics and can’t achieve good accuracy score. The reason might be we have limited dataset even we already include data agumentation in the CNN model.

To solve that, we can use transfer learning to take advantages of features from imagenet.

  • There are two ways of using a pre-trained network:
    1. Running the convolutional base over our dataset, recording its output to a Numpy array on disk and then using this data as an input to a standalone, densely connected classifier
    2. Extending the model by adding a top layer and then training model
  • The first method is cheap in terms of computation since it doesn’t require you to train the model from scratch. For the same reason we cannot take advantage of Data Augmentation
  • The second method involves training the network from scratch. This will help us exploit image augmentation, but for the same reason it will be computationally expensive

I will use the first type in this homework for the sake of time and computation ability as we have roughly enough data to train with.

Transfer Learning VGG16

The loaded model is chopped after last max-pool layer in VGG16 architecture by setting the parameter include_top=false.

# Defining and training the densely connected classifier

def model_transfer_vgg16_chopped():
    model = Sequential()   

    #model.add(ResNet50(include_top=False, pooling=POOLING))
    model.add(VGG16(include_top=False, weights='imagenet',  input_shape=(150,150,3)))

    model.add(layers.Flatten())
    model.add(layers.Dense(256, activation = 'relu'))
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(1,activation = 'sigmoid'))

    model.layers[0].trainable = False

    model.compile(optimizer=optimizers.RMSprop(lr=2e-5),
                  loss='binary_crossentropy',
                  metrics=['acc'])

    return model
model_transfer = model_transfer_vgg16_chopped()
model_transfer.summary()
Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
58892288/58889256 [==============================] - 2s 0us/step
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
vgg16 (Model)                (None, 4, 4, 512)         14714688  
_________________________________________________________________
flatten_2 (Flatten)          (None, 8192)              0         
_________________________________________________________________
dense_3 (Dense)              (None, 256)               2097408   
_________________________________________________________________
dropout_5 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 1)                 257       
=================================================================
Total params: 16,812,353
Trainable params: 2,097,665
Non-trainable params: 14,714,688
_________________________________________________________________
print("The number of trainable weights before freezing the convolutional layer = ",len(model_transfer.trainable_weights))
The number of trainable weights before freezing the convolutional layer =  4
# set callbacks
from keras.callbacks import EarlyStopping, ReduceLROnPlateau

earlystop = EarlyStopping(monitor='val_loss',patience=3)

learning_rate_reduction = ReduceLROnPlateau(monitor='val_loss',
                                            patience=3,
                                            verbose=1,
                                            factor=0.5,
                                            min_lr=0.0001)

callbacks = [earlystop, learning_rate_reduction]
history_vgg16 = model_transfer.fit(X_train, y_train,
                              validation_data=(X_test, y_test),
                              epochs = 20,
                             callbacks=callbacks,
                              verbose=1)

Performance of transfer learning model

model_transfer.evaluate(X_test, y_test)
[0.27564358766500857, 0.9672]
model_transfer.save_weights('model_transfer_weights.h5')
model_transfer.save('model_transfer.h5')
acc = history_vgg16.history['acc']
val_acc = history_vgg16.history['val_acc']
loss = history_vgg16.history['loss']
val_loss = history_vgg16.history['val_loss']

epochs = range(1, len(acc)+1)

fig, (ax1, ax2) = plt.subplots(1, 2,figsize=(12,6))
ax1.plot(epochs, loss,color='blue', label = 'Training loss')
ax1.plot(epochs, val_loss,color='red', label = 'Validation loss')
ax1.set_title('Training and Validation Log Loss')
ax1.legend()

ax2.plot(epochs, acc,  color='blue',label = 'Training acc')
ax2.plot(epochs, val_acc,color='red',label = 'Validation acc')
ax2.set_title('Training and Validation Accuracy')
ax2.legend()

plt.show()

Test

We will convert the predict category back into our generator classes by using train_generator.class_indices. It is the classes that image generator map while converting data into computer vision.

test_images = []
test_filenames = []
for img in os.listdir(test_dir):
    try:
        img_r = cv2.imread(os.path.join(test_dir, img), cv2.IMREAD_COLOR)
        test_images.append(np.array(cv2.resize(img_r, (150, 150), interpolation=cv2.INTER_CUBIC)))
        test_filenames.append(img.split('.')[0])
    except Exception as e:
        print('broken image')

test_filenames = [int(a) for a in test_filenames]
broken image
test_images = np.array(test_images)
test_images = test_images.reshape(-1, 150, 150, 3)
predictions = model_transfer.predict(test_images)
predictions2 = predictions.clip(min=0.005,max=0.995)
results = pd.DataFrame({"id": test_filenames, "label":list(predictions2)})
results['label'] = results['label'].map(lambda x: str(x).lstrip('[').rstrip(']')).astype(float)
results.to_csv('submission.csv', index=False)
for i in range(0,10):
    if predictions2[i, 0] >= 0.5:
        print('I am {:.3%} sure this is a Dog'.format(predictions[i][0]))
    else:
        print('I am {:.3%} sure this is a Cat'.format(1-predictions[i][0]))

    plt.imshow(test_images[i])
    plt.show()
I am 100.000% sure this is a Dog
I am 100.000% sure this is a Cat
I am 100.000% sure this is a Cat
I am 100.000% sure this is a Cat
I am 100.000% sure this is a Dog
I am 100.000% sure this is a Dog

codes available on GitHub