Simple VGG like CNN in Aerial cactus identification contest on Kaggle

3 min readNov 13, 2019

Here I present my final solution submitted to Aerial cactus identification competition on Kaggle. The goal of Aerial cactus identification competition was to predict the presence of cactus in every image in a large set of 32 x 32 thumbnail images containing aerial photos of a columnar cactus.

My solution is a fork of the kernel Aerial Cactus — Simple CNN.

CNN model from original kernel:

# create the network
num_filters = 8
input_shape = train_x.shape[1:]
output_shape = 1
# model
m = Sequential()
def cnnNet(m):
    m.add(Conv2D(30, kernel_size=3, activation='relu', input_shape=input_shape))
    m.add(MaxPooling2D(2,2))
    m.add(Conv2D(15, kernel_size=3, activation='relu'))
    m.add(MaxPooling2D(2,2))
    m.add(Dense(7, activation='relu')) # <7 stops working, but higher values do nothing
    m.add(Flatten())
    m.add(Dense(units = output_shape, activation='sigmoid'))
cnnNet(m)
# compile adam with decay and use binary_crossentropy for single category dataset
m.compile(optimizer = 'nadam',
          loss = 'binary_crossentropy', 
          metrics = ['accuracy'])
# show summary
m.summary()

Overview of my solution

I will highlight most important parts om my final solution.

Preparing training / test data

Splitting train / test dataset:

train_x,test_x,train_y,test_y=train_test_split(all_x,all_y,test_size=0.2,random_state=7)

Here is the data augmenter (left from original kernel without changes). I use x, y and rotation as the images were taken from aerial and thus can vary in these ways.

# x,y and rotation data augmentation
datagen = ImageDataGenerator(
    featurewise_center=False,  # set input mean to 0 over the dataset
    samplewise_center=False,  # set each sample mean to 0
    featurewise_std_normalization=False,  # divide inputs by std of the dataset
    samplewise_std_normalization=False,  # divide each input by its std
    rotation_range=60,  # randomly rotate images in the range (degrees, 0 to 180)
    zoom_range=0.2, # zoom images
    horizontal_flip=True,  # randomly flip images
    vertical_flip=True)  # randomly flip images
datagen.fit(train_x)

Creating model architecture

My final model represents a VGG like CNN network (Adrian Rosebrock explains this architecture in his book “Deep Learning for Computer Vision with Python — Practitioner Bundle” in chapter 7). This model includes 3 pairs of Conv2D layers with 3x3 kernels (32, 64 and 128 filters in each pair of Conv2D layers accordingly). Second pair of Conv2D layers is followed by MaxPooling2D layer with size 2x2, second MaxPooling layer is included between the last pair of Conv2D layers and Fully Connected layer. Large depth of the CNN network allows to learn richer and more distinctive features on images while BatchNormalization and Dropout layers reduce the overfit when training the network.

Architecture scheme: CONV(32) => RELU => CONV(32) => RELU => DROPOUT => CONV(64) => RELU => DROPOUT => CONV(64) => RELU => POOL => DROPOUT => CONV (128) => RELU => DROPOUT => CONV (128) => RELU => POOL => DROPOUT => FC

num_filters = 8
input_shape = train_x.shape[1:]
output_shape = 1
# model
m = Sequential()def cnnNet(m):
    # First pair of Conv2D layers
    m.add(Conv2D(32, kernel_size=3, input_shape=input_shape))
    m.add(BatchNormalization())
    m.add(Activation("relu"))
    
    m.add(Conv2D(32, kernel_size=3))
    m.add(BatchNormalization())
    m.add(Activation("relu"))
    m.add(Dropout(0.25))
    
    # Second pair of Conv2D layers
    m.add(Conv2D(64, kernel_size=3))
    m.add(BatchNormalization())
    m.add(Activation("relu"))
    m.add(Dropout(0.25))
    
    m.add(Conv2D(64, kernel_size=3))
    m.add(BatchNormalization())
    m.add(Activation("relu"))
    m.add(MaxPooling2D(2,2))
    m.add(Dropout(0.25))
    
    # Third pair of Conv2D layers
    m.add(Conv2D(128, kernel_size=3))
    m.add(BatchNormalization())
    m.add(Activation("relu"))
    m.add(Dropout(0.25))
    
    m.add(Conv2D(128, kernel_size=3))
    m.add(BatchNormalization())
    m.add(Activation("relu"))
    
    m.add(MaxPooling2D(2,2))
    m.add(Dropout(0.25))
    
    m.add(Flatten())
    
    m.add(Dense(64, activation='relu'))
    m.add(BatchNormalization())
    m.add(Dropout(0.5))
    m.add(Dense(units = output_shape, activation='sigmoid'))cnnNet(m)

Compile the model

m.compile(optimizer = ‘nadam’,
 loss = ‘binary_crossentropy’, 
 metrics = [‘accuracy’])

Training model

Finally we start training model

batch_size = 128
history = m.fit_generator(datagen.flow(train_x, train_y,
                          batch_size=batch_size),
                          steps_per_epoch= (train_x.shape[0] // batch_size),
                          epochs = 95,
                          validation_data=(test_x, test_y),
                          workers=4)

In final solution I trained the model in 95 epochs. The final result is following:

loss: 0.0127 - acc: 0.9963 - val_loss: 0.0114 - val_acc: 0.9960

My final result: 519th place with score 0.9997.

Simple VGG like CNN in Aerial cactus identification contest on Kaggle

Overview of my solution

Preparing training / test data

Creating model architecture

Training model

Written by Privalov Vladimir

Responses (1)