A Convolution Neural Network Application

Hello guys, welcome to this new project. In this project you will learn about Convolutional Neural Networks and the project is to Detect You whenever you are in-front of your laptop. One thing I want to conform before starting our project, this project is totally different from face detection. I will discuss about face detection in another session. And guys don't worry about code. 'Why worry, when we have Google's Tensorflow!'.You can find the code here, have a look. OK guys lets start our project.

Convolution Neural Network

Convolution Neural Network(CNN)

Why CNNs?

Let us start the session from this question - Why CNNs? We have DNN(Deep Neural Networks) which performs well on images but why CNNs? To know the reason, let us assume we have an RGB image of shape (64,64,3). This has 12288 (64*64*3) number of pixels (It is also called as 12288 Dimensional feature vector).It is not too bad! But as technology increased we can find images of (1000,1000,3) that has nearly 3 Million Dimensional feature vector.

These 3M Dimensional feature vector leads to large number of parameters(~=3 Billion parameters).It's difficult to get enough data to prevent neural network from overfitting .And also, the computational requirements and the memory requirements to train a neural network with three billion parameters is just a bit infeasible. So we use Convolutional Neural Networks.

Introduction and working

I will be explaining the working on CNNs in following steps.

Convolution

First you need an input image of any shape(let us consider (5,5,3) for now), it may be a RGB image or a gray scale image and you need a filter of shape (3,3) or (5,5) or so on... You can take of any shape of filter/ kernel, but most probably we consider these shape. But let us consider we are using a (3,3) filter for now.
Now take that filter keep it on the starting (top left) of the image and do a dot product of image pixels and the filter. Then move one stride and again perform the dot product of image pixels and filter and so on.
Here the dot product is nothing but, we are doing convolution for image and filter.( (5,5,3) * (3,3) where * denotes convolution).Note that here we are using only one filter.

padding

That's all. Now our input image of shape (5,5,3) becomes (3,3,3) image. You can notice that image is shrinking here. We can also say is as, the CNN is extracting the features from the images.
Here the edges of the image is involving less number of time in the convolution. This means that we are losing information at the edges of images. We don't want that! So we add padding layers.
To calculate the output shape of the image you can use the formula below

Pooling

Now let's talk about pooling layer. Their goal is to subsample (i.e., shrink) the input image in order to reduce the computational load, the memory usage, and the number of parameters (thereby limiting the risk of overfitting). Reducing the input image size also makes the neural network tolerate a little bit of image shift.

Just like in convolutional layers, each neuron in a pooling layer is connected to the outputs of a limited number of neurons in the previous layer, located within a small rectangular receptive field. You must define its size, the stride, and the padding type, just like before. However, a pooling neuron has no weights; all it does is aggregate the inputs using an aggregation function such as the max or mean.

Above figure shows a max pooling layer, which is the most common type of pooling layer. In this example, we use a 2 × 2 pooling kernel, a stride of 2, and no padding. Note that only the max input value in each kernel makes it to the next layer. The other inputs are dropped.

Flatten

It just makes the output features into a N-Dimensional vector. For example you have an shirked image at some specific layer of shape (32,32,128), then this flatten makes this as a (32*32*128) dimensional feature vector.

Dense

This dense layer is also known as fully connected layers.

Now let's start coding!!!

Import necessary modules

import tensorflow as tf

from tensorflow import keras

from keras import models

from keras import layers

from keras import optimizers

from keras.preprocessing.image import ImageDataGenerator

Prepare the data

Follow the steps below to make training data. Take some images(4000 images) with you present infront of webcam(2000 images) and without you(2000 images) in-front of webcam by runnig the capturing_images.py file and store the both files in Train dir.
Similarly, take some images(1600 images) with you(800 images) and without you(800 images) infront of webcam by runnig the capturing_images.py file and store the both files in Validation dir.
Now make sure that both Train and Validation files are in one Directory.

Build the CNN model

size = 128

model = models.Sequential()

model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(size,size,3)))

model.add(layers.MaxPooling2D((2, 2)))

model.add(layers.Conv2D(64, (3, 3), activation='relu'))

model.add(layers.MaxPooling2D(2, 2))

model.add(layers.Conv2D(128, (3, 3), activation='relu'))

model.add(layers.MaxPooling2D(2, 2))

model.add(layers.Conv2D(128, (3, 3), activation='relu'))

model.add(layers.MaxPooling2D(2, 2))

model.add(layers.Flatten())

model.add(layers.Dropout(0.5))

model.add(layers.Dense(512, activation='relu'))

model.add(layers.Dense(1, activation='sigmoid'))

One thing you have to notice at the last layer. We have 1 output layer(yes or no), this says that whether you are present in-front of laptop or not.

Compile the Model

Here we have used RMSprop optimizer and binary crossentropy as loss function because it is a classification problem.

model.compile(optimizer=optimizers.RMSprop(lr=0.0003),

loss='binary_crossentropy',

metrics=['acc'])

Train the model

train_datagen = ImageDataGenerator(

rescale=1./255,

rotation_range=40,

width_shift_range=0.2,

height_shift_range=0.2,

shear_range=0.2,

zoom_range=0.2,

horizontal_flip=True)

validation_datagen = ImageDataGenerator(rescale=1.255)

In the above code we have done data augmentation for the training data. Make sure that we are not doing data augmentation to validation data.

But what is actually Data Augmentation??

Data augmentation improves our training data by rotating images for certain angle, zoom in the images, horizontal flip, etc.

Data Augmentation

train_generator = train_datagen.flow_from_directory(directory + '\Train',

target_size=(size,size),

batch_size=64,

class_mode='binary')

validation_generator = validation_datagen.flow_from_directory(directory + '\Validation' ,

target_size=(size,size),

batch_size=64,

class_mode='binary')

Then we have set our training and validation data. One of the advantages of keras and tensorflow is that it directly takes the images from the directory. Similarly we have to set the target size of the images and batch size. And finally here class_mode should be binary because it is a classification problem(yes or no, present or absent in-front of laptop).

Finally fit your model!

model.fit_generator(train_generator,

epochs=5,

steps_per_epoch=63,

validation_data=validation_generator,

validation_steps=7, workers=4)

Save the model

You can save the model by using the following piece of code

model.save(directory + '\model.h5')

Yoo! you came to end of the project. To see how the output looks like please refer my github here

OK guys, I hope you got some information from this post. Please share it to your friends and also you can find my other projects here

Thank you!

Contacts:

ph.No: +91 9182530027

gmail: hunnurjirao2000@gmail.com

github: github.com/hunnurjirao

Comments

UnknownJuly 14, 2020 at 10:51 PM
Good work dear..keep learning nd help others to learn.. all the best.
ReplyDelete
Replies

Add comment

Deep Learning Projects Hub

Search This Blog