Skip to main content

A Convolution Neural Network Application

Hello guys, welcome to this new project. In this project you will learn about Convolutional Neural Networks and the project is to Detect You whenever you are in-front of your laptop. One thing I want to conform before starting our project, this project is totally different from face detection. I will discuss about face detection in another session. And guys don't worry about code. 'Why worry, when we have Google's Tensorflow!'.You can find the code here, have a look. OK guys lets start our project. 

Convolution Neural Network

Convolution Neural Network(CNN)

Why CNNs?

Let us start the session from this question - Why CNNs? We have DNN(Deep Neural Networks) which performs well on images but why CNNs? To know the reason, let us assume we have an RGB image of shape (64,64,3). This has 12288 (64*64*3) number of pixels (It is also called as 12288 Dimensional feature vector).It is not too bad! But as technology increased we can find images of (1000,1000,3) that has nearly 3 Million Dimensional feature vector. 

These 3M Dimensional feature vector leads to large number of parameters(~=3 Billion parameters).It's difficult to get enough data to prevent neural network from overfitting .And also, the computational requirements and the memory requirements to train a neural network with three billion parameters is just a bit infeasible. So we use Convolutional Neural Networks.

Introduction and working

I will be explaining the working on CNNs in following steps. 

See the source image
Convolution
  1. First you need an input image of any shape(let us consider (5,5,3) for now), it may be a RGB image or a gray scale image and you need a filter of shape (3,3) or (5,5) or so on... You can take of any shape of filter/ kernel, but most probably we consider these shape. But let us consider we are using a (3,3) filter for now.
  2. Now take that filter keep it on the starting (top left) of the image and do a dot product of image pixels and the filter. Then move one stride and again perform the dot product of image pixels and filter and so on.
  3. Here the dot product is nothing but, we are doing convolution for image and filter.( (5,5,3) * (3,3)  where * denotes convolution).Note that here we are using only one filter.                                                                                                                                                                                                             
  4. See the source image
    padding

  5. That's all. Now our input image of shape (5,5,3) becomes (3,3,3) image. You can notice that image is shrinking here. We can also say is as, the CNN is extracting the features from the images.
  6. Here the edges of the image is involving less number of time in the convolution. This means that we are losing information at the edges of images. We don't want that! So we add padding layers.
  7. To calculate the output shape of the image you can use the formula below
See the source image



Pooling

See the source image
Pooling
Now let's talk about pooling layer. Their goal is to subsample (i.e., shrink) the input image in order to reduce the computational load, the memory usage, and the number of parameters (thereby limiting the risk of overfitting). Reducing the input image size also makes the neural network tolerate a little bit of image shift.

Just like in convolutional layers, each neuron in a pooling layer is connected to the outputs of a limited number of neurons in the previous layer, located within a small rectangular receptive field. You must define its size, the stride, and the padding type, just like before. However, a pooling neuron has no weights; all it does is aggregate the inputs using an aggregation function such as the max or mean.

Above figure shows a max pooling layer, which is the most common type of pooling layer. In this example, we use a 2 × 2 pooling kernel, a stride of 2, and no padding. Note that only the max input value in each kernel makes it to the next layer. The other inputs are dropped.

Flatten

It just makes the output features into a N-Dimensional vector. For example you have an shirked image at some specific layer of shape (32,32,128), then this flatten makes this as a (32*32*128) dimensional feature vector. 

Dense

This dense layer is also known as fully connected layers.


See the source image


Now let's start coding!!!

Import necessary modules


import tensorflow as tf 
from tensorflow import keras
from keras import models
from keras import layers
from keras import optimizers
from keras.preprocessing.image import ImageDataGenerator

Prepare the data

  1. Follow the steps below to make training data.  Take some images(4000 images) with you present infront of webcam(2000 images) and without you(2000 images) in-front of webcam by runnig the capturing_images.py file and store the both files in Train dir.

  2. Similarly, take some images(1600 images) with you(800 images) and without you(800 images) infront of webcam by runnig the capturing_images.py file and store the both files in Validation dir.

  3. Now make sure that both Train and Validation files are in one Directory.

Build the CNN model

size = 128

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(size,size,3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D(2, 2))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D(2, 2))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D(2, 2))
model.add(layers.Flatten())
model.add(layers.Dropout(0.5))
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))


One thing you have to notice at the last layer. We have 1 output layer(yes or no), this says that whether you are present in-front of laptop or not. 

Compile the Model

Here we have used RMSprop optimizer and binary crossentropy as loss function because it is a classification problem.

model.compile(optimizer=optimizers.RMSprop(lr=0.0003),
                         loss='binary_crossentropy', 
                         metrics=['acc'])

Train the model

train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True)
validation_datagen = ImageDataGenerator(rescale=1.255)


In the above code we have done data augmentation for the training data. Make sure that we are not doing data augmentation to validation data. 

But what is actually Data Augmentation??

Data augmentation improves our training data by rotating images for certain angle, zoom in the images, horizontal flip, etc.
See the source image
Data Augmentation
 

 
train_generator = train_datagen.flow_from_directory(directory + '\Train',
                                            target_size=(size,size),
                                            batch_size=64,
                                            class_mode='binary')
validation_generator = validation_datagen.flow_from_directory(directory + '\Validation' ,
                                             target_size=(size,size), 
                                             batch_size=64, 
                                             class_mode='binary')

Then we have set our training and validation data. One of the advantages of keras and tensorflow is that it directly takes the images from the directory. Similarly we have to set the target size of the images and batch size. And finally here class_mode should be binary because it is a classification problem(yes or no, present or absent in-front of laptop).

Finally fit your model!

model.fit_generator(train_generator, 
                    epochs=5, 
                    steps_per_epoch=63, 
                    validation_data=validation_generator,
                    validation_steps=7, workers=4)

Save the model

You can save the model by using the following piece of code

model.save(directory + '\model.h5')

Yoo! you came to end of the project. To see how the output looks like please refer my github here

OK guys, I hope you got some information from this post. Please share it to your friends and also you can find my other  projects here

Thank you!

Contacts:

ph.No: +91 9182530027
gmail: hunnurjirao2000@gmail.com
github: github.com/hunnurjirao

Comments

Post a Comment

If you have any doubts please leave it in a comment box

Popular posts from this blog

Introduction to Deep Learning

Hello guys, today we will learn what is deep learning and difference between Machine Learning and Deep Learning and also a small introduction about Neural Networks. What is Deep Learning? Deep learning is a type of  machine learning  (ML) and  artificial intelligence  (AI) that imitates the way humans gain certain types of knowledge. Deep learning is an important element of data science, which includes statistics and predictive modeling.   Difference between ML and DL In machine learning, first we should extract features from input and then fed to the model. But when comes to Deep learning there will be no need of feature extraction. There is one of the most important algorithms called Neural Networks which we will discuss in this post. Lets discuss some more concept about the difference between ML and DL. Before that lets know about different types of data. There are two types of data  Structured Data - Price of house, user ID etc.   Unstructured...

Introduction to Machine Learning

Hello all, welcome to the practical machine learning hub. Here you will be learning some of the projects on machine learning and  deep learning. But in this post let us know what exactly is machine learning and types of machine learning.

Generating Fake Faces

Hello guys, welcome to this new project of Generating Fake Faces. There is a wonderful concept called GANs (Generative Adversarial Network) which generates fake images. You can generate any thing you want. If you have person's faces dataset, you can generate fake faces of persons. If you have Pokémon's dataset, you can generate new Pokémon that never exists on the Earth. GANs were invented by Ian Goodfellow in 2014 and first described in paper Generative Adversarial Nets . If you have zero knowledge about GANs don't worry about it, you will learn how it works here and by the end of the post you will be able to generate whatever you want (only if you have a good and huge dataset!!!). One last thing I want to inform you is that this project is done using PyTorch . You can find the full code here . You can find more fake persons that doesn't exist on the earth  here , once go through it.  OK guys lets start our project. Generative Adversarial Networks Generative A...

Hand Signs Using KNN

Hello guys, welcome to this new project named "Hand Signs using KNN". There are different types of Hand Signs, but here we use only hand signs of  Zero, One, Two, Three, Four & Five. But wait.! we are not using any Deep Learning or Neural Networks in this project. Instead we are using KNN a.k.a K-Nearest Neighbor which is one the Machine Learning Algorithms. We will talk about this later in this post. But how can we give directly images as input to the model, if we use a Machine Learning algorithm? Don't worry about it there are two files (train.h5, test.h5) that consists of features of the images (here features is nothing but pixel values of the image). You can find these files and also full code  here . Hand Signs Now lets talk about K Nearest Neighbor. K Nearest Neighbor(A Supervised Learning Algorithm) Introduction KNN algorithm is the laziest algorithm , we can also say KNN as instance based learning. It is called as the laziest algorithm because   it does nothin...

MNIST Handwritten Digits Recognition

Hello guys, in this post we will learn about MNITS(Modified National Institute of Standards and Technology) Hand written digits recognition. Here we will be creating a Deep Neural Network model that recognizes the hand written digits. MNIST is nothing but a dataset that contains 60000 training images and 10000 testing images of hand written digits and these are actually gray scale images. You can find full code  here   The below are the sample images of MNIST dataset. Now lets dive into code!!! Setup First we will import some necessary modules like numpy and keras. Numpy is used to perform algebric or numerical operations and Keras is a deep learning API written in Python, running on top of the machine learning platform Tensorflow(Tensorflow is an end-to-end, open-source machine learning platform.) import numpy as np from tensorflow import keras from tensorflow.keras import layers Load the dataset Now lets load the MNIST dataset. (x_train, y_train), (x_test, y_test) = keras.da...