Updated: Apr 28
There are hundreds of Machine Learning algorithms out there which may confuse you on which one to choose when you're building a model of your own. Well, the answer is it depends on your dataset, it's size, how many classes you're going to predict, etc. I personally find this flow chart very helpful to me when I'm stuck at a position like this.
Although, when you've thousands (or sometimes even millions) of images, there's a high chance you're going to go for convolutional neural networks. This is because CNNs are extremely good at detecting different kinds of shapes and structures (beware of the over fitment though), plus they provide state of the art accuracy when it comes to complex image recognition.
The following project deals with the automation of detection of Pneumonia in patients by using their Chest X-Ray Images.
You can download the entire dataset here if you want to follow along.
I’ve explained the extraction and cleaning process of the dataset in deep in one of my blogs. You can check this out in case you need help regarding that.
Moving on, in this section I’m going to focus more on the deep learning part and how actually things work in CNNs. So, let’s get started.
For this particular task, I’ve chosen to apply Convolutional Neural Networks on our dataset. So, we first try to create a baseline model and see how well does it fit.
Let's import all the libraries that we're going to use in this section.
import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Conv2D from tensorflow.keras.layers import MaxPooling2D from tensorflow.keras.layers import Dense from tensorflow.keras.layers import Flatten from tensorflow.keras.optimizers import SGD
Now, we need to normalize our input arrays. We do this because at present our tensors' value range from 0 to 255, what the normalization will do is bring the values to range between 0 and 1. This just helps the network to learn the algorithm better and a significant amount of change in accuracy is seen (Without normalization, accuracy -> 42 %, with normalization, accuracy -> >90 %).
Normalizing our data:
X = tf.keras.utils.normalize(X, axis = 1)
Or more easily, you can do just this:
X = X / 255.0
Now, let's define our model:
def define_model(): model = Sequential() model.add(Conv2D(32, (3, 3), activation = 'relu', padding = 'same', input_shape = (200, 200, 3)))
Here, we added a convolutional layer having 32 nodes and a filter of (3 * 3). I think 32 is a good number for our first layer. The first layer generally only detects common shapes like squares and straight lines.
We're using the Rectified Linear Unit (or relu) as our activation function. This is one of the most widely used activation function when designing networks today. One of the main advantage of using the relu function is that it does not activate all the neurons at the same time.
According to the plot here, if the input is negative, it'll be automatically converted to 0, and so the neurons doesn't get activated. So, at any particular time, only a few neurons get activated, making the network sparse and very efficient.
Next, we add a Pooling layer:
The pooling layer's main objective is to reduce the spatial dimensions of the data propagating through the network.
Max-Pooling provides spatial-variance which enables the neural network to recognize objects in an image even if the object does not exactly resemble the original object.
We provided the input shape as (200, 200, 3) which is the dimension of our images. So, to fit the data, we need to flatten the input. For this, we use:
After this, we add 2 Dense layers:
model.add(Dense(128, activation='relu', kernel_initializer='he_uniform')) model.add(Dense(1, activation='sigmoid'))
The last Dense layer has only 1 node because this defines the dimension of our output space which is 1 (i.e.,  or ).
Now, it's time to compile our model.
opt = SGD(lr=0.001, momentum=0.9) model.compile(optimizer=opt, loss='binary_crossentropy', metrics=['accuracy']) return model
We use Stochastic Gradient Descent (or SGD) optimizer in this case where we set the learning rate to 0.001 which is the general learning rate.
Lastly, it is time to fit our model.
model = define_model() model.fit(X, y, batch_size = 32, epochs = 25, validation_data = (X_test[:522], y_test[:522]), shuffle = True)
We set the batch size to 32 (you can change it if you want) and epochs to 25. I've separated the test set from the train which is provided here in the validation_data section. On running it, we get an accuracy of >75 %.
But this is just our baseline model with only one convolutional layer.
model.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform', padding=' model.add(MaxPooling2D((2, model.add(Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_uniform', padding=' model.add(MaxPooling2D((2, model.add(Conv2D(256, (3, 3), activation='relu', kernel_initializer='he_uniform', padding=' model.add(MaxPooling2D((2, model.add(Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_uniform', padding=' model.add(MaxPooling2D((2, 2)))
After adding 4 more layers, we get an accuracy of >95% which is quite good when you've only 5,863 images. The accuracy of CNNs increase along with the increase in size of dataset.
Well, I hope this helps you to understand the concepts of CNNs. If you've any doubts regarding the code, drop a comment below and I'll try to solve them to the best of my knowledge. For the source code, check this link out.