Dog Breed Classification

Dog Breed Classifier Web Application using CNN and Transfer Learning

Pirsqred
15 min readNov 19, 2020

A simple web application that allows a user to select an image of a canine (or not) to have the breed of the dog estimated. In case the image selected is that of a human it will provide an estimate of what kind of dog breed the person resembles to the most.

Dog Breed Classifier Application

Overview

The Dog Breed Classification is one of the capstone for the Data Scientist Nanodegree by Udacity. The project entails building the prediction model to writing an algorithm that is then used in a web application via Flask. The deep learning model used distinguishes between 133 breeds of canines with an accuracy over 84.45%

Domain Framework

The project involves Image Processing, classification and detection using Deep Learning. A Convolutional Neural Network (CNN) and Transfer Learning are used to deliver the results. Given an image of a dog, the CNN provides the breed of the dog with high accuracy. This prediction engine is then embedded into an algorithm fed to a web application using Flask.

Project Statement

The aim of this project is to create a classifier that is able to identify a breed of a dog given an image as input. If the image contains a human face, then the application will return the breed of dog that most resembles the person. Using Deep learning for image processing and recognition is a fascinating technology. As such it was not difficult for me to choose this project as my capstone to complete the Udacity Data Scientist Nanodegree.

Dataset Exploration

Two datasets provided by Udacity are used:

  1. The Dog Dataset:
There are 133 total dog categories.
There are 8351 total dog images.

There are 6680 training dog images.
There are 835 validation dog images.
There are 836 test dog images.

2. The Human Face Dataset:

There are 13233 total human images.

The dog data set has a total of 8351 images distributed among 133 breeds for an average total number of images per breed of aproximately 63. When I narrow this down to the training set we get the following distribution:

For the training set we get an average of about 53 images per dog breed. We can also see that the distribution displays a slight imbalance.

Additionally I have taken a look at different images of the same dog breed to look at some of their characteristics:

  • Image size
  • RGB intensity distribution
  • resolution
  • lighting condition.

Here a a couple of images of an American Water Spaniel:

It probably comes as no surprise that images are of different sizes, show different lighting conditions and resolutions and the RGB channel distribution can be very different from image to image. It certainly would be very challenging for a CNN with only dozens of images per dog breed to make an accurate prediction of what kind of dog it is looking at.

Solution Statement

The following steps were taken for my approach to this project:

  1. Importing the Datasets
  2. Write a function to detect human faces in an image.
  3. Write a function to detect dogs in an image.
  4. Write functions to preprocess the data.
  5. Create a CNN from scratch to classify dog breeds .
  6. Create a CNN using transfer learning and Xception bottleneck features.
  7. Write the algorithm.
  8. Test the Pipeline.
  9. Write a web application using flask, that takes the best model to estimate the dog breed.
  10. The user can select any image and the backend will make the prediction and display the result.

Software Environment

An anaconda installation with python 3.6 on a Mac was used to develop the software. The details can be found on my GitHub Repository.

Detecting Human Faces

OpenCV provides many pre-trained face detectors, stored as XML files on github. One of these detectors was download and stored it in the haarcascades directory.

Human Face Detector Function

In the above code cell the image is read and then converted to grayscale first, which is common practice, before the detectMultiScale function executes the classifier face_cascade. The function then returns a 0 or 1 if no face is detected or a face is detected respectively.

When tested on human face images and dog images the classifier performs excellently on human face images, but still detects a number of human faces on dog images.

Humans: 100.0%
Dogs: 11.0%

Choosing this algorithm to detect human faces requires us to let the user know that the pictures used should have a clear view of a face. This would be a reasonable expectation, however using deep learning models that are capabable of analyzing human faces at a level of detecting edges of ears and noses may provide an increase in accuracy, thus allowing users to provide images that may not be as clear.

Detecting Dogs

A pre trained ResNet50 model is used to detect dogs in images.

ResNet50 Model definition

Pre-Processing of Input Data

When using TensorFlow as backend, Keras CNNs require a 4D array as input, with shape (nb_samples,rows,columns,channels), where nb_samples corresponds to the total number of images (or samples), and rows, columns, and channels correspond to the number of rows, columns, and channels for each image, respectively.

Thepath_to_tensor function below provides for the necessary transformation. It takes a string-valued file path to a color image as input and returns a 4D tensor suitable for supplying to a Keras CNN.

Path to Tensor Function to Provide Keras CNN with the Correct Size Tensor

Additional processing is required to get the 4D tensor ready for any pre trained Keras model. First, the RGB image is converted to BGR by reordering the channels. All pre-trained models have the additional normalization step that the mean pixel (expressed in RGB as [103.939,116.779,123.68] and calculated from all pixels in all images in ImageNet) must be subtracted from every pixel in each image. This is implemented in the imported function preprocess_input (the code for this function can be found here).

Making Predictions with ResNet-50

The above pre-processing steps can now be used for extracitng the predictions using our model here is the code for that:

ResNet50 Prediction function using the pre processing functions

The Dog Detector

We now have all the building blocks to code our dog prediction function.

Dog Detector Function

The dog detector function returns True when a dog is detected and False when not. You may notice something peculiar in the fact that this function checks if the prediction falls between the values of 151 and 268, inclusive. This is because when looking at the dictionary, the categories corresponding to dogs appear in an uninterrupted sequence and correspond to dictionary keys 151–268, inclusive.

When testing the dog detector we get excellent results:

Code cell for testing the dog detector
Dogs in Humans_Files: 0.0%
Dogs in Dog_Files: 100.0%

Creating a CNN from scratch to Classify Dog Breeds

Now that we have the functions to detect human faces and dogs in images we need a way to predict the dog breed. For this purpose we will create a CNN.

The challenge in predicting a dog breed is that some dog breeds look very similar. Belwo a comparison between a Brittany and a Welsh Springer Spaniel to illustrate this challenge.

Brittany (Image Source: Dog Dataset)
Welsh Springer Spaniel (Image Source: Dog Dataset)

Then there is the challenge of having a dog breed that comes in different colors such as the Yellow , Black and Chocolate Labrador:

Yellow, Chocolate and Black Labrador (image Source: Dog Dataset)

Pre-Processing the Data

Before proceeding with the architecture of the CNN we rescale all the images by divinding every pixel in all of the images by 255:

Model Architecture

I have tried different models and achieved different levels of accuracy. The basic strategy is to use filters to learn the patterns in the data and then to pass those patterns into an MLP for categorization via softmax. The model consists of the following steps/layers:

  • Convoluational Portion: This portion of the model takes images as input and learns the patterns. This is done by alternating convolutional layers of varying depth with pooling layers to produce an approximation of bottleneck features that is passed to an MLP. Relu activation is used at each step.
  • 224 x 224 x 3 inputs (image dimensions plus a layer each for R, G, B)
  • 16 locally connected Conv layer (called filters going forward)
  • A max pooling layer of size 2
  • 64 locallly connected filters
  • a max pooling layer of size 2
  • 64 locallly connected filters (dropout is active here at 40% probability)
  • a max pooling layer of size 2.
  • Multilayer Perceptron Portion: Here we flatten the bottleneck features (the deep, narrow array created in the conv portion), pass the flat array to 500 nodes, activate with relu (this section has a dropout probability of 40%), and pass the result into 133 nodes (one for each breed/category of dog) and get the probability of each category using softmax activation.

I wanted to keep the model simple, effective, and train in a reasonble time frame and modelled it after some of the models used in the classroom. I tried other models fundamentally with the same architecture but different parameters. The test accuracy tends to be between 7–9% with a kernel of 2. I tried kernels of 3 and 5 as well and accuracy did increase with the latter model at 12.68%. I would conclude that accuracy with a kernel that is too small, e.g. 2, will not lead to high accuracies, while a kernel slightly bigger will. This is due to the fact that with a very small kernel more complex pattern could not be discerned and therefore a bigger kernel has ot be used.

Here is the model I chose:

Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D) (None, 112, 112, 16) 1216
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 56, 56, 16) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 28, 28, 64) 25664
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 14, 14, 64) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 7, 7, 64) 102464
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 3, 3, 64) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 3, 3, 64) 0
_________________________________________________________________
flatten_2 (Flatten) (None, 576) 0
_________________________________________________________________
dense_1 (Dense) (None, 500) 288500
_________________________________________________________________
dropout_2 (Dropout) (None, 500) 0
_________________________________________________________________
dense_2 (Dense) (None, 133) 66633
=================================================================
Total params: 484,477
Trainable params: 484,477
Non-trainable params: 0
_________________________________________________________________

Metrics

Because we have have a multi-class classification problem at hand, I will be using a catregorical crossentropy loss function. Associate with this and because we only have a slight imbalance in the dataset I opted for accuracy as the evaluation metric. This gives me a more binary view of how the model performs. Log loss could be used as an alternative evaluation metric; it is more of a continuous metric over the binary aspect of accuracy, meaning that the further away from the truth the classifier is, the more it will get punished. It can be debated whether this would lead to a better model in this case and could be a matter for further investigation and potential improvement.

Compiling and Training and Testing the Model

To compile the model I used the rmsprop optimizer with a categorical crossentropy loss function. The measure how successful this model is i used accuracy as a metric of success, which is a sutibale metric to use in this situation.

To train the model i used 15 epochs and and a batch size of 20. I also implemented model checkpoints to later load the model with the best validation loss.

After training the model I loaded the model with the best validation loss and tested it in the dog images test dataset

The accuracy achieved did not look very promising:

Test accuracy: 11.8421%

One technique to improve accuracy and reduce traiing time is to use transfer learning. This will be discussed in the next section.

Create a CNN to Classify Dog Breeds using Transfer Learning

Model Architecture

The idea of transfer learning is to use a pre-trained model as a fixed feature extractor, where the last convolutional output is fed as input to our model. The features of the pre-trained model can be found in its bottleneck features that can be downloaded. For this exercise I will use the Xception pre-trained model.

I followed the following steps to assemble the CNN model with transfer learning:

  • obtain Xception bottleneck features
  • Pool features (Global Average Pooling)
  • Pass pooled features into a layer with 133 nodes, and activate with the softmax function, squeezing features into probability space, and determinging the most probable category for each image.

This is suitable to this problem since we have a small data set with similiar features. In this case we can just extract the bottleneck features and feed them into a new front end that has nodes corresponding to the number of categories (dog breeds) for our new data.

The above 3 steps are illustared below:

  • obtain the bottleneck features:
  • Adding a GlobalAveragePolling2D layer followed by a Dense Layer with 133 nodes (the number of dog breeds) and a softmax activation:

Below the resulting architecture that result in 272517 parameters:

Layer (type)                 Output Shape              Param #   
=================================================================
global_average_pooling2d_2 ( (None, 2048) 0
_________________________________________________________________
dense_4 (Dense) (None, 133) 272517
=================================================================
Total params: 272,517
Trainable params: 272,517
Non-trainable params: 0
_________________________________________________________________

Compiling and Training and Testing the Model

As before, to compile the model I used the rmsprop optimizer with a categorical crossentropy loss function. The measure how successful this model is i used accuracy as a metric of success, which is a sutibale metric to use in this situation.

To train the model i used 20 epochs and and a batch size of 20. I also implemented model checkpoints to later load the model with the best validation loss.

After training the model I loaded the model with the best validation loss and tested it in the dog images test dataset.

This model provided a much improved accuracy that will allow me to use it for the following step of building a web application.

Test accuracy: 84.4498%

Predicting Dog Breed with Xception Model

The next step is to write a function that takes an image path as input and returns the dog breed predicted by the model.

This function will follow the following 3 steps:

  1. Extract the bottleneck features corresponding to the chosen CNN model.
  2. Supply the bottleneck features as input to the model to return the predicted vector. Note that the argmax of this prediction vector gives the index of the predicted dog breed.
  3. Use the dog_names array to return the corresponding breed.

The Algorithm

In this step I will write an algorithm that accepts a file path to an image and first determines whether the image contains a human, dog, or neither. Then,

  • if a dog is detected in the image, returns the predicted breed.
  • if a human is detected in the image, returns the resembling dog breed.
  • if neither is detected in the image, provide output that indicates an error.

Testing the Algorithm

The output is actually slightly better than I expected for the dogs. A bigger test sample size would be needed to see if the % accuracy from the trained and cross-validated model was generalizing well to samples. There were two very interesting results. The first is that the model did not recognize a great dane. The second was that a picture of a person did not have their face detected. Looking at the picture of the 2 ( The Person and the Great Dane) and from other runs of the face detector it seems the face and dog detection algorithms have an issue with detections when there isn’t a lot of hair or fur present in the picture. More testing of this theory would be required. Issues have also been noted with dog breed that look very similar as previously mentioned.

Points of improvement:

  • Use more training examples with bald people or people with very short hair as well as dogs with short smooth fur.
  • Use more training examples that may not conform to breed expectations, such as Great Danes without tails and different profiles.
  • augment images.
  • create an ensemble of networks for the dog detector using several of the available networks, and apply weighted voting.
  • Improve accuracy of model: changing Kernel size seems ot have a significant impact on accuracy. A kernel that is to small fails to recognize complex features, while a bigger kernel will do much better with this. Finding the correct size is key, too small or too big and we miss the details that we need for proper categorization.

An example of a successful breed estimate:

An example of an incorrect breed estimate:

The correct breed in the above example is a Bichon Frisee, which does look very similar to a Maltese.

Web Application using Flask

The code written to predict the dog breed has been incorporated into a backend application using Flask to work with the front end web application.

The Xception model created previously was saved as well as the dog_names array. Both are loaded when the backend is run.

To run the web application follow these steps:

  1. The README in my project git repository outlines the the software requirements needed to run the app
  2. Clone the project GitHub Repository
  3. Open a terminal and go to the project directory
  4. once there change to the app directory
  5. from here run the following command line:
python3 dog_breed_run.py

6. go to your browser and go to http://0.0.0.0:3001 or the address provided once the backend is fully loaded

7. Once the app is open in your browser you can select an image from your computer click the Classify Dog Breed Button and the app will return the estimated breed.

Here are some pictures of the app:

App when first loaded
App when Image is Selected
App when it return the predicted breed

Conclusion

I was a little disappointed when i first built the CNN from scratch and the accuracy was very low (less than 15%). However when learning about transfer learning and applying that to a CNN, I was pleasantly surprised on how much the accuracy improved to a level of 84.4498%. Accuracy could be further improved, by optimizing the model parameters, running for more epochs, using image augmentation and train over a larger pool of images, optimize the kernel size or use an ensemble of networks and apply weighted voting.

The App is functional but could be further improved to provide a better user experience. or allow for improved functionality points of improvement could be:

  • keep image after breed is predicted.
  • allow for more image formats to be used.
  • allow for selection of multiple images and return a prediction of all of them.
  • allow for selection of images from the web

All the project details, jupyter notebook of the code used throughout this project, the back end and app can be found on my GitHub Repository

References & Mentions

Immeasurable help has been provided by Stackoverflow

I Would like to thank Udacity for providing this great opportunity to enter the field of Data Science. It has been a great experience and the best one yet.

--

--

Pirsqred
Pirsqred

No responses yet