Introduction
Image classification is a pillar of the domain of computer vision that is a very good introduction to the domain of machine learning. In this article, we will go on a journey to build an image classifier from scratch with the aid of Python and Keras. At the end of this, you will have a working model that can classify images with a very acceptable degree of accuracy. So, let us begin!
Selecting a Dataset
The initial action to undertake with any machine learning activity is to find a fitting dataset to work with. It is best to find a well-documented dataset that is well-balanced—not too big and not too complex. Of the most intriguing challenges of image classification to tackle are:
- MNIST: Handwritten digits (10 classes)
- CIFAR-10: Small color images (10 classes)
- Fashion MNIST: Fashion article images (10 classes)
For this guide, we will work with the CIFAR-10 database. The database includes 60,000 32x32 color images that are split into 10 classes with 6,000 images per class. The classes are airplane, car, bird, cat, deer, dog, frog, horse, ship, and truck.
The CIFAR-10 database can be obtained by the following code:
from tensorflow.keras.datasets import cifar10
(train_images, train_labels), (test_images, test_labels) = cifar10.load_data()
Setting Up Your Environment
Before diving into the code itself, you need to have the proper software installed to successfully finish this exercise:
- Python version 3.x
- TensorFlow 2.x, included with Keras
- NumPy
- Matplotlib (for visualization)
They can be installed with pip:
pip install tensorflow numpy matplotlib
Prepare the Data
Once the database is downloaded and the environment is established, we proceed with the following stages to prepare the learning material:
- Modify the value of the pixel to between 0 to 1
- Transform the category to a representation of a one-hot vector
- Split the data into training sets and testing sets
Here lies the code that serves this function:
train_images = train_images / 255.0
test_images = test_images / 255.0
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)
Building the Model
At long last, we are at the exciting phase of building the neural network! We will have a convolutional neural network (CNN), a format that is highly adept at processing image information. We will have a simple CNN consisting of the following layers:
- Conv2D layer with 32 filters, 3x3 kernel, ReLU activation
- A MaxPooling2D with a 2x2 pooling area
- A Conv2D with 64 filters with a 3x3 kernel and ReLU activation
- A MaxPooling2D with a 2x2 pool size
- Flatten the layer to reshape 2D features to 1D
- Dense layer with 64 units, ReLU activation
- Dense output layer with 10 units, softmax activation
This is the way that it looks:
model = models.Sequential([
Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
MaxPooling2D((2, 2)),
Conv2D(64, kernel_size=(3, 3), activation='relu'),
MaxPooling2D((2, 2)),
Conv2D(64, (3, 3), activation='relu'),
Flatten(),
Dense(64, activation='relu'),
Dense(10, activation='softmax')
])
Training and Evaluation
With our architecture built out, the time is ready to actually train the model with our information. We first build the model out with the optimizer, loss function, and metrics we want to track:
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
Then, we train the model using fit():
history = model.fit(train_images, train_labels, epochs=10,
validation_data=(test_images, test_labels))
After training, we can evaluate the model's performance on the test set:
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
print('Test accuracy:', test_acc)
We can also plot the training and validation accuracy over time:
plt.plot(history.history['accuracy'], label='accuracy')
plt.plot(history.history['val_accuracy'], label = 'val_accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
Conclusion
Congratulations, you have successfully built your very first image classification model! With minimal code touch, we were able to train a CNN that has a 70% accuracy rate of correctly predicting classes of images. Of course, much is still to improve upon; you could look into techniques like data augmentation or transfer learning to improve performance all the way!
I trust that this guide has introduced you to the potential of machine learning and computer vision. Carry on with learning, and have a nice time programming!
References and Resources
- CIFAR-10 dataset: https://www.cs.toronto.edu/~kriz/cifar.html
- Keras documentation: https://keras.io/
- TensorFlow tutorials: https://www.tensorflow.org/tutorials
- Stanford CS231n: Convolutional Neural Networks for Visual Recognition: https://cs231n.github.io/
Top comments (1)
Excellent work!