Introduction to CNNs: Convolutional Layers, Pooling Layers, and Fully Connected Layers
Convolutional Neural Networks (CNNs) are a class of deep learning algorithms primarily used for processing structured grid data such as images. Unlike traditional neural networks, CNNs take advantage of the hierarchical pattern in data to capture spatial and temporal dependencies efficiently. CNNs are widely applied in various fields, such as image recognition, video analysis, medical image processing, and natural language processing.
This article will break down CNN architecture, explore the key layers—Convolutional, Pooling, and Fully Connected layers—and provide a hands-on project using TensorFlow/Keras to build a CNN for image classification on the CIFAR-10 dataset.
Key Components of CNNs
Convolutional Layers: Think of a convolutional layer as a magnifying glass that looks at small parts of a picture. Instead of trying to see the whole image at once, it looks at tiny pieces one at a time, like a 3×3 square. The magnifying glass focuses on patterns like edges, shapes, or colors. It’s like finding corners, lines, or textures in a picture. Different magnifying glasses (called filters) help the network see different parts of the image more clearly.
Pooling Layers: After the magnifying glass finds important patterns, the pooling layer comes in like a “simplifier.” It reduces the size of the image by picking out the most important details. Imagine you’re looking at a group of four tiles, and you only keep the biggest or most noticeable one (this is called Max Pooling). By doing this, we make the image smaller but still keep the important information. It’s like shrinking a big, detailed picture into a smaller one, but you can still see what’s important.
Fully Connected Layers: Finally, fully connected layers work like the decision-making part of the brain. After the convolutional and pooling layers have figured out the important patterns, the fully connected layer takes all this information and tries to understand what it means. It connects every detail from the previous layers and tries to guess what the image is (like whether it’s a dog, a cat, or a tree). It’s like taking all the clues and putting them together to make a final decision.
1. Convolutional Layers
Convolutional layers are the core building blocks of a CNN. Instead of connecting each input to each neuron as in fully connected layers, CNNs use kernels (or filters) that slide over the input image. Each filter is a small matrix (often 3×3 or 5×5) that is applied across the image to capture different features such as edges, textures, or other patterns.
The process of applying a filter to an input image is called convolution. As the filter slides across the input, it computes a dot product between the filter and the local regions of the input image, creating a feature map. Multiple filters are used to capture different features from the image.
2. Pooling Layers
After convolution, pooling layers are used to downsample the feature maps. Pooling reduces the spatial dimensions (width and height) of the feature map, making the network more computationally efficient and robust to small translations in the input.
The most common type of pooling is Max Pooling, which selects the maximum value from a patch of the feature map. Other types include Average Pooling, which computes the average value over a patch.
For example, in a 2×2 Max Pooling layer, the feature map would be divided into 2×2 regions, and the maximum value from each region would be selected to form a smaller feature map.
3. Fully Connected Layers
Fully connected layers (also known as Dense layers in Keras) are typically placed at the end of a CNN architecture. These layers are responsible for combining all the extracted features to make final predictions.
In a fully connected layer, every neuron in the previous layer is connected to every neuron in the current layer. This is the same as in traditional neural networks. These layers allow the model to make final decisions based on the features learned from convolution and pooling.
4. Activation Functions
Activation functions like ReLU (Rectified Linear Unit) are applied after convolution and pooling to introduce non-linearity.
It helps in avoiding vanishing gradient problems and introduces non-linearity to the model, making it capable of learning complex patterns.
After we use the convolutional and pooling layers to find patterns and simplify the image, we need something to help the network understand more complex things. That’s where activation functions come in!
The most popular activation function is called ReLU (Rectified Linear Unit). It’s like a rule that the network follows after every step. The rule is simple: if a number is positive, keep it the same; if it’s negative, change it to zero.
This helps the network focus only on the important details (the positive numbers) and ignore unnecessary things (the negative numbers). By doing this, ReLU helps the network learn better and understand more complicated patterns, like telling the difference between different objects.
In short, ReLU helps the network by adding some “smarts” or flexibility, making sure it doesn’t just look at everything in a straight line but can handle more complex information.
Building CNNs with TensorFlow/Keras
Let’s move on to building a CNN using the TensorFlow/Keras framework. TensorFlow/Keras provides a high-level API to build and train deep learning models efficiently.
Step-by-Step Guide to Build a CNN
1. Installing Required Libraries
First, install the required libraries if they are not already installed:
pip install tensorflow numpy matplotlib
2. Importing the Required Libraries
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
3. Loading the CIFAR-10 Dataset
The CIFAR-10 dataset consists of 60,000 32×32 color images in 10 classes, with 6,000 images per class. The classes are airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks.
Key Features of CIFAR-10:
- Images:
- The dataset consists of 60,000 color images.
- Each image is of size 32×32 pixels.
- There are 3 color channels (RGB) in each image.
- Classes:
- CIFAR-10 has 10 different classes. Each class represents a different object category:
- Airplane
- Automobile
- Bird
- Cat
- Deer
- Dog
- Frog
- Horse
- Ship
- Truck
- There are 6,000 images per class.
- CIFAR-10 has 10 different classes. Each class represents a different object category:
- Training and Testing Sets:
- The dataset is divided into:
- 50,000 training images.
- 10,000 test images.
- This allows you to train your model on the training set and evaluate it on the test set.
- The dataset is divided into:
- Task:
- The goal with CIFAR-10 is to build a model that can classify the images into one of the 10 classes.
# Load the CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = datasets.cifar10.load_data()
# Normalize the data
x_train, x_test = x_train / 255.0, x_test / 255.0
4. Visualizing the Data
It’s helpful to visualize some sample images to understand what the dataset looks like.
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
# Display first 10 images
plt.figure(figsize=(10,10))
for i in range(10):
plt.subplot(2,5,i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(x_train[i])
plt.xlabel(class_names[y_train[i][0]])
plt.show()
5. Building the CNN Model
Now, let’s build a CNN with multiple convolutional and pooling layers, followed by a fully connected layer:
# Create the model
model = models.Sequential()
# Add the first convolutional layer
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
# Add the second convolutional layer
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
# Add the third convolutional layer
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
# Flatten the output and add the fully connected layer
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10)) # Output layer with 10 neurons for 10 classes
Create the model:
The model is built using the Sequential
class, which allows us to stack layers one after the other.
model = models.Sequential()
First Convolutional Layer:
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
The first layer is a convolutional layer with 32 filters, each of size 3×3. It uses the ReLU activation function to add non-linearity. The input_shape=(32, 32, 3)
means the input is a 32×32 pixel image with 3 color channels (RGB).
After the convolution, we add Max Pooling to reduce the size of the feature maps and keep the most important information. This Max Pooling layer reduces the size by half (from 32×32 to 16×16).
Second Convolutional Layer:
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
The second convolutional layer has 64 filters, again with a 3×3 size, and uses ReLU activation.Another Max Pooling layer is added to further reduce the size of the feature maps (from 16×16 to 8×8).
Third Convolutional Layer:
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
This is a third convolutional layer with 64 filters and ReLU activation. No pooling is applied after this layer, so the output stays at 8×8.
Flatten and Fully Connected Layers:
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10))
The Flatten
layer turns the 3D output of the last convolutional layer (8x8x64) into a 1D vector. This is necessary for connecting it to the fully connected layers.
A dense layer with 64 neurons and ReLU activation is added. This layer learns complex patterns from the features extracted by the convolutional layers.
Finally, the last Dense
layer has 10 neurons (for 10 different classes, assuming you are classifying 10 categories), but no activation function is applied. This is often followed by softmax during training to convert the outputs into probabilities.
6. Compiling the Model
The next step is to compile the model by specifying the optimizer, loss function, and evaluation metric:
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
Optimizer:
optimizer='adam'
The Adam optimizer is used to update the weights of the model during training. Adam is an adaptive learning rate optimization algorithm that works well in practice for many problems. It’s popular because it adjusts the learning rate during training, which helps the model converge faster and more effectively.
The Adam optimizer is like a smart helper for teaching a model how to learn better. When the model is trying to get better at a task, Adam changes how fast or slow it learns in a way that helps it improve quickly without making mistakes. It’s like adjusting the speed of your bike when going up or down a hill, so you can reach the finish line faster!
Loss Function:
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
The loss function tells the model how far its predictions are from the actual answers. It helps the model learn during training by minimizing the difference (error).
SparseCategoricalCrossentropy is used for classification tasks with multiple classes. This function is specifically for cases where the target labels are provided as integers (like 0, 1, 2, … for each class).
from_logits=True
means that the model’s output is not passed through a softmax activation yet, so the raw output (logits) will be used, and the softmax will be applied internally during training.
“from_logits=True” means that the model gives its answers as raw numbers, not yet turned into probabilities. Later, the computer will turn those raw numbers into probabilities during training to help the model learn better. It’s like getting a rough draft before polishing it up!
Metrics:
metrics=['accuracy']
Accuracy is the metric we are using to evaluate how well the model is doing. It calculates the percentage of correct predictions over the total predictions, making it easy to track performance during training and evaluation.
7. Training the Model
Now, train the model on the CIFAR-10 dataset for 10 epochs:
history = model.fit(x_train, y_train, epochs=10,
validation_data=(x_test, y_test))
8. Evaluating the Model
Finally, evaluate the model’s performance on the test set:
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=2)
print(f"Test accuracy: {test_acc}")
9. Plotting Training History
You can visualize the training accuracy and loss over epochs to understand how well the model is learning:
plt.plot(history.history['accuracy'], label='accuracy')
plt.plot(history.history['val_accuracy'], label = 'val_accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0.5, 1])
plt.legend(loc='lower right')
plt.show()
This graph shows the accuracy of a model over several training epochs (represented on the x-axis) and compares the accuracy on training data (blue line labeled “accuracy”) with the accuracy on validation data (orange line labeled “val_accuracy”). Here’s a breakdown:
- Epochs (x-axis): Each point on the x-axis represents one full pass of the training data through the model.
- Accuracy (y-axis): This shows how well the model is performing, with values between 0 and 1 (where 1 represents 100% accuracy).
Key Observations:
- Training accuracy (blue line): This line steadily increases as the model improves its performance on the training data, approaching a value near 0.9 by the end of the epochs.
- Validation accuracy (orange line): This represents how well the model is performing on unseen data (validation set). Initially, it follows the training accuracy but starts to flatten and even slightly decrease after a few epochs.
Interpretation:
- The model is learning and improving on both the training and validation data in the early epochs.
- Around the 5th epoch, the validation accuracy starts to decrease slightly while the training accuracy continues to improve, indicating potential overfitting. This means the model may be learning patterns specific to the training data but not generalizing well to new, unseen data.
In summary, the model is performing well, but after a certain point, it may need some regularization techniques or early stopping to avoid overfitting to the training data.
This second graph shows the loss (how far off the model’s predictions are) over time during training (blue line) and validation (orange line). Loss is on the y-axis, and the number of epochs (training cycles) is on the x-axis.
Key Observations:
- Training loss (blue line): The model’s loss steadily decreases over time, indicating that the model is learning and getting better at minimizing the error during training.
- Validation loss (orange line): This initially decreases along with the training loss but starts to flatten and even rise after a few epochs.
Interpretation:
- Both training and validation loss decrease initially, meaning the model is learning well in the beginning.
- Around epoch 5, the validation loss starts to increase, which suggests overfitting. The model is performing well on the training data (low training loss), but it’s not generalizing as well to new data (higher validation loss).
In summary, while the model is learning during training (lower training loss), the increasing validation loss indicates that it is starting to overfit to the training data. You might want to introduce techniques like early stopping, regularization, or dropout to address this overfitting and help the model generalize better.
Hands-On Project: Image Classification using CIFAR-10
For the hands-on project, we built a CNN that classifies images from the CIFAR-10 dataset. The model consists of multiple convolutional and pooling layers, followed by a fully connected layer. After training, we evaluated the model’s performance on unseen test data and achieved a solid accuracy score.
Complete Code
# Import necessary libraries
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
# Load the CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = datasets.cifar10.load_data()
# Normalize the data (pixel values between 0 and 1)
x_train, x_test = x_train / 255.0, x_test / 255.0
# Define class names for CIFAR-10
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
# Visualize the first 10 images of the training set
plt.figure(figsize=(10,10))
for i in range(10):
plt.subplot(2,5,i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(x_train[i])
plt.xlabel(class_names[y_train[i][0]])
plt.show()
# Create the CNN model
model = models.Sequential()
# First convolutional layer
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
# Second convolutional layer
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
# Third convolutional layer
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
# Flatten the output from the convolutional layers
model.add(layers.Flatten())
# Fully connected layer
model.add(layers.Dense(64, activation='relu'))
# Output layer (for 10 classes)
model.add(layers.Dense(10))
# Print the model summary
model.summary()
# Compile the model
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
# Train the model
history = model.fit(x_train, y_train, epochs=10,
validation_data=(x_test, y_test))
# Evaluate the model on the test set
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=2)
print(f"Test accuracy: {test_acc}")
# Plot training and validation accuracy over epochs
plt.plot(history.history['accuracy'], label='accuracy')
plt.plot(history.history['val_accuracy'], label='val_accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0, 1])
plt.legend(loc='lower right')
plt.show()
# Plot training and validation loss over epochs
plt.plot(history.history['loss'], label='loss')
plt.plot(history.history['val_loss'], label='val_loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend(loc='upper right')
plt.show()
Output
2024-10-13 22:18:07.951493: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-10-13 22:18:09.320242: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
D:\internship\unsupervised_learning\tensorflow\venv\lib\site-packages\keras\src\layers\convolutional\base_conv.py:107: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
super().__init__(activity_regularizer=activity_regularizer, **kwargs)
2024-10-13 22:18:47.631835: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type) ┃ Output Shape ┃ Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ conv2d (Conv2D) │ (None, 30, 30, 32) │ 896 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling2d (MaxPooling2D) │ (None, 15, 15, 32) │ 0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_1 (Conv2D) │ (None, 13, 13, 64) │ 18,496 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling2d_1 (MaxPooling2D) │ (None, 6, 6, 64) │ 0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_2 (Conv2D) │ (None, 4, 4, 64) │ 36,928 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ flatten (Flatten) │ (None, 1024) │ 0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense (Dense) │ (None, 64) │ 65,600 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_1 (Dense) │ (None, 10) │ 650 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
Total params: 122,570 (478.79 KB)
Trainable params: 122,570 (478.79 KB)
Non-trainable params: 0 (0.00 B)
Epoch 1/10
1563/1563 ━━━━━━━━━━━━━━━━━━━━ 19s 11ms/step - accuracy: 0.3574 - loss: 1.7384 - val_accuracy: 0.5347 - val_loss: 1.3179
Epoch 2/10
1563/1563 ━━━━━━━━━━━━━━━━━━━━ 17s 11ms/step - accuracy: 0.5910 - loss: 1.1685 - val_accuracy: 0.6264 - val_loss: 1.0814
Epoch 3/10
1563/1563 ━━━━━━━━━━━━━━━━━━━━ 16s 10ms/step - accuracy: 0.6529 - loss: 0.9978 - val_accuracy: 0.6744 - val_loss: 0.9342
Epoch 4/10
1563/1563 ━━━━━━━━━━━━━━━━━━━━ 16s 10ms/step - accuracy: 0.6894 - loss: 0.8802 - val_accuracy: 0.6828 - val_loss: 0.9199
Epoch 5/10
1563/1563 ━━━━━━━━━━━━━━━━━━━━ 17s 11ms/step - accuracy: 0.7155 - loss: 0.8091 - val_accuracy: 0.6946 - val_loss: 0.8982
Epoch 6/10
1563/1563 ━━━━━━━━━━━━━━━━━━━━ 17s 11ms/step - accuracy: 0.7406 - loss: 0.7404 - val_accuracy: 0.6991 - val_loss: 0.8734
Epoch 7/10
1563/1563 ━━━━━━━━━━━━━━━━━━━━ 15s 10ms/step - accuracy: 0.7635 - loss: 0.6770 - val_accuracy: 0.7077 - val_loss: 0.8666
Epoch 8/10
1563/1563 ━━━━━━━━━━━━━━━━━━━━ 16s 10ms/step - accuracy: 0.7776 - loss: 0.6312 - val_accuracy: 0.7160 - val_loss: 0.8427
Epoch 9/10
1563/1563 ━━━━━━━━━━━━━━━━━━━━ 15s 10ms/step - accuracy: 0.7965 - loss: 0.5824 - val_accuracy: 0.6889 - val_loss: 0.9559
Epoch 10/10
1563/1563 ━━━━━━━━━━━━━━━━━━━━ 15s 10ms/step - accuracy: 0.8081 - loss: 0.5419 - val_accuracy: 0.7105 - val_loss: 0.8948
313/313 - 1s - 4ms/step - accuracy: 0.7105 - loss: 0.8948
Test accuracy: 0.7105000019073486
Explanation of the Output and Logs
1. oneDNN Custom Operations:
2024-10-13 22:18:07.951493: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
What is oneDNN?: oneDNN (Deep Neural Network Library) is an open-source performance library for deep learning applications, optimized for Intel CPUs. TensorFlow uses oneDNN to speed up certain operations on CPU, such as convolutions and matrix multiplications.
Impact: When oneDNN optimizations are enabled, the computations may follow slightly different orders due to floating-point arithmetic. As a result, there might be small differences in numerical results (round-off errors) compared to non-optimized computations. These differences are generally negligible in practical scenarios.
How to disable oneDNN: If you want to disable oneDNN optimizations and get more consistent numerical results, set the environment variable TF_ENABLE_ONEDNN_OPTS=0
before running your code.
2. TensorFlow Optimization for CPU:
2024-10-13 22:18:47.631835: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
CPU optimizations: TensorFlow is checking for CPU features such as AVX2 and FMA (Advanced Vector Extensions and Fused Multiply-Add), which are CPU instruction sets that accelerate deep learning tasks.
Impact: If your CPU supports these instructions, TensorFlow will use them to optimize performance. If not, it may suggest recompiling TensorFlow with the appropriate compiler flags to fully utilize your CPU’s capabilities.
3. Model Summary:
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type) ┃ Output Shape ┃ Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ conv2d (Conv2D) │ (None, 30, 30, 32) │ 896 │
│ max_pooling2d (MaxPooling2D) │ (None, 15, 15, 32) │ 0 │
│ conv2d_1 (Conv2D) │ (None, 13, 13, 64) │ 18,496 │
│ max_pooling2d_1 (MaxPooling2D) │ (None, 6, 6, 64) │ 0 │
│ conv2d_2 (Conv2D) │ (None, 4, 4, 64) │ 36,928 │
│ flatten (Flatten) │ (None, 1024) │ 0 │
│ dense (Dense) │ (None, 64) │ 65,600 │
│ dense_1 (Dense) │ (None, 10) │ 650 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
Total params: 122,570 (478.79 KB)
Trainable params: 122,570 (478.79 KB)
Non-trainable params: 0 (0.00 B)
Layer breakdown: This is the model summary showing each layer’s output shape and the number of parameters:
- Conv2D Layers: These layers apply convolution operations on the input. The parameters in Conv2D layers are determined by the filter size and the number of filters.
- MaxPooling2D: Reduces the spatial dimensions of the data by taking the maximum value in a pool (sub-region). This layer has no parameters. The MaxPooling2D layer is used in convolutional neural networks to reduce the size of the input data by selecting the maximum value from small regions (called pools) within the data. This helps decrease the amount of computation and prevents overfitting by making the model focus on the most important features. Since it only selects the maximum value and doesn’t involve weights or biases, it has no parameters to learn.
- Flatten: Flattens the multi-dimensional tensor into a 1D tensor to feed into fully connected (Dense) layers.
- Dense Layers: Fully connected layers where every input is connected to every output. The first Dense layer has 64 neurons, and the final output layer has 10 neurons (one for each class).
4. Training Progress:
Epoch 1/10
1563/1563 ━━━━━━━━━━━━━━━━━━━━ 19s 11ms/step - accuracy: 0.3574 - loss: 1.7384 - val_accuracy: 0.5347 - val_loss: 1.3179
- Epoch 1/10: This line shows the progress of the first epoch during training.
- 1563/1563: The number of batches processed out of the total number of batches.
- 19s: Time taken to process all the batches for this epoch.
- accuracy: 0.3574: Training accuracy after this epoch (35.74%).
- loss: 1.7384: Training loss after this epoch. Refers to the training loss value at the end of a particular epoch. It indicates how far off the model’s predictions are from the actual results, with lower values being better.
- val_accuracy: 0.5347: Validation accuracy (53.47%) on the test set.
- val_loss: 1.3179: Validation loss. Refers to the validation loss at the end of the epoch, which measures how well the model is performing on unseen data (the validation set).
As you go through each epoch, the accuracy improves, and the loss decreases.
5. Test Accuracy:
313/313 - 1s - 4ms/step - accuracy: 0.7105 - loss: 0.8948
Test accuracy: 0.7105000019073486
Test accuracy: After training the model for 10 epochs, its performance on the test set is evaluated. The test accuracy is 71.05%.Test loss: The loss on the test set is 0.8948, indicating how well the model performed.
Conclusion:
- The code is working as expected, and the model is successfully training and validating on the CIFAR-10 dataset.
- The
oneDNN
optimizations are active and helping the CPU operations. - The model achieved a test accuracy of 71.05% after 10 epochs, which is decent for a basic CNN on CIFAR-10. You can further optimize the model or experiment with deeper architectures for improved performance.