Link to the problem and jupyter notebook files

This is the final project for my Data Science class.

Here is the link of the problem from Kaggle
Here is the my presenation file
Here is the jupyter notebook file that applies basic CNN models
Here is the link to CNN_tuning jupyter_notebook

Overall, my accuracy is arround 90%

Detail of the project

A basic CNN model

np.random.seed(35675)
tf.random.set_seed(1352)

model1 = keras.Sequential([
   AveragePooling2D(6,3, input_shape=(img_dims,img_dims, 3)), 
   Conv2D(64, 3, activation='relu'),
   Conv2D(32, 3, activation='relu'),
   MaxPool2D(pool_size=(2,2)),  # so we use max pool to reduce size of image
   Dropout(0.5),    # we can drop out some connection
   Flatten(),
   Dense(128, activation='relu'),  #add another dense layer before doing the output layer
   Dense(1, activation='sigmoid')
])

When applying basic CNN model, the accuracy on the test set is 0.77960. We can improve the accuracy better and prevent some problems that can arrive with a CNN problem. Some common problems and solutions are

Some problems and common solutions to be considered

Vanishing/Exploding Gradient Problems: the gradients/signals die out or explode and saturate.
- Glorot and He Initialization:
- Nonsaturating Activation Functions: leaky ReLU.
- Batch Normalization
Callbacks - A solution to prevent overfitting and wasting of time and resouces. Common callbacks are:
- ModelCheckpoint: helps save checkpoint (by default) at the end of each epoch . Helps return the best model on the validation set.
- Earlystopping: will interrupt training when it measures no progress on the validation set for a number of epochs and it will optionally roll back to the best model.
Overfitting and some common solutions
- Simplifying The Model: decrease the complexity of the model by remove layers or reduce the number of neurons.
- Early Stopping:
- Use Data Augmentation:
- Use Regularization (L1, L2, Max-Norm regularization)
- Use Dropouts (with fixed dropout rate) and Monte-Carlo

A better CNN model for the problem

seed = 240
np.random.seed(seed)
tf.random.set_seed(seed)

input_shape = Input(shape=(img_dims, img_dims, 3))

model2 = keras.Sequential([
    Conv2D(filters=16, kernel_size=(3, 3), activation='relu', padding='same', input_shape=[img_dims, img_dims, 3]),
    BatchNormalization(), # use Batch Normalization to address the vanishing/exploding gradients problems
    MaxPool2D(pool_size=(2, 2)),
    
    Conv2D(64, 3, activation='relu'),
    Conv2D(32, 3, activation='relu'),
    
    #Note: Instead of using a convolutional layer with a 5 × 5 kernel, 
    # it is generally preferable to stack two layers with 3 × 3 kernels:
    # it will use less parameters and require less computations, and will usually perform better (page447 Hand_on)
    SeparableConv2D(filters=64, kernel_size=(3, 3), activation='relu', padding='same'),
    SeparableConv2D(filters=64, kernel_size=(3, 3), activation='relu', padding='same'),
    BatchNormalization(), 
    MaxPool2D(pool_size=(2, 2)),
    
    #Note: Note that the number of filters grows as we climb up the CNN towards the output layer (64, 128, 256) (page448 Hanh-on)
    SeparableConv2D(filters=128, kernel_size=(3, 3), activation='relu', padding='same'),
    SeparableConv2D(filters=128, kernel_size=(3, 3), activation='relu', padding='same'),
    BatchNormalization(),  
    MaxPool2D(pool_size=(2, 2)),
    Dropout(0.2),   
    
    SeparableConv2D(filters=256, kernel_size=(3, 3), activation='relu', padding='same'),
    SeparableConv2D(filters=256, kernel_size=(3, 3), activation='relu', padding='same'),
    BatchNormalization(),  
    MaxPool2D(pool_size=(2, 2)),
    Dropout(0.2), 
    
    #Note: we must flatten its inputs, since a dense network expects a 1D array of features for each instance
    Flatten(),
    Dense(128, activation='relu'),  #add another dense layer before doing the output layer
    Dropout(rate=0.5),     #NOte: dropout to prvent overfitting
    Dense(units=64, activation='relu'),
    Dropout(rate=0.5),
    Dense(1, activation='sigmoid')
])

With this approach, the model stops training at echo=9 and gives testing accuracy is 0.901315

Tuning the CNN model

The detail can be found on jupyter notebook file that applies basic CNN models and link to CNN_tuning jupyter_notebook, over all, the accuracy is not much better than the modified CNN model.

	1.Simple Model	2. A mmodified model	3.A model with tuned hyper-parameter with callbacks	4.A model with tuned hyper-parameter without callbacks
Validation accuracy	0.8438	0.8887	0.9232	0.9587
Test accuracy	0.7796	0.9013	0.8026	0.9086