Arthur Danjou
Back to Projects

Dropout Reduces Underfitting

Research ProjectCompleted

TensorFlow/Keras implementation and reproduction of "Dropout Reduces Underfitting" (Liu et al., 2023). A comparative study of Early and Late Dropout strategies to optimize model convergence.

December 10, 2024 6 min read
PythonTensorFlowDeep LearningResearch

Study and reproduction of the paper: Liu, Z., et al. (2023). Dropout Reduces Underfitting. arXiv:2303.01500.

The paper is available at: https://arxiv.org/abs/2303.01500

This repository contains a robust, modular TensorFlow/Keras implementation of Early Dropout and Late Dropout strategies. The goal is to verify the hypothesis that dropout, traditionally used to reduce overfitting, can also combat underfitting when applied only during the initial training phase.

🎯 Scientific Objectives

The study aims to validate the operating regimes of Dropout described in the paper:

  1. Early Dropout (Targeting Underfitting): Active only during the initial phase to reduce gradient variance and align their direction, enabling better final optimization.
  2. Late Dropout (Targeting Overfitting): Disabled at the start to allow rapid learning, then activated to regularize final convergence.
  3. Standard Dropout: Constant rate throughout training (baseline).
  4. No Dropout: Control experiment without dropout.

πŸ› οΈ Technical Architecture

Unlike naive Keras callback implementations, this project uses a dynamic approach via the TensorFlow graph to ensure the dropout rate updates on the GPU without model recompilation.

Key Components

  • DynamicDropout: A custom layer inheriting from keras.layers.Layer that reads its rate from a shared tf.Variable.
  • DropoutScheduler: A Keras Callback that drives the rate variable based on the current epoch and the chosen strategy (early, late, standard).
  • ExperimentPipeline: An orchestrator class that handles data loading (MNIST, CIFAR-10, Fashion MNIST), model creation (Dense or CNN), and execution of comparative benchmarks.

File Structure

.
β”œβ”€β”€ README.md                         # This documentation file
β”œβ”€β”€ Dropout reduces underfitting.pdf  # Original research paper
β”œβ”€β”€ pipeline.py                       # Main experiment pipeline
β”œβ”€β”€ pipeline.ipynb                    # Jupyter notebook for experiments
β”œβ”€β”€ pipeline_mnist.ipynb              # Jupyter notebook for MNIST experiments
β”œβ”€β”€ pipeline_cifar10.ipynb            # Jupyter notebook for CIFAR-10 experiments
β”œβ”€β”€ pipeline_cifar100.ipynb           # Jupyter notebook for CIFAR-100 experiments
β”œβ”€β”€ pipeline_fashion_mnist.ipynb      # Jupyter notebook for Fashion MNIST experiments
β”œβ”€β”€ requirements.txt                  # Python dependencies
β”œβ”€β”€ .python-version                   # Python version specification
└── uv.lock                           # Dependency lock file

πŸš€ Installation

# Clone the repository
git clone https://github.com/arthurdanjou/dropoutreducesunderfitting.git
cd dropoutreducesunderfitting

Install dependencies

pip install tensorflow numpy matplotlib seaborn scikit-learn

πŸ“Š Usage

The main notebook pipeline.ipynb contains all necessary code. Here is how to run a typical experiment via the pipeline API.

1. Initialization

Choose your dataset (cifar10, fashion_mnist, mnist) and architecture (cnn, dense).

from pipeline import ExperimentPipeline

# Fashion MNIST is recommended to observe underfitting/overfitting nuances
exp = ExperimentPipeline(dataset_name="fashion_mnist", model_type="cnn")

2. Learning Curves Comparison

Compare training dynamics (loss and accuracy) of the three strategies.

exp.compare_learning_curves(
    modes=["standard", "early", "late"],
    switch_epoch=10,  # The epoch where dropout state changes
    rate=0.4,         # Dropout rate
    epochs=30
)

3. Ablation Studies

Study the impact of the "Early" phase duration or Dropout intensity.

# Impact of the switch epoch on final performance
exp.compare_switch_epochs(
    switch_epochs=[5, 10, 15, 20],
    modes=["early"],
    rate=0.4,
    epochs=30
)

# Impact of the dropout rate
exp.compare_drop_rates(
    rates=[0.2, 0.4, 0.6],
    modes=["standard", "early"],
    switch_epoch=10,
    epochs=25
)

4. Data Regimes (Data Scarcity)

Verify the paper's hypothesis that Early Dropout shines on large datasets (or limited models) while Standard Dropout protects small datasets.

# Training on 10%, 50% and 100% of the dataset
exp.run_dataset_size_comparison(
    fractions=[0.1, 0.5, 1.0],
    modes=["standard", "early"],
    rate=0.3,
    switch_epoch=10
)

πŸ“ˆ Expected Results

According to the paper, you should observe:

  • Early Dropout: Higher initial loss, followed by a sharp drop after the switch_epoch, often reaching a lower minimum than Standard Dropout (reduction of underfitting).
  • Late Dropout: Rapid rise in accuracy at the start (potential overfitting), then stabilized by the activation of dropout.

πŸ“„ Detailed Report

πŸ“ Authors

M.Sc. Statistical and Financial Engineering (ISF) - Data Science Track at UniversitΓ© Paris-Dauphine PSL

Based on the work of Liu, Z., et al. (2023). Dropout Reduces Underfitting.