Dropout Reduces Underfitting
TensorFlow/Keras implementation and reproduction of "Dropout Reduces Underfitting" (Liu et al., 2023). A comparative study of Early and Late Dropout strategies to optimize model convergence.
Study and reproduction of the paper: Liu, Z., et al. (2023). Dropout Reduces Underfitting. arXiv:2303.01500.
The paper is available at: https://arxiv.org/abs/2303.01500
This repository contains a robust, modular TensorFlow/Keras implementation of Early Dropout and Late Dropout strategies. The goal is to verify the hypothesis that dropout, traditionally used to reduce overfitting, can also combat underfitting when applied only during the initial training phase.
π― Scientific Objectives
The study aims to validate the operating regimes of Dropout described in the paper:
- Early Dropout (Targeting Underfitting): Active only during the initial phase to reduce gradient variance and align their direction, enabling better final optimization.
- Late Dropout (Targeting Overfitting): Disabled at the start to allow rapid learning, then activated to regularize final convergence.
- Standard Dropout: Constant rate throughout training (baseline).
- No Dropout: Control experiment without dropout.
π οΈ Technical Architecture
Unlike naive Keras callback implementations, this project uses a dynamic approach via the TensorFlow graph to ensure the dropout rate updates on the GPU without model recompilation.
Key Components
DynamicDropout: A custom layer inheriting fromkeras.layers.Layerthat reads its rate from a sharedtf.Variable.DropoutScheduler: A KerasCallbackthat drives the rate variable based on the current epoch and the chosen strategy (early,late,standard).ExperimentPipeline: An orchestrator class that handles data loading (MNIST, CIFAR-10, Fashion MNIST), model creation (Dense or CNN), and execution of comparative benchmarks.
File Structure
.
βββ README.md # This documentation file
βββ Dropout reduces underfitting.pdf # Original research paper
βββ pipeline.py # Main experiment pipeline
βββ pipeline.ipynb # Jupyter notebook for experiments
βββ pipeline_mnist.ipynb # Jupyter notebook for MNIST experiments
βββ pipeline_cifar10.ipynb # Jupyter notebook for CIFAR-10 experiments
βββ pipeline_cifar100.ipynb # Jupyter notebook for CIFAR-100 experiments
βββ pipeline_fashion_mnist.ipynb # Jupyter notebook for Fashion MNIST experiments
βββ requirements.txt # Python dependencies
βββ .python-version # Python version specification
βββ uv.lock # Dependency lock file
π Installation
# Clone the repository
git clone https://github.com/arthurdanjou/dropoutreducesunderfitting.git
cd dropoutreducesunderfitting
Install dependencies
pip install tensorflow numpy matplotlib seaborn scikit-learn
π Usage
The main notebook pipeline.ipynb contains all necessary code. Here is how to run a typical experiment via the pipeline API.
1. Initialization
Choose your dataset (cifar10, fashion_mnist, mnist) and architecture (cnn, dense).
from pipeline import ExperimentPipeline
# Fashion MNIST is recommended to observe underfitting/overfitting nuances
exp = ExperimentPipeline(dataset_name="fashion_mnist", model_type="cnn")
2. Learning Curves Comparison
Compare training dynamics (loss and accuracy) of the three strategies.
exp.compare_learning_curves(
modes=["standard", "early", "late"],
switch_epoch=10, # The epoch where dropout state changes
rate=0.4, # Dropout rate
epochs=30
)
3. Ablation Studies
Study the impact of the "Early" phase duration or Dropout intensity.
# Impact of the switch epoch on final performance
exp.compare_switch_epochs(
switch_epochs=[5, 10, 15, 20],
modes=["early"],
rate=0.4,
epochs=30
)
# Impact of the dropout rate
exp.compare_drop_rates(
rates=[0.2, 0.4, 0.6],
modes=["standard", "early"],
switch_epoch=10,
epochs=25
)
4. Data Regimes (Data Scarcity)
Verify the paper's hypothesis that Early Dropout shines on large datasets (or limited models) while Standard Dropout protects small datasets.
# Training on 10%, 50% and 100% of the dataset
exp.run_dataset_size_comparison(
fractions=[0.1, 0.5, 1.0],
modes=["standard", "early"],
rate=0.3,
switch_epoch=10
)
π Expected Results
According to the paper, you should observe:
- Early Dropout: Higher initial loss, followed by a sharp drop after the switch_epoch, often reaching a lower minimum than Standard Dropout (reduction of underfitting).
- Late Dropout: Rapid rise in accuracy at the start (potential overfitting), then stabilized by the activation of dropout.
π Detailed Report
π Authors
M.Sc. Statistical and Financial Engineering (ISF) - Data Science Track at UniversitΓ© Paris-Dauphine PSL
Based on the work of Liu, Z., et al. (2023). Dropout Reduces Underfitting.