A Python toolkit for evaluating machine learning models based on their learning trajectories rather than solely on final performance metrics.
Traditional machine learning benchmarks focus on static evaluation, judging models by their final accuracy or other correctness-based metrics. However, this approach overlooks critical aspects of the learning process such as:
- How models refine their knowledge over time
- When and how generalization emerges
- Whether models truly learn structured representations or simply memorize training data
- How robustness evolves during training
Process-Aware Benchmarking (PAB) addresses these limitations by tracking the entire learning trajectory, providing deeper insights into model behavior and generalization capabilities.
pabkit/
├── README.md # Project overview and usage instructions
├── setup.py # Package installation configuration
├── requirements.txt # Package dependencies
├── LICENSE # MIT License
├── MANIFEST.in # Distribution manifest
│
├── pab/ # Main package directory
│ ├── __init__.py # Package initialization and imports
│ ├── core.py # Core PAB functionality and classes
│ ├── metrics.py # Metrics calculations for trajectory analysis
│ ├── visualization.py # Plotting and visualization utilities
│ ├── utils.py # Helper functions and utilities
│ ├── cli.py # Command-line interface
│ │
│ ├── tracking/ # Model checkpoint tracking
│ │ ├── __init__.py
│ │ └── checkpoint_manager.py
│ │
│ ├── adversarial/ # Adversarial attack utilities
│ │ └── __init__.py
│ │
│ ├── datasets/ # Dataset utilities
│ │ ├── __init__.py
│ │ └── imagenet.py
│ │
│ └── config/ # Configuration management
│ ├── __init__.py
│ └── default_config.py
│
├── bin/ # Command-line scripts
│ └── pab-cli # CLI entry point
│
├── examples/ # Example scripts
│ ├── __init__.py
│ ├── simple_example.py # Basic usage with CIFAR-10
│ ├── imagenet_case_study.py # ImageNet case study from the paper
│ ├── model_comparison.py # Comparing multiple models
│ ├── representation_analysis.py # Feature representation analysis
│ ├── comparative_analysis.py # In-depth comparative analysis
│ └── pab_tutorial.ipynb # Jupyter notebook tutorial
│
├── tests/ # Unit tests
│ ├── __init__.py
│ ├── test_core.py
│ ├── test_metrics.py
│ └── test_tracking.py
│
└── docs/ # Documentation
├── usage.md # Detailed usage instructions
├── mathematical_formalism.md # Mathematical foundations of PAB
└── api_reference.md # API reference documentation
pip install pabkitOr install from source:
git clone https://github.com/parama/pabkit.git
cd pabkit
pip install -e .Here's a simple example of using PAB to track a model's learning trajectory:
from pab import ProcessAwareBenchmark, track_learning_curve
import torch
import torchvision
# Load a model and dataset
model = torchvision.models.resnet18(pretrained=False)
train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True,
transform=torchvision.transforms.ToTensor())
test_dataset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True,
transform=torchvision.transforms.ToTensor())
# Track the learning trajectory
pab = track_learning_curve(
model=model,
dataset=(train_dataset, test_dataset),
epochs=100,
batch_size=128
)
# Evaluate the trajectory
results = pab.evaluate_trajectory()
print(pab.summarize())PAB tracks how models evolve during training, capturing metrics like:
- Loss and accuracy curves
- Generalization gap over time
- Class-wise learning progression
- Feature representation shifts
Track how model robustness changes during training:
- Adversarial robustness over epochs
- Consistency under transformations
- Stability of decision boundaries
Visualize learning dynamics with built-in plotting functions:
- Learning curves
- Class progression
- Robustness evolution
- Generalization gap
Efficiently manage model checkpoints across training:
- Save checkpoints at regular intervals
- Load and compare checkpoints
- Prune checkpoints to save disk space
The main class for tracking and analyzing learning trajectories:
from pab import ProcessAwareBenchmark
pab = ProcessAwareBenchmark(
checkpoint_dir='./checkpoints',
save_frequency=5,
track_representations=True
)
# Track each epoch
for epoch in range(1, epochs+1):
train_loss, train_acc = train_epoch(model, train_loader)
val_loss, val_acc = validate(model, val_loader)
# Track metrics
pab.track_epoch(
model=model,
epoch=epoch,
train_loss=train_loss,
val_loss=val_loss,
train_acc=train_acc,
val_acc=val_acc
)
# Evaluate the learning trajectory
results = pab.evaluate_trajectory()Create insightful visualizations of learning trajectories:
from pab.visualization import plot_learning_trajectory, plot_class_progression
# Plot learning curves
fig = plot_learning_trajectory(
train_losses=pab.metrics['train_loss'],
val_losses=pab.metrics['val_loss'],
train_accs=pab.metrics['train_acc'],
val_accs=pab.metrics['val_acc'],
save_path='learning_curves.png'
)
# Plot class-wise learning progression
fig = plot_class_progression(
class_accuracies=pab.metrics['class_accuracy'],
save_path='class_progression.png'
)PAB provides a command-line interface for common tasks:
# Analyze checkpoints from a trained model
pab-cli analyze --checkpoint_dir ./checkpoints --output_dir ./results
# Compare multiple models
pab-cli compare --model_dirs ./checkpoints/model1 ./checkpoints/model2 --model_names ResNet50 EfficientNet
# Visualize metrics
pab-cli visualize --metrics_file ./results/pab_metrics.json --type learning_curve
# Generate a comprehensive report
pab-cli report --checkpoint_dir ./checkpoints --model_name ResNet50Explore the examples/ directory for detailed examples:
simple_example.py: Basic usage with CIFAR-10imagenet_case_study.py: ImageNet case study from the papermodel_comparison.py: Comparing multiple modelsrepresentation_analysis.py: Feature representation analysispab_tutorial.ipynb: Jupyter notebook tutorial
Refer to the docs/ directory for detailed documentation:
usage.md: Detailed usage instructionsmathematical_formalism.md: Mathematical foundations of PABapi_reference.md: API reference documentation
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.