Skip to content

v1.0.0 - Initial Release

Latest
Compare
Choose a tag to compare
@b-turan b-turan released this 27 Feb 09:56
acd70c7

The repository provides the codebase for the Merlin-Arthur Classifiers, a novel multi-agent framework designed to enhance interpretability in machine learning models. Inspired by the Merlin-Arthur protocol from interactive proof systems, this project introduces a method to ensure interpretability guarantees, as detailed in our AISTATS 2024 paper, Interpretability Guarantees with Merlin-Arthur Classifiers. The approach is tested on the MNIST and UCI Census datasets, employing a verifier (Arthur) and two provers (Merlin and Morgana) in a setup that mimics a min-max game to refine classification outcomes.

Our objective is to contribute to the development of interpretable AI systems, providing a toolkit for researchers and practitioners to replicate our experiments, engage with our methodology, and extend it to new contexts. The repository includes comprehensive guidance on setup, usage, and customization for various datasets and training modes.

Getting Started involves cloning the repository, setting up the Conda environment with the necessary dependencies, and initializing wandb for experiment tracking.

Basic Usage outlines steps for regular and Merlin-Arthur training on supported datasets, with examples for different configurations and advanced features. Regular training examples for MNIST and UCI Census datasets demonstrate how to customize training parameters, while Merlin-Arthur training provides a template for engaging in the strategic min-max game that characterizes our interpretability-enhancing methodology.

Advanced Features detail customization options for loss functions, optimization techniques, and regularization, enabling researchers to fine-tune the training process according to their specific needs.

This repository is intended as a collaborative platform for advancing interpretability in AI, and we welcome contributions, feedback, and partnerships from the broader community.