This repository contains the implementation of the Quality-Weighted Vendi Score (qVS), a diversity metric that accounts for the quality of individual items built on top of the previously proposed Vendi Score.
The input of the metric is a collection of samples, a pairwise similarity function, and a score function.
The output is a number, which can be interpreted as the effective quality sum of the samples in the collection.
Specifically, given a positive semi-definite matrix
The Quality-Weighted Vendi Score gives rise to a search policy that successfully finds diverse sets of high-quality items, which are the target of our search
For more information, please see our paper, Quality-Weighted Vendi Scores For Diverse Experimental Design, published at the International Conference on Machine Learning (ICML) 2024.
The input to q_vendi.score
is a list of samples, a similarity function k
, and a score function s
.
k
should be symmetric and k(x, x) = 1
.
>>> import numpy as np
>>> from q_vendi import *
>>> samples = [0, 0, 2, 2, 4, 4]
>>> k = lambda a, b: np.exp(-np.abs(a - b))
>>> s = lambda a: np.exp(-np.abs(a - 2))
>>> score(samples[:3], k, s)
0.793705659274703
You can find the subset that maximizes the qVS:
>>> selected_samples, qVS = sequential_maximize_score(samples, k, s, 3)
>>> selected_samples
[2, 0, 2]
>>> qVS
1.2551553872451062
An example in 2d is included in a Jupyter notebook in the examples/
folder.
The qVS is used for experimental design and active learning tasks, specifically active search and Bayesian optimization aiming at making diverse discoveries.
See the respective subdirectories diverse_search
and diverse_bayesopt
for more details.
@inproceedings{nguyen2024quality,
title={{Quality-Weighted Vendi Scores And Their Application To Diverse Experimental Design}},
author={Nguyen, Quan and Dieng, Adji Bousso},
booktitle={Proceedings of the 41st International Conference on Machine Learning},
year={2024},
}