Some data sets may be incorrectly labeled #14

LinMu7177 · 2025-01-17T03:57:26Z

First of all, thank you for your open-source work, it is very meaningful!

When I was trying to use your evaluation tool, I found that some datasets might have issues. For example, the dmlab dataset, whose annotations are divided into 6 categories: ['nearby apple/melon', 'far apple/melon', 'very far apple/melon', 'nearby lemon', 'far lemon', 'very far lemon'].
This doesn't seem to match the content in the images.

Similarly, for the dspr_orientation dataset, the angles are divided from 0 to 39, and the template is 'an object rotated at {c}', but this combination also seems unreasonable.

haideraltahan · 2025-02-25T15:51:00Z

Thank you so much @LinMu7177 for the kind words! We appreciate you taking the time going through and utilizing UniBench!

For dmlab, dspr_orientation, and several other benchmarks, we opted to adopt the implementation from the OpenCLIP benchmark repository as a foundation for our approach. For instance, here dmlab for OpenCLIP.

During our implementation of UniBench we also made note of this limitation! Hence, we made sure to allow users to change both class names and templates when evaluating, as the following (example below is for FashionMNIST):

from functools import partial
from unibench import Evaluator
from unibench.benchmarks_zoo import ZeroShotBenchmarkHandler
from torchvision.datasets import FashionMNIST

class_names = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

templates = ["an image of {}"]

benchmark = partial(
    FashionMNIST, root="/fsx-robust/haideraltahan", train=False, download=True
)
handler = partial(
    ZeroShotBenchmarkHandler,
    benchmark_name="fashion_mnist_new",
    classes=class_names,
    templates=templates,
)


eval = Evaluator()

eval.add_benchmark(
    benchmark,
    handler,
    meta_data={
        "benchmark_type": "object recognition",
    },
)

As far as we know, the question of which prompts are best suited for these benchmarks has not yet been thoroughly explored. If you’re interested in leading efforts in this area, we’d be more than happy to support and highlight your efforts on the Repo!

haideraltahan self-assigned this Feb 26, 2025

haideraltahan added the Benchmark label Feb 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some data sets may be incorrectly labeled #14

Some data sets may be incorrectly labeled #14

LinMu7177 commented Jan 17, 2025 •

edited

Loading

haideraltahan commented Feb 25, 2025 •

edited

Loading

Some data sets may be incorrectly labeled #14

Some data sets may be incorrectly labeled #14

Comments

LinMu7177 commented Jan 17, 2025 • edited Loading

haideraltahan commented Feb 25, 2025 • edited Loading

LinMu7177 commented Jan 17, 2025 •

edited

Loading

haideraltahan commented Feb 25, 2025 •

edited

Loading