Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some data sets may be incorrectly labeled #14

Open
LinMu7177 opened this issue Jan 17, 2025 · 1 comment
Open

Some data sets may be incorrectly labeled #14

LinMu7177 opened this issue Jan 17, 2025 · 1 comment
Assignees

Comments

@LinMu7177
Copy link

LinMu7177 commented Jan 17, 2025

First of all, thank you for your open-source work, it is very meaningful!

When I was trying to use your evaluation tool, I found that some datasets might have issues. For example, the dmlab dataset, whose annotations are divided into 6 categories: ['nearby apple/melon', 'far apple/melon', 'very far apple/melon', 'nearby lemon', 'far lemon', 'very far lemon'].
This doesn't seem to match the content in the images.

Image

Similarly, for the dspr_orientation dataset, the angles are divided from 0 to 39, and the template is 'an object rotated at {c}', but this combination also seems unreasonable.
Image

@haideraltahan
Copy link
Collaborator

haideraltahan commented Feb 25, 2025

Thank you so much @LinMu7177 for the kind words! We appreciate you taking the time going through and utilizing UniBench!

For dmlab, dspr_orientation, and several other benchmarks, we opted to adopt the implementation from the OpenCLIP benchmark repository as a foundation for our approach. For instance, here dmlab for OpenCLIP.

During our implementation of UniBench we also made note of this limitation! Hence, we made sure to allow users to change both class names and templates when evaluating, as the following (example below is for FashionMNIST):

from functools import partial
from unibench import Evaluator
from unibench.benchmarks_zoo import ZeroShotBenchmarkHandler
from torchvision.datasets import FashionMNIST

class_names = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]

templates = ["an image of {}"]

benchmark = partial(
    FashionMNIST, root="/fsx-robust/haideraltahan", train=False, download=True
)
handler = partial(
    ZeroShotBenchmarkHandler,
    benchmark_name="fashion_mnist_new",
    classes=class_names,
    templates=templates,
)


eval = Evaluator()

eval.add_benchmark(
    benchmark,
    handler,
    meta_data={
        "benchmark_type": "object recognition",
    },
)

As far as we know, the question of which prompts are best suited for these benchmarks has not yet been thoroughly explored. If you’re interested in leading efforts in this area, we’d be more than happy to support and highlight your efforts on the Repo!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants