Add datasets for a benchmark newly introduced for "Engineering" domain #1911

mehrzadshm · 2025-01-30T21:29:56Z

The datasets are associated with this research work (under review): Benchmarking pre-trained text embedding models in aligning built asset information

The initial results are included in embeddings-benchmark/results; related PR: #110

The proposed benchmark introduces 4 tasks, under three main types: clustering, retrieval, and reranking.

HuggingFace links to datasets :

Adding datasets checklist

Reason for dataset addition: This dataset points to a new domain, i.e., "Engineering" (more specifically related to architecture, construction, and built asset management)

I have run the following models on the task (adding the results to the pr). These can be run using the mteb -m {model_name} -t {task_name} command.
- sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
- intfloat/multilingual-e5-small
I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
I have filled out the metadata object in the dataset file (find documentation on it here).
Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

Looking forward to your feedback!

* Added "Engineering" as new domain to TaskMetadata.py * Updated tasks table in docs * Updated task metadata for BuiltBenchClustering S2S and P2P

Samoed

It seems you want to add missing tasks to benchmark. You can follow adding_a_benchmark doc to fully integrate it

mehrzadshm · 2025-01-31T02:03:18Z

Makes perfect sense; I'll be adding other tasks and putting it all together as a new benchmark

isaac-chung · 2025-02-03T04:35:39Z

Just converted this to draft. Feel free to mark it as ready when it's ready for review :)

mehrzadshm · 2025-02-03T14:00:33Z

Awesome! will do it shortly; tnx for your time!

- Add BuiltBenchRetrieval task - Add BuiltBenchReranking task - Update metadata for BuiltBenchClusterinP2P - Update metadata for BuiltBenchClusterinS2S

* add initial results for proposed tasks * update paths.json

mehrzadshm · 2025-02-06T13:50:28Z

Just followed the advice and integrated all changes, including new datasets and an associated new benchmark class in mteb, plus adding results in embeddings-benchmark/results (related PR: embeddings-benchmark/results#110)

Just marking the PR ready for review :) Looking forward to your feedback.

Thanks in advance for your time!

mteb/benchmarks/benchmarks.py

mteb/tasks/Clustering/eng/BuiltBenchClusteringS2S.py

mteb/tasks/Clustering/eng/BuiltBenchClusteringP2P.py

Co-authored-by: Roman Solomatin <[email protected]>

…110) * Add BuiltBench results (related mteb PR: embeddings-benchmark/mteb#1911) * add initial results for proposed tasks * update paths.json * Update model_meta files modified in BuiltBench PR: #110 * rollback paths.json (see PR: #110)

isaac-chung

Thanks for adding this, good work!

isaac-chung · 2025-02-08T02:35:16Z

mteb/benchmarks/benchmarks.py

+            "BuiltBenchReranking",
+        ],
+    ),
+    description="\"Built-Bench\" is an ongoing effort aimed at evaluating text embedding models in the context of buit asset management, spanning over various dicsiplines such as architeture, engineering, constrcution, and operations management of the built environment.",


Tiny typo

Suggested change

description="\"Built-Bench\" is an ongoing effort aimed at evaluating text embedding models in the context of buit asset management, spanning over various dicsiplines such as architeture, engineering, constrcution, and operations management of the built environment.",

description="\"Built-Bench\" is an ongoing effort aimed at evaluating text embedding models in the context of built asset management, spanning over various dicsiplines such as architeture, engineering, constrcution, and operations management of the built environment.",

Sharp eyes!

mehrzadshm added 4 commits January 30, 2025 13:06

adding clustering tasks (built-bench-clustering S2S & P2P)

0a70913

Merge remote-tracking branch 'upstream/main'

1486386

updated built-bench-clustering tasks

e378c71

Updated BuiltBenchClustering tasks

34f2e86

* Added "Engineering" as new domain to TaskMetadata.py * Updated tasks table in docs * Updated task metadata for BuiltBenchClustering S2S and P2P

Samoed reviewed Jan 30, 2025

View reviewed changes

resolved merge conflicts

f8be95f

mehrzadshm force-pushed the main branch from d83f41f to f8be95f Compare February 1, 2025 22:50

isaac-chung marked this pull request as draft February 3, 2025 04:35

mehrzadshm added 4 commits February 3, 2025 09:03

Merge remote-tracking branch 'upstream/main'

d3ec031

updated metadata for clustering tasks

b8ff15e

Merge remote-tracking branch 'upstream/main'

68d5316

Add/update BuiltBench tasks

b299df5

- Add BuiltBenchRetrieval task - Add BuiltBenchReranking task - Update metadata for BuiltBenchClusterinP2P - Update metadata for BuiltBenchClusterinS2S

mehrzadshm changed the title ~~Add new clustering tasks for a new domain (Engineering)~~ Add datasets for a benchmark newly introduced for "Engineering" domain Feb 4, 2025

mehrzadshm added a commit to mehrzadshm/results that referenced this pull request Feb 6, 2025

Add BuiltBench results (related mteb PR: embeddings-benchmark/mteb#1911)

5c8140b

* add initial results for proposed tasks * update paths.json

mehrzadshm mentioned this pull request Feb 6, 2025

Add BuiltBench results (a benchmark proposed for engineering domain) embeddings-benchmark/results#110

Merged

2 tasks

mehrzadshm added 2 commits February 6, 2025 08:39

update BuiltBench benchmark

e8c7e10

Merge remote-tracking branch 'upstream/main'

3abb5d4

mehrzadshm requested a review from Samoed February 6, 2025 13:50

mehrzadshm marked this pull request as ready for review February 6, 2025 13:53

Samoed approved these changes Feb 6, 2025

View reviewed changes

mteb/benchmarks/benchmarks.py Show resolved Hide resolved

mteb/tasks/Clustering/eng/BuiltBenchClusteringS2S.py Outdated Show resolved Hide resolved

mteb/tasks/Clustering/eng/BuiltBenchClusteringP2P.py Outdated Show resolved Hide resolved

Samoed requested review from x-tabdeveloping, KennethEnevoldsen and isaac-chung February 6, 2025 13:58

mehrzadshm and others added 3 commits February 6, 2025 09:01

Update mteb/benchmarks/benchmarks.py

0a777b3

Co-authored-by: Roman Solomatin <[email protected]>

Update mteb/tasks/Clustering/eng/BuiltBenchClusteringS2S.py

f236c04

Co-authored-by: Roman Solomatin <[email protected]>

Update mteb/tasks/Clustering/eng/BuiltBenchClusteringP2P.py

681d6f3

Co-authored-by: Roman Solomatin <[email protected]>

isaac-chung approved these changes Feb 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add datasets for a benchmark newly introduced for "Engineering" domain #1911

Add datasets for a benchmark newly introduced for "Engineering" domain #1911

mehrzadshm commented Jan 30, 2025 •

edited

Loading

Samoed left a comment

mehrzadshm commented Jan 31, 2025

isaac-chung commented Feb 3, 2025

mehrzadshm commented Feb 3, 2025

mehrzadshm commented Feb 6, 2025 •

edited

Loading

isaac-chung left a comment

isaac-chung Feb 8, 2025

mehrzadshm Feb 8, 2025

	description="\"Built-Bench\" is an ongoing effort aimed at evaluating text embedding models in the context of buit asset management, spanning over various dicsiplines such as architeture, engineering, constrcution, and operations management of the built environment.",
	description="\"Built-Bench\" is an ongoing effort aimed at evaluating text embedding models in the context of built asset management, spanning over various dicsiplines such as architeture, engineering, constrcution, and operations management of the built environment.",

Add datasets for a benchmark newly introduced for "Engineering" domain #1911

Are you sure you want to change the base?

Add datasets for a benchmark newly introduced for "Engineering" domain #1911

Conversation

mehrzadshm commented Jan 30, 2025 • edited Loading

Adding datasets checklist

Samoed left a comment

Choose a reason for hiding this comment

mehrzadshm commented Jan 31, 2025

isaac-chung commented Feb 3, 2025

mehrzadshm commented Feb 3, 2025

mehrzadshm commented Feb 6, 2025 • edited Loading

isaac-chung left a comment

Choose a reason for hiding this comment

isaac-chung Feb 8, 2025

Choose a reason for hiding this comment

mehrzadshm Feb 8, 2025

Choose a reason for hiding this comment

mehrzadshm commented Jan 30, 2025 •

edited

Loading

mehrzadshm commented Feb 6, 2025 •

edited

Loading