Scientific Variable Extraction Dataset

Welcome to the repository for the Scientific Variable Extraction Dataset, as featured in our paper "A Dataset for Scientific Variable Extraction". This repository is structured to provide both the benchmark dataset and the corresponding evaluation results for various extraction approaches.

Benchmark Dataset

This dataset comprises paragraphs extracted from 20 research papers focused on pandemic studies. We engaged a domain expert to meticulously annotate the scientific variables present in these papers. Details about the list of papers and the annotation guidelines can be found in the benchmark folder.

Evaluation Results

The evaluation results for the different approaches tested—including conventional machine learning techniques, large language model-based solutions, and their integrations—are detailed in the evaluation folder. These results are formatted as JSON files for ease of use and integration into further analysis.

We hope this dataset serves as a valuable resource for researchers and practitioners working on the extraction of scientific variables and enhances the understanding and application of machine learning in the domain of scientific research.

Citation

If you use this dataset in your research, please cite it as follows:

Anonymized

Thank you for your interest in our work, and we look forward to seeing how it contributes to your research endeavors.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
benchmark		benchmark
eval		eval
.DS_Store		.DS_Store
README.md		README.md
candidates.json		candidates.json
candidates_3shot.json		candidates_3shot.json
candidates_id-pz.json		candidates_id-pz.json
candidates_id.json		candidates_id.json
candidates_w_bertscore.json		candidates_w_bertscore.json
results_bertscore.csv		results_bertscore.csv
results_token.csv		results_token.csv
results_token_3shot.csv		results_token_3shot.csv
token_analysis.ipynb		token_analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scientific Variable Extraction Dataset

Benchmark Dataset

Evaluation Results

Citation

About

Releases

Packages

Contributors 2

Languages

mitdbg/scivar

Folders and files

Latest commit

History

Repository files navigation

Scientific Variable Extraction Dataset

Benchmark Dataset

Evaluation Results

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages