Skip to content

mitdbg/scivar

Repository files navigation

Scientific Variable Extraction Dataset

Welcome to the repository for the Scientific Variable Extraction Dataset, as featured in our paper "A Dataset for Scientific Variable Extraction". This repository is structured to provide both the benchmark dataset and the corresponding evaluation results for various extraction approaches.

Benchmark Dataset

This dataset comprises paragraphs extracted from 20 research papers focused on pandemic studies. We engaged a domain expert to meticulously annotate the scientific variables present in these papers. Details about the list of papers and the annotation guidelines can be found in the benchmark folder.

Evaluation Results

The evaluation results for the different approaches tested—including conventional machine learning techniques, large language model-based solutions, and their integrations—are detailed in the evaluation folder. These results are formatted as JSON files for ease of use and integration into further analysis.

We hope this dataset serves as a valuable resource for researchers and practitioners working on the extraction of scientific variables and enhances the understanding and application of machine learning in the domain of scientific research.

Citation

If you use this dataset in your research, please cite it as follows:

@misc{liu2024variableextractionmodelrecovery,
      title={Variable Extraction for Model Recovery in Scientific Literature}, 
      author={Chunwei Liu and Enrique Noriega-Atala and Adarsh Pyarelal and Clayton T Morrison and Mike Cafarella},
      year={2024},
      eprint={2411.14569},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2411.14569}, 
}

Thank you for your interest in our work, and we look forward to seeing how it contributes to your research endeavors.

About

Scientific Variable Extraction Dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published