Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Include instructions on how to access the datasets used for benchmarks #52

Closed
mgrover1 opened this issue Mar 20, 2024 · 3 comments · Fixed by #54
Closed

DOC: Include instructions on how to access the datasets used for benchmarks #52

mgrover1 opened this issue Mar 20, 2024 · 3 comments · Fixed by #54
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@mgrover1
Copy link

mgrover1 commented Mar 20, 2024

To ensure reproducibility, instructions should be added on how to access datasets exclusively on LLNL resources, as with this example

https://github.com/xCDAT/xcdat-validation/blob/main/scripts/performance-benchmarks/perf_benchmark.py

This prevents full reproducibility of the results

Related to openjournals/joss-reviews#6426

@tomvothecoder
Copy link
Collaborator

Hey @pochedls, are all of these datasets available on ESGF? If they are, I'll check if Globus links are available to make it easier to download the larger datasets.

Also are the XML files publicly available somewhere so that non-LLNL users can open multi-file datasets with cdms2? Otherwise, we need to figure out another way to open multi-file datasets using cdms2.

@tomvothecoder tomvothecoder self-assigned this Mar 21, 2024
@tomvothecoder tomvothecoder added the documentation Improvements or additions to documentation label Mar 21, 2024
@tomvothecoder tomvothecoder moved this from Todo to In Progress in xCDAT Development Mar 21, 2024
@pochedls
Copy link
Collaborator

Hey @pochedls, are all of these datasets available on ESGF? If they are, I'll check if Globus links are available to make it easier to download the larger datasets.

Also are the XML files publicly available somewhere so that non-LLNL users can open multi-file datasets with cdms2? Otherwise, we need to figure out another way to open multi-file datasets using cdms2.

These datasets are all on ESGF.

XML files are usually created locally (to map to your local files on disk). Once the data is in place, you can create xml files with cdscan -x file.xml /path/to/dataset/ (pretty sure that is the syntax).

@tomvothecoder
Copy link
Collaborator

Thanks @pochedls. I'll need to update the instructions for the script so that non-LLNL users can reproduce the XMLs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants