Skip to content

Latest commit

 

History

History
50 lines (31 loc) · 10.6 KB

README.md

File metadata and controls

50 lines (31 loc) · 10.6 KB

README for TEMNOS (Temnospondyl Evolution, Morphology, Nomenclature, and Other Stuff) database

About

This is the README documentation for the TEMNOS (Temnospondyl Evolution, Morphology, Nomenclature, and Other Stuff) database, which is maintained on GitHub and then published periodically through a Zenodo integration that will contain all of the files found here. The GitHub is used for tracking of issues and feature development, while the Zenodo deposit establishes a static, permanent record that can be cited and indexed. This database is intended to be a living database that is routinely updated and expanded to become a multi-use reference for various aspects of temnospondyl study and biology, including for reuse beyond the strict confines of scholarly articles. This README file blends the style of a conventional software/data README file with a descriptor paper style of description. A preprint that contains most of the general overview and additional details on the conceptual framework and motivations for TEMNOS is maintained and periodically updated on OSF.

DOI

  • Current version: 1.0.0 (initial release)
  • Publication date: 2024/09/22

Author information

Sharing/access restrictions:

  • License: The database is currently published under the CC0 license waiver, which is a public domain designation that imposes no conditions for access, reuse, or redistribution on the material.

Reuse recommendations

Accuracy

I have certainly tried my best to enter and format data with maximum accuracy, but like any other human output, TEMNOS is not assured to be entirely free of error (and by virtue of the volume of data, almost certainly contains some minor errors beyond areas where there is intrinsic disagreement; e.g., taxonomic frameworks). Reusers, particularly those who intend to reuse it for scholarly purposes where accuracy is critical (e.g., journal publications) are responsible for how it is incorporated and interpreted; I am not responsible for any issues that potentially alter the outcomes of analyses. Reusers should also take care to consider the various limitations and caveats to these datasets and their original sources (see associated preprint for more concrete discussion) in order to avoid issues of conceptual mismatch and misrepresentation.

Citation

Citation is an ethically normalized scholarly practice and should not be confused with “attribution” (e.g., the condition for reuse that is prescribed by a Creative Commons CC BY license), which is a legal requirement. Users should give credit if they use TEMNOS in the same way that everyone should cite their sources. My preference, if you use any part of TEMNOS, is that you cite all of the original sources for entries/occurrences/data points that you used. Citing the OSF preprint and/or the Zenodo deposit is appreciated, but in order to properly credit the primary data generators, the actual primary data sources should (also) be cited. If you benefited from the aggregation of data or its construction (e.g., standardization, providing PIDs for articles and institutions wherever possible), that would be one reason to cite TEMNOS itself. The Zenodo deposit will have a citation generated for it on the landing page that you can copy/edit for a publication. Under no circumstances should you "cite" the GitHub repository since it has no persistent identifier (PID). Blanket statements (e.g., “all of the sources listed in TEMNOS”) should be avoided.

File overview:

The filenaming convention is as follows: YYYY-MM-DD_[C]*_[filename] where [C] (for 'category') refers to the categorization scheme of the file. Right now, there are 'M' (metadata) files like the institutional abbreviations key and 'D' (data) files that contain the raw data. Additional categories may be developed later. Enumeration of the files will remain consistent throughout. Additional details on each data file are provided as separate README files.

  1. D1_temnospondyl_species_listing.csv: this file records all presently valid or potentially valid (disputed) body fossil species with their taxonomic classification and taxonomic authority, as well as all named species presently considered to be a nomen dubium, a nomen nudum, or a nomen vanum. Junior synonyms and outdated combinations are intended to be added in the future.
  2. D2_temnosopndyl_specimen_listing.csv: this file records the majority of temnospondyl body fossil specimens recorded by specific specimen number (or otherwise distinct enough to be identifiable, such as a holotype without a catalogue number) that have been published in the literature. The present iteration has surveyed all major clades except the specimen-rich Branchiosauridae, Micromelerpetidae, and non-stereospondyl Stereospondylomorpha. The North American metoposaurids are also incomplete, as is surveying of indeterminate higher-level occurrences (e.g., Temnospondyli indet.). In many instances, there are undoubtedly at least a handful of specimens recorded in the literature but not presently in this dataset due to a lack of access to that material.
  3. D3_temnospondyl_skull_measurements.csv: this file records skull length measurements, inclusive of estimates, that have been published for temnospondyls in the literature (same criteria as with file D2). It records similar information about the taxonomic identification and geographic occurrence. Entries for species where no measurement/estimate has been published are included for completeness and are indicated with the reason for a lack of data.
  4. D4_temnospondyl_histology_since_2000.csv: this file records metadata for all reported instances of histological sampling conducted on temnospondyls since the year 2000 (inclusive). It includes information about the skeletal region(s), taxonomic identification, and basic bibliometric metadata about the associated publication.
  5. M1_data_dictionary.csv: this file provides a description of each column in each tabular data file.
  6. M2_institution_key.csv: this file provides a list of utilized institutional abbreviations along with notes on previously used 'synonyms' of a single institution.
  7. M3_temnos_reference.txt: plain-text file containing the full bibliographic references for the entire database. References only cited in the preprint are not included here.

File format

The tabular data files are provided in a maximally FAIR tabular format: CSV. They are intentionally not provided in Excel because Excel is a proprietary software with less long-term stability and various quirks that can be detriment or outright damaging to data reuse (e.g., its habit of auto-converting certain formats to dates; this would mess up specimen numbers for the Bernard Price Institute, for example), and I do not want the data to be redistributed in Excel format for these reasons and others (e.g., not everyone has access to MS Office). If you need to open a CSV file in Excel, the preferred method is to use Power Query to pull the file in without these issues (disabling automatic date conversion in Excel does not seem to prevent this from actually happening because that only applies to continuous numbers and letters, not strings with punctuation like "/"). Importing via Power Query will have the added advantage of automatically adding drop-down filters and alternating row coloration for visual differentiation. TEMNOS data can be redistributed in other open tabular formats like .tsv or .txt. A standalone list of references cited in the preprint for the entire database is not provided in Word for the same reason and is instead provided as a .txt file.

Versioning and long-term preservation

This database is intended to be a living object that is frequently edited, potentially dozens to hundreds of times per year, ranging from small-scale changes like fixing a single typo to large-scale changes like releasing a new dataset. GitHub does not have mechanisms for ensuring long-term preservation of digital objects (which can be deleted at any time) like issuing a persistent identifier. Therefore, the GitHub repo is linked to Zenodo such that each time a new release is published, a new permanent static version of the dataset will be automatically published on Zenodo with a new DOI. If you are not familiar with how this process works, you can read more here. The descriptor preprint will be similarly updated only periodically for major changes. Each release will be accompanied by a change log that documents all of the changes made since the last release. Release naming will follow semantic versioning, although it is adapted for code since the original framework is meant for code. The normal scheme is MAJOR.MINOR.PATCH, which is adapted here to: 'MAJOR' includes fixing systemic issues or releasing completely new resources, 'MINOR' includes fixing minor issues (e.g., typographic errors), and 'PATCH' includes emergency fixes. In general, there should not be many patches (it would have to be me releasing a new version where something is drastically wrong).

The current plan for when to publish an a new release is to develop some kind of regular schedule, probably every 3-4 months, regardless of whether MAJOR changes have been made. Edits will be accrued to the datasets over that time, all recorded in a change log, with the preprint being updated as necessary (e.g., to add references for new literature sources). Necessary edits will be actioned as soon as possible, but these will not be immediately reflected in the versions in the GitHub repo (via commits) or the Zenodo repo (via new releases) unless they are patches to critical issues (which does not include, among other things, typographic errors). The reason for this is to avoid potentially conflicting versions; if I am frequently making little edits and updating files in the Git repo without publishing new releases, the Zenodo version will not be equivalent for potentially several months unless I keep releasing new releases, which would likely create an exorbitant number of DOIs and needless confusion. Minor changes to supporting files like this README or CONTRIBUTING.md (e.g., adding the link to the OSF preprint after it is released) will be updated in real-time via commits.