Skip to content

Commit

Permalink
Built site for gh-pages
Browse files Browse the repository at this point in the history
  • Loading branch information
Quarto GHA Workflow Runner committed Oct 11, 2024
1 parent 57f5197 commit dd3dfb3
Show file tree
Hide file tree
Showing 23 changed files with 87 additions and 90 deletions.
2 changes: 1 addition & 1 deletion .nojekyll
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0a1c56ae
cb0e0483
46 changes: 23 additions & 23 deletions _tex/index.tex
Original file line number Diff line number Diff line change
Expand Up @@ -385,7 +385,7 @@ \subsection{Earth sciences}\label{earth-sciences}
acquired), they later became international standards of the OGC, which
now encompasses more than 450 commercial, governmental, nonprofit, and
research organizations working together on the development and
implementation of open standards \url{https://www.ogc.org}.
implementation of open standards (\url{https://www.ogc.org}).

\subsection{Neuroscience}\label{neuroscience}

Expand Down Expand Up @@ -423,16 +423,16 @@ \subsection{Community science}\label{community-science}

Another interesting use case for open-source standards is
community/citizen science. An early example of this approach is
OpenStreetMap \url{https://www.openstreetmap.org}, which allows users to
contribute to the project development with code and data and freely use
the maps and other related geospatial datasets. But this example is not
unique. Overall, this approach has grown in the last 20 years and has
been adopted in many different fields. It has many benefits for both the
research field that harnesses the energy of non-scientist members of the
community to engage with scientific data, as well as to the community
members themselves who can draw both knowledge and pride in their
participation in the scientific endeavor. It is also recognized that
unique broader benefits are accrued from this mode of scientific
OpenStreetMap (\url{https://www.openstreetmap.org}), which allows users
to contribute to the project development with code and data and freely
use the maps and other related geospatial datasets. But this example is
not unique. Overall, this approach has grown in the last 20 years and
has been adopted in many different fields. It has many benefits for both
the research field that harnesses the energy of non-scientist members of
the community to engage with scientific data, as well as to the
community members themselves who can draw both knowledge and pride in
their participation in the scientific endeavor. It is also recognized
that unique broader benefits are accrued from this mode of scientific
research, through the inclusion of perspectives and data that would not
otherwise be included. To make data accessible to community scientists,
and to make the data collected by community scientists accessible to
Expand Down Expand Up @@ -794,9 +794,9 @@ \subsubsection{Establish standards governance based on OSS best
challenges mentioned in Section~\ref{sec-challenges}, especially for
communities beyond a certain size that need to converge toward a new
standard or rely on an existing standard. Developers and maintainers
should review existing governance practices such as
\href{https://www.theopensourceway.org/the_open_source_way-guidebook-2.0.html\#_project_and_community_governance}{The
Open Source Way}.
should review existing governance practices such as those provided by
The Open Source
Way(\href{https://www.theopensourceway.org/the_open_source_way-guidebook-2.0.html\#_project_and_community_governance}{https://www.theopensourceway.org/}).

\subsubsection{Foster meta-standards
development}\label{foster-meta-standards-development}
Expand All @@ -822,11 +822,11 @@ \subsubsection{Foster meta-standards
More generally, meta-standards could include formalization for
versioning of standards and interactions with specific related software.
This includes amplifying formalization/guidelines on how to create
standards (for example, metadata schema specifications using LinkML
(https://linkml.io)). However, aspects of communication with potential
user audiences (e.g., researchers in particular domains) should be taken
into account as well. For example, in the quality of onboarding
documentation and tools for ingestion or conversion into
standards (for example, metadata schema specifications using LinkML,
\url{https://linkml.io}). However, aspects of communication with
potential user audiences (e.g., researchers in particular domains)
should be taken into account as well. For example, in the quality of
onboarding documentation and tools for ingestion or conversion into
standards-compliant datasets.

An ontology for the standards-development process -- for example
Expand All @@ -839,10 +839,10 @@ \subsubsection{Foster meta-standards
the dreaded proliferation of standards, or complexity-impeded adoption.
Surveying and documenting the success and failures of current standards
for a specific dataset / domain can help disseminate knowledge about the
standardization process. Resources such as
\href{https://fairsharing.org/}{Fairsharing} or
\href{https://www.dcc.ac.uk/guidance/standards}{Digital Curation Center}
can help guide this process.
standardization process. Resources such as Fairsharing (
\url{https://fairsharing.org/}) or the Digital Curation Center
(\url{https://www.dcc.ac.uk/guidance/standards}) can help guide this
process.

\subsubsection{Develop standards in tandem with standards-associated
software}\label{develop-standards-in-tandem-with-standards-associated-software}
Expand Down
Binary file modified index.docx
Binary file not shown.
10 changes: 5 additions & 5 deletions index.html

Large diffs are not rendered by default.

Binary file modified index.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion sections/01-introduction.embed.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
"\n",
"Data and metadata standards that use tools and practices of OSS (“open-source standards” henceforth) reap many of the benefits that the OSS model has provided in the development of other technologies. The present report explores how OSS processes and tools have affected the development of data and metadata standards. The report will survey common features of a variety of use cases; it will identify some of the challenges and pitfalls of this mode of standards development, with a particular focus on cross-sector interactions; and it will make recommendations for future developments and policies that can help this mode of standards development thrive and reach its full potential."
],
"id": "e0275793-734e-4309-8897-4ea536aa2964"
"id": "fec64ca6-5258-4a80-80c6-9b104d6cc937"
}
],
"nbformat": 4,
Expand Down
2 changes: 1 addition & 1 deletion sections/01-introduction.out.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
"\n",
"Wilkinson, Mark D, Michel Dumontier, I Jsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. 2016. “The FAIR Guiding Principles for Scientific Data Management and Stewardship.” *Sci Data* 3 (March): 160018."
],
"id": "f25987e3-95af-4ead-ac2c-76e89719159d"
"id": "a65e5159-08fe-4d2c-88e4-933ae33bd979"
}
],
"nbformat": 4,
Expand Down
4 changes: 2 additions & 2 deletions sections/02-use-cases-preview.html
Original file line number Diff line number Diff line change
Expand Up @@ -191,15 +191,15 @@ <h2 class="anchored" data-anchor-id="high-energy-physics-hep">High-energy physic
</section>
<section id="earth-sciences" class="level2">
<h2 class="anchored" data-anchor-id="earth-sciences">Earth sciences</h2>
<p>The need for geospatial data exchange between different systems began to be recognized in the 1970s and 1980s, but proprietary formats still dominated. Coordinated standardization efforts brought the Open Geospatial Consortium (OGC) establishment in the 1990s, a critical step towards open standards for geospatial data. The 1990s have also seen the development of key standards such as the Network Common Data Form (NetCDF) developed by the University Corporation for Atmospheric Research (UCAR), and the Hierarchical Data Format (HDF), a set of file formats (HDF4, HDF5) that are widely used, particularly in climate research. The GeoTIFF format, which originated at NASA in the late 1990s, is extensively used to share image data. The following two decades, the 2000s-2020s, brought an expansion of open standards and integration with web technologies developed by OGC, as well as other standards such as the Keyhole Markup Language (KML) for displaying geographic data in Earth browsers. Formats suitable for cloud computing also emerged, such as the Cloud Optimized GeoTIFF (COG), followed by Zarr and Apache Parquet for array and tabular data, respectively. In 2006, the Open Source Geospatial Foundation (OSGeo, <a href="https://www.osgeo.org">https://www.osgeo.org</a>) was established, demonstrating the community’s commitment to the development of open-source geospatial technologies. While some standards have been developed in the industry (e.g., Keyhole Markup Language (KML) by Keyhole Inc., which Google later acquired), they later became international standards of the OGC, which now encompasses more than 450 commercial, governmental, nonprofit, and research organizations working together on the development and implementation of open standards <a href="https://www.ogc.org">https://www.ogc.org</a>.</p>
<p>The need for geospatial data exchange between different systems began to be recognized in the 1970s and 1980s, but proprietary formats still dominated. Coordinated standardization efforts brought the Open Geospatial Consortium (OGC) establishment in the 1990s, a critical step towards open standards for geospatial data. The 1990s have also seen the development of key standards such as the Network Common Data Form (NetCDF) developed by the University Corporation for Atmospheric Research (UCAR), and the Hierarchical Data Format (HDF), a set of file formats (HDF4, HDF5) that are widely used, particularly in climate research. The GeoTIFF format, which originated at NASA in the late 1990s, is extensively used to share image data. The following two decades, the 2000s-2020s, brought an expansion of open standards and integration with web technologies developed by OGC, as well as other standards such as the Keyhole Markup Language (KML) for displaying geographic data in Earth browsers. Formats suitable for cloud computing also emerged, such as the Cloud Optimized GeoTIFF (COG), followed by Zarr and Apache Parquet for array and tabular data, respectively. In 2006, the Open Source Geospatial Foundation (OSGeo, <a href="https://www.osgeo.org">https://www.osgeo.org</a>) was established, demonstrating the community’s commitment to the development of open-source geospatial technologies. While some standards have been developed in the industry (e.g., Keyhole Markup Language (KML) by Keyhole Inc., which Google later acquired), they later became international standards of the OGC, which now encompasses more than 450 commercial, governmental, nonprofit, and research organizations working together on the development and implementation of open standards (<a href="https://www.ogc.org">https://www.ogc.org</a>).</p>
</section>
<section id="neuroscience" class="level2">
<h2 class="anchored" data-anchor-id="neuroscience">Neuroscience</h2>
<p>In contrast to the previously-mentioned fields, Neuroscience has traditionally been a “cottage industry”, where individual labs have generated experimental data designed to answer specific experimental questions. While this model still exists, the field has also seen the emergence of new modes of data production that focus on generating large shared datasets designed to answer many different questions, more akin to the data generated in large astronomy data collection efforts <span class="citation" data-cites="Koch2012-ve">(<a href="#ref-Koch2012-ve" role="doc-biblioref">Koch and Clay Reid 2012</a>)</span>. This change has been brought on through a combination of technical advances in data acquisition techniques, which now generate large and very high-dimensional/information-rich datasets, cultural changes, which have ushered in new norms of transparency and reproducibility, and funding initiatives that have encouraged this kind of data collection. However, because these changes are recent relative to the other cases mentioned above, standards for data and metadata in neuroscience have been prone to adopt many elements of modern OSS development. Two salient examples in neuroscience are the Neurodata Without Borders file format for neurophysiology data <span class="citation" data-cites="Rubel2022NWB">(<a href="#ref-Rubel2022NWB" role="doc-biblioref">Rübel et al. 2022</a>)</span> and the Brain Imaging Data Structure (BIDS) standard for neuroimaging data <span class="citation" data-cites="Gorgolewski2016BIDS">(<a href="#ref-Gorgolewski2016BIDS" role="doc-biblioref">Gorgolewski et al. 2016</a>)</span>. BIDS in particular owes some of its success to the adoption of OSS development mechanisms <span class="citation" data-cites="Poldrack2024BIDS">(<a href="#ref-Poldrack2024BIDS" role="doc-biblioref">Poldrack et al. 2024</a>)</span>. For example, small changes to the standard are managed through the GitHub pull request mechanism; larger changes are managed through a BIDS Enhancement Proposal (BEP) process that is directly inspired by the Python programming language community’s Python Enhancement Proposal procedure, which is used to introduce new ideas into the language. Though the BEP mechanism takes a slightly different technical approach, it tries to emulate the open-ended and community-driven aspects of Python development to accept contributions from a wide range of stakeholders and tap a broad base of expertise.</p>
</section>
<section id="community-science" class="level2">
<h2 class="anchored" data-anchor-id="community-science">Community science</h2>
<p>Another interesting use case for open-source standards is community/citizen science. An early example of this approach is OpenStreetMap <a href="https://www.openstreetmap.org">https://www.openstreetmap.org</a>, which allows users to contribute to the project development with code and data and freely use the maps and other related geospatial datasets. But this example is not unique. Overall, this approach has grown in the last 20 years and has been adopted in many different fields. It has many benefits for both the research field that harnesses the energy of non-scientist members of the community to engage with scientific data, as well as to the community members themselves who can draw both knowledge and pride in their participation in the scientific endeavor. It is also recognized that unique broader benefits are accrued from this mode of scientific research, through the inclusion of perspectives and data that would not otherwise be included. To make data accessible to community scientists, and to make the data collected by community scientists accessible to professional scientists, it needs to be provided in a manner that can be created and accessed without specialized instruments or specialized knowledge. Here, standards are needed to facilitate interactions between an in-group of expert researchers who generate and curate data and a broader set of out-group enthusiasts who would like to make meaningful contributions to the science. This creates a particularly stringent constraint on transparency and simplicity of standards. Creating these standards in a manner that addresses these unique constraints can benefit from OSS tools, with the caveat that some of these tools require additional expertise. For example, if the standard is developed using git/GitHub for versioning, this would require learning the complex and obscure technical aspects of these system that are far from easy to adopt, even for many professional scientists.</p>
<p>Another interesting use case for open-source standards is community/citizen science. An early example of this approach is OpenStreetMap (<a href="https://www.openstreetmap.org">https://www.openstreetmap.org</a>), which allows users to contribute to the project development with code and data and freely use the maps and other related geospatial datasets. But this example is not unique. Overall, this approach has grown in the last 20 years and has been adopted in many different fields. It has many benefits for both the research field that harnesses the energy of non-scientist members of the community to engage with scientific data, as well as to the community members themselves who can draw both knowledge and pride in their participation in the scientific endeavor. It is also recognized that unique broader benefits are accrued from this mode of scientific research, through the inclusion of perspectives and data that would not otherwise be included. To make data accessible to community scientists, and to make the data collected by community scientists accessible to professional scientists, it needs to be provided in a manner that can be created and accessed without specialized instruments or specialized knowledge. Here, standards are needed to facilitate interactions between an in-group of expert researchers who generate and curate data and a broader set of out-group enthusiasts who would like to make meaningful contributions to the science. This creates a particularly stringent constraint on transparency and simplicity of standards. Creating these standards in a manner that addresses these unique constraints can benefit from OSS tools, with the caveat that some of these tools require additional expertise. For example, if the standard is developed using git/GitHub for versioning, this would require learning the complex and obscure technical aspects of these system that are far from easy to adopt, even for many professional scientists.</p>
<div id="refs" class="references csl-bib-body hanging-indent" data-entry-spacing="0" role="list">
<div id="ref-Basaglia2023-dq" class="csl-entry" role="listitem">
Basaglia, T, M Bellis, J Blomer, J Boyd, C Bozzi, D Britzger, S Campana, et al. 2023. <span>“Data Preservation in High Energy Physics.”</span> <em>The European Physical Journal C</em> 83 (9): 795.
Expand Down
Loading

0 comments on commit dd3dfb3

Please sign in to comment.