diff --git a/.nojekyll b/.nojekyll index e957ab6..dda63af 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -0a1c56ae \ No newline at end of file +cb0e0483 \ No newline at end of file diff --git a/_tex/index.tex b/_tex/index.tex index 7fd7e00..d1a6f1f 100644 --- a/_tex/index.tex +++ b/_tex/index.tex @@ -385,7 +385,7 @@ \subsection{Earth sciences}\label{earth-sciences} acquired), they later became international standards of the OGC, which now encompasses more than 450 commercial, governmental, nonprofit, and research organizations working together on the development and -implementation of open standards \url{https://www.ogc.org}. +implementation of open standards (\url{https://www.ogc.org}). \subsection{Neuroscience}\label{neuroscience} @@ -423,16 +423,16 @@ \subsection{Community science}\label{community-science} Another interesting use case for open-source standards is community/citizen science. An early example of this approach is -OpenStreetMap \url{https://www.openstreetmap.org}, which allows users to -contribute to the project development with code and data and freely use -the maps and other related geospatial datasets. But this example is not -unique. Overall, this approach has grown in the last 20 years and has -been adopted in many different fields. It has many benefits for both the -research field that harnesses the energy of non-scientist members of the -community to engage with scientific data, as well as to the community -members themselves who can draw both knowledge and pride in their -participation in the scientific endeavor. It is also recognized that -unique broader benefits are accrued from this mode of scientific +OpenStreetMap (\url{https://www.openstreetmap.org}), which allows users +to contribute to the project development with code and data and freely +use the maps and other related geospatial datasets. But this example is +not unique. Overall, this approach has grown in the last 20 years and +has been adopted in many different fields. It has many benefits for both +the research field that harnesses the energy of non-scientist members of +the community to engage with scientific data, as well as to the +community members themselves who can draw both knowledge and pride in +their participation in the scientific endeavor. It is also recognized +that unique broader benefits are accrued from this mode of scientific research, through the inclusion of perspectives and data that would not otherwise be included. To make data accessible to community scientists, and to make the data collected by community scientists accessible to @@ -794,9 +794,9 @@ \subsubsection{Establish standards governance based on OSS best challenges mentioned in Section~\ref{sec-challenges}, especially for communities beyond a certain size that need to converge toward a new standard or rely on an existing standard. Developers and maintainers -should review existing governance practices such as -\href{https://www.theopensourceway.org/the_open_source_way-guidebook-2.0.html\#_project_and_community_governance}{The -Open Source Way}. +should review existing governance practices such as those provided by +The Open Source +Way(\href{https://www.theopensourceway.org/the_open_source_way-guidebook-2.0.html\#_project_and_community_governance}{https://www.theopensourceway.org/}). \subsubsection{Foster meta-standards development}\label{foster-meta-standards-development} @@ -822,11 +822,11 @@ \subsubsection{Foster meta-standards More generally, meta-standards could include formalization for versioning of standards and interactions with specific related software. This includes amplifying formalization/guidelines on how to create -standards (for example, metadata schema specifications using LinkML -(https://linkml.io)). However, aspects of communication with potential -user audiences (e.g., researchers in particular domains) should be taken -into account as well. For example, in the quality of onboarding -documentation and tools for ingestion or conversion into +standards (for example, metadata schema specifications using LinkML, +\url{https://linkml.io}). However, aspects of communication with +potential user audiences (e.g., researchers in particular domains) +should be taken into account as well. For example, in the quality of +onboarding documentation and tools for ingestion or conversion into standards-compliant datasets. An ontology for the standards-development process -- for example @@ -839,10 +839,10 @@ \subsubsection{Foster meta-standards the dreaded proliferation of standards, or complexity-impeded adoption. Surveying and documenting the success and failures of current standards for a specific dataset / domain can help disseminate knowledge about the -standardization process. Resources such as -\href{https://fairsharing.org/}{Fairsharing} or -\href{https://www.dcc.ac.uk/guidance/standards}{Digital Curation Center} -can help guide this process. +standardization process. Resources such as Fairsharing ( +\url{https://fairsharing.org/}) or the Digital Curation Center +(\url{https://www.dcc.ac.uk/guidance/standards}) can help guide this +process. \subsubsection{Develop standards in tandem with standards-associated software}\label{develop-standards-in-tandem-with-standards-associated-software} diff --git a/index.docx b/index.docx index 850f203..af68975 100644 Binary files a/index.docx and b/index.docx differ diff --git a/index.html b/index.html index 6d0aec9..25fd846 100644 --- a/index.html +++ b/index.html @@ -304,7 +304,7 @@

2.3 Earth sciences

-

The need for geospatial data exchange between different systems began to be recognized in the 1970s and 1980s, but proprietary formats still dominated. Coordinated standardization efforts brought the Open Geospatial Consortium (OGC) establishment in the 1990s, a critical step towards open standards for geospatial data. The 1990s have also seen the development of key standards such as the Network Common Data Form (NetCDF) developed by the University Corporation for Atmospheric Research (UCAR), and the Hierarchical Data Format (HDF), a set of file formats (HDF4, HDF5) that are widely used, particularly in climate research. The GeoTIFF format, which originated at NASA in the late 1990s, is extensively used to share image data. The following two decades, the 2000s-2020s, brought an expansion of open standards and integration with web technologies developed by OGC, as well as other standards such as the Keyhole Markup Language (KML) for displaying geographic data in Earth browsers. Formats suitable for cloud computing also emerged, such as the Cloud Optimized GeoTIFF (COG), followed by Zarr and Apache Parquet for array and tabular data, respectively. In 2006, the Open Source Geospatial Foundation (OSGeo, https://www.osgeo.org) was established, demonstrating the community’s commitment to the development of open-source geospatial technologies. While some standards have been developed in the industry (e.g., Keyhole Markup Language (KML) by Keyhole Inc., which Google later acquired), they later became international standards of the OGC, which now encompasses more than 450 commercial, governmental, nonprofit, and research organizations working together on the development and implementation of open standards https://www.ogc.org.

+

The need for geospatial data exchange between different systems began to be recognized in the 1970s and 1980s, but proprietary formats still dominated. Coordinated standardization efforts brought the Open Geospatial Consortium (OGC) establishment in the 1990s, a critical step towards open standards for geospatial data. The 1990s have also seen the development of key standards such as the Network Common Data Form (NetCDF) developed by the University Corporation for Atmospheric Research (UCAR), and the Hierarchical Data Format (HDF), a set of file formats (HDF4, HDF5) that are widely used, particularly in climate research. The GeoTIFF format, which originated at NASA in the late 1990s, is extensively used to share image data. The following two decades, the 2000s-2020s, brought an expansion of open standards and integration with web technologies developed by OGC, as well as other standards such as the Keyhole Markup Language (KML) for displaying geographic data in Earth browsers. Formats suitable for cloud computing also emerged, such as the Cloud Optimized GeoTIFF (COG), followed by Zarr and Apache Parquet for array and tabular data, respectively. In 2006, the Open Source Geospatial Foundation (OSGeo, https://www.osgeo.org) was established, demonstrating the community’s commitment to the development of open-source geospatial technologies. While some standards have been developed in the industry (e.g., Keyhole Markup Language (KML) by Keyhole Inc., which Google later acquired), they later became international standards of the OGC, which now encompasses more than 450 commercial, governmental, nonprofit, and research organizations working together on the development and implementation of open standards (https://www.ogc.org).

2.4 Neuroscience

@@ -312,7 +312,7 @@

2.5 Community science

-

Another interesting use case for open-source standards is community/citizen science. An early example of this approach is OpenStreetMap https://www.openstreetmap.org, which allows users to contribute to the project development with code and data and freely use the maps and other related geospatial datasets. But this example is not unique. Overall, this approach has grown in the last 20 years and has been adopted in many different fields. It has many benefits for both the research field that harnesses the energy of non-scientist members of the community to engage with scientific data, as well as to the community members themselves who can draw both knowledge and pride in their participation in the scientific endeavor. It is also recognized that unique broader benefits are accrued from this mode of scientific research, through the inclusion of perspectives and data that would not otherwise be included. To make data accessible to community scientists, and to make the data collected by community scientists accessible to professional scientists, it needs to be provided in a manner that can be created and accessed without specialized instruments or specialized knowledge. Here, standards are needed to facilitate interactions between an in-group of expert researchers who generate and curate data and a broader set of out-group enthusiasts who would like to make meaningful contributions to the science. This creates a particularly stringent constraint on transparency and simplicity of standards. Creating these standards in a manner that addresses these unique constraints can benefit from OSS tools, with the caveat that some of these tools require additional expertise. For example, if the standard is developed using git/GitHub for versioning, this would require learning the complex and obscure technical aspects of these system that are far from easy to adopt, even for many professional scientists.

+

Another interesting use case for open-source standards is community/citizen science. An early example of this approach is OpenStreetMap (https://www.openstreetmap.org), which allows users to contribute to the project development with code and data and freely use the maps and other related geospatial datasets. But this example is not unique. Overall, this approach has grown in the last 20 years and has been adopted in many different fields. It has many benefits for both the research field that harnesses the energy of non-scientist members of the community to engage with scientific data, as well as to the community members themselves who can draw both knowledge and pride in their participation in the scientific endeavor. It is also recognized that unique broader benefits are accrued from this mode of scientific research, through the inclusion of perspectives and data that would not otherwise be included. To make data accessible to community scientists, and to make the data collected by community scientists accessible to professional scientists, it needs to be provided in a manner that can be created and accessed without specialized instruments or specialized knowledge. Here, standards are needed to facilitate interactions between an in-group of expert researchers who generate and curate data and a broader set of out-group enthusiasts who would like to make meaningful contributions to the science. This creates a particularly stringent constraint on transparency and simplicity of standards. Creating these standards in a manner that addresses these unique constraints can benefit from OSS tools, with the caveat that some of these tools require additional expertise. For example, if the standard is developed using git/GitHub for versioning, this would require learning the complex and obscure technical aspects of these system that are far from easy to adopt, even for many professional scientists.

@@ -369,13 +369,13 @@

5 Recommendations

5.1 Science and technology communities:

5.1.1 Establish standards governance based on OSS best practices

-

While best-practice governance principles are also relatively new in OSS communities, there is already a substantial set of prior art in this domain, on which the developers and maintainers of open-source data and metadata standards can rely. For example, it is now clear that governance principles and rules can mitigate some of the risks and challenges mentioned in Section 3, especially for communities beyond a certain size that need to converge toward a new standard or rely on an existing standard. Developers and maintainers should review existing governance practices such as The Open Source Way.

+

While best-practice governance principles are also relatively new in OSS communities, there is already a substantial set of prior art in this domain, on which the developers and maintainers of open-source data and metadata standards can rely. For example, it is now clear that governance principles and rules can mitigate some of the risks and challenges mentioned in Section 3, especially for communities beyond a certain size that need to converge toward a new standard or rely on an existing standard. Developers and maintainers should review existing governance practices such as those provided by The Open Source Way(https://www.theopensourceway.org/).

5.1.2 Foster meta-standards development

One of the main conclusions that arise from our survey of the landscape of existing standards is that there is significant knowledge that exists across fields and domains and that informs the development of standards within each field, but that could be surfaced to the level where it may be adopted more widely in different domains and be more broadly useful. One approach to this is a comparative approach: in this approach, a readiness and/or maturity model can be developed that assesses the challenges and opportunities that a specific standard faces at its current phase of development. Developing such a maturity model, while it goes beyond the scope of the current report, could lead to the eventual development of a meta-standard or a standard-of-standards. This would facilitate a succinct description of cross-cutting best-practices that can be used as a basis for the analysis or assessment of an existing standard, or as guidelines to develop new standards. For instance, specific barriers to adopting a data standard that take into account the size of the community and its specific technological capabilities should be considered.

-

More generally, meta-standards could include formalization for versioning of standards and interactions with specific related software. This includes amplifying formalization/guidelines on how to create standards (for example, metadata schema specifications using LinkML (https://linkml.io)). However, aspects of communication with potential user audiences (e.g., researchers in particular domains) should be taken into account as well. For example, in the quality of onboarding documentation and tools for ingestion or conversion into standards-compliant datasets.

-

An ontology for the standards-development process – for example top-down vs bottom-up, minimum number of datasets, target community size and technical expertise typical of this community, and so forth – could help guide the standards-development process towards more effective adoption and use. A set of meta-standards and high-level descriptions of the standards-development process – some of which is laid out in this report – could help standard developers avoid known pitfalls, such as the dreaded proliferation of standards, or complexity-impeded adoption. Surveying and documenting the success and failures of current standards for a specific dataset / domain can help disseminate knowledge about the standardization process. Resources such as Fairsharing or Digital Curation Center can help guide this process.

+

More generally, meta-standards could include formalization for versioning of standards and interactions with specific related software. This includes amplifying formalization/guidelines on how to create standards (for example, metadata schema specifications using LinkML, https://linkml.io). However, aspects of communication with potential user audiences (e.g., researchers in particular domains) should be taken into account as well. For example, in the quality of onboarding documentation and tools for ingestion or conversion into standards-compliant datasets.

+

An ontology for the standards-development process – for example top-down vs bottom-up, minimum number of datasets, target community size and technical expertise typical of this community, and so forth – could help guide the standards-development process towards more effective adoption and use. A set of meta-standards and high-level descriptions of the standards-development process – some of which is laid out in this report – could help standard developers avoid known pitfalls, such as the dreaded proliferation of standards, or complexity-impeded adoption. Surveying and documenting the success and failures of current standards for a specific dataset / domain can help disseminate knowledge about the standardization process. Resources such as Fairsharing ( https://fairsharing.org/) or the Digital Curation Center (https://www.dcc.ac.uk/guidance/standards) can help guide this process.

5.1.3 Develop standards in tandem with standards-associated software

diff --git a/index.pdf b/index.pdf index cdd2283..291fcf8 100644 Binary files a/index.pdf and b/index.pdf differ diff --git a/sections/01-introduction.embed.ipynb b/sections/01-introduction.embed.ipynb index 466d512..24e51b0 100644 --- a/sections/01-introduction.embed.ipynb +++ b/sections/01-introduction.embed.ipynb @@ -14,7 +14,7 @@ "\n", "Data and metadata standards that use tools and practices of OSS (“open-source standards” henceforth) reap many of the benefits that the OSS model has provided in the development of other technologies. The present report explores how OSS processes and tools have affected the development of data and metadata standards. The report will survey common features of a variety of use cases; it will identify some of the challenges and pitfalls of this mode of standards development, with a particular focus on cross-sector interactions; and it will make recommendations for future developments and policies that can help this mode of standards development thrive and reach its full potential." ], - "id": "e0275793-734e-4309-8897-4ea536aa2964" + "id": "fec64ca6-5258-4a80-80c6-9b104d6cc937" } ], "nbformat": 4, diff --git a/sections/01-introduction.out.ipynb b/sections/01-introduction.out.ipynb index 7f9e466..e201419 100644 --- a/sections/01-introduction.out.ipynb +++ b/sections/01-introduction.out.ipynb @@ -18,7 +18,7 @@ "\n", "Wilkinson, Mark D, Michel Dumontier, I Jsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. 2016. “The FAIR Guiding Principles for Scientific Data Management and Stewardship.” *Sci Data* 3 (March): 160018." ], - "id": "f25987e3-95af-4ead-ac2c-76e89719159d" + "id": "a65e5159-08fe-4d2c-88e4-933ae33bd979" } ], "nbformat": 4, diff --git a/sections/02-use-cases-preview.html b/sections/02-use-cases-preview.html index bba24c9..405d3e4 100644 --- a/sections/02-use-cases-preview.html +++ b/sections/02-use-cases-preview.html @@ -191,7 +191,7 @@

High-energy physic

Earth sciences

-

The need for geospatial data exchange between different systems began to be recognized in the 1970s and 1980s, but proprietary formats still dominated. Coordinated standardization efforts brought the Open Geospatial Consortium (OGC) establishment in the 1990s, a critical step towards open standards for geospatial data. The 1990s have also seen the development of key standards such as the Network Common Data Form (NetCDF) developed by the University Corporation for Atmospheric Research (UCAR), and the Hierarchical Data Format (HDF), a set of file formats (HDF4, HDF5) that are widely used, particularly in climate research. The GeoTIFF format, which originated at NASA in the late 1990s, is extensively used to share image data. The following two decades, the 2000s-2020s, brought an expansion of open standards and integration with web technologies developed by OGC, as well as other standards such as the Keyhole Markup Language (KML) for displaying geographic data in Earth browsers. Formats suitable for cloud computing also emerged, such as the Cloud Optimized GeoTIFF (COG), followed by Zarr and Apache Parquet for array and tabular data, respectively. In 2006, the Open Source Geospatial Foundation (OSGeo, https://www.osgeo.org) was established, demonstrating the community’s commitment to the development of open-source geospatial technologies. While some standards have been developed in the industry (e.g., Keyhole Markup Language (KML) by Keyhole Inc., which Google later acquired), they later became international standards of the OGC, which now encompasses more than 450 commercial, governmental, nonprofit, and research organizations working together on the development and implementation of open standards https://www.ogc.org.

+

The need for geospatial data exchange between different systems began to be recognized in the 1970s and 1980s, but proprietary formats still dominated. Coordinated standardization efforts brought the Open Geospatial Consortium (OGC) establishment in the 1990s, a critical step towards open standards for geospatial data. The 1990s have also seen the development of key standards such as the Network Common Data Form (NetCDF) developed by the University Corporation for Atmospheric Research (UCAR), and the Hierarchical Data Format (HDF), a set of file formats (HDF4, HDF5) that are widely used, particularly in climate research. The GeoTIFF format, which originated at NASA in the late 1990s, is extensively used to share image data. The following two decades, the 2000s-2020s, brought an expansion of open standards and integration with web technologies developed by OGC, as well as other standards such as the Keyhole Markup Language (KML) for displaying geographic data in Earth browsers. Formats suitable for cloud computing also emerged, such as the Cloud Optimized GeoTIFF (COG), followed by Zarr and Apache Parquet for array and tabular data, respectively. In 2006, the Open Source Geospatial Foundation (OSGeo, https://www.osgeo.org) was established, demonstrating the community’s commitment to the development of open-source geospatial technologies. While some standards have been developed in the industry (e.g., Keyhole Markup Language (KML) by Keyhole Inc., which Google later acquired), they later became international standards of the OGC, which now encompasses more than 450 commercial, governmental, nonprofit, and research organizations working together on the development and implementation of open standards (https://www.ogc.org).

Neuroscience

@@ -199,7 +199,7 @@

Neuroscience

Community science

-

Another interesting use case for open-source standards is community/citizen science. An early example of this approach is OpenStreetMap https://www.openstreetmap.org, which allows users to contribute to the project development with code and data and freely use the maps and other related geospatial datasets. But this example is not unique. Overall, this approach has grown in the last 20 years and has been adopted in many different fields. It has many benefits for both the research field that harnesses the energy of non-scientist members of the community to engage with scientific data, as well as to the community members themselves who can draw both knowledge and pride in their participation in the scientific endeavor. It is also recognized that unique broader benefits are accrued from this mode of scientific research, through the inclusion of perspectives and data that would not otherwise be included. To make data accessible to community scientists, and to make the data collected by community scientists accessible to professional scientists, it needs to be provided in a manner that can be created and accessed without specialized instruments or specialized knowledge. Here, standards are needed to facilitate interactions between an in-group of expert researchers who generate and curate data and a broader set of out-group enthusiasts who would like to make meaningful contributions to the science. This creates a particularly stringent constraint on transparency and simplicity of standards. Creating these standards in a manner that addresses these unique constraints can benefit from OSS tools, with the caveat that some of these tools require additional expertise. For example, if the standard is developed using git/GitHub for versioning, this would require learning the complex and obscure technical aspects of these system that are far from easy to adopt, even for many professional scientists.

+

Another interesting use case for open-source standards is community/citizen science. An early example of this approach is OpenStreetMap (https://www.openstreetmap.org), which allows users to contribute to the project development with code and data and freely use the maps and other related geospatial datasets. But this example is not unique. Overall, this approach has grown in the last 20 years and has been adopted in many different fields. It has many benefits for both the research field that harnesses the energy of non-scientist members of the community to engage with scientific data, as well as to the community members themselves who can draw both knowledge and pride in their participation in the scientific endeavor. It is also recognized that unique broader benefits are accrued from this mode of scientific research, through the inclusion of perspectives and data that would not otherwise be included. To make data accessible to community scientists, and to make the data collected by community scientists accessible to professional scientists, it needs to be provided in a manner that can be created and accessed without specialized instruments or specialized knowledge. Here, standards are needed to facilitate interactions between an in-group of expert researchers who generate and curate data and a broader set of out-group enthusiasts who would like to make meaningful contributions to the science. This creates a particularly stringent constraint on transparency and simplicity of standards. Creating these standards in a manner that addresses these unique constraints can benefit from OSS tools, with the caveat that some of these tools require additional expertise. For example, if the standard is developed using git/GitHub for versioning, this would require learning the complex and obscure technical aspects of these system that are far from easy to adopt, even for many professional scientists.

Basaglia, T, M Bellis, J Blomer, J Boyd, C Bozzi, D Britzger, S Campana, et al. 2023. “Data Preservation in High Energy Physics.” The European Physical Journal C 83 (9): 795. diff --git a/sections/02-use-cases.embed.ipynb b/sections/02-use-cases.embed.ipynb index b877d5e..3a91727 100644 --- a/sections/02-use-cases.embed.ipynb +++ b/sections/02-use-cases.embed.ipynb @@ -20,7 +20,7 @@ "\n", "## Earth sciences\n", "\n", - "The need for geospatial data exchange between different systems began to be recognized in the 1970s and 1980s, but proprietary formats still dominated. Coordinated standardization efforts brought the Open Geospatial Consortium (OGC) establishment in the 1990s, a critical step towards open standards for geospatial data. The 1990s have also seen the development of key standards such as the Network Common Data Form (NetCDF) developed by the University Corporation for Atmospheric Research (UCAR), and the Hierarchical Data Format (HDF), a set of file formats (HDF4, HDF5) that are widely used, particularly in climate research. The GeoTIFF format, which originated at NASA in the late 1990s, is extensively used to share image data. The following two decades, the 2000s-2020s, brought an expansion of open standards and integration with web technologies developed by OGC, as well as other standards such as the Keyhole Markup Language (KML) for displaying geographic data in Earth browsers. Formats suitable for cloud computing also emerged, such as the Cloud Optimized GeoTIFF (COG), followed by Zarr and Apache Parquet for array and tabular data, respectively. In 2006, the Open Source Geospatial Foundation (OSGeo, ) was established, demonstrating the community’s commitment to the development of open-source geospatial technologies. While some standards have been developed in the industry (e.g., Keyhole Markup Language (KML) by Keyhole Inc., which Google later acquired), they later became international standards of the OGC, which now encompasses more than 450 commercial, governmental, nonprofit, and research organizations working together on the development and implementation of open standards .\n", + "The need for geospatial data exchange between different systems began to be recognized in the 1970s and 1980s, but proprietary formats still dominated. Coordinated standardization efforts brought the Open Geospatial Consortium (OGC) establishment in the 1990s, a critical step towards open standards for geospatial data. The 1990s have also seen the development of key standards such as the Network Common Data Form (NetCDF) developed by the University Corporation for Atmospheric Research (UCAR), and the Hierarchical Data Format (HDF), a set of file formats (HDF4, HDF5) that are widely used, particularly in climate research. The GeoTIFF format, which originated at NASA in the late 1990s, is extensively used to share image data. The following two decades, the 2000s-2020s, brought an expansion of open standards and integration with web technologies developed by OGC, as well as other standards such as the Keyhole Markup Language (KML) for displaying geographic data in Earth browsers. Formats suitable for cloud computing also emerged, such as the Cloud Optimized GeoTIFF (COG), followed by Zarr and Apache Parquet for array and tabular data, respectively. In 2006, the Open Source Geospatial Foundation (OSGeo, ) was established, demonstrating the community’s commitment to the development of open-source geospatial technologies. While some standards have been developed in the industry (e.g., Keyhole Markup Language (KML) by Keyhole Inc., which Google later acquired), they later became international standards of the OGC, which now encompasses more than 450 commercial, governmental, nonprofit, and research organizations working together on the development and implementation of open standards ().\n", "\n", "## Neuroscience\n", "\n", @@ -28,9 +28,9 @@ "\n", "## Community science\n", "\n", - "Another interesting use case for open-source standards is community/citizen science. An early example of this approach is OpenStreetMap , which allows users to contribute to the project development with code and data and freely use the maps and other related geospatial datasets. But this example is not unique. Overall, this approach has grown in the last 20 years and has been adopted in many different fields. It has many benefits for both the research field that harnesses the energy of non-scientist members of the community to engage with scientific data, as well as to the community members themselves who can draw both knowledge and pride in their participation in the scientific endeavor. It is also recognized that unique broader benefits are accrued from this mode of scientific research, through the inclusion of perspectives and data that would not otherwise be included. To make data accessible to community scientists, and to make the data collected by community scientists accessible to professional scientists, it needs to be provided in a manner that can be created and accessed without specialized instruments or specialized knowledge. Here, standards are needed to facilitate interactions between an in-group of expert researchers who generate and curate data and a broader set of out-group enthusiasts who would like to make meaningful contributions to the science. This creates a particularly stringent constraint on transparency and simplicity of standards. Creating these standards in a manner that addresses these unique constraints can benefit from OSS tools, with the caveat that some of these tools require additional expertise. For example, if the standard is developed using git/GitHub for versioning, this would require learning the complex and obscure technical aspects of these system that are far from easy to adopt, even for many professional scientists." + "Another interesting use case for open-source standards is community/citizen science. An early example of this approach is OpenStreetMap (), which allows users to contribute to the project development with code and data and freely use the maps and other related geospatial datasets. But this example is not unique. Overall, this approach has grown in the last 20 years and has been adopted in many different fields. It has many benefits for both the research field that harnesses the energy of non-scientist members of the community to engage with scientific data, as well as to the community members themselves who can draw both knowledge and pride in their participation in the scientific endeavor. It is also recognized that unique broader benefits are accrued from this mode of scientific research, through the inclusion of perspectives and data that would not otherwise be included. To make data accessible to community scientists, and to make the data collected by community scientists accessible to professional scientists, it needs to be provided in a manner that can be created and accessed without specialized instruments or specialized knowledge. Here, standards are needed to facilitate interactions between an in-group of expert researchers who generate and curate data and a broader set of out-group enthusiasts who would like to make meaningful contributions to the science. This creates a particularly stringent constraint on transparency and simplicity of standards. Creating these standards in a manner that addresses these unique constraints can benefit from OSS tools, with the caveat that some of these tools require additional expertise. For example, if the standard is developed using git/GitHub for versioning, this would require learning the complex and obscure technical aspects of these system that are far from easy to adopt, even for many professional scientists." ], - "id": "98fe4965-8d8a-4055-9af3-24b79cb77bd5" + "id": "f8f40499-090a-4274-a3d8-a4140fd7a6e9" } ], "nbformat": 4, diff --git a/sections/02-use-cases.out.ipynb b/sections/02-use-cases.out.ipynb index cf1fd03..6ce93e5 100644 --- a/sections/02-use-cases.out.ipynb +++ b/sections/02-use-cases.out.ipynb @@ -20,7 +20,7 @@ "\n", "## Earth sciences\n", "\n", - "The need for geospatial data exchange between different systems began to be recognized in the 1970s and 1980s, but proprietary formats still dominated. Coordinated standardization efforts brought the Open Geospatial Consortium (OGC) establishment in the 1990s, a critical step towards open standards for geospatial data. The 1990s have also seen the development of key standards such as the Network Common Data Form (NetCDF) developed by the University Corporation for Atmospheric Research (UCAR), and the Hierarchical Data Format (HDF), a set of file formats (HDF4, HDF5) that are widely used, particularly in climate research. The GeoTIFF format, which originated at NASA in the late 1990s, is extensively used to share image data. The following two decades, the 2000s-2020s, brought an expansion of open standards and integration with web technologies developed by OGC, as well as other standards such as the Keyhole Markup Language (KML) for displaying geographic data in Earth browsers. Formats suitable for cloud computing also emerged, such as the Cloud Optimized GeoTIFF (COG), followed by Zarr and Apache Parquet for array and tabular data, respectively. In 2006, the Open Source Geospatial Foundation (OSGeo, ) was established, demonstrating the community’s commitment to the development of open-source geospatial technologies. While some standards have been developed in the industry (e.g., Keyhole Markup Language (KML) by Keyhole Inc., which Google later acquired), they later became international standards of the OGC, which now encompasses more than 450 commercial, governmental, nonprofit, and research organizations working together on the development and implementation of open standards .\n", + "The need for geospatial data exchange between different systems began to be recognized in the 1970s and 1980s, but proprietary formats still dominated. Coordinated standardization efforts brought the Open Geospatial Consortium (OGC) establishment in the 1990s, a critical step towards open standards for geospatial data. The 1990s have also seen the development of key standards such as the Network Common Data Form (NetCDF) developed by the University Corporation for Atmospheric Research (UCAR), and the Hierarchical Data Format (HDF), a set of file formats (HDF4, HDF5) that are widely used, particularly in climate research. The GeoTIFF format, which originated at NASA in the late 1990s, is extensively used to share image data. The following two decades, the 2000s-2020s, brought an expansion of open standards and integration with web technologies developed by OGC, as well as other standards such as the Keyhole Markup Language (KML) for displaying geographic data in Earth browsers. Formats suitable for cloud computing also emerged, such as the Cloud Optimized GeoTIFF (COG), followed by Zarr and Apache Parquet for array and tabular data, respectively. In 2006, the Open Source Geospatial Foundation (OSGeo, ) was established, demonstrating the community’s commitment to the development of open-source geospatial technologies. While some standards have been developed in the industry (e.g., Keyhole Markup Language (KML) by Keyhole Inc., which Google later acquired), they later became international standards of the OGC, which now encompasses more than 450 commercial, governmental, nonprofit, and research organizations working together on the development and implementation of open standards ().\n", "\n", "## Neuroscience\n", "\n", @@ -28,7 +28,7 @@ "\n", "## Community science\n", "\n", - "Another interesting use case for open-source standards is community/citizen science. An early example of this approach is OpenStreetMap , which allows users to contribute to the project development with code and data and freely use the maps and other related geospatial datasets. But this example is not unique. Overall, this approach has grown in the last 20 years and has been adopted in many different fields. It has many benefits for both the research field that harnesses the energy of non-scientist members of the community to engage with scientific data, as well as to the community members themselves who can draw both knowledge and pride in their participation in the scientific endeavor. It is also recognized that unique broader benefits are accrued from this mode of scientific research, through the inclusion of perspectives and data that would not otherwise be included. To make data accessible to community scientists, and to make the data collected by community scientists accessible to professional scientists, it needs to be provided in a manner that can be created and accessed without specialized instruments or specialized knowledge. Here, standards are needed to facilitate interactions between an in-group of expert researchers who generate and curate data and a broader set of out-group enthusiasts who would like to make meaningful contributions to the science. This creates a particularly stringent constraint on transparency and simplicity of standards. Creating these standards in a manner that addresses these unique constraints can benefit from OSS tools, with the caveat that some of these tools require additional expertise. For example, if the standard is developed using git/GitHub for versioning, this would require learning the complex and obscure technical aspects of these system that are far from easy to adopt, even for many professional scientists.\n", + "Another interesting use case for open-source standards is community/citizen science. An early example of this approach is OpenStreetMap (), which allows users to contribute to the project development with code and data and freely use the maps and other related geospatial datasets. But this example is not unique. Overall, this approach has grown in the last 20 years and has been adopted in many different fields. It has many benefits for both the research field that harnesses the energy of non-scientist members of the community to engage with scientific data, as well as to the community members themselves who can draw both knowledge and pride in their participation in the scientific endeavor. It is also recognized that unique broader benefits are accrued from this mode of scientific research, through the inclusion of perspectives and data that would not otherwise be included. To make data accessible to community scientists, and to make the data collected by community scientists accessible to professional scientists, it needs to be provided in a manner that can be created and accessed without specialized instruments or specialized knowledge. Here, standards are needed to facilitate interactions between an in-group of expert researchers who generate and curate data and a broader set of out-group enthusiasts who would like to make meaningful contributions to the science. This creates a particularly stringent constraint on transparency and simplicity of standards. Creating these standards in a manner that addresses these unique constraints can benefit from OSS tools, with the caveat that some of these tools require additional expertise. For example, if the standard is developed using git/GitHub for versioning, this would require learning the complex and obscure technical aspects of these system that are far from easy to adopt, even for many professional scientists.\n", "\n", "Basaglia, T, M Bellis, J Blomer, J Boyd, C Bozzi, D Britzger, S Campana, et al. 2023. “Data Preservation in High Energy Physics.” *The European Physical Journal C* 83 (9): 795.\n", "\n", @@ -46,7 +46,7 @@ "\n", "Wells, Donald Carson, and Eric W Greisen. 1979. “FITS-a Flexible Image Transport System.” In *Image Processing in Astronomy*, 445." ], - "id": "e08e001b-5ee2-4963-9b78-1b0899d0eb73" + "id": "5a81d885-60af-451c-8ae4-0793537adefd" } ], "nbformat": 4, diff --git a/sections/02-use-cases.qmd b/sections/02-use-cases.qmd index b1f5f5d..3c25693 100644 --- a/sections/02-use-cases.qmd +++ b/sections/02-use-cases.qmd @@ -86,8 +86,7 @@ some standards have been developed in the industry (e.g., Keyhole Markup Language (KML) by Keyhole Inc., which Google later acquired), they later became international standards of the OGC, which now encompasses more than 450 commercial, governmental, nonprofit, and research organizations working -together on the development and implementation of open standards -[https://www.ogc.org](https://www.ogc.org). +together on the development and implementation of open standards ([https://www.ogc.org](https://www.ogc.org)). ## Neuroscience @@ -121,30 +120,29 @@ wide range of stakeholders and tap a broad base of expertise. ## Community science Another interesting use case for open-source standards is community/citizen -science. An early example of this approach is OpenStreetMap -[https://www.openstreetmap.org](https://www.openstreetmap.org), which allows -users to contribute to the project development with code and data and freely -use the maps and other related geospatial datasets. But this example is not -unique. Overall, this approach has grown in the last 20 years and has been -adopted in many different fields. It has many benefits for both the research -field that harnesses the energy of non-scientist members of the community to -engage with scientific data, as well as to the community members themselves who -can draw both knowledge and pride in their participation in the scientific -endeavor. It is also recognized that unique broader benefits are accrued from -this mode of scientific research, through the inclusion of perspectives and -data that would not otherwise be included. To make data accessible to community -scientists, and to make the data collected by community scientists accessible -to professional scientists, it needs to be provided in a manner that can be -created and accessed without specialized instruments or specialized knowledge. -Here, standards are needed to facilitate interactions between an in-group of -expert researchers who generate and curate data and a broader set of out-group -enthusiasts who would like to make meaningful contributions to the science. -This creates a particularly stringent constraint on transparency and simplicity -of standards. Creating these standards in a manner that addresses these unique -constraints can benefit from OSS tools, with the caveat that some of these -tools require additional expertise. For example, if the standard is developed -using git/GitHub for versioning, this would require learning the complex and -obscure technical aspects of these system that are far from easy to adopt, even -for many professional scientists. +science. An early example of this approach is OpenStreetMap ([https://www.openstreetmap.org](https://www.openstreetmap.org)), +which allows users to contribute to the project development with code and data +and freely use the maps and other related geospatial datasets. But this example +is not unique. Overall, this approach has grown in the last 20 years and has +been adopted in many different fields. It has many benefits for both the +research field that harnesses the energy of non-scientist members of the +community to engage with scientific data, as well as to the community members +themselves who can draw both knowledge and pride in their participation in the +scientific endeavor. It is also recognized that unique broader benefits are +accrued from this mode of scientific research, through the inclusion of +perspectives and data that would not otherwise be included. To make data +accessible to community scientists, and to make the data collected by community +scientists accessible to professional scientists, it needs to be provided in a +manner that can be created and accessed without specialized instruments or +specialized knowledge. Here, standards are needed to facilitate interactions +between an in-group of expert researchers who generate and curate data and a +broader set of out-group enthusiasts who would like to make meaningful +contributions to the science. This creates a particularly stringent constraint +on transparency and simplicity of standards. Creating these standards in a +manner that addresses these unique constraints can benefit from OSS tools, with +the caveat that some of these tools require additional expertise. For example, +if the standard is developed using git/GitHub for versioning, this would +require learning the complex and obscure technical aspects of these system that +are far from easy to adopt, even for many professional scientists. diff --git a/sections/03-challenges.embed.ipynb b/sections/03-challenges.embed.ipynb index 778f099..4658c43 100644 --- a/sections/03-challenges.embed.ipynb +++ b/sections/03-challenges.embed.ipynb @@ -36,7 +36,7 @@ "\n", "The development of open-source standards faces similar sustainability challenges to those faced by open-source software that is developed for research. Standards typically develop organically through sustained and persistent efforts from dedicated groups of data practitioners. These include scientists and the broader ecosystem of data curators and users. However, there is no playbook on the structure and components of a data standard, or the pathway that moves the implementation of a specific data architecture (e.g., a particular file format) to become a data standard. As a result, data standardization lacks formal avenues for success and recognition, for example through dedicated research grants (and see @sec-cross-sector). This hampers the long-term trajectory that is needed to inculcate a standard into the day-to-day practice of researchers." ], - "id": "04084c1c-5945-4692-ad9d-aad2a10aa97e" + "id": "1210eaa1-9af9-49c2-8ee2-c00d4925904f" } ], "nbformat": 4, diff --git a/sections/03-challenges.out.ipynb b/sections/03-challenges.out.ipynb index aed3b39..b136072 100644 --- a/sections/03-challenges.out.ipynb +++ b/sections/03-challenges.out.ipynb @@ -50,7 +50,7 @@ "\n", "Scroggins, Michael, and Bernadette M Boscoe. 2020. “Once FITS, Always FITS? Astronomical Infrastructure in Transition.” *IEEE Ann. Hist. Comput.* 42 (2): 42–54." ], - "id": "5703d200-943c-4f99-b983-e6c7c1c96b32" + "id": "84983b8a-ef34-4fde-b448-42c82cd8276a" } ], "nbformat": 4, diff --git a/sections/04-cross-sector.embed.ipynb b/sections/04-cross-sector.embed.ipynb index 875564c..0de644b 100644 --- a/sections/04-cross-sector.embed.ipynb +++ b/sections/04-cross-sector.embed.ipynb @@ -26,7 +26,7 @@ "\n", "Interactions of data and meta-data standards with commercial interests may provide specific sources of friction. This is because proprietary/closed formats of data can create difficulty at various transition points: from one instrument vendor to another, from data producer to downstream recipient/user, etc. On the other hand, in some cases, cross-sector collaborations with commercial entities may pave the way to robust and useful standards. For example, imaging measurements in human subjects (e.g., in brain imaging experiments) significantly interact with standards for medical imaging, and chiefly the Digital Imaging and Communications in Medicine (DICOM) standard, which is widely used in a range of medical imaging applications, including in clinical settings \\[@Larobina2023-vq, @Mustra2008-xk\\]. The standard emerged from the demands of the clinical practice in the 1980s, as digital technologies were came into widespread use in medical imaging, through joint work of industry organizations: the American College of Radiology and the National Association of Electronic Manufacturers. One of the defining features of the DICOM standard is that it allows manufacturers of instruments to define “private fields” that are compliant with the standard, but which may include idiosyncratically organized data and/or metadata. This provides significant flexibility, but can also easily lead to the loss of important information. Nevertheless, the human brain imaging case is exemplary of a case in which industry standards and research standards coexist and need to communicate with each other effectively to advance research use-cases, while keeping up with the rapid development of the technologies." ], - "id": "07077904-63b0-44ee-8ad5-54c9eaecdd2d" + "id": "bdce5ee4-07ef-4456-8fae-4ff8532d5d4f" } ], "nbformat": 4, diff --git a/sections/04-cross-sector.out.ipynb b/sections/04-cross-sector.out.ipynb index ca74e2a..a690fbb 100644 --- a/sections/04-cross-sector.out.ipynb +++ b/sections/04-cross-sector.out.ipynb @@ -36,7 +36,7 @@ "\n", "The National Science and Technology Council. 2022. “Desirable Characteristics of Data Repositories for Federally Funded Research.” *Executive Office of the President of the United States, Tech. Rep*." ], - "id": "d4585359-8b4c-4f3e-908e-74f662e67826" + "id": "06286a89-1be6-432d-9407-8f286ac6fe93" } ], "nbformat": 4, diff --git a/sections/05-recommendations-preview.html b/sections/05-recommendations-preview.html index bfeda70..74eae5b 100644 --- a/sections/05-recommendations-preview.html +++ b/sections/05-recommendations-preview.html @@ -193,13 +193,13 @@

Recommendations for open-source data and metadata standards

Science and technology communities:

Establish standards governance based on OSS best practices

-

While best-practice governance principles are also relatively new in OSS communities, there is already a substantial set of prior art in this domain, on which the developers and maintainers of open-source data and metadata standards can rely. For example, it is now clear that governance principles and rules can mitigate some of the risks and challenges mentioned in (sec-challenges?), especially for communities beyond a certain size that need to converge toward a new standard or rely on an existing standard. Developers and maintainers should review existing governance practices such as The Open Source Way.

+

While best-practice governance principles are also relatively new in OSS communities, there is already a substantial set of prior art in this domain, on which the developers and maintainers of open-source data and metadata standards can rely. For example, it is now clear that governance principles and rules can mitigate some of the risks and challenges mentioned in (sec-challenges?), especially for communities beyond a certain size that need to converge toward a new standard or rely on an existing standard. Developers and maintainers should review existing governance practices such as those provided by The Open Source Way(https://www.theopensourceway.org/).

Foster meta-standards development

One of the main conclusions that arise from our survey of the landscape of existing standards is that there is significant knowledge that exists across fields and domains and that informs the development of standards within each field, but that could be surfaced to the level where it may be adopted more widely in different domains and be more broadly useful. One approach to this is a comparative approach: in this approach, a readiness and/or maturity model can be developed that assesses the challenges and opportunities that a specific standard faces at its current phase of development. Developing such a maturity model, while it goes beyond the scope of the current report, could lead to the eventual development of a meta-standard or a standard-of-standards. This would facilitate a succinct description of cross-cutting best-practices that can be used as a basis for the analysis or assessment of an existing standard, or as guidelines to develop new standards. For instance, specific barriers to adopting a data standard that take into account the size of the community and its specific technological capabilities should be considered.

-

More generally, meta-standards could include formalization for versioning of standards and interactions with specific related software. This includes amplifying formalization/guidelines on how to create standards (for example, metadata schema specifications using LinkML (https://linkml.io)). However, aspects of communication with potential user audiences (e.g., researchers in particular domains) should be taken into account as well. For example, in the quality of onboarding documentation and tools for ingestion or conversion into standards-compliant datasets.

-

An ontology for the standards-development process – for example top-down vs bottom-up, minimum number of datasets, target community size and technical expertise typical of this community, and so forth – could help guide the standards-development process towards more effective adoption and use. A set of meta-standards and high-level descriptions of the standards-development process – some of which is laid out in this report – could help standard developers avoid known pitfalls, such as the dreaded proliferation of standards, or complexity-impeded adoption. Surveying and documenting the success and failures of current standards for a specific dataset / domain can help disseminate knowledge about the standardization process. Resources such as Fairsharing or Digital Curation Center can help guide this process.

+

More generally, meta-standards could include formalization for versioning of standards and interactions with specific related software. This includes amplifying formalization/guidelines on how to create standards (for example, metadata schema specifications using LinkML, https://linkml.io). However, aspects of communication with potential user audiences (e.g., researchers in particular domains) should be taken into account as well. For example, in the quality of onboarding documentation and tools for ingestion or conversion into standards-compliant datasets.

+

An ontology for the standards-development process – for example top-down vs bottom-up, minimum number of datasets, target community size and technical expertise typical of this community, and so forth – could help guide the standards-development process towards more effective adoption and use. A set of meta-standards and high-level descriptions of the standards-development process – some of which is laid out in this report – could help standard developers avoid known pitfalls, such as the dreaded proliferation of standards, or complexity-impeded adoption. Surveying and documenting the success and failures of current standards for a specific dataset / domain can help disseminate knowledge about the standardization process. Resources such as Fairsharing ( https://fairsharing.org/) or the Digital Curation Center (https://www.dcc.ac.uk/guidance/standards) can help guide this process.

Develop standards in tandem with standards-associated software

diff --git a/sections/05-recommendations.embed.ipynb b/sections/05-recommendations.embed.ipynb index f8bac17..cb1f85f 100644 --- a/sections/05-recommendations.embed.ipynb +++ b/sections/05-recommendations.embed.ipynb @@ -14,15 +14,15 @@ "\n", "### Establish standards governance based on OSS best practices\n", "\n", - "While best-practice governance principles are also relatively new in OSS communities, there is already a substantial set of prior art in this domain, on which the developers and maintainers of open-source data and metadata standards can rely. For example, it is now clear that governance principles and rules can mitigate some of the risks and challenges mentioned in @sec-challenges, especially for communities beyond a certain size that need to converge toward a new standard or rely on an existing standard. Developers and maintainers should review existing governance practices such as [The Open Source Way](https://www.theopensourceway.org/the_open_source_way-guidebook-2.0.html#_project_and_community_governance).\n", + "While best-practice governance principles are also relatively new in OSS communities, there is already a substantial set of prior art in this domain, on which the developers and maintainers of open-source data and metadata standards can rely. For example, it is now clear that governance principles and rules can mitigate some of the risks and challenges mentioned in @sec-challenges, especially for communities beyond a certain size that need to converge toward a new standard or rely on an existing standard. Developers and maintainers should review existing governance practices such as those provided by The Open Source Way([https://www.theopensourceway.org/](https://www.theopensourceway.org/the_open_source_way-guidebook-2.0.html#_project_and_community_governance)).\n", "\n", "### Foster meta-standards development\n", "\n", "One of the main conclusions that arise from our survey of the landscape of existing standards is that there is significant knowledge that exists across fields and domains and that informs the development of standards within each field, but that could be surfaced to the level where it may be adopted more widely in different domains and be more broadly useful. One approach to this is a comparative approach: in this approach, a readiness and/or maturity model can be developed that assesses the challenges and opportunities that a specific standard faces at its current phase of development. Developing such a maturity model, while it goes beyond the scope of the current report, could lead to the eventual development of a meta-standard or a standard-of-standards. This would facilitate a succinct description of cross-cutting best-practices that can be used as a basis for the analysis or assessment of an existing standard, or as guidelines to develop new standards. For instance, specific barriers to adopting a data standard that take into account the size of the community and its specific technological capabilities should be considered.\n", "\n", - "More generally, meta-standards could include formalization for versioning of standards and interactions with specific related software. This includes amplifying formalization/guidelines on how to create standards (for example, metadata schema specifications using LinkML (https://linkml.io)). However, aspects of communication with potential user audiences (e.g., researchers in particular domains) should be taken into account as well. For example, in the quality of onboarding documentation and tools for ingestion or conversion into standards-compliant datasets.\n", + "More generally, meta-standards could include formalization for versioning of standards and interactions with specific related software. This includes amplifying formalization/guidelines on how to create standards (for example, metadata schema specifications using LinkML, ). However, aspects of communication with potential user audiences (e.g., researchers in particular domains) should be taken into account as well. For example, in the quality of onboarding documentation and tools for ingestion or conversion into standards-compliant datasets.\n", "\n", - "An ontology for the standards-development process – for example top-down vs bottom-up, minimum number of datasets, target community size and technical expertise typical of this community, and so forth – could help guide the standards-development process towards more effective adoption and use. A set of meta-standards and high-level descriptions of the standards-development process – some of which is laid out in this report – could help standard developers avoid known pitfalls, such as the dreaded proliferation of standards, or complexity-impeded adoption. Surveying and documenting the success and failures of current standards for a specific dataset / domain can help disseminate knowledge about the standardization process. Resources such as [Fairsharing](https://fairsharing.org/) or [Digital Curation Center](https://www.dcc.ac.uk/guidance/standards) can help guide this process.\n", + "An ontology for the standards-development process – for example top-down vs bottom-up, minimum number of datasets, target community size and technical expertise typical of this community, and so forth – could help guide the standards-development process towards more effective adoption and use. A set of meta-standards and high-level descriptions of the standards-development process – some of which is laid out in this report – could help standard developers avoid known pitfalls, such as the dreaded proliferation of standards, or complexity-impeded adoption. Surveying and documenting the success and failures of current standards for a specific dataset / domain can help disseminate knowledge about the standardization process. Resources such as Fairsharing ( ) or the Digital Curation Center () can help guide this process.\n", "\n", "### Develop standards in tandem with standards-associated software\n", "\n", @@ -48,7 +48,7 @@ "\n", "Encourage cross-sector and cross-domain alliances that can impact successful standards creation. Invest in robust program management of these alliances to align pace and create incentives (for instance via Open Source Program Offices at Universities or other research organizations). Similar to program officers at funding agencies, standards evolution need sustained PM efforts. Multi-party partnerships should include strategic initiatives for standard establishment such as the Pistoia Alliance ()." ], - "id": "44caf65c-2b55-4ba4-b1e6-93877140efd3" + "id": "0be62ce2-3df8-4e30-944b-5150deff3829" } ], "nbformat": 4, diff --git a/sections/05-recommendations.out.ipynb b/sections/05-recommendations.out.ipynb index 069ce7b..e9b5499 100644 --- a/sections/05-recommendations.out.ipynb +++ b/sections/05-recommendations.out.ipynb @@ -14,15 +14,15 @@ "\n", "### Establish standards governance based on OSS best practices\n", "\n", - "While best-practice governance principles are also relatively new in OSS communities, there is already a substantial set of prior art in this domain, on which the developers and maintainers of open-source data and metadata standards can rely. For example, it is now clear that governance principles and rules can mitigate some of the risks and challenges mentioned in ([**sec-challenges?**](#ref-sec-challenges)), especially for communities beyond a certain size that need to converge toward a new standard or rely on an existing standard. Developers and maintainers should review existing governance practices such as [The Open Source Way](https://www.theopensourceway.org/the_open_source_way-guidebook-2.0.html#_project_and_community_governance).\n", + "While best-practice governance principles are also relatively new in OSS communities, there is already a substantial set of prior art in this domain, on which the developers and maintainers of open-source data and metadata standards can rely. For example, it is now clear that governance principles and rules can mitigate some of the risks and challenges mentioned in ([**sec-challenges?**](#ref-sec-challenges)), especially for communities beyond a certain size that need to converge toward a new standard or rely on an existing standard. Developers and maintainers should review existing governance practices such as those provided by The Open Source Way([https://www.theopensourceway.org/](https://www.theopensourceway.org/the_open_source_way-guidebook-2.0.html#_project_and_community_governance)).\n", "\n", "### Foster meta-standards development\n", "\n", "One of the main conclusions that arise from our survey of the landscape of existing standards is that there is significant knowledge that exists across fields and domains and that informs the development of standards within each field, but that could be surfaced to the level where it may be adopted more widely in different domains and be more broadly useful. One approach to this is a comparative approach: in this approach, a readiness and/or maturity model can be developed that assesses the challenges and opportunities that a specific standard faces at its current phase of development. Developing such a maturity model, while it goes beyond the scope of the current report, could lead to the eventual development of a meta-standard or a standard-of-standards. This would facilitate a succinct description of cross-cutting best-practices that can be used as a basis for the analysis or assessment of an existing standard, or as guidelines to develop new standards. For instance, specific barriers to adopting a data standard that take into account the size of the community and its specific technological capabilities should be considered.\n", "\n", - "More generally, meta-standards could include formalization for versioning of standards and interactions with specific related software. This includes amplifying formalization/guidelines on how to create standards (for example, metadata schema specifications using LinkML (https://linkml.io)). However, aspects of communication with potential user audiences (e.g., researchers in particular domains) should be taken into account as well. For example, in the quality of onboarding documentation and tools for ingestion or conversion into standards-compliant datasets.\n", + "More generally, meta-standards could include formalization for versioning of standards and interactions with specific related software. This includes amplifying formalization/guidelines on how to create standards (for example, metadata schema specifications using LinkML, ). However, aspects of communication with potential user audiences (e.g., researchers in particular domains) should be taken into account as well. For example, in the quality of onboarding documentation and tools for ingestion or conversion into standards-compliant datasets.\n", "\n", - "An ontology for the standards-development process – for example top-down vs bottom-up, minimum number of datasets, target community size and technical expertise typical of this community, and so forth – could help guide the standards-development process towards more effective adoption and use. A set of meta-standards and high-level descriptions of the standards-development process – some of which is laid out in this report – could help standard developers avoid known pitfalls, such as the dreaded proliferation of standards, or complexity-impeded adoption. Surveying and documenting the success and failures of current standards for a specific dataset / domain can help disseminate knowledge about the standardization process. Resources such as [Fairsharing](https://fairsharing.org/) or [Digital Curation Center](https://www.dcc.ac.uk/guidance/standards) can help guide this process.\n", + "An ontology for the standards-development process – for example top-down vs bottom-up, minimum number of datasets, target community size and technical expertise typical of this community, and so forth – could help guide the standards-development process towards more effective adoption and use. A set of meta-standards and high-level descriptions of the standards-development process – some of which is laid out in this report – could help standard developers avoid known pitfalls, such as the dreaded proliferation of standards, or complexity-impeded adoption. Surveying and documenting the success and failures of current standards for a specific dataset / domain can help disseminate knowledge about the standardization process. Resources such as Fairsharing ( ) or the Digital Curation Center () can help guide this process.\n", "\n", "### Develop standards in tandem with standards-associated software\n", "\n", @@ -56,7 +56,7 @@ "\n", "Van Tuyl, Steve, ed. 2023. “Hiring, Managing, and Retaining Data Scientists and Research Software Engineers in Academia: A Career Guidebook from ADSA and US-RSE.” https://doi.org/." ], - "id": "8d8ee194-4142-41ff-b971-f3eb81f0e3de" + "id": "49c2e5b0-9555-469b-9808-1a7ad37676c9" } ], "nbformat": 4, diff --git a/sections/05-recommendations.qmd b/sections/05-recommendations.qmd index 88fa3eb..4ac87f7 100644 --- a/sections/05-recommendations.qmd +++ b/sections/05-recommendations.qmd @@ -20,7 +20,7 @@ can rely. For example, it is now clear that governance principles and rules can mitigate some of the risks and challenges mentioned in @sec-challenges, especially for communities beyond a certain size that need to converge toward a new standard or rely on an existing standard. Developers and maintainers should -review existing governance practices such as [The Open Source Way](https://www.theopensourceway.org/the_open_source_way-guidebook-2.0.html#_project_and_community_governance). +review existing governance practices such as those provided by The Open Source Way([https://www.theopensourceway.org/](https://www.theopensourceway.org/the_open_source_way-guidebook-2.0.html#_project_and_community_governance)). ### Foster meta-standards development @@ -44,7 +44,7 @@ its specific technological capabilities should be considered. More generally, meta-standards could include formalization for versioning of standards and interactions with specific related software. This includes amplifying formalization/guidelines on how to create standards (for example, -metadata schema specifications using LinkML (https://linkml.io)). However, +metadata schema specifications using LinkML, [https://linkml.io](https://linkml.io)). However, aspects of communication with potential user audiences (e.g., researchers in particular domains) should be taken into account as well. For example, in the quality of onboarding documentation and tools for ingestion or conversion into @@ -59,9 +59,8 @@ meta-standards and high-level descriptions of the standards-development process avoid known pitfalls, such as the dreaded proliferation of standards, or complexity-impeded adoption. Surveying and documenting the success and failures of current standards for a specific dataset / domain can help disseminate -knowledge about the standardization process. Resources such as -[Fairsharing](https://fairsharing.org/) or [Digital Curation Center](https://www.dcc.ac.uk/guidance/standards) -can help guide this process. +knowledge about the standardization process. Resources such as Fairsharing ( +[https://fairsharing.org/](https://fairsharing.org/)) or the Digital Curation Center ([https://www.dcc.ac.uk/guidance/standards](https://www.dcc.ac.uk/guidance/standards)) can help guide this process. ### Develop standards in tandem with standards-associated software diff --git a/sections/06-acknowledgments.embed.ipynb b/sections/06-acknowledgments.embed.ipynb index 49b88f7..cfb5eaa 100644 --- a/sections/06-acknowledgments.embed.ipynb +++ b/sections/06-acknowledgments.embed.ipynb @@ -12,7 +12,7 @@ "\n", "The workshop and this report were funded through [NSF grant #2334483](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2334483&HistoricalAwards=false) from the NSF [Pathways to Enable Open-Source Ecosystems (POSE)](https://new.nsf.gov/funding/opportunities/pathways-enable-open-source-ecosystems-pose) program. The opinions expressed in this report do not necessarily reflect those of the National Science Foundation." ], - "id": "67d2ff76-a2bf-4c5a-a424-95f99096a6d7" + "id": "8bc5e899-1ead-40cf-87a2-cbe581453325" } ], "nbformat": 4, diff --git a/sections/06-acknowledgments.out.ipynb b/sections/06-acknowledgments.out.ipynb index 5aac93f..00912eb 100644 --- a/sections/06-acknowledgments.out.ipynb +++ b/sections/06-acknowledgments.out.ipynb @@ -12,7 +12,7 @@ "\n", "The workshop and this report were funded through [NSF grant #2334483](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2334483&HistoricalAwards=false) from the NSF [Pathways to Enable Open-Source Ecosystems (POSE)](https://new.nsf.gov/funding/opportunities/pathways-enable-open-source-ecosystems-pose) program. The opinions expressed in this report do not necessarily reflect those of the National Science Foundation." ], - "id": "87cdcc58-d3e3-483a-a35a-0d2ce8878a44" + "id": "ae7d89d3-2875-4fb8-aa43-45254d8fff8b" } ], "nbformat": 4, diff --git a/sections/07-participants.embed.ipynb b/sections/07-participants.embed.ipynb index 7480239..d348211 100644 --- a/sections/07-participants.embed.ipynb +++ b/sections/07-participants.embed.ipynb @@ -40,7 +40,7 @@ "| Yaroslav Halchenko | Dartmouth University |\n", "| Ziheng Sun | George Mason University |" ], - "id": "e7d849d7-d779-4132-97ac-7e2d9d9cb402" + "id": "58492b18-0b7b-4f20-8879-9721d5d88b51" } ], "nbformat": 4, diff --git a/sections/07-participants.out.ipynb b/sections/07-participants.out.ipynb index 81b954e..851be9a 100644 --- a/sections/07-participants.out.ipynb +++ b/sections/07-participants.out.ipynb @@ -40,7 +40,7 @@ "| Yaroslav Halchenko | Dartmouth University |\n", "| Ziheng Sun | George Mason University |" ], - "id": "80f7b6ef-e3e9-4496-9bdb-797dd2a2f5a5" + "id": "453390c0-c9bb-4602-9bff-47504b675ca9" } ], "nbformat": 4,