diff --git a/doc/sphinx-guides/source/admin/collectionquotas.rst b/doc/sphinx-guides/source/admin/collectionquotas.rst
new file mode 100644
index 00000000000..2ce3132e2ba
--- /dev/null
+++ b/doc/sphinx-guides/source/admin/collectionquotas.rst
@@ -0,0 +1,19 @@
+
+Storage Quotas for Collections
+==============================
+
+Please note that this is a new and still experimental feature (as of Dataverse v6.1 release).
+
+Instance admins can now define storage quota limits for specific collections. These limits can be set, changed and/or deleted via the provided APIs (please see the :ref:`collection-storage-quotas` section of the :doc:`/api/native-api` guide). The Read version of the API is available to the individual collection admins (i.e., a collection owner can check on the quota configured for their collection), but only superusers can set, change or disable storage quotas.
+
+Storage quotas are *inherited* by subcollections. In other words, when storage use limit is set for a specific collection, it applies to all the datasets immediately under it and in its sub-collections, unless different quotas are defined there and so on. Each file added to any dataset in that hierarchy counts for the purposes of the quota limit defined for the top collection. A storage quota defined on a child sub-collection overrides whatever quota that may be defined on the parent, or inherited from an ancestor.
+
+For example, a collection ``A`` has the storage quota set to 10GB. It has 3 sub-collections, ``B``, ``C`` and ``D``. Users can keep uploading files into the datasets anywhere in this hierarchy until the combined size of 10GB is reached between them. However, if an admin has reasons to limit one of the sub-collections, ``B`` to 3GB only, that quota can be explicitly set there. This both limits the growth of ``B`` to 3GB, and also *guarantees* that allocation to it. I.e. the contributors to collection ``B`` will be able to keep adding data until the 3GB limit is reached, even after the parent collection ``A`` reaches the combined 10GB limit (at which point ``A`` and all its subcollections except for ``B`` will become read-only).
+
+We do not yet know whether this is going to be a popular, or needed use case - a child collection quota that is different from the quota it inherits from a parent. It is likely that for many instances it will be sufficient to be able to define quotas for collections and have them apply to all the child objects underneath. We will examine the response to this feature and consider making adjustments to this scheme based on it. We are already considering introducing other types of quotas, such as limits by users or specific storage volumes.
+
+Please note that only the sizes of the main datafiles and the archival tab-delimited format versions, as produced by the ingest process are counted for the purposes of enforcing the limits. Automatically generated "auxiliary" files, such as rescaled image thumbnails and metadata exports for datasets are not.
+
+When quotas are set and enforced, the users will be informed of the remaining storage allocation on the file upload page together with other upload and processing limits.
+
+Part of the new and experimental nature of this feature is that we don't know for the fact yet how well it will function in real life on a very busy production system, despite our best efforts to test it prior to the release. One specific issue is having to update the recorded storage use for every parent collection of the given dataset whenever new files are added. This includes updating the combined size of the root, top collection - which will need to be updated after *every* file upload. In an unlikely case that this will start causing problems with race conditions and database update conflicts, it is possible to disable these updates (and thus disable the storage quotas feature), by setting the :ref:`dataverse.storageuse.disable-storageuse-increments` JVM setting to true.
diff --git a/doc/sphinx-guides/source/admin/dataverses-datasets.rst b/doc/sphinx-guides/source/admin/dataverses-datasets.rst
index 170807d3d67..37494c57fa1 100644
--- a/doc/sphinx-guides/source/admin/dataverses-datasets.rst
+++ b/doc/sphinx-guides/source/admin/dataverses-datasets.rst
@@ -53,11 +53,15 @@ Configure a Dataverse Collection to Store All New Files in a Specific File Store
To direct new files (uploaded when datasets are created or edited) for all datasets in a given Dataverse collection, the store can be specified via the API as shown below, or by editing the 'General Information' for a Dataverse collection on the Dataverse collection page. Only accessible to superusers. ::
curl -H "X-Dataverse-key: $API_TOKEN" -X PUT -d $storageDriverLabel http://$SERVER/api/admin/dataverse/$dataverse-alias/storageDriver
+
+(Note that for ``dataverse.files.store1.label=MyLabel``, you should pass ``MyLabel``.)
The current driver can be seen using::
curl -H "X-Dataverse-key: $API_TOKEN" http://$SERVER/api/admin/dataverse/$dataverse-alias/storageDriver
+(Note that for ``dataverse.files.store1.label=MyLabel``, ``store1`` will be returned.)
+
and can be reset to the default store with::
curl -H "X-Dataverse-key: $API_TOKEN" -X DELETE http://$SERVER/api/admin/dataverse/$dataverse-alias/storageDriver
diff --git a/doc/sphinx-guides/source/admin/external-tools.rst b/doc/sphinx-guides/source/admin/external-tools.rst
index 67075e986bb..346ca0b15ee 100644
--- a/doc/sphinx-guides/source/admin/external-tools.rst
+++ b/doc/sphinx-guides/source/admin/external-tools.rst
@@ -115,7 +115,7 @@ Dataset level explore tools allow the user to explore all the files in a dataset
Dataset Level Configure Tools
+++++++++++++++++++++++++++++
-Configure tools at the dataset level are not currently supported.
+Dataset level configure tools can be launched by users who have edit access to the dataset. These tools are found under the "Edit Dataset" menu.
Writing Your Own External Tool
------------------------------
diff --git a/doc/sphinx-guides/source/admin/index.rst b/doc/sphinx-guides/source/admin/index.rst
index ac81aa737a7..633842044b4 100755
--- a/doc/sphinx-guides/source/admin/index.rst
+++ b/doc/sphinx-guides/source/admin/index.rst
@@ -27,6 +27,7 @@ This guide documents the functionality only available to superusers (such as "da
solr-search-index
ip-groups
mail-groups
+ collectionquotas
monitoring
reporting-tools-and-queries
maintenance
diff --git a/doc/sphinx-guides/source/admin/integrations.rst b/doc/sphinx-guides/source/admin/integrations.rst
index 21adf8338d9..2b6bdb8eeb5 100644
--- a/doc/sphinx-guides/source/admin/integrations.rst
+++ b/doc/sphinx-guides/source/admin/integrations.rst
@@ -121,6 +121,18 @@ Its goal is to make the dashboard adjustable for a Dataverse installation's need
The integrations dashboard is currently in development. A preview and more information can be found at: `rdm-integration GitHub repository `_
+Globus
+++++++
+
+Globus transfer uses an efficient transfer mechanism and has additional features that make it suitable for large files and large numbers of files:
+
+* robust file transfer capable of restarting after network or endpoint failures
+* third-party transfer, which enables a user accessing a Dataverse installation in their desktop browser to initiate transfer of their files from a remote endpoint (i.e. on a local high-performance computing cluster), directly to an S3 store managed by the Dataverse installation
+
+Users can transfer files via `Globus `_ into and out of datasets, or reference files on a remote Globus endpoint, when their Dataverse installation is configured to use a Globus accessible store(s)
+and a community-developed `dataverse-globus `_ app has been properly installed and configured.
+
+
Embedding Data on Websites
--------------------------
@@ -185,6 +197,16 @@ Avgidea Data Search
Researchers can use a Google Sheets add-on to search for Dataverse installation's CSV data and then import that data into a sheet. See `Avgidea Data Search `_ for details.
+JupyterHub
+++++++++++
+
+The `Dataverse-to-JupyterHub Data Transfer Connector `_ streamlines data transfer between Dataverse repositories and the cloud-based platform JupyterHub, enhancing collaborative research.
+This connector facilitates seamless two-way transfer of datasets and files, emphasizing the potential of an integrated research environment.
+It is a lightweight client-side web application built using React and relying on the Dataverse External Tool feature, allowing for easy deployment on modern integration systems. Currently, it supports small to medium-sized files, with plans to enable support for large files and signed Dataverse endpoints in the future.
+
+What kind of user is the feature intended for?
+The feature is intended for researchers, scientists and data analyst who are working with Dataverse instances and JupyterHub looking to ease the data transfer process. See `presentation `_ for details.
+
.. _integrations-discovery:
Discoverability
@@ -202,6 +224,11 @@ Geodisy
`Geodisy `_ will take your Dataverse installation’s data, search for geospatial metadata and files, and copy them to a new system that allows for visual searching. Your original data and search methods are untouched; you have the benefit of both. For more information, please refer to `Geodisy's GitHub Repository. `_
+DataONE
++++++++
+
+`DataONE `_ is a community driven program providing access to data across multiple `member repositories `_, supporting enhanced search and discovery of Earth and environmental data. Membership is free and is most easily achieved by providing schema.org data via `science-on-schema.org `_ metadata markup on dataset landing pages, support for which is native in Dataverse. Dataverse installations are welcome `join the network `_ to have their datasets included.
+
Research Data Preservation
--------------------------
@@ -217,7 +244,14 @@ Sponsored by the `Ontario Council of University Libraries (OCUL) `_ zipped `BagIt `_ bags to the `Chronopolis `_ via `DuraCloud `_, to a local file system, or to `Google Cloud Storage `_.
+A Dataverse installation can be configured to submit a copy of published Dataset versions, packaged as `Research Data Alliance conformant `_ zipped `BagIt `_ bags to `Chronopolis `_ via `DuraCloud `_, a local file system, any S3 store, or to `Google Cloud Storage `_.
+Submission can be automated to occur upon publication, or can be done periodically (via external scripting).
+The archival status of each Dataset version can be seen in the Dataset page version table and queried via API.
+
+The archival Bags include all of the files and metadata in a given dataset version and are sufficient to recreate the dataset, e.g. in a new Dataverse instance, or potentially in another RDA-conformant repository.
+Specifically, the archival Bags include an OAI-ORE Map serialized as JSON-LD that describe the dataset and it's files, as well as information about the version of Dataverse used to export the archival Bag.
+
+The `DVUploader `_ includes functionality to recreate a Dataset from an archival Bag produced by Dataverse (using the Dataverse API to do so).
For details on how to configure this integration, see :ref:`BagIt Export` in the :doc:`/installation/config` section of the Installation Guide.
@@ -226,7 +260,7 @@ Future Integrations
The `Dataverse Project Roadmap `_ is a good place to see integrations that the core Dataverse Project team is working on.
-The `Community Dev `_ column of our project board is a good way to track integrations that are being worked on by the Dataverse Community but many are not listed and if you have an idea for an integration, please ask on the `dataverse-community `_ mailing list if someone is already working on it.
+If you have an idea for an integration, please ask on the `dataverse-community `_ mailing list if someone is already working on it.
Many integrations take the form of "external tools". See the :doc:`external-tools` section for details. External tool makers should check out the :doc:`/api/external-tools` section of the API Guide.
diff --git a/doc/sphinx-guides/source/admin/metadatacustomization.rst b/doc/sphinx-guides/source/admin/metadatacustomization.rst
index 4f737bd730b..66911aa0ad1 100644
--- a/doc/sphinx-guides/source/admin/metadatacustomization.rst
+++ b/doc/sphinx-guides/source/admin/metadatacustomization.rst
@@ -37,8 +37,8 @@ tab-separated value (TSV). [1]_\ :sup:`,`\ [2]_ While it is technically
possible to define more than one metadata block in a TSV file, it is
good organizational practice to define only one in each file.
-The metadata block TSVs shipped with the Dataverse Software are in `/tree/develop/scripts/api/data/metadatablocks
-`__ and the corresponding ResourceBundle property files `/tree/develop/src/main/java `__ of the Dataverse Software GitHub repo. Human-readable copies are available in `this Google Sheets
+The metadata block TSVs shipped with the Dataverse Software are in `/scripts/api/data/metadatablocks
+`__ with the corresponding ResourceBundle property files in `/src/main/java/propertyFiles `__ of the Dataverse Software GitHub repo. Human-readable copies are available in `this Google Sheets
document `__ but they tend to get out of sync with the TSV files, which should be considered authoritative. The Dataverse Software installation process operates on the TSVs, not the Google spreadsheet.
About the metadata block TSV
@@ -413,7 +413,7 @@ Setting Up a Dev Environment for Testing
You have several options for setting up a dev environment for testing metadata block changes:
-- Docker: See :doc:`/container/index`.
+- Docker: See :doc:`/container/running/metadata-blocks` in the Container Guide.
- AWS deployment: See the :doc:`/developers/deployment` section of the Developer Guide.
- Full dev environment: See the :doc:`/developers/dev-environment` section of the Developer Guide.
@@ -648,6 +648,28 @@ Alternatively, you are welcome to request "edit" access to this "Tips for Datave
The thinking is that the tips can become issues and the issues can eventually be worked on as features to improve the Dataverse Software metadata system.
+Development Tasks Specific to Changing Fields in Core Metadata Blocks
+---------------------------------------------------------------------
+
+When it comes to the fields from the core blocks that are distributed with Dataverse (such as Citation, Social Science and Geospatial blocks), code dependencies may exist in Dataverse, primarily in the Import and Export subsystems, on these fields being configured a certain way. So, if it becomes necessary to modify one of such core fields, code changes may be necessary to accompany the change in the block tsv, plus some sample and test files maintained in the Dataverse source tree will need to be adjusted accordingly.
+
+Making a Field Multi-Valued
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+As a recent real life example, a few fields from the Citation and Social Science block were changed to support multiple values, in order to accommodate specific needs of some community member institutions. A PR for one of these fields, ``alternativeTitle`` from the Citation block is linked below. Each time a number of code changes, plus some changes in the sample metadata files in the Dataverse code tree had to be made. The checklist below is to help another developer in the event that a similar change becomes necessary in the future. Note that some of the steps below may not apply 1:1 to a different metadata field, depending on how it is exported and imported in various formats by Dataverse. It may help to consult the PR `#9440 `_ as a specific example of the changes that had to be made for the ``alternativeTitle`` field.
+
+- Change the value from ``FALSE`` to ``TRUE`` in the ``allowmultiples`` column of the .tsv file for the block.
+- Change the value of the ``multiValued`` attribute for the search field in the Solr schema (``conf/solr/x.x.x/schema.xml``).
+- Modify the DDI import code (``ImportDDIServiceBean.java``) to support multiple values. (You may be able to use the change in the PR above as a model.)
+- Modify the DDI export utility (``DdiExportUtil.java``).
+- Modify the OpenAire export utility (``OpenAireExportUtil.java``).
+- Modify the following JSON source files in the Dataverse code tree to actually include multiple values for the field (two should be quite enough!): ``scripts/api/data/dataset-create-new-all-default-fields.json``, ``src/test/java/edu/harvard/iq/dataverse/export/dataset-all-defaults.txt``, ``src/test/java/edu/harvard/iq/dataverse/export/ddi/dataset-finch1.json`` and ``src/test/java/edu/harvard/iq/dataverse/export/ddi/dataset-create-new-all-ddi-fields.json``. (These are used as examples for populating datasets via the import API and by the automated import and export code tests).
+- Similarly modify the following XML files that are used by the DDI export code tests: ``src/test/java/edu/harvard/iq/dataverse/export/ddi/dataset-finch1.xml`` and ``src/test/java/edu/harvard/iq/dataverse/export/ddi/exportfull.xml``.
+- Make sure all the automated unit and integration tests are passing. See :doc:`/developers/testing` in the Developer Guide.
+- Write a short release note to announce the change in the upcoming release. See :ref:`writing-release-note-snippets` in the Developer Guide.
+- Make a pull request.
+
+
Footnotes
---------
diff --git a/doc/sphinx-guides/source/admin/monitoring.rst b/doc/sphinx-guides/source/admin/monitoring.rst
index a4affda1302..04fba23a3e8 100644
--- a/doc/sphinx-guides/source/admin/monitoring.rst
+++ b/doc/sphinx-guides/source/admin/monitoring.rst
@@ -1,7 +1,7 @@
Monitoring
===========
-Once you're in production, you'll want to set up some monitoring. This page may serve as a starting point for you but you are encouraged to share your ideas with the Dataverse community!
+Once you're in production, you'll want to set up some monitoring. This page may serve as a starting point for you but you are encouraged to share your ideas with the Dataverse community! You may also be interested in the :doc:`/developers/performance` section of the Developer Guide.
.. contents:: Contents:
:local:
@@ -14,7 +14,7 @@ In production you'll want to monitor the usual suspects such as CPU, memory, fre
Munin
+++++
-http://munin-monitoring.org says, "A default installation provides a lot of graphs with almost no work." From RHEL or CentOS 7, you can try the following steps.
+https://munin-monitoring.org says, "A default installation provides a lot of graphs with almost no work." From RHEL or CentOS 7, you can try the following steps.
Enable the EPEL yum repo (if you haven't already):
diff --git a/doc/sphinx-guides/source/admin/solr-search-index.rst b/doc/sphinx-guides/source/admin/solr-search-index.rst
index e6f7b588ede..3f7b9d5b547 100644
--- a/doc/sphinx-guides/source/admin/solr-search-index.rst
+++ b/doc/sphinx-guides/source/admin/solr-search-index.rst
@@ -26,8 +26,8 @@ Remove all Solr documents that are orphaned (i.e. not associated with objects in
``curl http://localhost:8080/api/admin/index/clear-orphans``
-Clearing Data from Solr
-~~~~~~~~~~~~~~~~~~~~~~~
+Clearing ALL Data from Solr
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
Please note that the moment you issue this command, it will appear to end users looking at the root Dataverse installation page that all data is gone! This is because the root Dataverse installation page is powered by the search index.
@@ -86,6 +86,16 @@ To re-index a dataset by its database ID:
``curl http://localhost:8080/api/admin/index/datasets/7504557``
+Clearing a Dataset from Solr
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This API will clear the Solr entry for the dataset specified. It can be useful if you have reasons to want to hide a published dataset from showing in search results and/or on Collection pages, but don't want to destroy and purge it from the database just yet.
+
+``curl -X DELETE http://localhost:8080/api/admin/index/datasets/``
+
+This can be reversed of course by re-indexing the dataset with the API above.
+
+
Manually Querying Solr
----------------------
diff --git a/doc/sphinx-guides/source/api/apps.rst b/doc/sphinx-guides/source/api/apps.rst
index a498c62d3d4..44db666736c 100755
--- a/doc/sphinx-guides/source/api/apps.rst
+++ b/doc/sphinx-guides/source/api/apps.rst
@@ -94,6 +94,13 @@ This series of Python scripts offers a starting point for migrating datasets fro
https://github.com/scholarsportal/dataverse-migration-scripts
+idsc.dataverse
+~~~~~~~~~~~~~~
+
+This module can, among others, help you migrate one dataverse to another. (see `migrate.md `_)
+
+https://github.com/iza-institute-of-labor-economics/idsc.dataverse
+
Java
----
diff --git a/doc/sphinx-guides/source/api/auth.rst b/doc/sphinx-guides/source/api/auth.rst
index bbc81b595e3..eae3bd3c969 100644
--- a/doc/sphinx-guides/source/api/auth.rst
+++ b/doc/sphinx-guides/source/api/auth.rst
@@ -77,6 +77,11 @@ To test if bearer tokens are working, you can try something like the following (
.. code-block:: bash
- export TOKEN=`curl -s -X POST --location "http://keycloak.mydomain.com:8090/realms/oidc-realm/protocol/openid-connect/token" -H "Content-Type: application/x-www-form-urlencoded" -d "username=kcuser&password=kcpassword&grant_type=password&client_id=oidc-client&client_secret=ss6gE8mODCDfqesQaSG3gwUwZqZt547E" | jq '.access_token' -r | tr -d "\n"`
+ export TOKEN=`curl -s -X POST --location "http://keycloak.mydomain.com:8090/realms/test/protocol/openid-connect/token" -H "Content-Type: application/x-www-form-urlencoded" -d "username=user&password=user&grant_type=password&client_id=test&client_secret=94XHrfNRwXsjqTqApRrwWmhDLDHpIYV8" | jq '.access_token' -r | tr -d "\n"`
curl -H "Authorization: Bearer $TOKEN" http://localhost:8080/api/users/:me
+
+Signed URLs
+-----------
+
+See :ref:`signed-urls`.
diff --git a/doc/sphinx-guides/source/api/changelog.rst b/doc/sphinx-guides/source/api/changelog.rst
new file mode 100644
index 00000000000..025a2069d6e
--- /dev/null
+++ b/doc/sphinx-guides/source/api/changelog.rst
@@ -0,0 +1,27 @@
+API Changelog (Breaking Changes)
+================================
+
+This API changelog is experimental and we would love feedback on its usefulness. Its primary purpose is to inform API developers of any breaking changes. (We try not ship any backward incompatible changes, but it happens.) To see a list of new APIs and backward-compatible changes to existing API, please see each version's release notes at https://github.com/IQSS/dataverse/releases
+
+.. contents:: |toctitle|
+ :local:
+ :depth: 1
+
+v6.2
+----
+
+- The fields "northLongitude" and "southLongitude" have been deprecated in favor of "northLatitude" and "southLatitude" in the Geolocation metadata block. After upgrading to 6.2 or later, you will need to use the new fields when creating or updating a dataset.
+
+- **/api/datasets/{id}/versions/{versionId}**: The includeFiles parameter has been renamed to excludeFiles. The default behavior remains the same, which is to include files. However, when excludeFiles is set to true, the files will be excluded. A bug that caused the API to only return a deaccessioned dataset if the user had edit privileges has been fixed.
+- **/api/datasets/{id}/versions**: The includeFiles parameter has been renamed to excludeFiles. The default behavior remains the same, which is to include files. However, when excludeFiles is set to true, the files will be excluded.
+- **/api/files/$ID/uningest**: Can now be used by users with the ability to publish the dataset to undo a failed ingest. (Removing a successful ingest still requires being superuser)
+
+v6.1
+----
+
+- The metadata field "Alternative Title" now supports multiple values so you must pass an array rather than a string when populating that field via API. See https://github.com/IQSS/dataverse/pull/9440
+
+v6.0
+----
+
+- **/api/access/datafile**: When a null or invalid API token is provided to download a public (non-restricted) file with this API call, it will result on a ``401`` error response. Previously, the download was allowed (``200`` response). Please note that we noticed this change sometime between 5.9 and 6.0. If you can help us pinpoint the exact version (or commit!), please get in touch. See :doc:`dataaccess`.
\ No newline at end of file
diff --git a/doc/sphinx-guides/source/api/client-libraries.rst b/doc/sphinx-guides/source/api/client-libraries.rst
index 62069f62c23..bd0aa55ba99 100755
--- a/doc/sphinx-guides/source/api/client-libraries.rst
+++ b/doc/sphinx-guides/source/api/client-libraries.rst
@@ -24,7 +24,7 @@ Java
https://github.com/IQSS/dataverse-client-java is the official Java library for Dataverse APIs.
-`Richard Adams `_ from `ResearchSpace `_ created and maintains this library.
+`Richard Adams `_ from `ResearchSpace `_ created and maintains this library.
Javascript
----------
@@ -52,20 +52,25 @@ There are multiple Python modules for interacting with Dataverse APIs.
`EasyDataverse `_ is a Python library designed to simplify the management of Dataverse datasets in an object-oriented way, giving users the ability to upload, download, and update datasets with ease. By utilizing metadata block configurations, EasyDataverse automatically generates Python objects that contain all the necessary details required to create the native Dataverse JSON format used to create or edit datasets. Adding files and directories is also possible with EasyDataverse and requires no additional API calls. This library is particularly well-suited for client applications such as workflows and scripts as it minimizes technical complexities and facilitates swift development.
-`pyDataverse `_ primarily allows developers to manage Dataverse collections, datasets and datafiles. Its intention is to help with data migrations and DevOps activities such as testing and configuration management. The module is developed by `Stefan Kasberger `_ from `AUSSDA - The Austrian Social Science Data Archive `_.
+`python-dvuploader `_ implements Jim Myers' excellent `dv-uploader `_ as a Python module. It offers parallel direct uploads to Dataverse backend storage, streams files directly instead of buffering them in memory, and supports multi-part uploads, chunking data accordingly.
+
+`pyDataverse `_ primarily allows developers to manage Dataverse collections, datasets and datafiles. Its intention is to help with data migrations and DevOps activities such as testing and configuration management. The module is developed by `Stefan Kasberger `_ from `AUSSDA - The Austrian Social Science Data Archive `_.
+
+`UBC's Dataverse Utilities `_ are a set of Python console utilities which allow one to upload datasets from a tab-separated-value spreadsheet, bulk release multiple datasets, bulk delete unpublished datasets, quickly duplicate records. replace licenses, and more. For additional information see their `PyPi page `_.
`dataverse-client-python `_ had its initial release in 2015. `Robert Liebowitz `_ created this library while at the `Center for Open Science (COS) `_ and the COS uses it to integrate the `Open Science Framework (OSF) `_ with Dataverse installations via an add-on which itself is open source and listed on the :doc:`/api/apps` page.
`Pooch `_ is a Python library that allows library and application developers to download data. Among other features, it takes care of various protocols, caching in OS-specific locations, checksum verification and adds optional features like progress bars or log messages. Among other popular repositories, Pooch supports Dataverse in the sense that you can reference Dataverse-hosted datasets by just a DOI and Pooch will determine the data repository type, query the Dataverse API for contained files and checksums, giving you an easy interface to download them.
+`idsc.dataverse `_ reads metadata and files of datasets from a dataverse dataverse.example1.com and writes them into ~/.idsc/dataverse/api/dataverse.example1.com organized in directories PID_type/prefix/suffix, where PID_type is one of: hdl, doi or ark. It can then ''export'' the local copy of the dataverse from ~/.idsc/dataverse/api/dataverse.example1.com to ~/.idsc/.cache/dataverse.example2.com so that one can upload them to dataverse.example2.com.
+
R
-
https://github.com/IQSS/dataverse-client-r is the official R package for Dataverse APIs. The latest release can be installed from `CRAN `_.
The R client can search and download datasets. It is useful when automatically (instead of manually) downloading data files as part of a script. For bulk edit and upload operations, we currently recommend pyDataverse.
-The package is currently maintained by `Shiro Kuriwaki `_. It was originally created by `Thomas Leeper `_ and then formerly maintained by `Will Beasley `_.
-
+The package is currently maintained by `Shiro Kuriwaki `_. It was originally created by `Thomas Leeper `_ and then formerly maintained by `Will Beasley `_.
Ruby
----
diff --git a/doc/sphinx-guides/source/api/dataaccess.rst b/doc/sphinx-guides/source/api/dataaccess.rst
index e76ea167587..f7aaa8f4ee4 100755
--- a/doc/sphinx-guides/source/api/dataaccess.rst
+++ b/doc/sphinx-guides/source/api/dataaccess.rst
@@ -83,7 +83,7 @@ Basic access URI:
``/api/access/datafile/$id``
-.. note:: Files can be accessed using persistent identifiers. This is done by passing the constant ``:persistentId`` where the numeric id of the file is expected, and then passing the actual persistent id as a query parameter with the name ``persistentId``.
+.. note:: Files can be accessed using persistent identifiers. This is done by passing the constant ``:persistentId`` where the numeric id of the file is expected, and then passing the actual persistent id as a query parameter with the name ``persistentId``. However, this file access method is only effective when the FilePIDsEnabled option is enabled, which can be authorized by the admin. For further information, refer to :ref:`:FilePIDsEnabled`.
Example: Getting the file whose DOI is *10.5072/FK2/J8SJZB* ::
@@ -403,3 +403,32 @@ This method returns a list of Authenticated Users who have requested access to t
A curl example using an ``id``::
curl -H "X-Dataverse-key:$API_TOKEN" -X GET http://$SERVER/api/access/datafile/{id}/listRequests
+
+User Has Requested Access to a File:
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``/api/access/datafile/{id}/userFileAccessRequested``
+
+This method returns true or false depending on whether or not the calling user has requested access to a particular file.
+
+A curl example using an ``id``::
+
+ curl -H "X-Dataverse-key:$API_TOKEN" -X GET "http://$SERVER/api/access/datafile/{id}/userFileAccessRequested"
+
+
+Get User Permissions on a File:
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``/api/access/datafile/{id}/userPermissions``
+
+This method returns the permissions that the calling user has on a particular file.
+
+In particular, the user permissions that this method checks, returned as booleans, are the following:
+
+* Can download the file
+* Can manage the file permissions
+* Can edit the file owner dataset
+
+A curl example using an ``id``::
+
+ curl -H "X-Dataverse-key:$API_TOKEN" -X GET "http://$SERVER/api/access/datafile/{id}/userPermissions"
diff --git a/doc/sphinx-guides/source/api/external-tools.rst b/doc/sphinx-guides/source/api/external-tools.rst
index 05affaf975e..ae0e44b36aa 100644
--- a/doc/sphinx-guides/source/api/external-tools.rst
+++ b/doc/sphinx-guides/source/api/external-tools.rst
@@ -11,7 +11,7 @@ Introduction
External tools are additional applications the user can access or open from your Dataverse installation to preview, explore, and manipulate data files and datasets. The term "external" is used to indicate that the tool is not part of the main Dataverse Software.
-Once you have created the external tool itself (which is most of the work!), you need to teach a Dataverse installation how to construct URLs that your tool needs to operate. For example, if you've deployed your tool to fabulousfiletool.com your tool might want the ID of a file and the siteUrl of the Dataverse installation like this: https://fabulousfiletool.com?fileId=42&siteUrl=http://demo.dataverse.org
+Once you have created the external tool itself (which is most of the work!), you need to teach a Dataverse installation how to construct URLs that your tool needs to operate. For example, if you've deployed your tool to fabulousfiletool.com your tool might want the ID of a file and the siteUrl of the Dataverse installation like this: https://fabulousfiletool.com?fileId=42&siteUrl=https://demo.dataverse.org
In short, you will be creating a manifest in JSON format that describes not only how to construct URLs for your tool, but also what types of files your tool operates on, where it should appear in the Dataverse installation web interfaces, etc.
@@ -40,7 +40,7 @@ How External Tools Are Presented to Users
An external tool can appear in your Dataverse installation in a variety of ways:
- as an explore, preview, query or configure option for a file
-- as an explore option for a dataset
+- as an explore or configure option for a dataset
- as an embedded preview on the file landing page
See also the :ref:`testing-external-tools` section of the Admin Guide for some perspective on how Dataverse installations will expect to test your tool before announcing it to their users.
@@ -88,11 +88,11 @@ Terminology
displayName The **name** of the tool in the Dataverse installation web interface. For example, "Data Explorer".
- description The **description** of the tool, which appears in a popup (for configure tools only) so the user who clicked the tool can learn about the tool before being redirected the tool in a new tab in their browser. HTML is supported.
+ description The **description** of the tool, which appears in a popup (for configure tools only) so the user who clicked the tool can learn about the tool before being redirected to the tool in a new tab in their browser. HTML is supported.
scope Whether the external tool appears and operates at the **file** level or the **dataset** level. Note that a file level tool much also specify the type of file it operates on (see "contentType" below).
- types Whether the external tool is an **explore** tool, a **preview** tool, a **query** tool, a **configure** tool or any combination of these (multiple types are supported for a single tool). Configure tools require an API token because they make changes to data files (files within datasets). Configure tools are currently not supported at the dataset level. The older "type" keyword that allows you to pass a single type as a string is deprecated but still supported.
+ types Whether the external tool is an **explore** tool, a **preview** tool, a **query** tool, a **configure** tool or any combination of these (multiple types are supported for a single tool). Configure tools require an API token because they make changes to data files (files within datasets). The older "type" keyword that allows you to pass a single type as a string is deprecated but still supported.
toolUrl The **base URL** of the tool before query parameters are added.
@@ -102,7 +102,7 @@ Terminology
httpMethod Either ``GET`` or ``POST``.
- queryParameters **Key/value combinations** that can be appended to the toolUrl. For example, once substitution takes place (described below) the user may be redirected to ``https://fabulousfiletool.com?fileId=42&siteUrl=http://demo.dataverse.org``.
+ queryParameters **Key/value combinations** that can be appended to the toolUrl. For example, once substitution takes place (described below) the user may be redirected to ``https://fabulousfiletool.com?fileId=42&siteUrl=https://demo.dataverse.org``.
query parameter keys An **arbitrary string** to associate with a value that is populated with a reserved word (described below). As the author of the tool, you have control over what "key" you would like to be passed to your tool. For example, if you want to have your tool receive and operate on the query parameter "dataverseFileId=42" instead of just "fileId=42", that's fine.
@@ -160,17 +160,25 @@ Authorization Options
When called for datasets or data files that are not public (i.e. in a draft dataset or for a restricted file), external tools are allowed access via the user's credentials. This is accomplished by one of two mechanisms:
-* Signed URLs (more secure, recommended)
+.. _signed-urls:
- - Configured via the ``allowedApiCalls`` section of the manifest. The tool will be provided with signed URLs allowing the specified access to the given dataset or datafile for the specified amount of time. The tool will not be able to access any other datasets or files the user may have access to and will not be able to make calls other than those specified.
- - For tools invoked via a GET call, Dataverse will include a callback query parameter with a Base64 encoded value. The decoded value is a signed URL that can be called to retrieve a JSON response containing all of the queryParameters and allowedApiCalls specified in the manfiest.
- - For tools invoked via POST, Dataverse will send a JSON body including the requested queryParameters and allowedApiCalls. Dataverse expects the response to the POST to indicate a redirect which Dataverse will use to open the tool.
+Signed URLs
+^^^^^^^^^^^
-* API Token (deprecated, less secure, not recommended)
+The signed URL mechanism is more secure than exposing API tokens and therefore recommended.
- - Configured via the ``queryParameters`` by including an ``{apiToken}`` value. When this is present Dataverse will send the user's apiToken to the tool. With the user's API token, the tool can perform any action via the Dataverse API that the user could. External tools configured via this method should be assessed for their trustworthiness.
- - For tools invoked via GET, this will be done via a query parameter in the request URL which could be cached in the browser's history. Dataverse expects the response to the POST to indicate a redirect which Dataverse will use to open the tool.
- - For tools invoked via POST, Dataverse will send a JSON body including the apiToken.
+- Configured via the ``allowedApiCalls`` section of the manifest. The tool will be provided with signed URLs allowing the specified access to the given dataset or datafile for the specified amount of time. The tool will not be able to access any other datasets or files the user may have access to and will not be able to make calls other than those specified.
+- For tools invoked via a GET call, Dataverse will include a callback query parameter with a Base64 encoded value. The decoded value is a signed URL that can be called to retrieve a JSON response containing all of the queryParameters and allowedApiCalls specified in the manfiest.
+- For tools invoked via POST, Dataverse will send a JSON body including the requested queryParameters and allowedApiCalls. Dataverse expects the response to the POST to indicate a redirect which Dataverse will use to open the tool.
+
+API Token
+^^^^^^^^^
+
+The API token mechanism is deprecated. Because it is less secure than signed URLs, it is not recommended for new external tools.
+
+- Configured via the ``queryParameters`` by including an ``{apiToken}`` value. When this is present Dataverse will send the user's apiToken to the tool. With the user's API token, the tool can perform any action via the Dataverse API that the user could. External tools configured via this method should be assessed for their trustworthiness.
+- For tools invoked via GET, this will be done via a query parameter in the request URL which could be cached in the browser's history. Dataverse expects the response to the POST to indicate a redirect which Dataverse will use to open the tool.
+- For tools invoked via POST, Dataverse will send a JSON body including the apiToken.
Internationalization of Your External Tool
++++++++++++++++++++++++++++++++++++++++++
@@ -187,6 +195,7 @@ Using Example Manifests to Get Started
++++++++++++++++++++++++++++++++++++++
Again, you can use :download:`fabulousFileTool.json <../_static/installation/files/root/external-tools/fabulousFileTool.json>` or :download:`dynamicDatasetTool.json <../_static/installation/files/root/external-tools/dynamicDatasetTool.json>` as a starting point for your own manifest file.
+Additional working examples, including ones using :ref:`signed-urls`, are available at https://github.com/gdcc/dataverse-previewers .
Testing Your External Tool
--------------------------
diff --git a/doc/sphinx-guides/source/api/getting-started.rst b/doc/sphinx-guides/source/api/getting-started.rst
index a6f6c259a25..c12fb01a269 100644
--- a/doc/sphinx-guides/source/api/getting-started.rst
+++ b/doc/sphinx-guides/source/api/getting-started.rst
@@ -9,7 +9,7 @@ If you are a researcher or curator who wants to automate parts of your workflow,
Servers You Can Test With
-------------------------
-Rather than using a production Dataverse installation, API users are welcome to use http://demo.dataverse.org for testing. You can email support@dataverse.org if you have any trouble with this server.
+Rather than using a production Dataverse installation, API users are welcome to use https://demo.dataverse.org for testing. You can email support@dataverse.org if you have any trouble with this server.
If you would rather have full control over your own test server, deployments to AWS, Docker, and more are covered in the :doc:`/developers/index` and the :doc:`/installation/index`.
@@ -86,7 +86,7 @@ See :ref:`create-dataset-command`.
Uploading Files
~~~~~~~~~~~~~~~
-See :ref:`add-file-api`.
+See :ref:`add-file-api`. In addition, when a Dataverse installation is configured to use S3 storage with direct upload enabled, there is API support to send a file directly to S3. This facilitates an efficient method to upload big files, but is more complex. The procedure is described in the :doc:`/developers/s3-direct-upload-api` section of the Developer Guide.
Publishing a Dataverse Collection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/doc/sphinx-guides/source/api/index.rst b/doc/sphinx-guides/source/api/index.rst
index c9e79098546..dd195aa9d62 100755
--- a/doc/sphinx-guides/source/api/index.rst
+++ b/doc/sphinx-guides/source/api/index.rst
@@ -24,3 +24,4 @@ API Guide
linkeddatanotification
apps
faq
+ changelog
\ No newline at end of file
diff --git a/doc/sphinx-guides/source/api/intro.rst b/doc/sphinx-guides/source/api/intro.rst
index 933932cd7b9..8eb11798dd7 100755
--- a/doc/sphinx-guides/source/api/intro.rst
+++ b/doc/sphinx-guides/source/api/intro.rst
@@ -187,6 +187,10 @@ Lists of Dataverse APIs
- Files
- etc.
+- :doc:`/developers/dataset-semantic-metadata-api`: For creating, reading, editing, and deleting dataset metadata using JSON-LD.
+- :doc:`/developers/dataset-migration-api`: For migrating datasets from other repositories while retaining the original persistent identifiers and publication date.
+- :doc:`/developers/s3-direct-upload-api`: For the transfer of larger files/larger numbers of files directly to an S3 bucket managed by Dataverse.
+- :doc:`/developers/globus-api`: For the Globus transfer of larger files/larger numbers of files directly via Globus endpoints managed by Dataverse or referencing files in remote endpoints.
- :doc:`metrics`: For query statistics about usage of a Dataverse installation.
- :doc:`sword`: For depositing data using a standards-based approach rather than the :doc:`native-api`.
@@ -237,7 +241,7 @@ Dataverse Software API questions are on topic in all the usual places:
- The dataverse-community Google Group: https://groups.google.com/forum/#!forum/dataverse-community
- The Dataverse Project community calls: https://dataverse.org/community-calls
-- The Dataverse Project chat room: http://chat.dataverse.org
+- The Dataverse Project chat room: https://chat.dataverse.org
- The Dataverse Project ticketing system: support@dataverse.org
After your question has been answered, you are welcome to help improve the :doc:`faq` section of this guide.
diff --git a/doc/sphinx-guides/source/api/metrics.rst b/doc/sphinx-guides/source/api/metrics.rst
index 28ac33ea228..14402096650 100755
--- a/doc/sphinx-guides/source/api/metrics.rst
+++ b/doc/sphinx-guides/source/api/metrics.rst
@@ -1,7 +1,7 @@
Metrics API
===========
-The Metrics API provides counts of downloads, datasets created, files uploaded, and more, as described below. The Dataverse Software also includes aggregate counts of Make Data Count metrics (described in the :doc:`/admin/make-data-count` section of the Admin Guide and available per-Dataset through the :doc:`/api/native-api`). A table of all the endpoints is listed below.
+The Metrics API provides counts of downloads, datasets created, files uploaded, user accounts created, and more, as described below. The Dataverse Software also includes aggregate counts of Make Data Count metrics (described in the :doc:`/admin/make-data-count` section of the Admin Guide and available per-Dataset through the :doc:`/api/native-api`). A table of all the endpoints is listed below.
.. contents:: |toctitle|
:local:
@@ -21,7 +21,7 @@ The Metrics API includes several categories of endpoints that provide different
* Form: GET https://$SERVER/api/info/metrics/$type
- * where ``$type`` can be set, for example, to ``dataverses`` (Dataverse collections), ``datasets``, ``files`` or ``downloads``.
+ * where ``$type`` can be set, for example, to ``dataverses`` (Dataverse collections), ``datasets``, ``files``, ``downloads`` or ``accounts``.
* Example: ``curl https://demo.dataverse.org/api/info/metrics/downloads``
@@ -31,7 +31,7 @@ The Metrics API includes several categories of endpoints that provide different
* Form: GET https://$SERVER/api/info/metrics/$type/toMonth/$YYYY-DD
- * where ``$type`` can be set, for example, to ``dataverses`` (Dataverse collections), ``datasets``, ``files`` or ``downloads``.
+ * where ``$type`` can be set, for example, to ``dataverses`` (Dataverse collections), ``datasets``, ``files``, ``downloads`` or ``accounts``.
* Example: ``curl https://demo.dataverse.org/api/info/metrics/dataverses/toMonth/2018-01``
@@ -41,7 +41,7 @@ The Metrics API includes several categories of endpoints that provide different
* Form: GET https://$SERVER/api/info/metrics/$type/pastDays/$days
- * where ``$type`` can be set, for example, to ``dataverses`` (Dataverse collections), ``datasets``, ``files`` or ``downloads``.
+ * where ``$type`` can be set, for example, to ``dataverses`` (Dataverse collections), ``datasets``, ``files``, ``downloads`` or ``accounts``.
* Example: ``curl https://demo.dataverse.org/api/info/metrics/datasets/pastDays/30``
@@ -51,7 +51,7 @@ The Metrics API includes several categories of endpoints that provide different
* Form: GET https://$SERVER/api/info/metrics/$type/monthly
- * where ``$type`` can be set, for example, to ``dataverses`` (Dataverse collections), ``datasets``, ``files`` or ``downloads``.
+ * where ``$type`` can be set, for example, to ``dataverses`` (Dataverse collections), ``datasets``, ``files``, ``downloads`` or ``accounts``.
* Example: ``curl https://demo.dataverse.org/api/info/metrics/downloads/monthly``
@@ -163,3 +163,14 @@ The following table lists the available metrics endpoints (not including the Mak
/api/info/metrics/uniquefiledownloads/toMonth/{yyyy-MM},"count by id, pid","json, csv",collection subtree,published,y,cumulative up to month specified,unique download counts per file id to the specified month. PIDs are also included in output if they exist
/api/info/metrics/tree,"id, ownerId, alias, depth, name, children",json,collection subtree,published,y,"tree of dataverses starting at the root or a specified parentAlias with their id, owner id, alias, name, a computed depth, and array of children dataverses","underlying code can also include draft dataverses, this is not currently accessible via api, depth starts at 0"
/api/info/metrics/tree/toMonth/{yyyy-MM},"id, ownerId, alias, depth, name, children",json,collection subtree,published,y,"tree of dataverses in existence as of specified date starting at the root or a specified parentAlias with their id, owner id, alias, name, a computed depth, and array of children dataverses","underlying code can also include draft dataverses, this is not currently accessible via api, depth starts at 0"
+ /api/info/metrics/accounts,count,json,Dataverse installation,all,y,as of now/totals,
+ /api/info/metrics/accounts/toMonth/{yyyy-MM},count,json,Dataverse installation,all,y,cumulative up to month specified,
+ /api/info/metrics/accounts/pastDays/{n},count,json,Dataverse installation,all,y,aggregate count for past n days,
+ /api/info/metrics/accounts/monthly,"date, count","json, csv",Dataverse installation,all,y,monthly cumulative timeseries from first date of first entry to now,
+
+Related API Endpoints
+---------------------
+
+The following endpoints are not under the metrics namespace but also return counts:
+
+- :ref:`file-download-count`
diff --git a/doc/sphinx-guides/source/api/native-api.rst b/doc/sphinx-guides/source/api/native-api.rst
index 4d9466703e4..70d73ae3c98 100644
--- a/doc/sphinx-guides/source/api/native-api.rst
+++ b/doc/sphinx-guides/source/api/native-api.rst
@@ -9,7 +9,7 @@ The Dataverse Software exposes most of its GUI functionality via a REST-based AP
.. _CORS: https://www.w3.org/TR/cors/
-.. warning:: The Dataverse Software's API is versioned at the URI - all API calls may include the version number like so: ``http://server-address/api/v1/...``. Omitting the ``v1`` part would default to the latest API version (currently 1). When writing scripts/applications that will be used for a long time, make sure to specify the API version, so they don't break when the API is upgraded.
+.. warning:: The Dataverse Software's API is versioned at the URI - all API calls may include the version number like so: ``https://server-address/api/v1/...``. Omitting the ``v1`` part would default to the latest API version (currently 1). When writing scripts/applications that will be used for a long time, make sure to specify the API version, so they don't break when the API is upgraded.
.. contents:: |toctitle|
:local:
@@ -88,6 +88,14 @@ The fully expanded example above (without environment variables) looks like this
curl "https://demo.dataverse.org/api/dataverses/root"
+If you want to include the Dataverse collections that this collection is part of, you must set ``returnOwners`` query parameter to ``true``.
+
+Usage example:
+
+.. code-block:: bash
+
+ curl "https://demo.dataverse.org/api/dataverses/root?returnOwners=true"
+
To view an unpublished Dataverse collection:
.. code-block:: bash
@@ -503,8 +511,58 @@ The fully expanded example above (without environment variables) looks like this
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X PUT "https://demo.dataverse.org/api/dataverses/root/metadatablocks/isRoot"
-.. note:: Previous endpoints ``$SERVER/api/dataverses/$id/metadatablocks/:isRoot`` and ``POST http://$SERVER/api/dataverses/$id/metadatablocks/:isRoot?key=$apiKey`` are deprecated, but supported.
+.. note:: Previous endpoints ``$SERVER/api/dataverses/$id/metadatablocks/:isRoot`` and ``POST https://$SERVER/api/dataverses/$id/metadatablocks/:isRoot?key=$apiKey`` are deprecated, but supported.
+
+.. _get-dataset-json-schema:
+
+Retrieve a Dataset JSON Schema for a Collection
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Retrieves a JSON schema customized for a given collection in order to validate a dataset JSON file prior to creating the dataset. This
+first version of the schema only includes required elements and fields. In the future we plan to improve the schema by adding controlled
+vocabulary and more robust dataset field format testing:
+
+.. code-block:: bash
+
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+ export SERVER_URL=https://demo.dataverse.org
+ export ID=root
+
+ curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/dataverses/$ID/datasetSchema"
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/dataverses/root/datasetSchema"
+
+Note: you must have "Add Dataset" permission in the given collection to invoke this endpoint.
+
+While it is recommended to download a copy of the JSON Schema from the collection (as above) to account for any fields that have been marked as required, you can also download a minimal :download:`dataset-schema.json <../_static/api/dataset-schema.json>` to get a sense of the schema when no customizations have been made.
+
+.. _validate-dataset-json:
+
+Validate Dataset JSON File for a Collection
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Validates a dataset JSON file customized for a given collection prior to creating the dataset. The validation only tests for json formatting
+and the presence of required elements:
+
+.. code-block:: bash
+
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+ export SERVER_URL=https://demo.dataverse.org
+ export ID=root
+
+ curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/dataverses/$ID/validateDatasetJson" -H 'Content-type:application/json' --upload-file dataset.json
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/dataverses/root/validateDatasetJson" -H 'Content-type:application/json' --upload-file dataset.json
+Note: you must have "Add Dataset" permission in the given collection to invoke this endpoint.
.. _create-dataset-command:
@@ -525,10 +583,16 @@ Submit Incomplete Dataset
^^^^^^^^^^^^^^^^^^^^^^^^^
**Note:** This feature requires :ref:`dataverse.api.allow-incomplete-metadata` to be enabled and your Solr
-Schema to be up-to-date with the ``datasetValid`` field.
+Schema to be up-to-date with the ``datasetValid`` field. If not done yet with the version upgrade, you will
+also need to reindex all dataset after enabling the :ref:`dataverse.api.allow-incomplete-metadata` feature.
Providing a ``.../datasets?doNotValidate=true`` query parameter turns off the validation of metadata.
-In this case, only the "Author Name" is required. For example, a minimal JSON file would look like this:
+In this situation, only the "Author Name" is required, except for the case when the setting :ref:`:MetadataLanguages`
+is configured and the value of "Dataset Metadata Language" setting of a collection is left with the default
+"Chosen at Dataset Creation" value. In that case, a language that is a part of the :ref:`:MetadataLanguages` list must be
+declared in the incomplete dataset.
+
+For example, a minimal JSON file, without the language specification, would look like this:
.. code-block:: json
:name: dataset-incomplete.json
@@ -748,13 +812,53 @@ The following attributes are supported:
* ``affiliation`` Affiliation
* ``filePIDsEnabled`` ("true" or "false") Restricted to use by superusers and only when the :ref:`:AllowEnablingFilePIDsPerCollection <:AllowEnablingFilePIDsPerCollection>` setting is true. Enables or disables registration of file-level PIDs in datasets within the collection (overriding the instance-wide setting).
+.. _collection-storage-quotas:
+
+Collection Storage Quotas
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block::
+
+ curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/dataverses/$ID/storage/quota"
+
+Will output the storage quota allocated (in bytes), or a message indicating that the quota is not defined for the specific collection. The user identified by the API token must have the ``Manage`` permission on the collection.
+
+
+To set or change the storage allocation quota for a collection:
+
+.. code-block::
+
+ curl -X PUT -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/dataverses/$ID/storage/quota/$SIZE_IN_BYTES"
+
+This is API is superuser-only.
+
+
+To delete a storage quota configured for a collection:
+
+.. code-block::
+
+ curl -X DELETE -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/dataverses/$ID/storage/quota"
+
+This is API is superuser-only.
+
+Use the ``/settings`` API to enable or disable the enforcement of storage quotas that are defined across the instance via the following setting. For example,
+
+.. code-block::
+
+ curl -X PUT -d 'true' http://localhost:8080/api/admin/settings/:UseStorageQuotas
+
Datasets
--------
**Note** Creation of new datasets is done with a ``POST`` onto a Dataverse collection. See the Dataverse Collections section above.
-**Note** In all commands below, dataset versions can be referred to as:
+.. _dataset-version-specifiers:
+
+Dataset Version Specifiers
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In all commands below, dataset versions can be referred to as:
* ``:draft`` the draft version, if any
* ``:latest`` either a draft (if exists) or the latest published version.
@@ -789,7 +893,7 @@ Getting its draft version:
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/J8SJZB
- curl -H "X-Dataverse-key:$API_TOKEN" "http://$SERVER/api/datasets/:persistentId/versions/:draft?persistentId=$PERSISTENT_IDENTIFIER"
+ curl -H "X-Dataverse-key:$API_TOKEN" "https://$SERVER/api/datasets/:persistentId/versions/:draft?persistentId=$PERSISTENT_IDENTIFIER"
The fully expanded example above (without environment variables) looks like this:
@@ -814,6 +918,14 @@ The fully expanded example above (without environment variables) looks like this
The dataset id can be extracted from the response retrieved from the API which uses the persistent identifier (``/api/datasets/:persistentId/?persistentId=$PERSISTENT_IDENTIFIER``).
+If you want to include the Dataverse collections that this dataset is part of, you must set ``returnOwners`` query parameter to ``true``.
+
+Usage example:
+
+.. code-block:: bash
+
+ curl "https://demo.dataverse.org/api/datasets/24?returnOwners=true"
+
List Versions of a Dataset
~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -883,6 +995,10 @@ It returns a list of versions with their metadata, and file list:
]
}
+The optional ``includeFiles`` parameter specifies whether the files should be listed in the output. It defaults to ``true``, preserving backward compatibility. (Note that for a dataset with a large number of versions and/or files having the files included can dramatically increase the volume of the output). A separate ``/files`` API can be used for listing the files, or a subset thereof in a given version.
+
+The optional ``offset`` and ``limit`` parameters can be used to specify the range of the versions list to be shown. This can be used to paginate through the list in a dataset with a large number of versions.
+
Get Version of a Dataset
~~~~~~~~~~~~~~~~~~~~~~~~
@@ -895,13 +1011,34 @@ Get Version of a Dataset
export ID=24
export VERSION=1.0
- curl "$SERVER_URL/api/datasets/$ID/versions/$VERSION"
+ curl "$SERVER_URL/api/datasets/$ID/versions/$VERSION?includeFiles=false"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
- curl "https://demo.dataverse.org/api/datasets/24/versions/1.0"
+ curl "https://demo.dataverse.org/api/datasets/24/versions/1.0?includeFiles=false"
+
+The optional ``includeFiles`` parameter specifies whether the files should be listed in the output (defaults to ``true``). Note that a separate ``/files`` API can be used for listing the files, or a subset thereof in a given version.
+
+
+By default, deaccessioned dataset versions are not included in the search when applying the :latest or :latest-published identifiers. Additionally, when filtering by a specific version tag, you will get a "not found" error if the version is deaccessioned and you do not enable the ``includeDeaccessioned`` option described below.
+
+If you want to include deaccessioned dataset versions, you must set ``includeDeaccessioned`` query parameter to ``true``.
+
+Usage example:
+
+.. code-block:: bash
+
+ curl "https://demo.dataverse.org/api/datasets/24/versions/1.0?includeDeaccessioned=true"
+
+If you want to include the Dataverse collections that this dataset version is part of, you must set ``returnOwners`` query parameter to ``true``.
+
+Usage example:
+
+.. code-block:: bash
+
+ curl "https://demo.dataverse.org/api/datasets/24/versions/1.0?returnOwners=true"
.. _export-dataset-metadata-api:
@@ -958,6 +1095,180 @@ The fully expanded example above (without environment variables) looks like this
curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files"
+This endpoint supports optional pagination, through the ``limit`` and ``offset`` query parameters.
+
+To aid in pagination the JSON response also includes the total number of rows (totalCount) available.
+
+Usage example:
+
+.. code-block:: bash
+
+ curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files?limit=10&offset=20"
+
+Category name filtering is also optionally supported. To return files to which the requested category has been added.
+
+Usage example:
+
+.. code-block:: bash
+
+ curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files?categoryName=Data"
+
+Tabular tag name filtering is also optionally supported. To return files to which the requested tabular tag has been added.
+
+Usage example:
+
+.. code-block:: bash
+
+ curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files?tabularTagName=Survey"
+
+Content type filtering is also optionally supported. To return files matching the requested content type.
+
+Usage example:
+
+.. code-block:: bash
+
+ curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files?contentType=image/png"
+
+Filtering by search text is also optionally supported. The search will be applied to the labels and descriptions of the dataset files, to return the files that contain the text searched in one of such fields.
+
+Usage example:
+
+.. code-block:: bash
+
+ curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files?searchText=word"
+
+File access filtering is also optionally supported. In particular, by the following possible values:
+
+* ``Public``
+* ``Restricted``
+* ``EmbargoedThenRestricted``
+* ``EmbargoedThenPublic``
+
+If no filter is specified, the files will match all of the above categories.
+
+Usage example:
+
+.. code-block:: bash
+
+ curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files?accessStatus=Public"
+
+Ordering criteria for sorting the results is also optionally supported. In particular, by the following possible values:
+
+* ``NameAZ`` (Default)
+* ``NameZA``
+* ``Newest``
+* ``Oldest``
+* ``Size``
+* ``Type``
+
+Usage example:
+
+.. code-block:: bash
+
+ curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files?orderCriteria=Newest"
+
+Please note that both filtering and ordering criteria values are case sensitive and must be correctly typed for the endpoint to recognize them.
+
+By default, deaccessioned dataset versions are not included in the search when applying the :latest or :latest-published identifiers. Additionally, when filtering by a specific version tag, you will get a "not found" error if the version is deaccessioned and you do not enable the ``includeDeaccessioned`` option described below.
+
+If you want to include deaccessioned dataset versions, you must set ``includeDeaccessioned`` query parameter to ``true``.
+
+Usage example:
+
+.. code-block:: bash
+
+ curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files?includeDeaccessioned=true"
+
+.. note:: Keep in mind that you can combine all of the above query parameters depending on the results you are looking for.
+
+Get File Counts in a Dataset
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Get file counts, for the given dataset and version.
+
+The returned file counts are based on different criteria:
+
+- Total (The total file count)
+- Per content type
+- Per category name
+- Per tabular tag name
+- Per access status (Possible values: Public, Restricted, EmbargoedThenRestricted, EmbargoedThenPublic)
+
+.. code-block:: bash
+
+ export SERVER_URL=https://demo.dataverse.org
+ export ID=24
+ export VERSION=1.0
+
+ curl "$SERVER_URL/api/datasets/$ID/versions/$VERSION/files/counts"
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+ curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files/counts"
+
+Category name filtering is optionally supported. To return counts only for files to which the requested category has been added.
+
+Usage example:
+
+.. code-block:: bash
+
+ curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files/counts?categoryName=Data"
+
+Tabular tag name filtering is also optionally supported. To return counts only for files to which the requested tabular tag has been added.
+
+Usage example:
+
+.. code-block:: bash
+
+ curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files/counts?tabularTagName=Survey"
+
+Content type filtering is also optionally supported. To return counts only for files matching the requested content type.
+
+Usage example:
+
+.. code-block:: bash
+
+ curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files/counts?contentType=image/png"
+
+Filtering by search text is also optionally supported. The search will be applied to the labels and descriptions of the dataset files, to return counts only for files that contain the text searched in one of such fields.
+
+Usage example:
+
+.. code-block:: bash
+
+ curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files/counts?searchText=word"
+
+File access filtering is also optionally supported. In particular, by the following possible values:
+
+* ``Public``
+* ``Restricted``
+* ``EmbargoedThenRestricted``
+* ``EmbargoedThenPublic``
+
+If no filter is specified, the files will match all of the above categories.
+
+Usage example:
+
+.. code-block:: bash
+
+ curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files/counts?accessStatus=Public"
+
+By default, deaccessioned dataset versions are not supported by this endpoint and will be ignored in the search when applying the :latest or :latest-published identifiers. Additionally, when filtering by a specific version tag, you will get a not found error if the version is deaccessioned and you do not enable the option described below.
+
+If you want to include deaccessioned dataset versions, you must specify this through the ``includeDeaccessioned`` query parameter.
+
+Usage example:
+
+.. code-block:: bash
+
+ curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files/counts?includeDeaccessioned=true"
+
+Please note that filtering values are case sensitive and must be correctly typed for the endpoint to recognize them.
+
+Keep in mind that you can combine all of the above query parameters depending on the results you are looking for.
+
View Dataset Files and Folders as a Directory Index
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1254,11 +1565,44 @@ The fully expanded example above (without environment variables) looks like this
curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE "https://demo.dataverse.org/api/datasets/24/versions/:draft"
+Deaccession Dataset
+~~~~~~~~~~~~~~~~~~~
+
+Given a version of a dataset, updates its status to deaccessioned.
+
+The JSON body required to deaccession a dataset (``deaccession.json``) looks like this::
+
+ {
+ "deaccessionReason": "Description of the deaccession reason.",
+ "deaccessionForwardURL": "https://demo.dataverse.org"
+ }
+
+
+Note that the field ``deaccessionForwardURL`` is optional.
+
+.. code-block:: bash
+
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+ export SERVER_URL=https://demo.dataverse.org
+ export ID=24
+ export VERSIONID=1.0
+ export FILE_PATH=deaccession.json
+
+ curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/datasets/$ID/versions/$VERSIONID/deaccession" -H "Content-type:application/json" --upload-file $FILE_PATH
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/datasets/24/versions/1.0/deaccession" -H "Content-type:application/json" --upload-file deaccession.json
+
+.. note:: You cannot deaccession a dataset more than once. If you call this endpoint twice for the same dataset version, you will get a not found error on the second call, since the dataset you are looking for will no longer be published since it is already deaccessioned.
+
Set Citation Date Field Type for a Dataset
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Sets the dataset citation date field type for a given dataset. ``:publicationDate`` is the default.
-Note that the dataset citation date field type must be a date field.
+Sets the dataset citation date field type for a given dataset. ``:publicationDate`` is the default.
+Note that the dataset citation date field type must be a date field. This change applies to all versions of the dataset that have an entry for the new date field. It also applies to all file citations in the dataset.
.. code-block:: bash
@@ -1667,6 +2011,73 @@ The fully expanded example above (without environment variables) looks like this
The size of all files available for download will be returned.
If :draft is passed as versionId the token supplied must have permission to view unpublished drafts. A token is not required for published datasets. Also restricted files will be included in this total regardless of whether the user has access to download the restricted file(s).
+There is an optional query parameter ``mode`` which applies a filter criteria to the operation. This parameter supports the following values:
+
+* ``All`` (Default): Includes both archival and original sizes for tabular files
+* ``Archival``: Includes only the archival size for tabular files
+* ``Original``: Includes only the original size for tabular files
+
+Usage example:
+
+.. code-block:: bash
+
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/datasets/24/versions/1.0/downloadsize?mode=Archival"
+
+Category name filtering is also optionally supported. To return the size of all files available for download matching the requested category name.
+
+Usage example:
+
+.. code-block:: bash
+
+ curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/downloadsize?categoryName=Data"
+
+Tabular tag name filtering is also optionally supported. To return the size of all files available for download for which the requested tabular tag has been added.
+
+Usage example:
+
+.. code-block:: bash
+
+ curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/downloadsize?tabularTagName=Survey"
+
+Content type filtering is also optionally supported. To return the size of all files available for download matching the requested content type.
+
+Usage example:
+
+.. code-block:: bash
+
+ curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/downloadsize?contentType=image/png"
+
+Filtering by search text is also optionally supported. The search will be applied to the labels and descriptions of the dataset files, to return the size of all files available for download that contain the text searched in one of such fields.
+
+Usage example:
+
+.. code-block:: bash
+
+ curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/downloadsize?searchText=word"
+
+File access filtering is also optionally supported. In particular, by the following possible values:
+
+* ``Public``
+* ``Restricted``
+* ``EmbargoedThenRestricted``
+* ``EmbargoedThenPublic``
+
+If no filter is specified, the files will match all of the above categories.
+
+Please note that filtering query parameters are case sensitive and must be correctly typed for the endpoint to recognize them.
+
+By default, deaccessioned dataset versions are not included in the search when applying the :latest or :latest-published identifiers. Additionally, when filtering by a specific version tag, you will get a "not found" error if the version is deaccessioned and you do not enable the ``includeDeaccessioned`` option described below.
+
+If you want to include deaccessioned dataset versions, you must set ``includeDeaccessioned`` query parameter to ``true``.
+
+Usage example:
+
+.. code-block:: bash
+
+ curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/downloadsize?includeDeaccessioned=true"
+
+.. note:: Keep in mind that you can combine all of the above query parameters depending on the results you are looking for.
+
Submit a Dataset for Review
~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1715,7 +2126,8 @@ The fully expanded example above (without environment variables) looks like this
curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/datasets/:persistentId/returnToAuthor?persistentId=doi:10.5072/FK2/J8SJZB" -H "Content-type: application/json" -d @reason-for-return.json
-The review process can sometimes resemble a tennis match, with the authors submitting and resubmitting the dataset over and over until the curators are satisfied. Each time the curators send a "reason for return" via API, that reason is persisted into the database, stored at the dataset version level.
+The review process can sometimes resemble a tennis match, with the authors submitting and resubmitting the dataset over and over until the curators are satisfied. Each time the curators send a "reason for return" via API, that reason is sent by email and is persisted into the database, stored at the dataset version level.
+The reason is required, please note that you can still type a creative and meaningful comment such as "The author would like to modify his dataset", "Files are missing", "Nothing to report" or "A curation report with comments and suggestions/instructions will follow in another email" that suits your situation.
The :ref:`send-feedback` API call may be useful as a way to move the conversation to email. However, note that these emails go to contacts (versus authors) and there is no database record of the email contents. (:ref:`dataverse.mail.cc-support-on-contact-email` will send a copy of these emails to the support email address which would provide a record.)
@@ -2088,10 +2500,12 @@ The API call requires a Json body that includes the list of the fileIds that the
curl -H "X-Dataverse-key: $API_TOKEN" -H "Content-Type:application/json" "$SERVER_URL/api/datasets/:persistentId/files/actions/:unset-embargo?persistentId=$PERSISTENT_IDENTIFIER" -d "$JSON"
+.. _Archival Status API:
+
Get the Archival Status of a Dataset By Version
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Archiving is an optional feature that may be configured for a Dataverse installation. When that is enabled, this API call be used to retrieve the status. Note that this requires "superuser" credentials.
+Archival :ref:`BagIt Export` is an optional feature that may be configured for a Dataverse installation. When that is enabled, this API call be used to retrieve the status. Note that this requires "superuser" credentials.
``GET /api/datasets/$dataset-id/$version/archivalStatus`` returns the archival status of the specified dataset version.
@@ -2171,11 +2585,11 @@ Signposting involves the addition of a `Link ;rel="cite-as", ;rel="describedby";type="application/vnd.citationstyles.csl+json",;rel="describedby";type="application/json+ld", ;rel="type",;rel="type", https://demo.dataverse.org/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.5072/FK2/YD5QDG;rel="license", ; rel="linkset";type="application/linkset+json"``
+``Link: ;rel="cite-as", ;rel="describedby";type="application/vnd.citationstyles.csl+json",;rel="describedby";type="application/ld+json", ;rel="type",;rel="type", ;rel="license", ; rel="linkset";type="application/linkset+json"``
The URL for linkset information is discoverable under the ``rel="linkset";type="application/linkset+json`` entry in the "Link" header, such as in the example above.
-The reponse includes a JSON object conforming to the `Signposting `__ specification.
+The reponse includes a JSON object conforming to the `Signposting `__ specification. As part of this conformance, unlike most Dataverse API responses, the output is not wrapped in a ``{"status":"OK","data":{`` object.
Signposting is not supported for draft dataset versions.
.. code-block:: bash
@@ -2196,6 +2610,17 @@ Get Dataset By Private URL Token
curl "$SERVER_URL/api/datasets/privateUrlDatasetVersion/$PRIVATE_URL_TOKEN"
+If you want to include the Dataverse collections that this dataset is part of, you must set ``returnOwners`` query parameter to ``true``.
+
+Usage example:
+
+.. code-block:: bash
+
+ curl "https://demo.dataverse.org/api/datasets/privateUrlDatasetVersion/a56444bc-7697-4711-8964-e0577f055fd2?returnOwners=true"
+
+
+.. _get-citation:
+
Get Citation
~~~~~~~~~~~~
@@ -2207,10 +2632,20 @@ Get Citation
curl -H "Accept:application/json" "$SERVER_URL/api/datasets/:persistentId/versions/$VERSION/{version}/citation?persistentId=$PERSISTENT_IDENTIFIER"
-Get Citation by Private URL Token
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+By default, deaccessioned dataset versions are not included in the search when applying the :latest or :latest-published identifiers. Additionally, when filtering by a specific version tag, you will get a "not found" error if the version is deaccessioned and you do not enable the ``includeDeaccessioned`` option described below.
-.. code-block:: bash
+If you want to include deaccessioned dataset versions, you must set ``includeDeaccessioned`` query parameter to ``true``.
+
+Usage example:
+
+.. code-block:: bash
+
+ curl -H "Accept:application/json" "$SERVER_URL/api/datasets/:persistentId/versions/$VERSION/{version}/citation?persistentId=$PERSISTENT_IDENTIFIER&includeDeaccessioned=true"
+
+Get Citation by Private URL Token
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: bash
export SERVER_URL=https://demo.dataverse.org
export PRIVATE_URL_TOKEN=a56444bc-7697-4711-8964-e0577f055fd2
@@ -2230,13 +2665,150 @@ See :ref:`:CustomDatasetSummaryFields` in the Installation Guide for how the lis
curl "$SERVER_URL/api/datasets/summaryFieldNames"
+.. _guestbook-at-request-api:
+
+Configure When a Dataset Guestbook Appears (If Enabled)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+By default, users are asked to fill out a configured Guestbook when they down download files from a dataset. If enabled for a given Dataverse instance (see XYZ), users may instead be asked to fill out a Guestbook only when they request access to restricted files.
+This is configured by a global default, collection-level settings, or directly at the dataset level via these API calls (superuser access is required to make changes).
+
+To see the current choice for this dataset:
+
+.. code-block:: bash
+
+ export SERVER_URL=https://demo.dataverse.org
+ export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/YD5QDG
+
+ curl "$SERVER_URL/api/datasets/:persistentId/guestbookEntryAtRequest?persistentId=$PERSISTENT_IDENTIFIER"
+
+
+ The response will be true (guestbook displays when making a request), false (guestbook displays at download), or will indicate that the dataset inherits one of these settings.
+
+To set the behavior for this dataset:
+
+.. code-block:: bash
+
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+ export SERVER_URL=https://demo.dataverse.org
+ export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/YD5QDG
+
+ curl -X PUT -H "X-Dataverse-key:$API_TOKEN" -H Content-type:application/json -d true "$SERVER_URL/api/datasets/:persistentId/guestbookEntryAtRequest?persistentId=$PERSISTENT_IDENTIFIER"
+
+
+ This example uses true to set the behavior to guestbook at request. Note that this call will return a 403/Forbidden response if guestbook at request functionality is not enabled for this Dataverse instance.
+
+The API can also be used to reset the dataset to use the default/inherited value:
+
+.. code-block:: bash
+
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+ export SERVER_URL=https://demo.dataverse.org
+ export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/YD5QDG
+
+ curl -X DELETE -H "X-Dataverse-key:$API_TOKEN" -H Content-type:application/json "$SERVER_URL/api/datasets/:persistentId/guestbookEntryAtRequest?persistentId=$PERSISTENT_IDENTIFIER"
+
+Get User Permissions on a Dataset
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This API call returns the permissions that the calling user has on a particular dataset.
+
+In particular, the user permissions that this API call checks, returned as booleans, are the following:
+
+* Can view the unpublished dataset
+* Can edit the dataset
+* Can publish the dataset
+* Can manage the dataset permissions
+* Can delete the dataset draft
+
+.. code-block:: bash
+
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+ export SERVER_URL=https://demo.dataverse.org
+ export ID=24
+
+ curl -H "X-Dataverse-key: $API_TOKEN" -X GET "$SERVER_URL/api/datasets/$ID/userPermissions"
+
+Know If a User Can Download at Least One File from a Dataset Version
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This API endpoint indicates if the calling user can download at least one file from a dataset version. Note that permissions based on :ref:`shib-groups` are not considered.
+
+.. code-block:: bash
+
+ export SERVER_URL=https://demo.dataverse.org
+ export ID=24
+ export VERSION=1.0
+
+ curl -H "X-Dataverse-key: $API_TOKEN" -X GET "$SERVER_URL/api/datasets/$ID/versions/$VERSION/canDownloadAtLeastOneFile"
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/datasets/24/versions/1.0/canDownloadAtLeastOneFile"
+
+.. _dataset-pid-generator:
+
+Configure The PID Generator a Dataset Uses (If Enabled)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Dataverse can be configured to use multiple PID Providers (see the :ref:`pids-configuration` section for more information).
+When there are multiple PID Providers and File PIDs are enabled, it is possible to set which provider will be used to generate (mint) those PIDs.
+While it usually makes sense to use the same PID Provider that manages the dataset PID, there are cases, specifically if the PID Provider for the dataset PID cannot generate
+other PIDs with the same authority/shoulder, etc. as in the dataset PID, where another Provider is needed. Dataverse has a set of API calls to see what PID provider will be
+used to generate datafile PIDs and, as a superuser, to change it (to a new one or back to a default).
+
+To see the current choice for this dataset:
+
+.. code-block:: bash
+
+ export SERVER_URL=https://demo.dataverse.org
+ export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/YD5QDG
+
+ curl "$SERVER_URL/api/datasets/:persistentId/pidGenerator?persistentId=$PERSISTENT_IDENTIFIER"
+
+The response will be the id of the PID Provider that will be used. Details of that provider's configration can be obtained via the :ref:`pids-providers-api`.
+
+To set the behavior for this dataset:
+
+.. code-block:: bash
+
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+ export SERVER_URL=https://demo.dataverse.org
+ export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/YD5QDG
+ export GENERATOR_ID=perma1
+
+ curl -X PUT -H "X-Dataverse-key:$API_TOKEN" -H Content-type:application/json -d $GENERATOR_ID "$SERVER_URL/api/datasets/:persistentId/pidGenerator?persistentId=$PERSISTENT_IDENTIFIER"
+
+
+The PID Provider id used must be one of the those configured - see :ref:`pids-providers-api`.
+The return status code may be 200/OK, 401/403 if an api key is not sent or the user is not a superuser, or 404 if the dataset or PID provider are not found.
+Note that using a PIDProvider that generates DEPENDENT datafile PIDs that doesn't share the dataset PID's protocol/authority/separator/shoulder is not supported. (INDEPENDENT should be used in this case see the :ref:`pids-configuration` section for more information).
+
+The API can also be used to reset the dataset to use the default/inherited value:
+
+.. code-block:: bash
+
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+ export SERVER_URL=https://demo.dataverse.org
+ export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/YD5QDG
+
+ curl -X DELETE -H "X-Dataverse-key:$API_TOKEN" -H Content-type:application/json "$SERVER_URL/api/datasets/:persistentId/pidGenerator?persistentId=$PERSISTENT_IDENTIFIER"
+
+The default will always be the same provider as for the dataset PID if that provider can generate new PIDs, and will be the PID Provider set for the collection or the global default otherwise.
+
Files
-----
+.. _get-json-rep-of-file:
+
Get JSON Representation of a File
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-.. note:: Files can be accessed using persistent identifiers. This is done by passing the constant ``:persistentId`` where the numeric id of the file is expected, and then passing the actual persistent id as a query parameter with the name ``persistentId``.
+.. note:: When a file has been assigned a persistent identifier, it can be used in the API. This is done by passing the constant ``:persistentId`` where the numeric id of the file is expected, and then passing the actual persistent id as a query parameter with the name ``persistentId``.
+
+This endpoint returns the file metadata present in the latest dataset version.
Example: Getting the file whose DOI is *10.5072/FK2/J8SJZB*:
@@ -2304,6 +2876,127 @@ The fully expanded example above (without environment variables) looks like this
The file id can be extracted from the response retrieved from the API which uses the persistent identifier (``/api/datasets/:persistentId/?persistentId=$PERSISTENT_IDENTIFIER``).
+By default, files from deaccessioned dataset versions are not included in the search. If no accessible dataset draft version exists, the search of the latest published file will ignore dataset deaccessioned versions unless ``includeDeaccessioned`` query parameter is set to ``true``.
+
+Usage example:
+
+.. code-block:: bash
+
+ export SERVER_URL=https://demo.dataverse.org
+ export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/J8SJZB
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+
+ curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/files/:persistentId/?persistentId=$PERSISTENT_IDENTIFIER&includeDeaccessioned=true"
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/files/:persistentId/?persistentId=doi:10.5072/FK2/J8SJZB&includeDeaccessioned=true"
+
+If you want to include the dataset version of the file in the response, there is an optional parameter for this called ``returnDatasetVersion`` whose default value is ``false``.
+
+Usage example:
+
+.. code-block:: bash
+
+ export SERVER_URL=https://demo.dataverse.org
+ export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/J8SJZB
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+
+ curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/files/:persistentId/?persistentId=$PERSISTENT_IDENTIFIER&returnDatasetVersion=true"
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/files/:persistentId/?persistentId=doi:10.5072/FK2/J8SJZB&returnDatasetVersion=true"
+
+Get JSON Representation of a File given a Dataset Version
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. note:: When a file has been assigned a persistent identifier, it can be used in the API. This is done by passing the constant ``:persistentId`` where the numeric id of the file is expected, and then passing the actual persistent id as a query parameter with the name ``persistentId``.
+
+This endpoint returns the file metadata present in the requested dataset version. To specify the dataset version, you can use ``:latest-published``, or ``:latest``, or ``:draft`` or ``1.0`` or any other style listed under :ref:`dataset-version-specifiers`.
+
+Example: Getting the file whose DOI is *10.5072/FK2/J8SJZB* present in the published dataset version ``1.0``:
+
+.. code-block:: bash
+
+ export SERVER_URL=https://demo.dataverse.org
+ export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/J8SJZB
+ export DATASET_VERSION=1.0
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+
+ curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/files/:persistentId/versions/$DATASET_VERSION?persistentId=$PERSISTENT_IDENTIFIER"
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/files/:persistentId/versions/1.0?persistentId=doi:10.5072/FK2/J8SJZB"
+
+You may obtain a not found error depending on whether or not the specified version exists or you have permission to view it.
+
+By default, files from deaccessioned dataset versions are not included in the search unless ``includeDeaccessioned`` query parameter is set to ``true``.
+
+Usage example:
+
+.. code-block:: bash
+
+ export SERVER_URL=https://demo.dataverse.org
+ export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/J8SJZB
+ export DATASET_VERSION=:latest-published
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+
+ curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/files/:persistentId/versions/$DATASET_VERSION?persistentId=$PERSISTENT_IDENTIFIER&includeDeaccessioned=true"
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/files/:persistentId/versions/:latest-published?persistentId=doi:10.5072/FK2/J8SJZB&includeDeaccessioned=true"
+
+If you want to include the dataset version of the file in the response, there is an optional parameter for this called ``returnDatasetVersion`` whose default value is ``false``.
+
+Usage example:
+
+.. code-block:: bash
+
+ export SERVER_URL=https://demo.dataverse.org
+ export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/J8SJZB
+ export DATASET_VERSION=:draft
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+
+ curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/files/:persistentId/versions/$DATASET_VERSION?persistentId=$PERSISTENT_IDENTIFIER&returnDatasetVersion=true"
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/files/:persistentId/versions/:draft?persistentId=doi:10.5072/FK2/J8SJZB&returnDatasetVersion=true"
+
+If you want to include the dataset and collections that the file is part of in the response, there is an optional parameter for this called ``returnOwners`` whose default value is ``false``.
+
+Usage example:
+
+.. code-block:: bash
+
+ export SERVER_URL=https://demo.dataverse.org
+ export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/J8SJZB
+ export DATASET_VERSION=:draft
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+
+ curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/files/:persistentId/versions/$DATASET_VERSION?persistentId=$PERSISTENT_IDENTIFIER&returnOwners=true"
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/files/:persistentId/versions/:draft?persistentId=doi:10.5072/FK2/J8SJZB&returnOwners=true"
+
+
+
Adding Files
~~~~~~~~~~~~
@@ -2375,10 +3068,15 @@ The fully expanded example above (without environment variables) looks like this
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X PUT -d true "https://demo.dataverse.org/api/files/:persistentId/restrict?persistentId=doi:10.5072/FK2/AAA000"
+.. _file-uningest:
+
Uningest a File
~~~~~~~~~~~~~~~
-Reverse the tabular data ingest process performed on a file where ``ID`` is the database id or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file to process. Note that this requires "superuser" credentials.
+Reverse the tabular data ingest process performed on a file where ``ID`` is the database id or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file to process.
+
+Note that this requires "superuser" credentials to undo a successful ingest and remove the variable-level metadata and .tab version of the file.
+It can also be used by a user who can publish the dataset to clear the error from an unsuccessful ingest.
A curl example using an ``ID``:
@@ -2412,12 +3110,229 @@ The fully expanded example above (without environment variables) looks like this
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/files/:persistentId/uningest?persistentId=doi:10.5072/FK2/AAA000"
+.. _file-reingest:
+
Reingest a File
~~~~~~~~~~~~~~~
-Attempt to ingest an existing datafile as tabular data. This API can be used on a file that was not ingested as tabular back when it was uploaded. For example, a Stata v.14 file that was uploaded before ingest support for Stata 14 was added (in Dataverse Software v.4.9). It can also be used on a file that failed to ingest due to a bug in the ingest plugin that has since been fixed (hence the name "reingest").
+Attempt to ingest an existing datafile as tabular data. This API can be used on a file that was not ingested as tabular back when it was uploaded. For example, a Stata v.14 file that was uploaded before ingest support for Stata 14 was added (in Dataverse Software v.4.9). It can also be used on a file that failed to ingest due to a bug in the ingest plugin that has since been fixed (hence the name "reingest").
+
+Note that this requires "superuser" credentials.
+
+A curl example using an ``ID``
+
+.. code-block:: bash
+
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+ export SERVER_URL=https://demo.dataverse.org
+ export ID=24
+
+ curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/files/$ID/reingest"
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/files/24/reingest"
+
+A curl example using a ``PERSISTENT_ID``
+
+.. code-block:: bash
+
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+ export SERVER_URL=https://demo.dataverse.org
+ export PERSISTENT_ID=doi:10.5072/FK2/AAA000
+
+ curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/files/:persistentId/reingest?persistentId=$PERSISTENT_ID"
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/files/:persistentId/reingest?persistentId=doi:10.5072/FK2/AAA000"
+
+Note: at present, the API cannot be used on a file that's already successfully ingested as tabular.
+
+.. _redetect-file-type:
+
+Redetect File Type
+~~~~~~~~~~~~~~~~~~
+
+The Dataverse Software uses a variety of methods for determining file types (MIME types or content types) and these methods (listed below) are updated periodically. If you have files that have an unknown file type, you can have the Dataverse Software attempt to redetect the file type.
+
+When using the curl command below, you can pass ``dryRun=true`` if you don't want any changes to be saved to the database. Change this to ``dryRun=false`` (or omit it) to save the change.
+
+A curl example using an ``id``
+
+.. code-block:: bash
+
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+ export SERVER_URL=https://demo.dataverse.org
+ export ID=24
+
+ curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/files/$ID/redetect?dryRun=true"
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/files/24/redetect?dryRun=true"
+
+A curl example using a ``pid``
+
+.. code-block:: bash
+
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+ export SERVER_URL=https://demo.dataverse.org
+ export PERSISTENT_ID=doi:10.5072/FK2/AAA000
+
+ curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/files/:persistentId/redetect?persistentId=$PERSISTENT_ID&dryRun=true"
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/files/:persistentId/redetect?persistentId=doi:10.5072/FK2/AAA000&dryRun=true"
+
+Currently the following methods are used to detect file types:
+
+- The file type detected by the browser (or sent via API).
+- JHOVE: https://jhove.openpreservation.org
+- The file extension (e.g. ".ipybn") is used, defined in a file called ``MimeTypeDetectionByFileExtension.properties``.
+- The file name (e.g. "Dockerfile") is used, defined in a file called ``MimeTypeDetectionByFileName.properties``.
+
+.. _extractNcml:
+
+Extract NcML
+~~~~~~~~~~~~
+
+As explained in the :ref:`netcdf-and-hdf5` section of the User Guide, when those file types are uploaded, an attempt is made to extract an NcML file from them and store it as an auxiliary file.
+
+This happens automatically but superusers can also manually trigger this NcML extraction process with the API endpoint below.
+
+Note that "true" will be returned if an NcML file was created. "false" will be returned if there was an error or if the NcML file already exists (check server.log for details).
+
+.. code-block:: bash
+
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+ export SERVER_URL=https://demo.dataverse.org
+ export ID=24
+
+ curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/files/$ID/extractNcml"
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/files/24/extractNcml"
+
+A curl example using a PID:
+
+.. code-block:: bash
+
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+ export SERVER_URL=https://demo.dataverse.org
+ export PERSISTENT_ID=doi:10.5072/FK2/AAA000
+
+ curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/files/:persistentId/extractNcml?persistentId=$PERSISTENT_ID"
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/files/:persistentId/extractNcml?persistentId=doi:10.5072/FK2/AAA000"
+
+Replacing Files
+~~~~~~~~~~~~~~~
+
+Replace an existing file where ``ID`` is the database id of the file to replace or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file. Requires the ``file`` to be passed as well as a ``jsonString`` expressing the new metadata. Note that metadata such as description, directoryLabel (File Path) and tags are not carried over from the file being replaced.
+
+Note that when a Dataverse installation is configured to use S3 storage with direct upload enabled, there is API support to send a replacement file directly to S3. This is more complex and is described in the :doc:`/developers/s3-direct-upload-api` guide.
+
+A curl example using an ``ID``
+
+.. code-block:: bash
+
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+ export SERVER_URL=https://demo.dataverse.org
+ export ID=24
+
+ curl -H "X-Dataverse-key:$API_TOKEN" -X POST -F 'file=@file.extension' -F 'jsonData={json}' "$SERVER_URL/api/files/$ID/replace"
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST -F 'file=@data.tsv' \
+ -F 'jsonData={"description":"My description.","categories":["Data"],"forceReplace":false}' \
+ "https://demo.dataverse.org/api/files/24/replace"
+
+A curl example using a ``PERSISTENT_ID``
+
+.. code-block:: bash
+
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+ export SERVER_URL=https://demo.dataverse.org
+ export PERSISTENT_ID=doi:10.5072/FK2/AAA000
+
+ curl -H "X-Dataverse-key:$API_TOKEN" -X POST -F 'file=@file.extension' -F 'jsonData={json}' \
+ "$SERVER_URL/api/files/:persistentId/replace?persistentId=$PERSISTENT_ID"
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST -F 'file=@data.tsv' \
+ -F 'jsonData={"description":"My description.","categories":["Data"],"forceReplace":false}' \
+ "https://demo.dataverse.org/api/files/:persistentId/replace?persistentId=doi:10.5072/FK2/AAA000"
+
+Deleting Files
+~~~~~~~~~~~~~~
+
+Delete an existing file where ``ID`` is the database id of the file to delete or ``PERSISTENT_ID`` is the persistent id (DOI or Handle, if it exists) of the file.
+
+Note that the behavior of deleting files depends on if the dataset has ever been published or not.
+
+- If the dataset has never been published, the file will be deleted forever.
+- If the dataset has published, the file is deleted from the draft (and future published versions).
+- If the dataset has published, the deleted file can still be downloaded because it was part of a published version.
+
+A curl example using an ``ID``
+
+.. code-block:: bash
+
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+ export SERVER_URL=https://demo.dataverse.org
+ export ID=24
+
+ curl -H "X-Dataverse-key:$API_TOKEN" -X DELETE "$SERVER_URL/api/files/$ID"
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE "https://demo.dataverse.org/api/files/24"
+
+A curl example using a ``PERSISTENT_ID``
+
+.. code-block:: bash
+
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+ export SERVER_URL=https://demo.dataverse.org
+ export PERSISTENT_ID=doi:10.5072/FK2/AAA000
+
+ curl -H "X-Dataverse-key:$API_TOKEN" -X DELETE "$SERVER_URL/api/files/:persistentId?persistentId=$PERSISTENT_ID"
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE "https://demo.dataverse.org/api/files/:persistentId?persistentId=doi:10.5072/FK2/AAA000"
-Note that this requires "superuser" credentials.
+Getting File Metadata
+~~~~~~~~~~~~~~~~~~~~~
+
+Provides a json representation of the file metadata for an existing file where ``ID`` is the database id of the file to get metadata from or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file.
A curl example using an ``ID``
@@ -2427,13 +3342,13 @@ A curl example using an ``ID``
export SERVER_URL=https://demo.dataverse.org
export ID=24
- curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/files/$ID/reingest"
+ curl "$SERVER_URL/api/files/$ID/metadata"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
- curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/files/24/reingest"
+ curl "https://demo.dataverse.org/api/files/24/metadata"
A curl example using a ``PERSISTENT_ID``
@@ -2443,26 +3358,17 @@ A curl example using a ``PERSISTENT_ID``
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_ID=doi:10.5072/FK2/AAA000
- curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/files/:persistentId/reingest?persistentId=$PERSISTENT_ID"
+ curl "$SERVER_URL/api/files/:persistentId/metadata?persistentId=$PERSISTENT_ID"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
- curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/files/:persistentId/reingest?persistentId=doi:10.5072/FK2/AAA000"
-
-Note: at present, the API cannot be used on a file that's already successfully ingested as tabular.
-
-.. _redetect-file-type:
-
-Redetect File Type
-~~~~~~~~~~~~~~~~~~
-
-The Dataverse Software uses a variety of methods for determining file types (MIME types or content types) and these methods (listed below) are updated periodically. If you have files that have an unknown file type, you can have the Dataverse Software attempt to redetect the file type.
+ curl "https://demo.dataverse.org/api/files/:persistentId/metadata?persistentId=doi:10.5072/FK2/AAA000"
-When using the curl command below, you can pass ``dryRun=true`` if you don't want any changes to be saved to the database. Change this to ``dryRun=false`` (or omit it) to save the change.
+The current draft can also be viewed if you have permissions and pass your API token
-A curl example using an ``id``
+A curl example using an ``ID``
.. code-block:: bash
@@ -2470,15 +3376,15 @@ A curl example using an ``id``
export SERVER_URL=https://demo.dataverse.org
export ID=24
- curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/files/$ID/redetect?dryRun=true"
+ curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/files/$ID/metadata/draft"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
- curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/files/24/redetect?dryRun=true"
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/files/24/metadata/draft"
-A curl example using a ``pid``
+A curl example using a ``PERSISTENT_ID``
.. code-block:: bash
@@ -2486,31 +3392,22 @@ A curl example using a ``pid``
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_ID=doi:10.5072/FK2/AAA000
- curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/files/:persistentId/redetect?persistentId=$PERSISTENT_ID&dryRun=true"
+ curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/files/:persistentId/metadata/draft?persistentId=$PERSISTENT_ID"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
- curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/files/:persistentId/redetect?persistentId=doi:10.5072/FK2/AAA000&dryRun=true"
-
-Currently the following methods are used to detect file types:
-
-- The file type detected by the browser (or sent via API).
-- JHOVE: http://jhove.openpreservation.org
-- The file extension (e.g. ".ipybn") is used, defined in a file called ``MimeTypeDetectionByFileExtension.properties``.
-- The file name (e.g. "Dockerfile") is used, defined in a file called ``MimeTypeDetectionByFileName.properties``.
-
-.. _extractNcml:
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/files/:persistentId/metadata/draft?persistentId=doi:10.5072/FK2/AAA000"
-Extract NcML
-~~~~~~~~~~~~
+Note: The ``id`` returned in the json response is the id of the file metadata version.
-As explained in the :ref:`netcdf-and-hdf5` section of the User Guide, when those file types are uploaded, an attempt is made to extract an NcML file from them and store it as an auxiliary file.
+Getting File Data Tables
+~~~~~~~~~~~~~~~~~~~~~~~~
-This happens automatically but superusers can also manually trigger this NcML extraction process with the API endpoint below.
+This endpoint is oriented toward tabular files and provides a JSON representation of the file data tables for an existing tabular file. ``ID`` is the database id of the file to get the data tables from or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file.
-Note that "true" will be returned if an NcML file was created. "false" will be returned if there was an error or if the NcML file already exists (check server.log for details).
+A curl example using an ``ID``
.. code-block:: bash
@@ -2518,15 +3415,15 @@ Note that "true" will be returned if an NcML file was created. "false" will be r
export SERVER_URL=https://demo.dataverse.org
export ID=24
- curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/files/$ID/extractNcml"
+ curl $SERVER_URL/api/files/$ID/dataTables
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
- curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/files/24/extractNcml"
+ curl https://demo.dataverse.org/api/files/24/dataTables
-A curl example using a PID:
+A curl example using a ``PERSISTENT_ID``
.. code-block:: bash
@@ -2534,20 +3431,22 @@ A curl example using a PID:
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_ID=doi:10.5072/FK2/AAA000
- curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/files/:persistentId/extractNcml?persistentId=$PERSISTENT_ID"
+ curl "$SERVER_URL/api/files/:persistentId/dataTables?persistentId=$PERSISTENT_ID"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
- curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/files/:persistentId/extractNcml?persistentId=doi:10.5072/FK2/AAA000"
+ curl "https://demo.dataverse.org/api/files/:persistentId/dataTables?persistentId=doi:10.5072/FK2/AAA000"
-Replacing Files
-~~~~~~~~~~~~~~~
+Note that if the requested file is not tabular, the endpoint will return an error.
-Replace an existing file where ``ID`` is the database id of the file to replace or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file. Requires the ``file`` to be passed as well as a ``jsonString`` expressing the new metadata. Note that metadata such as description, directoryLabel (File Path) and tags are not carried over from the file being replaced.
+.. _file-download-count:
-Note that when a Dataverse installation is configured to use S3 storage with direct upload enabled, there is API support to send a replacement file directly to S3. This is more complex and is described in the :doc:`/developers/s3-direct-upload-api` guide.
+Getting File Download Count
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Provides the download count for a particular file, where ``ID`` is the database id of the file to get the download count from or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file.
A curl example using an ``ID``
@@ -2557,15 +3456,13 @@ A curl example using an ``ID``
export SERVER_URL=https://demo.dataverse.org
export ID=24
- curl -H "X-Dataverse-key:$API_TOKEN" -X POST -F 'file=@file.extension' -F 'jsonData={json}' "$SERVER_URL/api/files/$ID/replace"
+ curl -H "X-Dataverse-key:$API_TOKEN" -X GET "$SERVER_URL/api/files/$ID/downloadCount"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
- curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST -F 'file=@data.tsv' \
- -F 'jsonData={"description":"My description.","categories":["Data"],"forceReplace":false}' \
- "https://demo.dataverse.org/api/files/24/replace"
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/files/24/downloadCount"
A curl example using a ``PERSISTENT_ID``
@@ -2575,27 +3472,20 @@ A curl example using a ``PERSISTENT_ID``
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_ID=doi:10.5072/FK2/AAA000
- curl -H "X-Dataverse-key:$API_TOKEN" -X POST -F 'file=@file.extension' -F 'jsonData={json}' \
- "$SERVER_URL/api/files/:persistentId/replace?persistentId=$PERSISTENT_ID"
+ curl -H "X-Dataverse-key:$API_TOKEN" -X GET "$SERVER_URL/api/files/:persistentId/downloadCount?persistentId=$PERSISTENT_ID"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
- curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST -F 'file=@data.tsv' \
- -F 'jsonData={"description":"My description.","categories":["Data"],"forceReplace":false}' \
- "https://demo.dataverse.org/api/files/:persistentId/replace?persistentId=doi:10.5072/FK2/AAA000"
-
-Deleting Files
-~~~~~~~~~~~~~~
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/files/:persistentId/downloadCount?persistentId=doi:10.5072/FK2/AAA000"
-Delete an existing file where ``ID`` is the database id of the file to delete or ``PERSISTENT_ID`` is the persistent id (DOI or Handle, if it exists) of the file.
+If you are interested in download counts for multiple files, see :doc:`/api/metrics`.
-Note that the behavior of deleting files depends on if the dataset has ever been published or not.
+File Has Been Deleted
+~~~~~~~~~~~~~~~~~~~~~
-- If the dataset has never been published, the file will be deleted forever.
-- If the dataset has published, the file is deleted from the draft (and future published versions).
-- If the dataset has published, the deleted file can still be downloaded because it was part of a published version.
+Know if a particular file that existed in a previous version of the dataset no longer exists in the latest version.
A curl example using an ``ID``
@@ -2605,13 +3495,13 @@ A curl example using an ``ID``
export SERVER_URL=https://demo.dataverse.org
export ID=24
- curl -H "X-Dataverse-key:$API_TOKEN" -X DELETE "$SERVER_URL/api/files/$ID"
+ curl -H "X-Dataverse-key:$API_TOKEN" -X GET "$SERVER_URL/api/files/$ID/hasBeenDeleted"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
- curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE "https://demo.dataverse.org/api/files/24"
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/files/24/hasBeenDeleted"
A curl example using a ``PERSISTENT_ID``
@@ -2621,18 +3511,18 @@ A curl example using a ``PERSISTENT_ID``
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_ID=doi:10.5072/FK2/AAA000
- curl -H "X-Dataverse-key:$API_TOKEN" -X DELETE "$SERVER_URL/api/files/:persistentId?persistentId=$PERSISTENT_ID"
+ curl -H "X-Dataverse-key:$API_TOKEN" -X GET "$SERVER_URL/api/files/:persistentId/hasBeenDeleted?persistentId=$PERSISTENT_ID"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
- curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE "https://demo.dataverse.org/api/files/:persistentId?persistentId=doi:10.5072/FK2/AAA000"
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/files/:persistentId/hasBeenDeleted?persistentId=doi:10.5072/FK2/AAA000"
-Getting File Metadata
-~~~~~~~~~~~~~~~~~~~~~
+Updating File Metadata
+~~~~~~~~~~~~~~~~~~~~~~
-Provides a json representation of the file metadata for an existing file where ``ID`` is the database id of the file to get metadata from or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file.
+Updates the file metadata for an existing file where ``ID`` is the database id of the file to update or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file. Requires a ``jsonString`` expressing the new metadata. No metadata from the previous version of this file will be persisted, so if you want to update a specific field first get the json with the above command and alter the fields you want.
A curl example using an ``ID``
@@ -2642,13 +3532,17 @@ A curl example using an ``ID``
export SERVER_URL=https://demo.dataverse.org
export ID=24
- curl "$SERVER_URL/api/files/$ID/metadata"
+ curl -H "X-Dataverse-key:$API_TOKEN" -X POST \
+ -F 'jsonData={"description":"My description bbb.","provFreeform":"Test prov freeform","categories":["Data"],"dataFileTags":["Survey"],"restrict":false}' \
+ "$SERVER_URL/api/files/$ID/metadata"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
- curl "https://demo.dataverse.org/api/files/24/metadata"
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST \
+ -F 'jsonData={"description":"My description bbb.","provFreeform":"Test prov freeform","categories":["Data"],"dataFileTags":["Survey"],"restrict":false}' \
+ "https://demo.dataverse.org/api/files/24/metadata"
A curl example using a ``PERSISTENT_ID``
@@ -2658,15 +3552,39 @@ A curl example using a ``PERSISTENT_ID``
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_ID=doi:10.5072/FK2/AAA000
- curl "$SERVER_URL/api/files/:persistentId/metadata?persistentId=$PERSISTENT_ID"
+ curl -H "X-Dataverse-key:$API_TOKEN" -X POST \
+ -F 'jsonData={"description":"My description bbb.","provFreeform":"Test prov freeform","categories":["Data"],"dataFileTags":["Survey"],"restrict":false}' \
+ "$SERVER_URL/api/files/:persistentId/metadata?persistentId=$PERSISTENT_ID"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
- curl "https://demo.dataverse.org/api/files/:persistentId/metadata?persistentId=doi:10.5072/FK2/AAA000"
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST \
+ -F 'jsonData={"description":"My description bbb.","provFreeform":"Test prov freeform","categories":["Data"],"dataFileTags":["Survey"],"restrict":false}' \
+ "https://demo.dataverse.org/api/files/:persistentId/metadata?persistentId=doi:10.5072/FK2/AAA000"
-The current draft can also be viewed if you have permissions and pass your API token
+Note: To update the 'tabularTags' property of file metadata, use the 'dataFileTags' key when making API requests. This property is used to update the 'tabularTags' of the file metadata.
+
+Also note that dataFileTags are not versioned and changes to these will update the published version of the file.
+
+.. _EditingVariableMetadata:
+
+Updating File Metadata Categories
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Updates the categories for an existing file where ``ID`` is the database id of the file to update or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file. Requires a ``jsonString`` expressing the category names.
+
+Although updating categories can also be done with the previous endpoint, this has been created to be more practical when it is only necessary to update categories and not other metadata fields.
+
+The JSON representation of file categories (``categories.json``) looks like this::
+
+ {
+ "categories": [
+ "Data",
+ "Custom"
+ ]
+ }
A curl example using an ``ID``
@@ -2675,14 +3593,19 @@ A curl example using an ``ID``
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export ID=24
+ export FILE_PATH=categories.json
- curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/files/$ID/metadata/draft"
+ curl -H "X-Dataverse-key:$API_TOKEN" -X POST \
+ "$SERVER_URL/api/files/$ID/metadata/categories" \
+ -H "Content-type:application/json" --upload-file $FILE_PATH
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
- curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/files/24/metadata/draft"
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST \
+ "http://demo.dataverse.org/api/files/24/metadata/categories" \
+ -H "Content-type:application/json" --upload-file categories.json
A curl example using a ``PERSISTENT_ID``
@@ -2691,22 +3614,35 @@ A curl example using a ``PERSISTENT_ID``
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_ID=doi:10.5072/FK2/AAA000
+ export FILE_PATH=categories.json
- curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/files/:persistentId/metadata/draft?persistentId=$PERSISTENT_ID"
+ curl -H "X-Dataverse-key:$API_TOKEN" -X POST \
+ "$SERVER_URL/api/files/:persistentId/metadata/categories?persistentId=$PERSISTENT_ID" \
+ -H "Content-type:application/json" --upload-file $FILE_PATH
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
- curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/files/:persistentId/metadata/draft?persistentId=doi:10.5072/FK2/AAA000"
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST \
+ "https://demo.dataverse.org/api/files/:persistentId/metadata/categories?persistentId=doi:10.5072/FK2/AAA000" \
+ -H "Content-type:application/json" --upload-file categories.json
-Note: The ``id`` returned in the json response is the id of the file metadata version.
+Note that if the specified categories do not exist, they will be created.
+Updating File Tabular Tags
+~~~~~~~~~~~~~~~~~~~~~~~~~~
-Updating File Metadata
-~~~~~~~~~~~~~~~~~~~~~~
+Updates the tabular tags for an existing tabular file where ``ID`` is the database id of the file to update or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file. Requires a ``jsonString`` expressing the tabular tag names.
-Updates the file metadata for an existing file where ``ID`` is the database id of the file to update or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file. Requires a ``jsonString`` expressing the new metadata. No metadata from the previous version of this file will be persisted, so if you want to update a specific field first get the json with the above command and alter the fields you want.
+The JSON representation of tabular tags (``tags.json``) looks like this::
+
+ {
+ "tabularTags": [
+ "Survey",
+ "Genomics"
+ ]
+ }
A curl example using an ``ID``
@@ -2715,18 +3651,19 @@ A curl example using an ``ID``
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export ID=24
+ export FILE_PATH=tags.json
curl -H "X-Dataverse-key:$API_TOKEN" -X POST \
- -F 'jsonData={"description":"My description bbb.","provFreeform":"Test prov freeform","categories":["Data"],"restrict":false}' \
- "$SERVER_URL/api/files/$ID/metadata"
+ "$SERVER_URL/api/files/$ID/metadata/tabularTags" \
+ -H "Content-type:application/json" --upload-file $FILE_PATH
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST \
- -F 'jsonData={"description":"My description bbb.","provFreeform":"Test prov freeform","categories":["Data"],"restrict":false}' \
- "http://demo.dataverse.org/api/files/24/metadata"
+ "http://demo.dataverse.org/api/files/24/metadata/tabularTags" \
+ -H "Content-type:application/json" --upload-file tags.json
A curl example using a ``PERSISTENT_ID``
@@ -2735,22 +3672,29 @@ A curl example using a ``PERSISTENT_ID``
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_ID=doi:10.5072/FK2/AAA000
+ export FILE_PATH=tags.json
curl -H "X-Dataverse-key:$API_TOKEN" -X POST \
- -F 'jsonData={"description":"My description bbb.","provFreeform":"Test prov freeform","categories":["Data"],"restrict":false}' \
- "$SERVER_URL/api/files/:persistentId/metadata?persistentId=$PERSISTENT_ID"
+ "$SERVER_URL/api/files/:persistentId/metadata/tabularTags?persistentId=$PERSISTENT_ID" \
+ -H "Content-type:application/json" --upload-file $FILE_PATH
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST \
- -F 'jsonData={"description":"My description bbb.","provFreeform":"Test prov freeform","categories":["Data"],"restrict":false}' \
- "https://demo.dataverse.org/api/files/:persistentId/metadata?persistentId=doi:10.5072/FK2/AAA000"
+ "https://demo.dataverse.org/api/files/:persistentId/metadata/tabularTags?persistentId=doi:10.5072/FK2/AAA000" \
+ -H "Content-type:application/json" --upload-file tags.json
-Also note that dataFileTags are not versioned and changes to these will update the published version of the file.
+Note that the specified tabular tags must be valid. The supported tags are:
-.. _EditingVariableMetadata:
+* ``Survey``
+* ``Time Series``
+* ``Panel``
+* ``Event``
+* ``Genomics``
+* ``Network``
+* ``Geospatial``
Editing Variable Level Metadata
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -2776,6 +3720,55 @@ The fully expanded example above (without environment variables) looks like this
You can download :download:`dct.xml <../../../../src/test/resources/xml/dct.xml>` from the example above to see what the XML looks like.
+Get File Citation as JSON
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This API is for getting the file citation as it appears on the file landing page. It is formatted in HTML and encoded in JSON.
+
+To specify the version, you can use ``:latest-published`` or ``:draft`` or ``1.0`` or any other style listed under :ref:`dataset-version-specifiers`.
+
+When the dataset version is published, authentication is not required:
+
+.. code-block:: bash
+
+ export SERVER_URL=https://demo.dataverse.org
+ export FILE_ID=42
+ export DATASET_VERSION=:latest-published
+
+ curl "$SERVER_URL/api/files/$FILE_ID/versions/$DATASET_VERSION/citation"
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+ curl "https://demo.dataverse.org/api/files/42/versions/:latest-published/citation"
+
+When the dataset version is a draft or deaccessioned, authentication is required.
+
+By default, deaccessioned dataset versions are not included in the search when applying the :latest or :latest-published identifiers. Additionally, when filtering by a specific version tag, you will get a "unauthorized" error if the version is deaccessioned and you do not enable the ``includeDeaccessioned`` option described below.
+
+If you want to include deaccessioned dataset versions, you must set ``includeDeaccessioned`` query parameter to ``true``.
+
+.. code-block:: bash
+
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+ export SERVER_URL=https://demo.dataverse.org
+ export FILE_ID=42
+ export DATASET_VERSION=:draft
+ export INCLUDE_DEACCESSIONED=true
+
+ curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/files/$FILE_ID/versions/$DATASET_VERSION/citation?includeDeaccessioned=$INCLUDE_DEACCESSIONED"
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/files/42/versions/:draft/citation?includeDeaccessioned=true"
+
+If your file has a persistent identifier (PID, such as a DOI), you can pass it using the technique described under :ref:`get-json-rep-of-file`.
+
+This API is not for downloading various citation formats such as EndNote XML, RIS, or BibTeX. This functionality has been requested in https://github.com/IQSS/dataverse/issues/3140 and https://github.com/IQSS/dataverse/issues/9994.
+
Provenance
~~~~~~~~~~
@@ -3318,6 +4311,8 @@ Show Support Of Incomplete Metadata Deposition
Learn if an instance has been configured to allow deposition of incomplete datasets via the API.
See also :ref:`create-dataset-command` and :ref:`dataverse.api.allow-incomplete-metadata`
+.. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of export below.
+
.. code-block:: bash
export SERVER_URL=https://demo.dataverse.org
@@ -3330,6 +4325,45 @@ The fully expanded example above (without environment variables) looks like this
curl "https://demo.dataverse.org/api/info/settings/incompleteMetadataViaApi"
+Get Zip File Download Limit
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Get the configured zip file download limit. The response contains the long value of the limit in bytes.
+
+This limit comes from the database setting :ref:`:ZipDownloadLimit` if set, or the default value if the database setting is not set, which is 104857600 (100MB).
+
+.. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of export below.
+
+.. code-block:: bash
+
+ export SERVER_URL=https://demo.dataverse.org
+
+ curl "$SERVER_URL/api/info/zipDownloadLimit"
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+ curl "https://demo.dataverse.org/api/info/zipDownloadLimit"
+
+Get Maximum Embargo Duration In Months
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Get the maximum embargo duration in months, if available, configured through the database setting :ref:`:MaxEmbargoDurationInMonths` from the Configuration section of the Installation Guide.
+
+.. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of export below.
+
+.. code-block:: bash
+
+ export SERVER_URL=https://demo.dataverse.org
+
+ curl "$SERVER_URL/api/info/settings/:MaxEmbargoDurationInMonths"
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+ curl "https://demo.dataverse.org/api/info/settings/:MaxEmbargoDurationInMonths"
.. _metadata-blocks-api:
@@ -3826,6 +4860,56 @@ The fully expanded example above (without environment variables) looks like this
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE "https://demo.dataverse.org/api/pids/:persistentId/delete?persistentId=doi:10.70122/FK2/9BXT5O"
+.. _pids-providers-api:
+
+Get Information about Configured PID Providers
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Dataverse can be configured with one or more PID Providers that it uses to create new PIDs and manage existing ones.
+This API call returns a JSONObject listing the configured providers and details about the protocol/authority/separator/shoulder they manage,
+along with information about about how new dataset and datafile PIDs are generated. See the :ref:`pids-configuration` section for more information.
+
+.. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of export below.
+
+.. code-block:: bash
+
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+ export SERVER_URL=https://demo.dataverse.org
+
+ curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/pids/providers"
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/pids/providers"
+
+Get the id of the PID Provider Managing a Given PID
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Dataverse can be configured with one or more PID Providers that it uses to create new PIDs and manage existing ones.
+This API call returns the string id of the PID Provider than manages a given PID. See the :ref:`pids-configuration` section for more information.
+Delete PID (this is only possible for PIDs that are in the "draft" state) and within a Dataverse installation, set ``globalidcreatetime`` to null and ``identifierregistered`` to false. A superuser API token is required.
+
+.. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of export below.
+
+.. code-block:: bash
+
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+ export SERVER_URL=https://demo.dataverse.org
+ export PID=doi:10.70122/FK2/9BXT5O
+
+ curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/pids/providers/$PID"
+
+The fully expanded example above (without environment variables) looks like this:
+
+.. code-block:: bash
+
+ curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/pids/providers/doi:10.70122/FK2/9BXT5O"
+
+If the PID is not managed by Dataverse, this call will report if the PID is recognized as a valid PID for a given protocol (doi, hdl, or perma)
+ or will return a 400/Bad Request response if it is not.
+
.. _admin:
@@ -4677,7 +5761,6 @@ A curl example using allowing access to a dataset's metadata
Please see :ref:`dataverse.api.signature-secret` for the configuration option to add a shared secret, enabling extra
security.
-
.. _send-feedback:
Send Feedback To Contact(s)
@@ -4704,6 +5787,33 @@ A curl example using an ``ID``
Note that this call could be useful in coordinating with dataset authors (assuming they are also contacts) as an alternative/addition to the functionality provided by :ref:`return-a-dataset`.
+.. _thumbnail_reset:
+
+Reset Thumbnail Failure Flags
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If Dataverse attempts to create a thumbnail image for an image or PDF file and the attempt fails, Dataverse will set a flag for the file to avoid repeated attempts to generate the thumbnail.
+For cases where the problem may have been temporary (or fixed in a later Dataverse release), the API calls below can be used to reset this flag for all files or for a given file.
+
+.. code-block:: bash
+
+ export SERVER_URL=https://demo.dataverse.org
+ export FILE_ID=1234
+
+ curl -X DELETE $SERVER_URL/api/admin/clearThumbnailFailureFlag
+
+ curl -X DELETE $SERVER_URL/api/admin/clearThumbnailFailureFlag/$FILE_ID
+
+.. _download-file-from-tmp:
+
+Download File from /tmp
+~~~~~~~~~~~~~~~~~~~~~~~
+
+As a superuser::
+
+ GET /api/admin/downloadTmpFile?fullyQualifiedPathToFile=/tmp/foo.txt
+
+Note that this API is probably only useful for testing.
MyData
------
diff --git a/doc/sphinx-guides/source/api/search.rst b/doc/sphinx-guides/source/api/search.rst
index b941064f173..e8d0a0b3ea7 100755
--- a/doc/sphinx-guides/source/api/search.rst
+++ b/doc/sphinx-guides/source/api/search.rst
@@ -25,7 +25,7 @@ Parameters
Name Type Description
=============== ======= ===========
q string The search term or terms. Using "title:data" will search only the "title" field. "*" can be used as a wildcard either alone or adjacent to a term (i.e. "bird*"). For example, https://demo.dataverse.org/api/search?q=title:data . For a list of fields to search, please see https://github.com/IQSS/dataverse/issues/2558 (for now).
-type string Can be either "Dataverse", "dataset", or "file". Multiple "type" parameters can be used to include multiple types (i.e. ``type=dataset&type=file``). If omitted, all types will be returned. For example, https://demo.dataverse.org/api/search?q=*&type=dataset
+type string Can be either "dataverse", "dataset", or "file". Multiple "type" parameters can be used to include multiple types (i.e. ``type=dataset&type=file``). If omitted, all types will be returned. For example, https://demo.dataverse.org/api/search?q=*&type=dataset
subtree string The identifier of the Dataverse collection to which the search should be narrowed. The subtree of this Dataverse collection and all its children will be searched. Multiple "subtree" parameters can be used to include multiple Dataverse collections. For example, https://demo.dataverse.org/api/search?q=data&subtree=birds&subtree=cats .
sort string The sort field. Supported values include "name" and "date". See example under "order".
order string The order in which to sort. Can either be "asc" or "desc". For example, https://demo.dataverse.org/api/search?q=data&sort=name&order=asc
diff --git a/doc/sphinx-guides/source/api/sword.rst b/doc/sphinx-guides/source/api/sword.rst
index 11b43e98774..51391784bde 100755
--- a/doc/sphinx-guides/source/api/sword.rst
+++ b/doc/sphinx-guides/source/api/sword.rst
@@ -9,19 +9,19 @@ SWORD_ stands for "Simple Web-service Offering Repository Deposit" and is a "pro
About
-----
-Introduced in Dataverse Network (DVN) `3.6 `_, the SWORD API was formerly known as the "Data Deposit API" and ``data-deposit/v1`` appeared in the URLs. For backwards compatibility these URLs continue to work (with deprecation warnings). Due to architectural changes and security improvements (especially the introduction of API tokens) in Dataverse Software 4.0, a few backward incompatible changes were necessarily introduced and for this reason the version has been increased to ``v1.1``. For details, see :ref:`incompatible`.
+Introduced in Dataverse Network (DVN) `3.6 `_, the SWORD API was formerly known as the "Data Deposit API" and ``data-deposit/v1`` appeared in the URLs. For backwards compatibility these URLs continue to work (with deprecation warnings). Due to architectural changes and security improvements (especially the introduction of API tokens) in Dataverse Software 4.0, a few backward incompatible changes were necessarily introduced and for this reason the version has been increased to ``v1.1``. For details, see :ref:`incompatible`.
-The Dataverse Software implements most of SWORDv2_, which is specified at http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html . Please reference the `SWORDv2 specification`_ for expected HTTP status codes (i.e. 201, 204, 404, etc.), headers (i.e. "Location"), etc.
+The Dataverse Software implements most of SWORDv2_, which is specified at https://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html . Please reference the `SWORDv2 specification`_ for expected HTTP status codes (i.e. 201, 204, 404, etc.), headers (i.e. "Location"), etc.
As a profile of AtomPub, XML is used throughout SWORD. As of Dataverse Software 4.0 datasets can also be created via JSON using the "native" API. SWORD is limited to the dozen or so fields listed below in the crosswalk, but the native API allows you to populate all metadata fields available in a Dataverse installation.
-.. _SWORD: http://en.wikipedia.org/wiki/SWORD_%28protocol%29
+.. _SWORD: https://en.wikipedia.org/wiki/SWORD_%28protocol%29
.. _SWORDv2: http://swordapp.org/sword-v2/sword-v2-specifications/
.. _RFC 5023: https://tools.ietf.org/html/rfc5023
-.. _SWORDv2 specification: http://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html
+.. _SWORDv2 specification: https://swordapp.github.io/SWORDv2-Profile/SWORDProfile.html
.. _sword-auth:
@@ -86,7 +86,7 @@ New features as of v1.1
- "Contact E-mail" is automatically populated from dataset owner's email.
-- "Subject" uses our controlled vocabulary list of subjects. This list is in the Citation Metadata of our User Guide > `Metadata References `_. Otherwise, if a term does not match our controlled vocabulary list, it will put any subject terms in "Keyword". If Subject is empty it is automatically populated with "N/A".
+- "Subject" uses our controlled vocabulary list of subjects. This list is in the Citation Metadata of our User Guide > `Metadata References `_. Otherwise, if a term does not match our controlled vocabulary list, it will put any subject terms in "Keyword". If Subject is empty it is automatically populated with "N/A".
- Zero-length files are now allowed (but not necessarily encouraged).
@@ -127,7 +127,7 @@ Dublin Core Terms (DC Terms) Qualified Mapping - Dataverse Project DB Element Cr
+-----------------------------+----------------------------------------------+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
|dcterms:creator | authorName (LastName, FirstName) | Y | Author(s) for the Dataset. |
+-----------------------------+----------------------------------------------+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
-|dcterms:subject | subject (Controlled Vocabulary) OR keyword | Y | Controlled Vocabulary list is in our User Guide > `Metadata References `_. |
+|dcterms:subject | subject (Controlled Vocabulary) OR keyword | Y | Controlled Vocabulary list is in our User Guide > `Metadata References `_. |
+-----------------------------+----------------------------------------------+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
|dcterms:description | dsDescriptionValue | Y | Describing the purpose, scope or nature of the Dataset. Can also use dcterms:abstract. |
+-----------------------------+----------------------------------------------+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
diff --git a/doc/sphinx-guides/source/conf.py b/doc/sphinx-guides/source/conf.py
index 7ff17eb45ed..98d10526517 100755
--- a/doc/sphinx-guides/source/conf.py
+++ b/doc/sphinx-guides/source/conf.py
@@ -38,11 +38,12 @@
# ones.
extensions = [
'sphinx.ext.autodoc',
- 'sphinx.ext.intersphinx',
'sphinx.ext.ifconfig',
'sphinx.ext.viewcode',
'sphinx.ext.graphviz',
'sphinxcontrib.icon',
+ 'myst_parser',
+ 'sphinx_tabs.tabs',
]
# Add any paths that contain templates here, relative to this directory.
@@ -66,9 +67,9 @@
# built documents.
#
# The short X.Y version.
-version = '6.0'
+version = '6.1'
# The full version, including alpha/beta/rc tags.
-release = '6.0'
+release = '6.1'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
@@ -430,9 +431,6 @@
# If false, no index is generated.
#epub_use_index = True
-
-# Example configuration for intersphinx: refer to the Python standard library.
-intersphinx_mapping = {'http://docs.python.org/': None}
# Suppress "WARNING: unknown mimetype for ..." https://github.com/IQSS/dataverse/issues/3391
suppress_warnings = ['epub.unknown_project_files']
rst_prolog = """
diff --git a/doc/sphinx-guides/source/container/app-image.rst b/doc/sphinx-guides/source/container/app-image.rst
index 29f6d6ac1d4..caf4aadbf7e 100644
--- a/doc/sphinx-guides/source/container/app-image.rst
+++ b/doc/sphinx-guides/source/container/app-image.rst
@@ -22,20 +22,20 @@ IQSS will not offer you support how to deploy or run it, please reach out to the
You might be interested in taking a look at :doc:`../developers/containers`, linking you to some (community-based)
efforts.
-
+.. _supported-image-tags-app:
Supported Image Tags
++++++++++++++++++++
This image is sourced from the main upstream code `repository of the Dataverse software `_.
-Development and maintenance of the `image's code `_ happens there
-(again, by the community).
-
-.. note::
- Please note that this image is not (yet) available from Docker Hub. You need to build local to use
- (see below). Follow https://github.com/IQSS/dataverse/issues/9444 for new developments.
-
-
+Development and maintenance of the `image's code `_
+happens there (again, by the community). Community-supported image tags are based on the two most important
+upstream branches:
+
+- The ``unstable`` tag corresponds to the ``develop`` branch, where pull requests are merged.
+ (`Dockerfile `__)
+- The ``alpha`` tag corresponds to the ``master`` branch, where releases are cut from.
+ (`Dockerfile `__)
Image Contents
++++++++++++++
diff --git a/doc/sphinx-guides/source/container/base-image.rst b/doc/sphinx-guides/source/container/base-image.rst
index 1a47a8fc413..c41250d48c5 100644
--- a/doc/sphinx-guides/source/container/base-image.rst
+++ b/doc/sphinx-guides/source/container/base-image.rst
@@ -217,7 +217,14 @@ provides. These are mostly based on environment variables (very common with cont
- ``0``
- Bool, ``0|1``
- Enable the dynamic "hot" reloads of files when changed in a deployment. Useful for development,
- when new artifacts are copied into the running domain.
+ when new artifacts are copied into the running domain. Also, export Dataverse specific environment variables
+ ``DATAVERSE_JSF_PROJECT_STAGE=Development`` and ``DATAVERSE_JSF_REFRESH_PERIOD=0`` to enable dynamic JSF page
+ reloads.
+ * - ``SKIP_DEPLOY``
+ - ``0``
+ - Bool, ``0|1`` or ``false|true``
+ - When active, do not deploy applications from ``DEPLOY_DIR`` (see below), just start the application server.
+ Will still execute any provided init scripts and only skip deployments within the default init scripts.
* - ``DATAVERSE_HTTP_TIMEOUT``
- ``900``
- Seconds
@@ -272,7 +279,8 @@ building upon it. You can also use these for references in scripts, etc.
(Might be reused for Dataverse one day)
* - ``DEPLOY_DIR``
- ``${HOME_DIR}/deployments``
- - Any EAR or WAR file, exploded WAR directory etc are autodeployed on start
+ - Any EAR or WAR file, exploded WAR directory etc are autodeployed on start.
+ See also ``SKIP_DEPLOY`` above.
* - ``DOMAIN_DIR``
- ``${PAYARA_DIR}/glassfish`` ``/domains/${DOMAIN_NAME}``
- Path to root of the Payara domain applications will be deployed into. Usually ``${DOMAIN_NAME}`` will be ``domain1``.
@@ -299,9 +307,9 @@ named Docker volume in these places to avoid data loss, gain performance and/or
- Description
* - ``STORAGE_DIR``
- ``/dv``
- - This place is writeable by the Payara user, making it usable as a place to store research data, customizations
- or other. Images inheriting the base image should create distinct folders here, backed by different
- mounted volumes.
+ - This place is writeable by the Payara user, making it usable as a place to store research data, customizations or other.
+ Images inheriting the base image should create distinct folders here, backed by different mounted volumes.
+ Enforce correct filesystem permissions on the mounted volume using ``fix-fs-perms.sh`` from :doc:`configbaker-image` or similar scripts.
* - ``SECRETS_DIR``
- ``/secrets``
- Mount secrets or other here, being picked up automatically by
@@ -353,6 +361,8 @@ Other Hints
By default, ``domain1`` is enabled to use the ``G1GC`` garbage collector.
+To access the Payara Admin Console or use the ``asadmin`` command, use username ``admin`` and password ``admin``.
+
For running a Java application within a Linux based container, the support for CGroups is essential. It has been
included and activated by default since Java 8u192, Java 11 LTS and later. If you are interested in more details,
you can read about those in a few places like https://developers.redhat.com/articles/2022/04/19/java-17-whats-new-openjdks-container-awareness,
diff --git a/doc/sphinx-guides/source/container/configbaker-image.rst b/doc/sphinx-guides/source/container/configbaker-image.rst
index 7218e2d8d14..d098bd46436 100644
--- a/doc/sphinx-guides/source/container/configbaker-image.rst
+++ b/doc/sphinx-guides/source/container/configbaker-image.rst
@@ -86,7 +86,7 @@ Maven modules packaging target with activated "container" profile from the proje
If you specifically want to build a config baker image *only*, try
-``mvn -Pct package -Ddocker.filter=dev_bootstrap``
+``mvn -Pct docker:build -Ddocker.filter=dev_bootstrap``
The build of config baker involves copying Solr configset files. The Solr version used is inherited from Maven,
acting as the single source of truth. Also, the tag of the image should correspond the application image, as
diff --git a/doc/sphinx-guides/source/container/dev-usage.rst b/doc/sphinx-guides/source/container/dev-usage.rst
index 04c7eba7913..be4eda5da44 100644
--- a/doc/sphinx-guides/source/container/dev-usage.rst
+++ b/doc/sphinx-guides/source/container/dev-usage.rst
@@ -141,26 +141,205 @@ Alternatives:
Options are the same.
-Re-Deploying
-------------
+Redeploying
+-----------
+
+The safest and most reliable way to redeploy code is to stop the running containers (with Ctrl-c if you started them in the foreground) and then build and run them again with ``mvn -Pct clean package docker:run``.
+Safe, but also slowing down the development cycle a lot.
+
+Triggering redeployment of changes using an IDE can greatly improve your feedback loop when changing code.
+You have at least two options:
+
+#. Use builtin features of IDEs or `IDE plugins from Payara `_.
+#. Use a paid product like `JRebel `_.
+
+The main differences between the first and the second options are support for hot deploys of non-class files and limitations in what the JVM HotswapAgent can do for you.
+Find more details in a `blog article by JRebel `_.
+
+.. _ide-trigger-code-deploy:
+
+IDE Triggered Code Re-Deployments
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+To make use of builtin features or Payara IDE Tools (option 1), please follow steps below.
+Note that using this method, you may redeploy a complete WAR or single methods.
+Redeploying WARs supports swapping and adding classes and non-code materials, but is slower (still faster than rebuilding containers).
+Hotswapping methods requires using JDWP (Debug Mode), but does not allow switching non-code material or adding classes.
+
+#. | Download the version of Payara shown in :ref:`install-payara-dev` and unzip it to a reasonable location such as ``/usr/local/payara6``.
+ | - Note that Payara can also be downloaded from `Maven Central `_.
+ | - Note that another way to check the expected version of Payara is to run this command:
+ | ``mvn help:evaluate -Dexpression=payara.version -q -DforceStdout``
+
+#. Install Payara Tools plugin in your IDE:
+
+ .. tabs::
+ .. group-tab:: Netbeans
+
+ This step is not necessary for Netbeans. The feature is builtin.
+
+ .. group-tab:: IntelliJ
+
+ **Requires IntelliJ Ultimate!**
+ (Note that `free educational licenses `_ are available)
+
+ .. image:: img/intellij-payara-plugin-install.png
+
+#. Configure a connection to Payara:
+
+ .. tabs::
+ .. group-tab:: Netbeans
+
+ Launch Netbeans and click "Tools" and then "Servers". Click "Add Server" and select "Payara Server" and set the installation location to ``/usr/local/payara6`` (or wherever you unzipped Payara). Choose "Remote Domain". Use the settings in the screenshot below. Most of the defaults are fine.
+
+ Under "Common", the username and password should be "admin". Make sure "Enable Hot Deploy" is checked.
+
+ .. image:: img/netbeans-servers-common.png
+
+ Under "Java", change the debug port to 9009.
+
+ .. image:: img/netbeans-servers-java.png
+
+ Open the project properties (under "File"), navigate to "Compile" and make sure "Compile on Save" is checked.
+
+ .. image:: img/netbeans-compile.png
+
+ Under "Run", under "Server", select "Payara Server". Make sure "Deploy on Save" is checked.
+
+ .. image:: img/netbeans-run.png
+
+ .. group-tab:: IntelliJ
+ Create a new running configuration with a "Remote Payara".
+ (Open dialog by clicking "Run", then "Edit Configurations")
+
+ .. image:: img/intellij-payara-add-new-config.png
+
+ Click on "Configure" next to "Application Server".
+ Add an application server and select unzipped local directory.
+
+ .. image:: img/intellij-payara-config-add-server.png
+
+ Add admin password "admin" and add "building artifact" before launch.
+ Make sure to select the WAR, *not* exploded!
+
+ .. image:: img/intellij-payara-config-server.png
+
+ Go to "Deployment" tab and add the Dataverse WAR, *not* exploded!.
+
+ .. image:: img/intellij-payara-config-deployment.png
+
+ Go to "Startup/Connection" tab, select "Debug" and change port to ``9009``.
+
+ .. image:: img/intellij-payara-config-startup.png
+
+ You might want to tweak the hot deploy behavior in the "Server" tab now.
+ "Update action" can be found in the run window (see below).
+ "Frame deactivation" means switching from IntelliJ window to something else, e.g. your browser.
+ *Note: static resources like properties, XHTML etc will only update when redeploying!*
+
+ .. image:: img/intellij-payara-config-server-behaviour.png
+
+#. Start all the containers, but take care to skip application deployment.
+
+ .. tabs::
+ .. group-tab:: Maven
+ ``mvn -Pct docker:run -Dapp.skipDeploy``
+
+ Run above command in your terminal to start containers in foreground and skip deployment.
+ See cheat sheet above for more options.
+ Note that this command either assumes you built the :doc:`app-image` first or will download it from Docker Hub.
+ .. group-tab:: Compose
+ ``SKIP_DEPLOY=1 docker compose -f docker-compose-dev.yml up``
+
+ Run above command in your terminal to start containers in foreground and skip deployment.
+ See cheat sheet above for more options.
+ Note that this command either assumes you built the :doc:`app-image` first or will download it from Docker Hub.
+ .. group-tab:: IntelliJ
+ You can create a service configuration to automatically start services for you.
+
+ **IMPORTANT**: This requires installation of the `Docker plugin `_.
+
+ **NOTE**: You might need to change the Docker Compose executable in your IDE settings to ``docker`` if you have no ``docker-compose`` bin (*File > Settings > Build > Docker > Tools*).
+
+ .. image:: img/intellij-compose-add-new-config.png
+
+ Give your configuration a meaningful name, select the compose file to use (in this case the default one), add the environment variable ``SKIP_DEPLOY=1``, and optionally select the services to start.
+ You might also want to change other options like attaching to containers to view the logs within the "Services" tab.
+
+ .. image:: img/intellij-compose-setup.png
+
+ Now run the configuration to prepare for deployment and watch it unfold in the "Services" tab.
+
+ .. image:: img/intellij-compose-run.png
+ .. image:: img/intellij-compose-services.png
+
+ Note: the Admin Console can be reached at http://localhost:4848 or https://localhost:4949
+
+#. To deploy the application to the running server, use the configured tools to deploy.
+ Using the "Run" configuration only deploys and enables redeploys, while running "Debug" enables hot swapping of classes via JDWP.
+
+ .. tabs::
+ .. group-tab:: Netbeans
+
+ Click "Debug" then "Debug Project". After some time, Dataverse will be deployed.
+
+ Try making a code change, perhaps to ``Info.java``.
+
+ Click "Debug" and then "Apply Code Changes". If the change was correctly applied, you should see output similar to this:
+
+ .. code-block::
+
+ Classes to reload:
+ edu.harvard.iq.dataverse.api.Info
+
+ Code updated
+
+ Check to make sure the change is live by visiting, for example, http://localhost:8080/api/info/version
+
+ See below for a `video `_ demonstrating the steps above but please note that the ports used have changed and now that we have the concept of "skip deploy" the undeployment step shown is no longer necessary.
+
+ .. raw:: html
+
+
+
+ .. group-tab:: IntelliJ
+ Choose "Run" or "Debug" in the toolbar.
+
+ .. image:: img/intellij-payara-run-toolbar.png
+
+ Watch the WAR build and the deployment unfold.
+ Note the "Update" action button (see config to change its behavior).
+
+ .. image:: img/intellij-payara-run-output.png
+
+ Manually hotswap classes in "Debug" mode via "Run" > "Debugging Actions" > "Reload Changed Classes".
+
+ .. image:: img/intellij-payara-run-menu-reload.png
+
+Note: in the background, the bootstrap job will wait for Dataverse to be deployed and responsive.
+When your IDE automatically opens the URL a newly deployed, not bootstrapped Dataverse application, it might take some more time and page refreshes until the job finishes.
+
+IDE Triggered Non-Code Re-Deployments
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Currently, the only safe and tested way to re-deploy the Dataverse application after you applied code changes is
-by recreating the container(s). In the future, more options may be added here.
+Either redeploy the WAR (see above), use JRebel or look into copying files into the exploded WAR within the running container.
+The steps below describe options to enable the later in different IDEs.
-If you started your containers in foreground, just stop them and follow the steps for building and running again.
-The same goes for using Maven to start the containers in the background.
+.. tabs::
+ .. group-tab:: IntelliJ
-In case of using Docker Compose and starting the containers in the background, you can use a workaround to only
-restart the application container:
+ This imitates the Netbeans builtin function to copy changes to files under ``src/main/webapp`` into a destination folder.
+ It is different in the way that it will copy the files into the running container deployment without using a bind mount.
-.. code-block::
+ 1. Install the `File Watchers plugin `_
+ 2. Import the :download:`watchers.xml <../../../../docker/util/intellij/watchers.xml>` file at *File > Settings > Tools > File Watchers*
+ 3. Once you have the deployment running (see above), editing files under ``src/main/webapp`` will be copied into the container after saving the edited file.
+ Note: by default, IDE auto-saves will not trigger the copy.
+ 4. Changes are visible once you reload the browser window.
- # First rebuild the container (will complain about an image still in use, this is fine.)
- mvn -Pct package
- # Then re-create the container (will automatically restart the container for you)
- docker compose -f docker-compose-dev.yml create dev_dataverse
+ **IMPORTANT**: This tool assumes you are using the :ref:`ide-trigger-code-deploy` method to run Dataverse.
-Using ``docker container inspect dev_dataverse | grep Image`` you can verify the changed checksums.
+ **IMPORTANT**: This tool uses a Bash shell script and is thus limited to Mac and Linux OS.
Using a Debugger
----------------
diff --git a/doc/sphinx-guides/source/container/img/intellij-compose-add-new-config.png b/doc/sphinx-guides/source/container/img/intellij-compose-add-new-config.png
new file mode 100644
index 00000000000..cec9bb357fe
Binary files /dev/null and b/doc/sphinx-guides/source/container/img/intellij-compose-add-new-config.png differ
diff --git a/doc/sphinx-guides/source/container/img/intellij-compose-run.png b/doc/sphinx-guides/source/container/img/intellij-compose-run.png
new file mode 100644
index 00000000000..e01744134f9
Binary files /dev/null and b/doc/sphinx-guides/source/container/img/intellij-compose-run.png differ
diff --git a/doc/sphinx-guides/source/container/img/intellij-compose-services.png b/doc/sphinx-guides/source/container/img/intellij-compose-services.png
new file mode 100644
index 00000000000..1c500c54201
Binary files /dev/null and b/doc/sphinx-guides/source/container/img/intellij-compose-services.png differ
diff --git a/doc/sphinx-guides/source/container/img/intellij-compose-setup.png b/doc/sphinx-guides/source/container/img/intellij-compose-setup.png
new file mode 100644
index 00000000000..42c2accf2b4
Binary files /dev/null and b/doc/sphinx-guides/source/container/img/intellij-compose-setup.png differ
diff --git a/doc/sphinx-guides/source/container/img/intellij-payara-add-new-config.png b/doc/sphinx-guides/source/container/img/intellij-payara-add-new-config.png
new file mode 100644
index 00000000000..d1c7a8f2777
Binary files /dev/null and b/doc/sphinx-guides/source/container/img/intellij-payara-add-new-config.png differ
diff --git a/doc/sphinx-guides/source/container/img/intellij-payara-config-add-server.png b/doc/sphinx-guides/source/container/img/intellij-payara-config-add-server.png
new file mode 100644
index 00000000000..54ffbd1b713
Binary files /dev/null and b/doc/sphinx-guides/source/container/img/intellij-payara-config-add-server.png differ
diff --git a/doc/sphinx-guides/source/container/img/intellij-payara-config-deployment.png b/doc/sphinx-guides/source/container/img/intellij-payara-config-deployment.png
new file mode 100644
index 00000000000..52adee056b5
Binary files /dev/null and b/doc/sphinx-guides/source/container/img/intellij-payara-config-deployment.png differ
diff --git a/doc/sphinx-guides/source/container/img/intellij-payara-config-server-behaviour.png b/doc/sphinx-guides/source/container/img/intellij-payara-config-server-behaviour.png
new file mode 100644
index 00000000000..5d23672e614
Binary files /dev/null and b/doc/sphinx-guides/source/container/img/intellij-payara-config-server-behaviour.png differ
diff --git a/doc/sphinx-guides/source/container/img/intellij-payara-config-server.png b/doc/sphinx-guides/source/container/img/intellij-payara-config-server.png
new file mode 100644
index 00000000000..614bda6f6d7
Binary files /dev/null and b/doc/sphinx-guides/source/container/img/intellij-payara-config-server.png differ
diff --git a/doc/sphinx-guides/source/container/img/intellij-payara-config-startup.png b/doc/sphinx-guides/source/container/img/intellij-payara-config-startup.png
new file mode 100644
index 00000000000..35b87148859
Binary files /dev/null and b/doc/sphinx-guides/source/container/img/intellij-payara-config-startup.png differ
diff --git a/doc/sphinx-guides/source/container/img/intellij-payara-plugin-install.png b/doc/sphinx-guides/source/container/img/intellij-payara-plugin-install.png
new file mode 100644
index 00000000000..7c6896574de
Binary files /dev/null and b/doc/sphinx-guides/source/container/img/intellij-payara-plugin-install.png differ
diff --git a/doc/sphinx-guides/source/container/img/intellij-payara-run-menu-reload.png b/doc/sphinx-guides/source/container/img/intellij-payara-run-menu-reload.png
new file mode 100644
index 00000000000..b1fd8bea260
Binary files /dev/null and b/doc/sphinx-guides/source/container/img/intellij-payara-run-menu-reload.png differ
diff --git a/doc/sphinx-guides/source/container/img/intellij-payara-run-output.png b/doc/sphinx-guides/source/container/img/intellij-payara-run-output.png
new file mode 100644
index 00000000000..aa139485a9d
Binary files /dev/null and b/doc/sphinx-guides/source/container/img/intellij-payara-run-output.png differ
diff --git a/doc/sphinx-guides/source/container/img/intellij-payara-run-toolbar.png b/doc/sphinx-guides/source/container/img/intellij-payara-run-toolbar.png
new file mode 100644
index 00000000000..2aecb27c5f3
Binary files /dev/null and b/doc/sphinx-guides/source/container/img/intellij-payara-run-toolbar.png differ
diff --git a/doc/sphinx-guides/source/container/img/netbeans-compile.png b/doc/sphinx-guides/source/container/img/netbeans-compile.png
new file mode 100644
index 00000000000..e429695ccb0
Binary files /dev/null and b/doc/sphinx-guides/source/container/img/netbeans-compile.png differ
diff --git a/doc/sphinx-guides/source/container/img/netbeans-run.png b/doc/sphinx-guides/source/container/img/netbeans-run.png
new file mode 100644
index 00000000000..00f8af23cc5
Binary files /dev/null and b/doc/sphinx-guides/source/container/img/netbeans-run.png differ
diff --git a/doc/sphinx-guides/source/container/img/netbeans-servers-common.png b/doc/sphinx-guides/source/container/img/netbeans-servers-common.png
new file mode 100644
index 00000000000..a9ded5dbec3
Binary files /dev/null and b/doc/sphinx-guides/source/container/img/netbeans-servers-common.png differ
diff --git a/doc/sphinx-guides/source/container/img/netbeans-servers-java.png b/doc/sphinx-guides/source/container/img/netbeans-servers-java.png
new file mode 100644
index 00000000000..2593cacc5ae
Binary files /dev/null and b/doc/sphinx-guides/source/container/img/netbeans-servers-java.png differ
diff --git a/doc/sphinx-guides/source/container/index.rst b/doc/sphinx-guides/source/container/index.rst
index 4bbc87a4845..abf871dd340 100644
--- a/doc/sphinx-guides/source/container/index.rst
+++ b/doc/sphinx-guides/source/container/index.rst
@@ -1,28 +1,12 @@
Container Guide
===============
-Running the Dataverse software in containers is quite different than in a :doc:`standard installation <../installation/prep>`.
-
-Both approaches have pros and cons. These days, containers are very often used for development and testing,
-but there is an ever rising move toward running applications in the cloud using container technology.
-
-**NOTE:**
-**As the Institute for Quantitative Social Sciences (IQSS) at Harvard is running a standard, non-containerized installation,
-container support described in this guide is mostly created and maintained by the Dataverse community on a best-effort
-basis.**
-
-This guide is *not* about installation on technology like Docker Swarm, Kubernetes, Rancher or other
-solutions to run containers in production. There is the `Dataverse on K8s project `_ for this
-purpose, as mentioned in the :doc:`/developers/containers` section of the Developer Guide.
-
-This guide focuses on describing the container images managed from the main Dataverse repository (again: by the
-community, not IQSS), their features and limitations. Instructions on how to build the images yourself and how to
-develop and extend them further are provided.
-
**Contents:**
.. toctree::
+ intro
+ running/index
dev-usage
base-image
app-image
diff --git a/doc/sphinx-guides/source/container/intro.rst b/doc/sphinx-guides/source/container/intro.rst
new file mode 100644
index 00000000000..5099531dcc9
--- /dev/null
+++ b/doc/sphinx-guides/source/container/intro.rst
@@ -0,0 +1,28 @@
+Introduction
+============
+
+Dataverse in containers!
+
+.. contents:: |toctitle|
+ :local:
+
+Intended Audience
+-----------------
+
+This guide is intended for anyone who wants to run Dataverse in containers. This is potentially a wide audience, from sysadmins interested in running Dataverse in production in containers (not recommended yet) to contributors working on a bug fix (encouraged!). See :doc:`running/index` for various scenarios and please let us know if your use case is not covered.
+
+.. _getting-help-containers:
+
+Getting Help
+------------
+
+Please ask in #containers at https://chat.dataverse.org
+
+Alternatively, you can try one or more of the channels under :ref:`support`.
+
+.. _helping-containers:
+
+Helping with the Containerization Effort
+----------------------------------------
+
+In 2023 the Containerization Working Group started meeting regularly. All are welcome to join! We talk in #containers at https://chat.dataverse.org and have a regular video call. For details, please visit https://ct.gdcc.io
diff --git a/doc/sphinx-guides/source/container/running/backend-dev.rst b/doc/sphinx-guides/source/container/running/backend-dev.rst
new file mode 100644
index 00000000000..8b2dab956ad
--- /dev/null
+++ b/doc/sphinx-guides/source/container/running/backend-dev.rst
@@ -0,0 +1,10 @@
+Backend Development
+===================
+
+.. contents:: |toctitle|
+ :local:
+
+Intro
+-----
+
+See :doc:`../dev-usage`.
diff --git a/doc/sphinx-guides/source/container/running/demo.rst b/doc/sphinx-guides/source/container/running/demo.rst
new file mode 100644
index 00000000000..24027e677a1
--- /dev/null
+++ b/doc/sphinx-guides/source/container/running/demo.rst
@@ -0,0 +1,217 @@
+Demo or Evaluation
+==================
+
+In the following tutorial we'll walk through spinning up Dataverse in containers for demo or evaluation purposes.
+
+.. contents:: |toctitle|
+ :local:
+
+Quickstart
+----------
+
+First, let's confirm that we can get Dataverse running on your system.
+
+- Download :download:`compose.yml <../../../../../docker/compose/demo/compose.yml>`
+- Run ``docker compose up`` in the directory where you put ``compose.yml``
+- Visit http://localhost:8080 and try logging in:
+
+ - username: dataverseAdmin
+ - password: admin1
+
+If you can log in, great! Please continue through the tutorial. If you have any trouble, please consult the sections below on troubleshooting and getting help.
+
+Stopping and Starting the Containers
+------------------------------------
+
+Let's practice stopping the containers and starting them up again. Your data, stored in a directory called ``data``, will remain intact
+
+To stop the containers hit ``Ctrl-c`` (hold down the ``Ctrl`` key and then hit the ``c`` key).
+
+To start the containers, run ``docker compose up``.
+
+Deleting Data and Starting Over
+-------------------------------
+
+Again, data related to your Dataverse installation such as the database is stored in a directory called ``data`` that gets created in the directory where you ran ``docker compose`` commands.
+
+You may reach a point during your demo or evaluation that you'd like to start over with a fresh database. Simply make sure the containers are not running and then remove the ``data`` directory. Now, as before, you can run ``docker compose up`` to spin up the containers.
+
+Setting Up for a Demo
+---------------------
+
+Now that you are familiar with the basics of running Dataverse in containers, let's move on to a better setup for a demo or evaluation.
+
+Starting Fresh
+++++++++++++++
+
+For this exercise, please start fresh by stopping all containers and removing the ``data`` directory.
+
+Creating and Running a Demo Persona
++++++++++++++++++++++++++++++++++++
+
+Previously we used the "dev" persona to bootstrap Dataverse, but for security reasons, we should create a persona more suited to demos and evaluations.
+
+Edit the ``compose.yml`` file and look for the following section.
+
+.. code-block:: bash
+
+ bootstrap:
+ container_name: "bootstrap"
+ image: gdcc/configbaker:alpha
+ restart: "no"
+ command:
+ - bootstrap.sh
+ - dev
+ #- demo
+ #volumes:
+ # - ./demo:/scripts/bootstrap/demo
+ networks:
+ - dataverse
+
+Comment out "dev" and uncomment "demo".
+
+Uncomment the "volumes" section.
+
+Create a directory called "demo" and copy :download:`init.sh <../../../../../modules/container-configbaker/scripts/bootstrap/demo/init.sh>` into it. You are welcome to edit this demo init script, customizing the final message, for example.
+
+Note that the init script contains a key for using the admin API once it is blocked. You should change it in the script from "unblockme" to something only you know.
+
+Now run ``docker compose up``. The "bootstrap" container should exit with the message from the init script and Dataverse should be running on http://localhost:8080 as before during the quickstart exercise.
+
+One of the main differences between the "dev" persona and our new "demo" persona is that we are now running the setup-all script without the ``--insecure`` flag. This makes our installation more secure, though it does block "admin" APIs that are useful for configuration.
+
+Smoke Testing
+-------------
+
+At this point, please try the following basic operations within your installation:
+
+- logging in as dataverseAdmin (password "admin1")
+- publishing the "root" collection (dataverse)
+- creating a collection
+- creating a dataset
+- uploading a data file
+- publishing the dataset
+
+If anything isn't working, please see the sections below on troubleshooting, giving feedback, and getting help.
+
+Further Configuration
+---------------------
+
+Now that we've verified through a smoke test that basic operations are working, let's configure our installation of Dataverse.
+
+Please refer to the :doc:`/installation/config` section of the Installation Guide for various configuration options.
+
+Below we'll explain some specifics for configuration in containers.
+
+JVM Options/MicroProfile Config
++++++++++++++++++++++++++++++++
+
+:ref:`jvm-options` can be configured under ``JVM_ARGS`` in the ``compose.yml`` file. Here's an example:
+
+.. code-block:: bash
+
+ environment:
+ JVM_ARGS: -Ddataverse.files.storage-driver-id=file1
+
+Some JVM options can be configured as environment variables. For example, you can configure the database host like this:
+
+.. code-block:: bash
+
+ environment:
+ DATAVERSE_DB_HOST: postgres
+
+We are in the process of making more JVM options configurable as environment variables. Look for the term "MicroProfile Config" in under :doc:`/installation/config` in the Installation Guide to know if you can use them this way.
+
+Please note that for a few environment variables (the ones that start with ``%ct`` in :download:`microprofile-config.properties <../../../../../src/main/resources/META-INF/microprofile-config.properties>`), you have to prepend ``_CT_`` to make, for example, ``_CT_DATAVERSE_SITEURL``. We are working on a fix for this in https://github.com/IQSS/dataverse/issues/10285.
+
+There is a final way to configure JVM options that we plan to deprecate once all JVM options have been converted to MicroProfile Config. Look for "magic trick" under "tunables" at :doc:`../app-image` for more information.
+
+Database Settings
++++++++++++++++++
+
+Generally, you should be able to look at the list of :ref:`database-settings` and configure them but the "demo" persona above secured your installation to the point that you'll need an "unblock key" to access the "admin" API and change database settings.
+
+In the example below of configuring :ref:`:FooterCopyright` we use the default unblock key of "unblockme" but you should use the key you set above.
+
+``curl -X PUT -d ", My Org" "http://localhost:8080/api/admin/settings/:FooterCopyright?unblock-key=unblockme"``
+
+One you make this change it should be visible in the copyright in the bottom left of every page.
+
+Next Steps
+----------
+
+From here, you are encouraged to continue poking around, configuring, and testing. You probably spend a lot of time reading the :doc:`/installation/config` section of the Installation Guide.
+
+Please consider giving feedback using the methods described below. Good luck with your demo!
+
+About the Containers
+--------------------
+
+Now that you've gone through the tutorial, you might be interested in the various containers you've spun up and what they do.
+
+Container List
+++++++++++++++
+
+If you run ``docker ps``, you'll see that multiple containers are spun up in a demo or evaluation. Here are the most important ones:
+
+- dataverse
+- postgres
+- solr
+- smtp
+- bootstrap
+
+Most are self-explanatory, and correspond to components listed under :doc:`/installation/prerequisites` in the (traditional) Installation Guide, but "bootstrap" refers to :doc:`../configbaker-image`.
+
+Additional containers are used in development (see :doc:`../dev-usage`), but for the purposes of a demo or evaluation, fewer moving (sometimes pointy) parts are included.
+
+Tags and Versions
++++++++++++++++++
+
+The compose file references a tag called "alpha", which corresponds to the latest released version of Dataverse. This means that if a release of Dataverse comes out while you are demo'ing or evaluating, the version of Dataverse you are using could change if you do a ``docker pull``. We are aware that there is a desire for tags that correspond to versions to ensure consistency. You are welcome to join `the discussion `_ and otherwise get in touch (see :ref:`helping-containers`). For more on tags, see :ref:`supported-image-tags-app`.
+
+Once Dataverse is running, you can check which version you have through the normal methods:
+
+- Check the bottom right in a web browser.
+- Check http://localhost:8080/api/info/version via API.
+
+Troubleshooting
+---------------
+
+Hardware and Software Requirements
+++++++++++++++++++++++++++++++++++
+
+- 8 GB RAM (if not much else is running)
+- Mac, Linux, or Windows (experimental)
+- Docker
+
+Windows support is experimental but we are very interested in supporting Windows better. Please report bugs (see :ref:`helping-containers`).
+
+Bootstrapping Did Not Complete
+++++++++++++++++++++++++++++++
+
+In the compose file, try increasing the timeout in the bootstrap container by adding something like this:
+
+.. code-block:: bash
+
+ environment:
+ - TIMEOUT=10m
+
+Wrapping Up
+-----------
+
+Deleting the Containers and Data
+++++++++++++++++++++++++++++++++
+
+If you no longer need the containers because your demo or evaluation is finished and you want to reclaim disk space, run ``docker compose down`` in the directory where you put ``compose.yml``.
+
+You might also want to delete the ``data`` directory, as described above.
+
+Giving Feedback
+---------------
+
+Your feedback is extremely valuable to us! To let us know what you think, please see :ref:`helping-containers`.
+
+Getting Help
+------------
+
+Please do not be shy about reaching out for help. We very much want you to have a pleasant demo or evaluation experience. For ways to contact us, please see See :ref:`getting-help-containers`.
diff --git a/doc/sphinx-guides/source/container/running/frontend-dev.rst b/doc/sphinx-guides/source/container/running/frontend-dev.rst
new file mode 100644
index 00000000000..88d40c12053
--- /dev/null
+++ b/doc/sphinx-guides/source/container/running/frontend-dev.rst
@@ -0,0 +1,10 @@
+Frontend Development
+====================
+
+.. contents:: |toctitle|
+ :local:
+
+Intro
+-----
+
+The frontend (web interface) of Dataverse is being decoupled from the backend. This evolving codebase has its own repo at https://github.com/IQSS/dataverse-frontend which includes docs and scripts for running the backend of Dataverse in Docker.
diff --git a/doc/sphinx-guides/source/container/running/github-action.rst b/doc/sphinx-guides/source/container/running/github-action.rst
new file mode 100644
index 00000000000..ae42dd494d1
--- /dev/null
+++ b/doc/sphinx-guides/source/container/running/github-action.rst
@@ -0,0 +1,18 @@
+GitHub Action
+=============
+
+.. contents:: |toctitle|
+ :local:
+
+Intro
+-----
+
+A GitHub Action is under development that will spin up a Dataverse instance within the context of GitHub CI workflows: https://github.com/gdcc/dataverse-action
+
+Use Cases
+---------
+
+Use cases for the GitHub Action include:
+
+- Testing :doc:`/api/client-libraries` that interact with Dataverse APIs
+- Testing :doc:`/admin/integrations` of third party software with Dataverse
diff --git a/doc/sphinx-guides/source/container/running/index.rst b/doc/sphinx-guides/source/container/running/index.rst
new file mode 100755
index 00000000000..a02266f7cba
--- /dev/null
+++ b/doc/sphinx-guides/source/container/running/index.rst
@@ -0,0 +1,13 @@
+Running Dataverse in Docker
+===========================
+
+Contents:
+
+.. toctree::
+
+ production
+ demo
+ metadata-blocks
+ github-action
+ frontend-dev
+ backend-dev
diff --git a/doc/sphinx-guides/source/container/running/metadata-blocks.rst b/doc/sphinx-guides/source/container/running/metadata-blocks.rst
new file mode 100644
index 00000000000..fcc80ce1909
--- /dev/null
+++ b/doc/sphinx-guides/source/container/running/metadata-blocks.rst
@@ -0,0 +1,15 @@
+Editing Metadata Blocks
+=======================
+
+.. contents:: |toctitle|
+ :local:
+
+Intro
+-----
+
+The Admin Guide has a section on :doc:`/admin/metadatacustomization` and suggests running Dataverse in containers (Docker) for this purpose.
+
+Status
+------
+
+For now, please see :doc:`demo`, which should also provide a suitable Dockerized Dataverse environment.
diff --git a/doc/sphinx-guides/source/container/running/production.rst b/doc/sphinx-guides/source/container/running/production.rst
new file mode 100644
index 00000000000..0a628dc57b9
--- /dev/null
+++ b/doc/sphinx-guides/source/container/running/production.rst
@@ -0,0 +1,20 @@
+Production (Future)
+===================
+
+.. contents:: |toctitle|
+ :local:
+
+Status
+------
+
+The images described in this guide are not yet recommended for production usage.
+
+How to Help
+-----------
+
+You can help the effort to support these images in production by trying them out (see :doc:`demo`) and giving feedback (see :ref:`helping-containers`).
+
+Alternatives
+------------
+
+Until the images are ready for production, please use the traditional installation method described in the :doc:`/installation/index`.
diff --git a/doc/sphinx-guides/source/developers/api-design.rst b/doc/sphinx-guides/source/developers/api-design.rst
new file mode 100755
index 00000000000..e7a7a6408bb
--- /dev/null
+++ b/doc/sphinx-guides/source/developers/api-design.rst
@@ -0,0 +1,63 @@
+==========
+API Design
+==========
+
+API design is a large topic. We expect this page to grow over time.
+
+.. contents:: |toctitle|
+ :local:
+
+Paths
+-----
+
+A reminder `from Wikipedia `_ of what a path is:
+
+.. code-block:: bash
+
+ userinfo host port
+ ┌──┴───┠┌──────┴──────┠┌┴â”
+ https://john.doe@www.example.com:123/forum/questions/?tag=networking&order=newest#top
+ └─┬─┘ └─────────────┬────────────┘└───────┬───────┘ └────────────┬────────────┘ └┬┘
+ scheme authority path query fragment
+
+Exposing Settings
+~~~~~~~~~~~~~~~~~
+
+Since Dataverse 4, database settings have been exposed via API at http://localhost:8080/api/admin/settings
+
+(JVM options are probably available via the Payara REST API, but this is out of scope.)
+
+Settings need to be exposed outside to API clients outside of ``/api/admin`` (which is typically restricted to localhost). Here are some guidelines to follow when exposing settings.
+
+- When you are exposing a database setting as-is:
+
+ - Use ``/api/info/settings`` as the root path.
+
+ - Append the name of the setting including the colon (e.g. ``:DatasetPublishPopupCustomText``)
+
+ - Final path example: ``/api/info/settings/:DatasetPublishPopupCustomText``
+
+- If the absence of the database setting is filled in by a default value (e.g. ``:ZipDownloadLimit`` or ``:ApiTermsOfUse``):
+
+ - Use ``/api/info`` as the root path.
+
+ - Append the setting but remove the colon and downcase the first character (e.g. ``zipDownloadLimit``)
+
+ - Final path example: ``/api/info/zipDownloadLimit``
+
+- If the database setting you're exposing make more sense outside of ``/api/info`` because there's more context (e.g. ``:CustomDatasetSummaryFields``):
+
+ - Feel free to use a path outside of ``/api/info`` as the root path.
+
+ - Given additional context, append a shortened name (e.g. ``/api/datasets/summaryFieldNames``).
+
+ - Final path example: ``/api/datasets/summaryFieldNames``
+
+- If you need to expose a JVM option (MicroProfile setting) such as ``dataverse.api.allow-incomplete-metadata``:
+
+ - Use ``/api/info`` as the root path.
+
+ - Append a meaningful name for the setting (e.g. ``incompleteMetadataViaApi``).
+
+ - Final path example: ``/api/info/incompleteMetadataViaApi``
+
diff --git a/doc/sphinx-guides/source/developers/big-data-support.rst b/doc/sphinx-guides/source/developers/big-data-support.rst
index 04885571a01..8d891e63317 100644
--- a/doc/sphinx-guides/source/developers/big-data-support.rst
+++ b/doc/sphinx-guides/source/developers/big-data-support.rst
@@ -149,26 +149,39 @@ Globus File Transfer
Note: Globus file transfer is still experimental but feedback is welcome! See :ref:`support`.
-Users can transfer files via `Globus `_ into and out of datasets when their Dataverse installation is configured to use a Globus accessible S3 store and a community-developed `dataverse-globus `_ "transfer" app has been properly installed and configured.
+Users can transfer files via `Globus `_ into and out of datasets, or reference files on a remote Globus endpoint, when their Dataverse installation is configured to use a Globus accessible store(s)
+and a community-developed `dataverse-globus `_ app has been properly installed and configured.
-Due to differences in the access control models of a Dataverse installation and Globus, enabling the Globus capability on a store will disable the ability to restrict and embargo files in that store.
+Globus endpoints can be in a variety of places, from data centers to personal computers.
+This means that from within the Dataverse software, a Globus transfer can feel like an upload or a download (with Globus Personal Connect running on your laptop, for example) or it can feel like a true transfer from one server to another (from a cluster in a data center into a Dataverse dataset or vice versa).
-As Globus aficionados know, Globus endpoints can be in a variety of places, from data centers to personal computers. This means that from within the Dataverse software, a Globus transfer can feel like an upload or a download (with Globus Personal Connect running on your laptop, for example) or it can feel like a true transfer from one server to another (from a cluster in a data center into a Dataverse dataset or vice versa).
-
-Globus transfer uses a very efficient transfer mechanism and has additional features that make it suitable for large files and large numbers of files:
+Globus transfer uses an efficient transfer mechanism and has additional features that make it suitable for large files and large numbers of files:
* robust file transfer capable of restarting after network or endpoint failures
* third-party transfer, which enables a user accessing a Dataverse installation in their desktop browser to initiate transfer of their files from a remote endpoint (i.e. on a local high-performance computing cluster), directly to an S3 store managed by the Dataverse installation
-Globus transfer requires use of the Globus S3 connector which requires a paid Globus subscription at the host institution. Users will need a Globus account which could be obtained via their institution or directly from Globus (at no cost).
+Note: Due to differences in the access control models of a Dataverse installation and Globus and the current Globus store model, Dataverse cannot enforce per-file-access restrictions.
+It is therefore recommended that a store be configured as public, which disables the ability to restrict and embargo files in that store, when Globus access is allowed.
+
+Dataverse supports three options for using Globus, two involving transfer to Dataverse-managed endpoints and one allowing Dataverse to reference files on remote endpoints.
+Dataverse-managed endpoints must be Globus 'guest collections' hosted on either a file-system-based endpoint or an S3-based endpoint (the latter requires use of the Globus
+S3 connector which requires a paid Globus subscription at the host institution). In either case, Dataverse is configured with the Globus credentials of a user account that can manage the endpoint.
+Users will need a Globus account, which can be obtained via their institution or directly from Globus (at no cost).
+
+With the file-system endpoint, Dataverse does not currently have access to the file contents. Thus, functionality related to ingest, previews, fixity hash validation, etc. are not available. (Using the S3-based endpoint, Dataverse has access via S3 and all functionality normally associated with direct uploads to S3 is available.)
+
+For the reference use case, Dataverse must be configured with a list of allowed endpoint/base paths from which files may be referenced. In this case, since Dataverse is not accessing the remote endpoint itself, it does not need Globus credentials.
+Users will need a Globus account in this case, and the remote endpoint must be configured to allow them access (i.e. be publicly readable, or potentially involving some out-of-band mechanism to request access (that could be described in the dataset's Terms of Use and Access).
+
+All of Dataverse's Globus capabilities are now store-based (see the store documentation) and therefore different collections/datasets can be configured to use different Globus-capable stores (or normal file, S3 stores, etc.)
-The setup required to enable Globus is described in the `Community Dataverse-Globus Setup and Configuration document `_ and the references therein.
+More details of the setup required to enable Globus is described in the `Community Dataverse-Globus Setup and Configuration document `_ and the references therein.
As described in that document, Globus transfers can be initiated by choosing the Globus option in the dataset upload panel. (Globus, which does asynchronous transfers, is not available during dataset creation.) Analogously, "Globus Transfer" is one of the download options in the "Access Dataset" menu and optionally the file landing page download menu (if/when supported in the dataverse-globus app).
An overview of the control and data transfer interactions between components was presented at the 2022 Dataverse Community Meeting and can be viewed in the `Integrations and Tools Session Video `_ around the 1 hr 28 min mark.
-See also :ref:`Globus settings <:GlobusBasicToken>`.
+See also :ref:`Globus settings <:GlobusSettings>`.
Data Capture Module (DCM)
-------------------------
diff --git a/doc/sphinx-guides/source/developers/classic-dev-env.rst b/doc/sphinx-guides/source/developers/classic-dev-env.rst
index 062a1bb36f3..6978f389e01 100755
--- a/doc/sphinx-guides/source/developers/classic-dev-env.rst
+++ b/doc/sphinx-guides/source/developers/classic-dev-env.rst
@@ -46,7 +46,7 @@ On Linux, you are welcome to use the OpenJDK available from package managers.
Install Netbeans or Maven
~~~~~~~~~~~~~~~~~~~~~~~~~
-NetBeans IDE is recommended, and can be downloaded from http://netbeans.org . Developers may use any editor or IDE. We recommend NetBeans because it is free, works cross platform, has good support for Jakarta EE projects, and includes a required build tool, Maven.
+NetBeans IDE is recommended, and can be downloaded from https://netbeans.org . Developers may use any editor or IDE. We recommend NetBeans because it is free, works cross platform, has good support for Jakarta EE projects, and includes a required build tool, Maven.
Below we describe how to build the Dataverse Software war file with Netbeans but if you prefer to use only Maven, you can find installation instructions in the :doc:`tools` section.
@@ -86,7 +86,9 @@ On Mac, run this command:
``brew install jq``
-On Linux, install ``jq`` from your package manager or download a binary from http://stedolan.github.io/jq/
+On Linux, install ``jq`` from your package manager or download a binary from https://stedolan.github.io/jq/
+
+.. _install-payara-dev:
Install Payara
~~~~~~~~~~~~~~
@@ -134,7 +136,7 @@ On Linux, you should just install PostgreSQL using your favorite package manager
Install Solr
^^^^^^^^^^^^
-`Solr `_ 9.3.0 is required.
+`Solr `_ 9.3.0 is required.
To install Solr, execute the following commands:
@@ -144,7 +146,7 @@ To install Solr, execute the following commands:
``cd /usr/local/solr``
-``curl -O http://archive.apache.org/dist/solr/solr/9.3.0/solr-9.3.0.tgz``
+``curl -O https://archive.apache.org/dist/solr/solr/9.3.0/solr-9.3.0.tgz``
``tar xvfz solr-9.3.0.tgz``
@@ -260,7 +262,3 @@ Next Steps
If you can log in to the Dataverse installation, great! If not, please see the :doc:`troubleshooting` section. For further assistance, please see "Getting Help" in the :doc:`intro` section.
You're almost ready to start hacking on code. Now that the installer script has you up and running, you need to continue on to the :doc:`tips` section to get set up to deploy code from your IDE or the command line.
-
-----
-
-Previous: :doc:`intro` | Next: :doc:`tips`
diff --git a/doc/sphinx-guides/source/developers/coding-style.rst b/doc/sphinx-guides/source/developers/coding-style.rst
index 0c00f611a7f..9da7836bbf4 100755
--- a/doc/sphinx-guides/source/developers/coding-style.rst
+++ b/doc/sphinx-guides/source/developers/coding-style.rst
@@ -57,7 +57,7 @@ Place curly braces according to the style below, which is an example you can see
Format Code You Changed with Netbeans
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-As you probably gathered from the :doc:`dev-environment` section, IQSS has standardized on Netbeans. It is much appreciated when you format your code (but only the code you touched!) using the out-of-the-box Netbeans configuration. If you have created an entirely new Java class, you can just click Source -> Format. If you are adjusting code in an existing class, highlight the code you changed and then click Source -> Format. Keeping the "diff" in your pull requests small makes them easier to code review.
+IQSS has standardized on Netbeans. It is much appreciated when you format your code (but only the code you touched!) using the out-of-the-box Netbeans configuration. If you have created an entirely new Java class, you can just click Source -> Format. If you are adjusting code in an existing class, highlight the code you changed and then click Source -> Format. Keeping the "diff" in your pull requests small makes them easier to code review.
Checking Your Formatting With Checkstyle
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -131,7 +131,3 @@ Bike Shedding
What color should the `bike shed `_ be? :)
Come debate with us about coding style in this Google doc that has public comments enabled: https://docs.google.com/document/d/1KTd3FpM1BI3HlBofaZjMmBiQEJtFf11jiiGpQeJzy7A/edit?usp=sharing
-
-----
-
-Previous: :doc:`debugging` | Next: :doc:`deployment`
diff --git a/doc/sphinx-guides/source/developers/containers.rst b/doc/sphinx-guides/source/developers/containers.rst
index 175b178b455..ed477ccefea 100755
--- a/doc/sphinx-guides/source/developers/containers.rst
+++ b/doc/sphinx-guides/source/developers/containers.rst
@@ -29,7 +29,3 @@ Using Containers for Reproducible Research
------------------------------------------
Please see :ref:`research-code` in the User Guide for this related topic.
-
-----
-
-Previous: :doc:`deployment` | Next: :doc:`making-releases`
diff --git a/doc/sphinx-guides/source/developers/debugging.rst b/doc/sphinx-guides/source/developers/debugging.rst
index 50e8901b1ff..ffee6764b7f 100644
--- a/doc/sphinx-guides/source/developers/debugging.rst
+++ b/doc/sphinx-guides/source/developers/debugging.rst
@@ -63,7 +63,3 @@ to maintain your settings more easily for different environments.
.. _Jakarta Server Faces 3.0 Spec: https://jakarta.ee/specifications/faces/3.0/jakarta-faces-3.0.html#a6088
.. _PrimeFaces Configuration Docs: https://primefaces.github.io/primefaces/11_0_0/#/gettingstarted/configuration
-
-----
-
-Previous: :doc:`documentation` | Next: :doc:`coding-style`
diff --git a/doc/sphinx-guides/source/developers/dependencies.rst b/doc/sphinx-guides/source/developers/dependencies.rst
index 0208c49f90a..26880374f23 100644
--- a/doc/sphinx-guides/source/developers/dependencies.rst
+++ b/doc/sphinx-guides/source/developers/dependencies.rst
@@ -444,7 +444,3 @@ The codebase is structured like this:
.. [#f1] Modern IDEs import your Maven POM and offer import autocompletion for classes based on direct dependencies in the model. You might end up using legacy or repackaged classes because of a wrong scope.
.. [#f2] This is going to bite back in modern IDEs when importing classes from transitive dependencies by "autocompletion accident".
-
-----
-
-Previous: :doc:`documentation` | Next: :doc:`debugging`
diff --git a/doc/sphinx-guides/source/developers/deployment.rst b/doc/sphinx-guides/source/developers/deployment.rst
index 045b0d0abbc..678e29f4079 100755
--- a/doc/sphinx-guides/source/developers/deployment.rst
+++ b/doc/sphinx-guides/source/developers/deployment.rst
@@ -114,7 +114,7 @@ Please note that while the script should work well on new-ish branches, older br
Migrating Datafiles from Local Storage to S3
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-A number of pilot Dataverse installations start on local storage, then administrators are tasked with migrating datafiles into S3 or similar object stores. The files may be copied with a command-line utility such as `s3cmd`. You will want to retain the local file hierarchy, keeping the authority (for example: 10.5072) at the bucket "root."
+A number of pilot Dataverse installations start on local storage, then administrators are tasked with migrating datafiles into S3 or similar object stores. The files may be copied with a command-line utility such as `s3cmd `_. You will want to retain the local file hierarchy, keeping the authority (for example: 10.5072) at the bucket "root."
The below example queries may assist with updating dataset and datafile locations in the Dataverse installation's PostgresQL database. Depending on the initial version of the Dataverse Software and subsequent upgrade path, Datafile storage identifiers may or may not include a ``file://`` prefix, so you'll want to catch both cases.
@@ -146,8 +146,3 @@ To Update Datafile Location to your-s3-bucket, Assuming no ``file://`` Prefix
WHERE id IN (SELECT o.id FROM dvobject o, dataset s WHERE o.dtype = 'DataFile'
AND s.id = o.owner_id AND s.harvestingclient_id IS null
AND o.storageidentifier NOT LIKE '%://%');
-
-
-----
-
-Previous: :doc:`coding-style` | Next: :doc:`containers`
diff --git a/doc/sphinx-guides/source/developers/dev-environment.rst b/doc/sphinx-guides/source/developers/dev-environment.rst
index 1301994cc82..2837f901d5e 100755
--- a/doc/sphinx-guides/source/developers/dev-environment.rst
+++ b/doc/sphinx-guides/source/developers/dev-environment.rst
@@ -71,10 +71,10 @@ After some time you should be able to log in:
- username: dataverseAdmin
- password: admin1
-More Information
-----------------
+Next Steps
+----------
-See also the :doc:`/container/dev-usage` section of the Container Guide.
+See the :doc:`/container/dev-usage` section of the Container Guide for tips on fast redeployment, viewing logs, and more.
Getting Help
------------
diff --git a/doc/sphinx-guides/source/developers/documentation.rst b/doc/sphinx-guides/source/developers/documentation.rst
index f0729c59dcf..a4b8c027445 100755
--- a/doc/sphinx-guides/source/developers/documentation.rst
+++ b/doc/sphinx-guides/source/developers/documentation.rst
@@ -8,7 +8,7 @@ Writing Documentation
Quick Fix
-----------
-If you find a typo or a small error in the documentation you can fix it using GitHub's online web editor. Generally speaking, we will be following https://help.github.com/en/articles/editing-files-in-another-users-repository
+If you find a typo or a small error in the documentation you can fix it using GitHub's online web editor. Generally speaking, we will be following https://docs.github.com/en/repositories/working-with-files/managing-files/editing-files#editing-files-in-another-users-repository
- Navigate to https://github.com/IQSS/dataverse/tree/develop/doc/sphinx-guides/source where you will see folders for each of the guides: `admin`_, `api`_, `developers`_, `installation`_, `style`_, `user`_.
- Find the file you want to edit under one of the folders above.
@@ -18,7 +18,7 @@ If you find a typo or a small error in the documentation you can fix it using Gi
- Under the **Write** tab, delete the long welcome message and write a few words about what you fixed.
- Click **Create Pull Request**.
-That's it! Thank you for your contribution! Your pull request will be added manually to the main Dataverse Project board at https://github.com/orgs/IQSS/projects/2 and will go through code review and QA before it is merged into the "develop" branch. Along the way, developers might suggest changes or make them on your behalf. Once your pull request has been merged you will be listed as a contributor at https://github.com/IQSS/dataverse/graphs/contributors
+That's it! Thank you for your contribution! Your pull request will be added manually to the main Dataverse Project board at https://github.com/orgs/IQSS/projects/34 and will go through code review and QA before it is merged into the "develop" branch. Along the way, developers might suggest changes or make them on your behalf. Once your pull request has been merged you will be listed as a contributor at https://github.com/IQSS/dataverse/graphs/contributors
Please see https://github.com/IQSS/dataverse/pull/5857 for an example of a quick fix that was merged (the "Files changed" tab shows how a typo was fixed).
@@ -36,7 +36,7 @@ If you would like to read more about the Dataverse Project's use of GitHub, plea
Building the Guides with Sphinx
-------------------------------
-The Dataverse guides are written using Sphinx (http://sphinx-doc.org). We recommend installing Sphinx on your localhost or using a Sphinx Docker container to build the guides locally so you can get an accurate preview of your changes.
+The Dataverse guides are written using Sphinx (https://sphinx-doc.org). We recommend installing Sphinx on your localhost or using a Sphinx Docker container to build the guides locally so you can get an accurate preview of your changes.
In case you decide to use a Sphinx Docker container to build the guides, you can skip the next two installation sections, but you will need to have Docker installed.
@@ -62,7 +62,7 @@ In some parts of the documentation, graphs are rendered as images using the Sphi
Building the guides requires the ``dot`` executable from GraphViz.
-This requires having `GraphViz `_ installed and either having ``dot`` on the path or
+This requires having `GraphViz `_ installed and either having ``dot`` on the path or
`adding options to the make call `_.
Editing and Building the Guides
@@ -71,7 +71,7 @@ Editing and Building the Guides
To edit the existing documentation:
- Create a branch (see :ref:`how-to-make-a-pull-request`).
-- In ``doc/sphinx-guides/source`` you will find the .rst files that correspond to http://guides.dataverse.org.
+- In ``doc/sphinx-guides/source`` you will find the .rst files that correspond to https://guides.dataverse.org.
- Using your preferred text editor, open and edit the necessary files, or create new ones.
Once you are done, you can preview the changes by building the guides locally. As explained, you can build the guides with Sphinx locally installed, or with a Docker container.
@@ -159,7 +159,3 @@ A few notes about the command above:
Also, as of this writing we have enabled PDF builds from the "develop" branch. You download the PDF from http://preview.guides.gdcc.io/_/downloads/en/develop/pdf/
If you would like to help improve the PDF version of the guides, please get in touch! Please see :ref:`getting-help-developers` for ways to contact the developer community.
-
-----
-
-Previous: :doc:`testing` | Next: :doc:`dependencies`
diff --git a/doc/sphinx-guides/source/developers/fontcustom.rst b/doc/sphinx-guides/source/developers/fontcustom.rst
index 2a94b0ffc0b..edcda1e69ab 100755
--- a/doc/sphinx-guides/source/developers/fontcustom.rst
+++ b/doc/sphinx-guides/source/developers/fontcustom.rst
@@ -35,7 +35,7 @@ RVM is a good way to install a specific version of Ruby: https://rvm.io
Install Dependencies and Font Custom Gem
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The brew commands below assume you are on a Mac. See :doc:`dev-environment` for more on ``brew``.
+The brew commands below assume you are on a Mac.
.. code-block:: bash
diff --git a/doc/sphinx-guides/source/developers/geospatial.rst b/doc/sphinx-guides/source/developers/geospatial.rst
index 9744438bf5d..48d300524c2 100644
--- a/doc/sphinx-guides/source/developers/geospatial.rst
+++ b/doc/sphinx-guides/source/developers/geospatial.rst
@@ -34,7 +34,7 @@ For example:
Upon recognition of the four required files, the Dataverse installation will group them as well as any other relevant files into a shapefile set. Files with these extensions will be included in the shapefile set:
- Required: ``.shp``, ``.shx``, ``.dbf``, ``.prj``
-- Optional: ``.sbn``, ``.sbx``, ``.fbn``, ``.fbx``, ``.ain``, ``.aih``, ``.ixs``, ``.mxs``, ``.atx``, ``.cpg``, ``shp.xml``
+- Optional: ``.sbn``, ``.sbx``, ``.fbn``, ``.fbx``, ``.ain``, ``.aih``, ``.ixs``, ``.mxs``, ``.atx``, ``.cpg``, ``.qpj``, ``.qmd``, ``shp.xml``
Then the Dataverse installation creates a new ``.zip`` with mimetype as a shapefile. The shapefile set will persist as this new ``.zip``.
@@ -81,7 +81,3 @@ For two "final" shapefile sets, ``bicycles.zip`` and ``subway_line.zip``, a new
- Mimetype: ``application/zipped-shapefile``
- Mimetype Label: "Shapefile as ZIP Archive"
-
-----
-
-Previous: :doc:`unf/index` | Next: :doc:`remote-users`
diff --git a/doc/sphinx-guides/source/developers/globus-api.rst b/doc/sphinx-guides/source/developers/globus-api.rst
new file mode 100644
index 00000000000..834db8161f0
--- /dev/null
+++ b/doc/sphinx-guides/source/developers/globus-api.rst
@@ -0,0 +1,239 @@
+Globus Transfer API
+===================
+
+.. contents:: |toctitle|
+ :local:
+
+The Globus API addresses three use cases:
+
+* Transfer to a Dataverse-managed Globus endpoint (File-based or using the Globus S3 Connector)
+* Reference of files that will remain in a remote Globus endpoint
+* Transfer from a Dataverse-managed Globus endpoint
+
+The ability for Dataverse to interact with Globus endpoints is configured via a Globus store - see :ref:`globus-storage`.
+
+Globus transfers (or referencing a remote endpoint) for upload and download transfers involve a series of steps. These can be accomplished using the Dataverse and Globus APIs. (These are used internally by the `dataverse-globus app `_ when transfers are done via the Dataverse UI.)
+
+Requesting Upload or Download Parameters
+----------------------------------------
+
+The first step in preparing for a Globus transfer/reference operation is to request the parameters relevant for a given dataset:
+
+.. code-block:: bash
+
+ curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/globusUploadParameters?locale=$LOCALE"
+
+The response will be of the form:
+
+.. code-block:: bash
+
+ {
+ "status": "OK",
+ "data": {
+ "queryParameters": {
+ "datasetId": 29,
+ "siteUrl": "http://ec2-34-204-169-194.compute-1.amazonaws.com",
+ "datasetVersion": ":draft",
+ "dvLocale": "en",
+ "datasetPid": "doi:10.5072/FK2/ILLPXE",
+ "managed": "true",
+ "endpoint": "d8c42580-6528-4605-9ad8-116a61982644"
+ },
+ "signedUrls": [
+ {
+ "name": "requestGlobusTransferPaths",
+ "httpMethod": "POST",
+ "signedUrl": "http://ec2-34-204-169-194.compute-1.amazonaws.com/api/v1/datasets/29/requestGlobusUploadPaths?until=2023-11-22T01:52:03.648&user=dataverseAdmin&method=POST&token=63ac4bb748d12078dded1074916508e19e6f6b61f64294d38e0b528010b07d48783cf2e975d7a1cb6d4a3c535f209b981c7c6858bc63afdfc0f8ecc8a139b44a",
+ "timeOut": 300
+ },
+ {
+ "name": "addGlobusFiles",
+ "httpMethod": "POST",
+ "signedUrl": "http://ec2-34-204-169-194.compute-1.amazonaws.com/api/v1/datasets/29/addGlobusFiles?until=2023-11-22T01:52:03.648&user=dataverseAdmin&method=POST&token=2aaa03f6b9f851a72e112acf584ffc0758ed0cc8d749c5a6f8c20494bb7bc13197ab123e1933f3dde2711f13b347c05e6cec1809a8f0b5484982570198564025",
+ "timeOut": 300
+ },
+ {
+ "name": "getDatasetMetadata",
+ "httpMethod": "GET",
+ "signedUrl": "http://ec2-34-204-169-194.compute-1.amazonaws.com/api/v1/datasets/29/versions/:draft?until=2023-11-22T01:52:03.649&user=dataverseAdmin&method=GET&token=1878d6a829cd5540e89c07bdaf647f1bea5314cc7a55433b0b506350dd330cad61ade3714a8ee199a7b464fb3b8cddaea0f32a89ac3bfc4a86cd2ea3004ecbb8",
+ "timeOut": 300
+ },
+ {
+ "name": "getFileListing",
+ "httpMethod": "GET",
+ "signedUrl": "http://ec2-34-204-169-194.compute-1.amazonaws.com/api/v1/datasets/29/versions/:draft/files?until=2023-11-22T01:52:03.650&user=dataverseAdmin&method=GET&token=78e8ca8321624f42602af659227998374ef3788d0feb43d696a0e19086e0f2b3b66b96981903a1565e836416c504b6248cd3c6f7c2644566979bd16e23a99622",
+ "timeOut": 300
+ }
+ ]
+ }
+ }
+
+The response includes the id for the Globus endpoint to use along with several signed URLs.
+
+The getDatasetMetadata and getFileListing URLs are just signed versions of the standard Dataset metadata and file listing API calls. The other two are Globus specific.
+
+If called for, a dataset using a store that is configured with a remote Globus endpoint(s), the return response is similar but the response includes a
+the "managed" parameter will be false, the "endpoint" parameter is replaced with a JSON array of "referenceEndpointsWithPaths" and the
+requestGlobusTransferPaths and addGlobusFiles URLs are replaced with ones for requestGlobusReferencePaths and addFiles. All of these calls are
+described further below.
+
+The call to set up for a transfer out (download) is similar:
+
+.. code-block:: bash
+
+ curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/globusDownloadParameters?locale=$LOCALE"
+
+Note that this API call supports an additional downloadId query parameter. This is only used when the globus-dataverse app is called from the Dataverse user interface. There is no need to use it when calling the API directly.
+
+The returned response includes the same getDatasetMetadata and getFileListing URLs as in the upload case and includes "monitorGlobusDownload" and "requestGlobusDownload" URLs. The response will also indicate whether the store is "managed" and will provide the "endpoint" from which downloads can be made.
+
+
+Performing an Upload/Transfer In
+--------------------------------
+
+The information from the API call above can be used to provide a user with information about the dataset and to prepare to transfer (managed=true) or to reference files (managed=false).
+
+Once the user identifies which files are to be added, the requestGlobusTransferPaths or requestGlobusReferencePaths URLs can be called. These both reference the same API call but must be used with different entries in the JSON body sent:
+
+.. code-block:: bash
+
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+ export SERVER_URL=https://demo.dataverse.org
+ export PERSISTENT_IDENTIFIER=doi:10.5072/FK27U7YBV
+ export LOCALE=en-US
+
+ curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:application/json" -X POST "$SERVER_URL/api/datasets/:persistentId/requestGlobusUploadPaths"
+
+Note that when using the dataverse-globus app or the return from the previous call, the URL for this call will be signed and no API_TOKEN is needed.
+
+In the managed case, the JSON body sent must include the id of the Globus user that will perform the transfer and the number of files that will be transferred:
+
+.. code-block:: bash
+
+ {
+ "principal":"d15d4244-fc10-47f3-a790-85bdb6db9a75",
+ "numberOfFiles":2
+ }
+
+In the remote reference case, the JSON body sent must include the Globus endpoint/paths that will be referenced:
+
+.. code-block:: bash
+
+ {
+ "referencedFiles":[
+ "d8c42580-6528-4605-9ad8-116a61982644/hdc1/test1.txt"
+ ]
+ }
+
+The response will include a JSON object. In the managed case, the map is from new assigned file storageidentifiers and specific paths on the managed Globus endpoint:
+
+.. code-block:: bash
+
+ {
+ "status":"OK",
+ "data":{
+ "globusm://18b49d3688c-62137dcb06e4":"/hdc1/10.5072/FK2/ILLPXE/18b49d3688c-62137dcb06e4",
+ "globusm://18b49d3688c-5c17d575e820":"/hdc1/10.5072/FK2/ILLPXE/18b49d3688c-5c17d575e820"
+ }
+ }
+
+In the managed case, the specified Globus principal is granted write permission to the specified endpoint/path,
+which will allow initiation of a transfer from the external endpoint to the managed endpoint using the Globus API.
+The permission will be revoked if the transfer is not started and the next call to Dataverse to finish the transfer are not made within a short time (configurable, default of 5 minutes).
+
+In the remote/reference case, the map is from the initially supplied endpoint/paths to the new assigned file storageidentifiers:
+
+.. code-block:: bash
+
+ {
+ "status":"OK",
+ "data":{
+ "d8c42580-6528-4605-9ad8-116a61982644/hdc1/test1.txt":"globus://18bf8c933f4-ed2661e7d19b//d8c42580-6528-4605-9ad8-116a61982644/hdc1/test1.txt"
+ }
+ }
+
+
+
+Adding Files to the Dataset
+---------------------------
+
+In the managed case, you must initiate a Globus transfer and take note of its task identifier. As in the JSON example below, you will pass it as ``taskIdentifier`` along with details about the files you are transferring:
+
+.. code-block:: bash
+
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+ export SERVER_URL=https://demo.dataverse.org
+ export PERSISTENT_IDENTIFIER=doi:10.5072/FK27U7YBV
+ export JSON_DATA='{"taskIdentifier":"3f530302-6c48-11ee-8428-378be0d9c521", \
+ "files": [{"description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "storageIdentifier":"globusm://18b3972213f-f6b5c2221423", "fileName":"file1.txt", "mimeType":"text/plain", "checksum": {"@type": "MD5", "@value": "1234"}}, \
+ {"description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "storageIdentifier":"globusm://18b39722140-50eb7d3c5ece", "fileName":"file2.txt", "mimeType":"text/plain", "checksum": {"@type": "MD5", "@value": "2345"}}]}'
+
+ curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:multipart/form-data" -X POST "$SERVER_URL/api/datasets/:persistentId/addGlobusFiles" -F "jsonData=$JSON_DATA"
+
+Note that the mimetype is multipart/form-data, matching the /addFiles API call. Also note that the API_TOKEN is not needed when using a signed URL.
+
+With this information, Dataverse will begin to monitor the transfer and when it completes, will add all files for which the transfer succeeded.
+As the transfer can take significant time and the API call is asynchronous, the only way to determine if the transfer succeeded via API is to use the standard calls to check the dataset lock state and contents.
+
+Once the transfer completes, Dataverse will remove the write permission for the principal.
+
+Note that when using a managed endpoint that uses the Globus S3 Connector, the checksum should be correct as Dataverse can validate it. For file-based endpoints, the checksum should be included if available but Dataverse cannot verify it.
+
+In the remote/reference case, where there is no transfer to monitor, the standard /addFiles API call (see :ref:`direct-add-to-dataset-api`) is used instead. There are no changes for the Globus case.
+
+Downloading/Transfer Out Via Globus
+-----------------------------------
+
+To begin downloading files, the requestGlobusDownload URL is used:
+
+.. code-block:: bash
+
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+ export SERVER_URL=https://demo.dataverse.org
+ export PERSISTENT_IDENTIFIER=doi:10.5072/FK27U7YBV
+
+ curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:application/json" -X POST "$SERVER_URL/api/datasets/:persistentId/requestGlobusDownload"
+
+The JSON body sent should include a list of file ids to download and, for a managed endpoint, the Globus principal that will make the transfer:
+
+.. code-block:: bash
+
+ {
+ "principal":"d15d4244-fc10-47f3-a790-85bdb6db9a75",
+ "fileIds":[60, 61]
+ }
+
+Note that this API call takes an optional downloadId parameter that is used with the dataverse-globus app. When downloadId is included, the list of fileIds is not needed.
+
+The response is a JSON object mapping the requested file Ids to Globus endpoint/paths. In the managed case, the principal will have been given read permissions for the specified paths:
+
+.. code-block:: bash
+
+ {
+ "status":"OK",
+ "data":{
+ "60": "d8c42580-6528-4605-9ad8-116a61982644/hdc1/10.5072/FK2/ILLPXE/18bf3af9c78-92b8e168090e",
+ "61": "d8c42580-6528-4605-9ad8-116a61982644/hdc1/10.5072/FK2/ILLPXE/18bf3af9c78-c8d81569305c"
+ }
+ }
+
+For the remote case, the use can perform the transfer without further contact with Dataverse. In the managed case, the user must initiate the transfer via the Globus API and then inform Dataverse.
+Dataverse will then monitor the transfer and revoke the read permission when the transfer is complete. (Not making this last call could result in failure of the transfer.)
+
+.. code-block:: bash
+
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+ export SERVER_URL=https://demo.dataverse.org
+ export PERSISTENT_IDENTIFIER=doi:10.5072/FK27U7YBV
+
+ curl -H "X-Dataverse-key:$API_TOKEN" -H "Content-type:application/json" -X POST "$SERVER_URL/api/datasets/:persistentId/monitorGlobusDownload"
+
+The JSON body sent just contains the task identifier for the transfer:
+
+.. code-block:: bash
+
+ {
+ "taskIdentifier":"b5fd01aa-8963-11ee-83ae-d5484943e99a"
+ }
+
+
diff --git a/doc/sphinx-guides/source/developers/index.rst b/doc/sphinx-guides/source/developers/index.rst
index 3ac9e955ea2..25fea138736 100755
--- a/doc/sphinx-guides/source/developers/index.rst
+++ b/doc/sphinx-guides/source/developers/index.rst
@@ -19,7 +19,9 @@ Developer Guide
sql-upgrade-scripts
testing
documentation
+ api-design
security
+ performance
dependencies
debugging
coding-style
@@ -38,6 +40,7 @@ Developer Guide
big-data-support
aux-file-support
s3-direct-upload-api
+ globus-api
dataset-semantic-metadata-api
dataset-migration-api
workflows
diff --git a/doc/sphinx-guides/source/developers/intro.rst b/doc/sphinx-guides/source/developers/intro.rst
index 4a64c407fc1..350968012d8 100755
--- a/doc/sphinx-guides/source/developers/intro.rst
+++ b/doc/sphinx-guides/source/developers/intro.rst
@@ -2,7 +2,7 @@
Introduction
============
-Welcome! `The Dataverse Project `_ is an `open source `_ project that loves `contributors `_!
+Welcome! `The Dataverse Project `_ is an `open source `_ project that loves `contributors `_!
.. contents:: |toctitle|
:local:
@@ -19,7 +19,7 @@ To get started, you'll want to set up your :doc:`dev-environment` and make sure
Getting Help
------------
-If you have any questions at all, please reach out to other developers via the channels listed in https://github.com/IQSS/dataverse/blob/develop/CONTRIBUTING.md such as http://chat.dataverse.org, the `dataverse-dev `_ mailing list, `community calls `_, or support@dataverse.org.
+If you have any questions at all, please reach out to other developers via the channels listed in https://github.com/IQSS/dataverse/blob/develop/CONTRIBUTING.md such as https://chat.dataverse.org, the `dataverse-dev `_ mailing list, `community calls `_, or support@dataverse.org.
.. _core-technologies:
@@ -37,10 +37,12 @@ Roadmap
For the Dataverse Software development roadmap, please see https://www.iq.harvard.edu/roadmap-dataverse-project
+.. _kanban-board:
+
Kanban Board
------------
-You can get a sense of what's currently in flight (in dev, in QA, etc.) by looking at https://github.com/orgs/IQSS/projects/2
+You can get a sense of what's currently in flight (in dev, in QA, etc.) by looking at https://github.com/orgs/IQSS/projects/34
Issue Tracker
-------------
@@ -73,7 +75,3 @@ As a developer, you also may be interested in these projects related to Datavers
- Third party apps - make use of Dataverse installation APIs: :doc:`/api/apps`
- chat.dataverse.org - chat interface for Dataverse Project users and developers: https://github.com/IQSS/chat.dataverse.org
- [Your project here] :)
-
-----
-
-Next: :doc:`dev-environment`
diff --git a/doc/sphinx-guides/source/developers/making-releases.rst b/doc/sphinx-guides/source/developers/making-releases.rst
index 23c4773a06e..e7a59910e56 100755
--- a/doc/sphinx-guides/source/developers/making-releases.rst
+++ b/doc/sphinx-guides/source/developers/making-releases.rst
@@ -14,16 +14,19 @@ See :doc:`version-control` for background on our branching strategy.
The steps below describe making both regular releases and hotfix releases.
+.. _write-release-notes:
+
Write Release Notes
-------------------
-Developers express the need for an addition to release notes by creating a file in ``/doc/release-notes`` containing the name of the issue they're working on. The name of the branch could be used for the filename with ".md" appended (release notes are written in Markdown) such as ``5053-apis-custom-homepage.md``.
+Developers express the need for an addition to release notes by creating a "release note snippet" in ``/doc/release-notes`` containing the name of the issue they're working on. The name of the branch could be used for the filename with ".md" appended (release notes are written in Markdown) such as ``5053-apis-custom-homepage.md``. See :ref:`writing-release-note-snippets` for how this is described for contributors.
-The task at or near release time is to collect these notes into a single doc.
+The task at or near release time is to collect these snippets into a single file.
- Create an issue in GitHub to track the work of creating release notes for the upcoming release.
-- Create a branch, add a .md file for the release (ex. 5.10.1 Release Notes) in ``/doc/release-notes`` and write the release notes, making sure to pull content from the issue-specific release notes mentioned above.
-- Delete the previously-created, issue-specific release notes as the content is added to the main release notes file.
+- Create a branch, add a .md file for the release (ex. 5.10.1 Release Notes) in ``/doc/release-notes`` and write the release notes, making sure to pull content from the release note snippets mentioned above.
+- Delete the release note snippets as the content is added to the main release notes file.
+- Include instructions to describe the steps required to upgrade the application from the previous version. These must be customized for release numbers and special circumstances such as changes to metadata blocks and infrastructure.
- Take the release notes .md through the regular Code Review and QA process.
Create a GitHub Issue and Branch for the Release
@@ -67,6 +70,21 @@ Once important tests have passed (compile, unit tests, etc.), merge the pull req
If this is a hotfix release, skip this whole "merge develop to master" step (the "develop" branch is not involved until later).
+(Optional) Test Docker Images
+-----------------------------
+
+After the "master" branch has been updated and the GitHub Action to build and push Docker images has run (see `PR #9776 `_), go to https://hub.docker.com/u/gdcc and make sure the "alpha" tag for the following images has been updated:
+
+- https://hub.docker.com/r/gdcc/base
+- https://hub.docker.com/r/gdcc/dataverse
+- https://hub.docker.com/r/gdcc/configbaker
+
+To test these images against our API test suite, go to the "alpha" workflow at https://github.com/gdcc/api-test-runner/actions/workflows/alpha.yml and run it.
+
+If there are failures, additional dependencies or settings may have been added to the "develop" workflow. Copy them over and try again.
+
+.. _build-guides:
+
Build the Guides for the Release
--------------------------------
@@ -112,9 +130,11 @@ Go to https://jenkins.dataverse.org/job/IQSS_Dataverse_Internal/ and make the fo
Click "Save" then "Build Now".
-The build number will appear in ``/api/info/version`` (along with the commit mentioned above) from a running installation (e.g. ``{"version":"5.10.1","build":"907-b844672``).
+This will build the war file, and then automatically deploy it on dataverse-internal. Verify that the application has deployed successfully.
-Note that the build number comes from script in an early build step...
+The build number will appear in ``/api/info/version`` (along with the commit mentioned above) from a running installation (e.g. ``{"version":"5.10.1","build":"907-b844672``).
+
+Note that the build number comes from the following script in an early Jenkins build step...
.. code-block:: bash
@@ -129,12 +149,16 @@ Build Installer (dvinstall.zip)
ssh into the dataverse-internal server and do the following:
- In a git checkout of the dataverse source switch to the master branch and pull the latest.
-- Copy the war file from the previous step to the ``target`` directory in the root of the repo (create it, if necessary).
+- Copy the war file from the previous step to the ``target`` directory in the root of the repo (create it, if necessary):
+- ``mkdir target``
+- ``cp /tmp/dataverse-5.10.1.war target``
- ``cd scripts/installer``
- ``make``
A zip file called ``dvinstall.zip`` should be produced.
+Alternatively, you can build the installer on your own dev. instance. But make sure you use the war file produced in the step above, not a war file build from master on your own system! That's because we want the released application war file to contain the build number described above. Download the war file directly from Jenkins, or from dataverse-internal.
+
Make Artifacts Available for Download
-------------------------------------
@@ -148,6 +172,11 @@ Upload the following artifacts to the draft release you created:
- metadata block tsv files
- config files
+Deploy on Demo
+--------------
+
+Now that you have the release ready to go, give it one final test by deploying it on https://demo.dataverse.org . Note that this is also an opportunity to re-test the upgrade checklist as described in the release note.
+
Publish the Release
-------------------
@@ -158,7 +187,14 @@ Update Guides Link
"latest" at https://guides.dataverse.org/en/latest/ is a symlink to the directory with the latest release. That directory (e.g. ``5.10.1``) was put into place by the Jenkins "guides" job described above.
-ssh into the guides server and update the symlink to point to the latest release.
+ssh into the guides server and update the symlink to point to the latest release, as in the example below.
+
+.. code-block:: bash
+
+ cd /var/www/html/en
+ ln -s 5.10.1 latest
+
+
Close Milestone on GitHub and Create a New One
----------------------------------------------
@@ -194,7 +230,3 @@ We've merged the hotfix into the "master" branch but now we need the fixes (and
Because of the hotfix version, any SQL scripts in "develop" should be renamed (from "5.11.0" to "5.11.1" for example). To read more about our naming conventions for SQL scripts, see :doc:`sql-upgrade-scripts`.
Please note that version bumps and SQL script renaming both require all open pull requests to be updated with the latest from the "develop" branch so you might want to add any SQL script renaming to the hotfix branch before you put it through QA to be merged with develop. This way, open pull requests only need to be updated once.
-
-----
-
-Previous: :doc:`containers` | Next: :doc:`tools`
diff --git a/doc/sphinx-guides/source/developers/performance.rst b/doc/sphinx-guides/source/developers/performance.rst
new file mode 100644
index 00000000000..46c152f322e
--- /dev/null
+++ b/doc/sphinx-guides/source/developers/performance.rst
@@ -0,0 +1,196 @@
+Performance
+===========
+
+`Performance is a feature `_ was a mantra when Stack Overflow was being developed. We endeavor to do the same with Dataverse!
+
+In this section we collect ideas and share practices for improving performance.
+
+.. contents:: |toctitle|
+ :local:
+
+Problem Statement
+-----------------
+
+Performance has always been important to the Dataverse Project, but results have been uneven. We've seen enough success in the marketplace that performance must be adequate, but internally we sometimes refer to Dataverse as a pig. ðŸ·
+
+Current Practices
+-----------------
+
+We've adopted a number of practices to help us maintain our current level of performance and most should absolutely continue in some form, but challenges mentioned throughout should be addressed to further improve performance.
+
+Cache When You Can
+~~~~~~~~~~~~~~~~~~
+
+The Metrics API, for example, caches values for 7 days by default. We took a look at JSR 107 (JCache - Java Temporary Caching API) in `#2100 `_. We're aware of the benefits of caching.
+
+Use Async
+~~~~~~~~~
+
+We index datasets (and all objects) asynchronously. That is, we let changes persist in the database and afterward copy the data into Solr.
+
+Use a Queue
+~~~~~~~~~~~
+
+We use a JMS queue for when ingesting tabular files. We've talked about adding a queue (even `an external queue `_) for indexing, DOI registration, and other services.
+
+Offload Expensive Operations Outside the App Server
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When operations are computationally expensive, we have realized performance gains by offloading them to systems outside of the core code. For example, rather than having files pass through our application server when they are downloaded, we use direct download so that client machines download files directly from S3. (We use the same trick with upload.) When a client downloads multiple files, rather than zipping them within the application server as before, we now have a separate "zipper" process that does this work out of band.
+
+Drop to Raw SQL as Necessary
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+We aren't shy about writing raw SQL queries when necessary. We've written `querycount `_ Â scripts to help identify problematic queries and mention slow query log at :doc:`/admin/monitoring`.
+
+Add Indexes to Database Tables
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+There was a concerted effort in `#1880 `_ to add indexes to a large number of columns, but it's something we're mindful of, generally. Perhaps we could use some better detection of when indexes would be valuable.
+
+Find Bottlenecks with a Profiler
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+VisualVM is popular and bundled with Netbeans. Many options are available including `JProfiler `_.
+
+Warn Developers in Code Comments
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For code that has been optimized for performance, warnings are sometimes inserted in the form of comments for future developers to prevent backsliding.
+
+Write Docs for Devs about Perf
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Like this doc. :)
+
+Sometimes perf is written about in other places, such as :ref:`avoid-efficiency-issues-with-render-logic-expressions`.
+
+Horizontal Scaling of App Server
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+We've made it possible to run more than one application server, though it requires some special configuration. This way load can be spread out across multiple servers. For details, see :ref:`multiple-app-servers` in the Installation Guide.
+
+Code Review and QA
+~~~~~~~~~~~~~~~~~~
+
+Before code is merged, while it is in review or QA, if a performance problem is detected (usually on an ad hoc basis), the code is returned to the developer for improvement. Developers and reviewers typically do not have many tools at their disposal to test code changes against anything close to production data. QA maintains a machine with a copy of production data but tests against smaller data unless a performance problem is suspected.
+
+A new QA guide is coming in https://github.com/IQSS/dataverse/pull/10103
+
+Locust Testing at Release Time
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+As one of the final steps in preparing for a release, QA runs performance tests using a tool called Locust as explained the Developer Guide (see :ref:`locust`). The tests are not comprehensive, testing only a handful of pages with anonymous users, but they increase confidence that the upcoming release is not drastically slower than previous releases.
+
+Issue Tracking and Prioritization
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Performance issues are tracked in our issue tracker under the `Feature: Performance & Stability `_Â label (e.g. `#7788 `_). That way, we can track performance problems throughout the application. Unfortunately, the pain is often felt by users in production before we realize there is a problem. As needed, performance issues are prioritized to be included in a sprint, to \ `speed up the collection page `_, for example.
+
+Document Performance Tools
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In the :doc:`/admin/monitoring` page section of the Admin Guide we describe how to set up Munin for monitoring performance of an operating system. We also explain how to set up Performance Insights to monitor AWS RDS (PostgreSQL as a service, in our case). In the :doc:`/developers/tools` section of the Developer Guide, we have documented how to use Eclipse Memory Analyzer Tool (MAT), SonarQube, jmap, and jstat.
+
+Google Analytics
+~~~~~~~~~~~~~~~~
+
+Emails go to a subset of the team monthly with subjects like "Your September Search performance for https://dataverse.harvard.edu" with a link to a report but it's mostly about the number clicks, not how fast the site is. It's unclear if it provides any value with regard to performance.
+
+Abandoned Tools and Practices
+-----------------------------
+
+New Relic
+~~~~~~~~~
+
+For many years Harvard Dataverse was hooked up to New Relic, a tool that promises all-in-one observability, according to their `website `_. In practice, we didn't do much with `the data `_.
+
+Areas of Particular Concern
+---------------------------
+
+Command Engine Execution Rate Metering
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+We'd like to rate limit commands (CreateDataset, etc.) so that we can keep them at a reasonable level (`#9356 `_). This is similar to how many APIs are rate limited, such as the GitHub API.
+
+Solr
+~~~~
+
+While in the past Solr performance hasn't been much of a concern, in recent years we've noticed performance problems when Harvard Dataverse is under load. Improvements were made in `PR #10050 `_, for example.
+
+Datasets with Large Numbers of Files or Versions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+We'd like to scale Dataverse to better handle large number of files or versions. Progress was made in `PR #9883 `_.
+
+Withstanding Bots
+~~~~~~~~~~~~~~~~~
+
+Google bot, etc.
+
+Suggested Practices
+-------------------
+
+Many of our current practices should remain in place unaltered. Others could use some refinement. Some new practices should be adopted as well. Here are some suggestions.
+
+Implement the Frontend Plan for Performance
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The `Dataverse - SPA MVP Definition doc `_ Â has some ideas around how to achieve good performance for the new front end in the areas of rendering, monitoring,file upload/download, pagination, and caching. We should create as many issues as necessary in the frontend repo and work on them in time. The doc recommends the use of `React Profiler `_Â and other tools. Not mentioned is https://pagespeed.web.dev but we can investigate it as well. See also `#183 `_, a parent issue about performance. In `#184 `_ Â we plan to compare the performance of the old JSF UI vs. the new React UI. Cypress plugins for load testing could be investigated.
+
+Set up Query Counter in Jenkins
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+See countquery script above. See also https://jenkins.dataverse.org/job/IQSS-dataverse-develop/ws/target/query_count.out
+
+Show the plot over time. Make spikes easily apparent. 320,035 queries as of this writing.
+
+Count Database Queries per API Test
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Is it possible? Just a thought.
+
+Teach Developers How to Do Performance Testing Locally
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Do developers know how to use a profiler? Should they use `JMeter `_? `statsd-jvm-profiler `_? How do you run our :ref:`locust` tests? Should we continue using that tool? Give developers time and space to try out tools and document any tips along the way. For this stage, small data is fine.
+
+Automate Performance Testing
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+We are already using two excellent continuous integration (CI) tools, Jenkins and GitHub Actions, to test our code. We should add performance testing into the mix (`#4201 `_ is an old issue for this but we can open a fresh one). Currently we test every commit on every PR and we should consider if this model makes sense since performance testing will likely take longer to run than regular tests. Once developers are comfortable with their favorite tools, we can pick which ones to automate.
+
+Make Production Data or Equivalent Available to Developers
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If developers are only testing small amounts of data on their laptops, it's hard to detect performance problems. Not every bug fix requires access to data similar to production, but it should be made available. This is not a trivial task! If we are to use actual production data, we need to be very careful to de-identify it. If we start with our `sample-data `_ Â repo instead, we'll need to figure out how to make sure we cover cases like many files, many versions, etc.
+
+Automate Performance Testing with Production Data or Equivalent
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Hopefully the environment developers use with production data or equivalent can be made available to our CI tools. Perhaps these tests don't need to be run on every commit to every pull request, but they should be run regularly.
+
+Use Monitoring as Performance Testing
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Monitoring can be seen as a form of testing. How long is a round trip ping to production? What is the Time to First Byte? First Contentful Paint? Largest Contentful Paint? Time to Interactive? We now have a beta server that we could monitor continuously to know if our app is getting faster or slower over time. Should our monitoring of production servers be improved?
+
+Learn from Training and Conferences
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Most likely there is training available that is oriented toward performance. The subject of performance often comes up at conferences as well.
+
+Learn from the Community How They Monitor Performance
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Some members of the Dataverse community are likely users of newish tools like the ELK stack (Elasticsearch, Logstash, and Kibana), the TICK stack (Telegraph InfluxDB Chronograph and Kapacitor), GoAccess, Prometheus, Graphite, and more we haven't even heard of. In the :doc:`/admin/monitoring` section of the Admin Guide, we already encourage the community to share findings, but we could dedicate time to this topic at our annual meeting or community calls.
+
+Teach the Community to Do Performance Testing
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+We have a worldwide community of developers. We should do what we can in the form of documentation and other resources to help them develop performant code.
+
+Conclusion
+----------
+
+Given its long history, Dataverse has encountered many performance problems over the years. The core team is conversant in how to make the app more performant, but investment in learning additional tools and best practices would likely yield dividends. We should automate our performance testing, catching more problems before code is merged.
diff --git a/doc/sphinx-guides/source/developers/remote-users.rst b/doc/sphinx-guides/source/developers/remote-users.rst
index d8f90e9257f..38b3edab772 100755
--- a/doc/sphinx-guides/source/developers/remote-users.rst
+++ b/doc/sphinx-guides/source/developers/remote-users.rst
@@ -39,7 +39,7 @@ STOP! ``oidc-keycloak-auth-provider.json`` was changed from http://localhost:809
If you are working on the OpenID Connect (OIDC) user authentication flow, you do not need to connect to a remote provider (as explained in :doc:`/installation/oidc`) to test this feature. Instead, you can use the available configuration that allows you to run a test Keycloak OIDC identity management service locally through a Docker container.
-(Please note! The client secret (``ss6gE8mODCDfqesQaSG3gwUwZqZt547E``) is hard-coded in ``oidc-realm.json`` and ``oidc-keycloak-auth-provider.json``. Do not use this config in production! This is only for developers.)
+(Please note! The client secret (``94XHrfNRwXsjqTqApRrwWmhDLDHpIYV8``) is hard-coded in ``test-realm.json`` and ``oidc-keycloak-auth-provider.json``. Do not use this config in production! This is only for developers.)
You can find this configuration in ``conf/keycloak``. There are two options available in this directory to run a Keycloak container: bash script or docker-compose.
@@ -55,15 +55,23 @@ Now load the configuration defined in ``oidc-keycloak-auth-provider.json`` into
You should see the new provider, called "OIDC-Keycloak", under "Other options" on the Log In page.
-You should be able to log into Keycloak with the following credentials:
+You should be able to log into Keycloak with the one of the following credentials:
-- username: kcuser
-- password: kcpassword
+.. list-table::
+
+ * - Username
+ - Password
+ * - admin
+ - admin
+ * - curator
+ - curator
+ * - user
+ - user
+ * - affiliate
+ - affiliate
In case you want to stop and remove the Keycloak container, just run the other available bash script:
``./rm-keycloak.sh``
-----
-
-Previous: :doc:`unf/index` | Next: :doc:`geospatial`
+Note: the Keycloak admin to login at the admin console is ``kcadmin:kcpassword``
diff --git a/doc/sphinx-guides/source/developers/s3-direct-upload-api.rst b/doc/sphinx-guides/source/developers/s3-direct-upload-api.rst
index 4d323455d28..1cb9ae9e6db 100644
--- a/doc/sphinx-guides/source/developers/s3-direct-upload-api.rst
+++ b/doc/sphinx-guides/source/developers/s3-direct-upload-api.rst
@@ -3,6 +3,12 @@ Direct DataFile Upload/Replace API
The direct Datafile Upload API is used internally to support direct upload of files to S3 storage and by tools such as the DVUploader.
+.. contents:: |toctitle|
+ :local:
+
+Overview
+--------
+
Direct upload involves a series of three activities, each involving interacting with the server for a Dataverse installation:
* Requesting initiation of a transfer from the server
@@ -69,8 +75,9 @@ In the single part case, only one call to the supplied URL is required:
.. code-block:: bash
- curl -H 'x-amz-tagging:dv-state=temp' -X PUT -T ""
+ curl -i -H 'x-amz-tagging:dv-state=temp' -X PUT -T ""
+Note that without the ``-i`` flag, you should not expect any output from the command above. With the ``-i`` flag, you should expect to see a "200 OK" response.
In the multipart case, the client must send each part and collect the 'eTag' responses from the server. The calls for this are the same as the one for the single part case except that each call should send a slice of the total file, with the last part containing the remaining bytes.
The responses from the S3 server for these calls will include the 'eTag' for the uploaded part.
@@ -90,7 +97,7 @@ If the client is unable to complete the multipart upload, it should call the abo
.. _direct-add-to-dataset-api:
-Adding the Uploaded file to the Dataset
+Adding the Uploaded File to the Dataset
---------------------------------------
Once the file exists in the s3 bucket, a final API call is needed to add it to the Dataset. This call is the same call used to upload a file to a Dataverse installation but, rather than sending the file bytes, additional metadata is added using the "jsonData" parameter.
@@ -115,10 +122,10 @@ The allowed checksum algorithms are defined by the edu.harvard.iq.dataverse.Data
curl -X POST -H "X-Dataverse-key: $API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/add?persistentId=$PERSISTENT_IDENTIFIER" -F "jsonData=$JSON_DATA"
-Note that this API call can be used independently of the others, e.g. supporting use cases in which the file already exists in S3/has been uploaded via some out-of-band method.
-With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifer must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.
+Note that this API call can be used independently of the others, e.g. supporting use cases in which the file already exists in S3/has been uploaded via some out-of-band method. Enabling out-of-band uploads is described at :ref:`file-storage` in the Configuration Guide.
+With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifier must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.
-To add multiple Uploaded Files to the Dataset
+To Add Multiple Uploaded Files to the Dataset
---------------------------------------------
Once the files exists in the s3 bucket, a final API call is needed to add all the files to the Dataset. In this API call, additional metadata is added using the "jsonData" parameter.
@@ -146,11 +153,10 @@ The allowed checksum algorithms are defined by the edu.harvard.iq.dataverse.Data
curl -X POST -H "X-Dataverse-key: $API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/addFiles?persistentId=$PERSISTENT_IDENTIFIER" -F "jsonData=$JSON_DATA"
-Note that this API call can be used independently of the others, e.g. supporting use cases in which the files already exists in S3/has been uploaded via some out-of-band method.
-With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifer must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.
-
+Note that this API call can be used independently of the others, e.g. supporting use cases in which the files already exists in S3/has been uploaded via some out-of-band method. Enabling out-of-band uploads is described at :ref:`file-storage` in the Configuration Guide.
+With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifier must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.
-Replacing an existing file in the Dataset
+Replacing an Existing File in the Dataset
-----------------------------------------
Once the file exists in the s3 bucket, a final API call is needed to register it as a replacement of an existing file. This call is the same call used to replace a file to a Dataverse installation but, rather than sending the file bytes, additional metadata is added using the "jsonData" parameter.
@@ -176,10 +182,10 @@ Note that the API call does not validate that the file matches the hash value su
curl -X POST -H "X-Dataverse-key: $API_TOKEN" "$SERVER_URL/api/files/$FILE_IDENTIFIER/replace" -F "jsonData=$JSON_DATA"
-Note that this API call can be used independently of the others, e.g. supporting use cases in which the file already exists in S3/has been uploaded via some out-of-band method.
-With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifer must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.
+Note that this API call can be used independently of the others, e.g. supporting use cases in which the file already exists in S3/has been uploaded via some out-of-band method. Enabling out-of-band uploads is described at :ref:`file-storage` in the Configuration Guide.
+With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifier must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.
-Replacing multiple existing files in the Dataset
+Replacing Multiple Existing Files in the Dataset
------------------------------------------------
Once the replacement files exist in the s3 bucket, a final API call is needed to register them as replacements for existing files. In this API call, additional metadata is added using the "jsonData" parameter.
@@ -274,5 +280,5 @@ The JSON object returned as a response from this API call includes a "data" that
}
-Note that this API call can be used independently of the others, e.g. supporting use cases in which the files already exists in S3/has been uploaded via some out-of-band method.
-With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifer must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.
+Note that this API call can be used independently of the others, e.g. supporting use cases in which the files already exists in S3/has been uploaded via some out-of-band method. Enabling out-of-band uploads is described at :ref:`file-storage` in the Configuration Guide.
+With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifier must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.
diff --git a/doc/sphinx-guides/source/developers/selinux.rst b/doc/sphinx-guides/source/developers/selinux.rst
index dcbf3ee594f..ca41ab82d25 100644
--- a/doc/sphinx-guides/source/developers/selinux.rst
+++ b/doc/sphinx-guides/source/developers/selinux.rst
@@ -109,7 +109,3 @@ Once your updated SELinux rules are in place, try logging in with Shibboleth aga
Keep iterating until it works and then create a pull request based on your updated file. Good luck!
Many thanks to Bill Horka from IQSS for his assistance in explaining how to construct a SELinux Type Enforcement (TE) file!
-
-----
-
-Previous: :doc:`geospatial`
diff --git a/doc/sphinx-guides/source/developers/sql-upgrade-scripts.rst b/doc/sphinx-guides/source/developers/sql-upgrade-scripts.rst
index bace682b1b8..32c465524b0 100644
--- a/doc/sphinx-guides/source/developers/sql-upgrade-scripts.rst
+++ b/doc/sphinx-guides/source/developers/sql-upgrade-scripts.rst
@@ -21,12 +21,14 @@ If you are creating a new database table (which maps to an ``@Entity`` in JPA),
If you are doing anything other than creating a new database table such as adding a column to an existing table, you must create or update a SQL upgrade script.
+.. _create-sql-script:
+
How to Create a SQL Upgrade Script
----------------------------------
We assume you have already read the :doc:`version-control` section and have been keeping your feature branch up to date with the "develop" branch.
-Create a new file called something like ``V4.11.0.1__5565-sanitize-directory-labels.sql`` in the ``src/main/resources/db/migration`` directory. Use a version like "4.11.0.1" in the example above where the previously released version was 4.11, ensuring that the version number is unique. Note that this is not the version that you expect the code changes to be included in (4.12 in this example). When the previously released version is a patch version (e.g. 5.10.1), use "5.10.1.1" for the first SQL script version (rather than "5.10.1.0.1"). For the "description" you should the name of your branch, which should include the GitHub issue you are working on, as in the example above. To read more about Flyway file naming conventions, see https://flywaydb.org/documentation/migrations#naming
+Create a new file called something like ``V4.11.0.1__5565-sanitize-directory-labels.sql`` in the ``src/main/resources/db/migration`` directory. Use a version like "4.11.0.1" in the example above where the previously released version was 4.11, ensuring that the version number is unique. Note that this is not the version that you expect the code changes to be included in (4.12 in this example). When the previously released version is a patch version (e.g. 5.10.1), use "5.10.1.1" for the first SQL script version (rather than "5.10.1.0.1"). For the "description" you should the name of your branch, which should include the GitHub issue you are working on, as in the example above. To read more about Flyway file naming conventions, see https://documentation.red-gate.com/fd/migrations-184127470.html
The SQL migration script you wrote will be part of the war file and executed when the war file is deployed. To see a history of Flyway database migrations that have been applied, look at the ``flyway_schema_history`` table.
@@ -41,7 +43,3 @@ Renaming SQL Upgrade Scripts
Please note that if you need to rename your script (because a new version of the Dataverse Software was released, for example), you will see the error "FlywayException: Validate failed: Detected applied migration not resolved locally" when you attempt to deploy and deployment will fail.
To resolve this problem, delete the old migration from the ``flyway_schema_history`` table and attempt to redeploy.
-
-----
-
-Previous: :doc:`version-control` | Next: :doc:`testing`
diff --git a/doc/sphinx-guides/source/developers/testing.rst b/doc/sphinx-guides/source/developers/testing.rst
index acaeccf4f23..2ea85913d42 100755
--- a/doc/sphinx-guides/source/developers/testing.rst
+++ b/doc/sphinx-guides/source/developers/testing.rst
@@ -5,7 +5,7 @@ Testing
In order to keep our codebase healthy, the Dataverse Project encourages developers to write automated tests in the form of unit tests and integration tests. We also welcome ideas for how to improve our automated testing.
.. contents:: |toctitle|
- :local:
+ :local:
The Health of a Codebase
------------------------
@@ -46,7 +46,7 @@ The main takeaway should be that we care about unit testing enough to measure th
Writing Unit Tests with JUnit
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-We are aware that there are newer testing tools such as TestNG, but we use `JUnit `_ because it's tried and true.
+We are aware that there are newer testing tools such as TestNG, but we use `JUnit `_ because it's tried and true.
We support JUnit 5 based testing and require new tests written with it.
(Since Dataverse 6.0, we migrated all of our tests formerly based on JUnit 4.)
@@ -89,22 +89,35 @@ JUnit 5 Test Helper Extensions
Our codebase provides little helpers to ease dealing with state during tests.
Some tests might need to change something which should be restored after the test ran.
-For unit tests, the most interesting part is to set a JVM setting just for the current test.
-Please use the ``@JvmSetting(key = JvmSettings.XXX, value = "")`` annotation on a test method or
-a test class to set and clear the property automatically.
+For unit tests, the most interesting part is to set a JVM setting just for the current test or a whole test class.
+(Which might be an inner class, too!). Please make use of the ``@JvmSetting(key = JvmSettings.XXX, value = "")``
+annotation and also make sure to annotate the test class with ``@LocalJvmSettings``.
-To set arbitrary system properties for the current test, a similar extension
-``@SystemProperty(key = "", value = "")`` has been added.
+Inspired by JUnit's ``@MethodSource`` annotation, you may use ``@JvmSetting(key = JvmSettings.XXX, method = "zzz")``
+to reference a static method located in the same test class by name (i. e. ``private static String zzz() {}``) to allow
+retrieving dynamic data instead of String constants only. (Note the requirement for a *static* method!)
+
+If you want to delete a setting, simply provide a ``null`` value. This can be used to override a class-wide setting
+or some other default that is present for some reason.
+
+To set arbitrary system properties for the current test, a similar extension ``@SystemProperty(key = "", value = "")``
+has been added. (Note: it does not support method references.)
Both extensions will ensure the global state of system properties is non-interfering for
test executions. Tests using these extensions will be executed in serial.
+This settings helper may be extended at a later time to manipulate settings in a remote instance during integration
+or end-to-end testing. Stay tuned!
+
Observing Changes to Code Coverage
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Once you've written some tests, you're probably wondering how much you've helped to increase the code coverage. In Netbeans, do a "clean and build." Then, under the "Projects" tab, right-click "dataverse" and click "Code Coverage" -> "Show Report". For each Java file you have open, you should be able to see the percentage of code that is covered by tests and every line in the file should be either green or red. Green indicates that the line is being exercised by a unit test and red indicates that it is not.
-In addition to seeing code coverage in Netbeans, you can also see code coverage reports by opening ``target/site/jacoco/index.html`` in your browser.
+In addition to seeing code coverage in Netbeans, you can also see code coverage reports by opening ``target/site/jacoco-X-test-coverage-report/index.html`` in your browser.
+Depending on the report type you want to look at, let ``X`` be one of ``unit``, ``integration`` or ``merged``.
+"Merged" will display combined coverage of both unit and integration test, but does currently not cover API tests.
+
Testing Commands
^^^^^^^^^^^^^^^^
@@ -177,42 +190,38 @@ Finally, run the script:
$ ./ec2-create-instance.sh -g jenkins.yml -l log_dir
-Running the full API test suite using Docker
+Running the Full API Test Suite Using Docker
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-To run the full suite of integration tests on your laptop, running Dataverse and its dependencies in Docker, as explained in the :doc:`/container/dev-usage` section of the Container Guide.
-
-Alternatively, you can run tests against the app server running on your laptop by following the "getting set up" steps below.
+To run the full suite of integration tests on your laptop, we recommend running Dataverse and its dependencies in Docker, as explained in the :doc:`/container/dev-usage` section of the Container Guide. This environment provides additional services (such as S3) that are used in testing.
-Getting Set Up to Run REST Assured Tests
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Running the APIs Without Docker (Classic Dev Env)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Unit tests are run automatically on every build, but dev environments and servers require special setup to run REST Assured tests. In short, the Dataverse Software needs to be placed into an insecure mode that allows arbitrary users and datasets to be created and destroyed. This differs greatly from the out-of-the-box behavior of the Dataverse Software, which we strive to keep secure for sysadmins installing the software for their institutions in a production environment.
+While it is possible to run a good number of API tests without using Docker in our :doc:`classic-dev-env`, we are transitioning toward including additional services (such as S3) in our Dockerized development environment (:doc:`/container/dev-usage`), so you will probably find it more convenient to it instead.
-The :doc:`dev-environment` section currently refers developers here for advice on getting set up to run REST Assured tests, but we'd like to add some sort of "dev" flag to the installer to put the Dataverse Software in "insecure" mode, with lots of scary warnings that this dev mode should not be used in production.
-
-The instructions below assume a relatively static dev environment on a Mac. There is a newer "all in one" Docker-based approach documented in the :doc:`/developers/containers` section under "Docker" that you may like to play with as well.
+Unit tests are run automatically on every build, but dev environments and servers require special setup to run API (REST Assured) tests. In short, the Dataverse software needs to be placed into an insecure mode that allows arbitrary users and datasets to be created and destroyed (this is done automatically in the Dockerized environment, as well as the steps described below). This differs greatly from the out-of-the-box behavior of the Dataverse software, which we strive to keep secure for sysadmins installing the software for their institutions in a production environment.
The Burrito Key
^^^^^^^^^^^^^^^
-For reasons that have been lost to the mists of time, the Dataverse Software really wants you to to have a burrito. Specifically, if you're trying to run REST Assured tests and see the error "Dataverse config issue: No API key defined for built in user management", you must run the following curl command (or make an equivalent change to your database):
+For reasons that have been lost to the mists of time, the Dataverse software really wants you to to have a burrito. Specifically, if you're trying to run REST Assured tests and see the error "Dataverse config issue: No API key defined for built in user management", you must run the following curl command (or make an equivalent change to your database):
``curl -X PUT -d 'burrito' http://localhost:8080/api/admin/settings/BuiltinUsers.KEY``
-Without this "burrito" key in place, REST Assured will not be able to create users. We create users to create objects we want to test, such as Dataverse collections, datasets, and files.
+Without this "burrito" key in place, REST Assured will not be able to create users. We create users to create objects we want to test, such as collections, datasets, and files.
-Root Dataverse Collection Permissions
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Root Collection Permissions
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In your browser, log in as dataverseAdmin (password: admin) and click the "Edit" button for your root Dataverse collection. Navigate to Permissions, then the Edit Access button. Under "Who can add to this Dataverse collection?" choose "Anyone with a Dataverse installation account can add sub Dataverse collections and datasets" if it isn't set to this already.
+In your browser, log in as dataverseAdmin (password: admin) and click the "Edit" button for your root collection. Navigate to Permissions, then the Edit Access button. Under "Who can add to this collection?" choose "Anyone with a Dataverse installation account can add sub collections and datasets" if it isn't set to this already.
Alternatively, this same step can be done with this script: ``scripts/search/tests/grant-authusers-add-on-root``
-Publish Root Dataverse Collection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Publish Root Collection
+^^^^^^^^^^^^^^^^^^^^^^^
-The root Dataverse collection must be published for some of the REST Assured tests to run.
+The root collection must be published for some of the REST Assured tests to run.
dataverse.siteUrl
^^^^^^^^^^^^^^^^^
@@ -225,6 +234,20 @@ If ``dataverse.siteUrl`` is absent, you can add it with:
``./asadmin create-jvm-options "-Ddataverse.siteUrl=http\://localhost\:8080"``
+dataverse.oai.server.maxidentifiers
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The OAI Harvesting tests require that the paging limit for ListIdentifiers must be set to 2, in order to be able to trigger this paging behavior without having to create and export too many datasets:
+
+``./asadmin create-jvm-options "-Ddataverse.oai.server.maxidentifiers=2"``
+
+dataverse.oai.server.maxrecords
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The OAI Harvesting tests require that the paging limit for ListRecords must be set to 2, in order to be able to trigger this paging behavior without having to create and export too many datasets:
+
+``./asadmin create-jvm-options "-Ddataverse.oai.server.maxrecords=2"``
+
Identifier Generation
^^^^^^^^^^^^^^^^^^^^^
@@ -245,17 +268,22 @@ Remember, it’s only a test (and it's not graded)! Some guidelines to bear in m
- Map out which logical functions you want to test
- Understand what’s being tested and ensure it’s repeatable
- Assert the conditions of success / return values for each operation
- * A useful resource would be `HTTP status codes `_
+ * A useful resource would be `HTTP status codes `_
- Let the code do the labor; automate everything that happens when you run your test file.
+- If you need to test an optional service (S3, etc.), add it to our docker compose file. See :doc:`/container/dev-usage`.
- Just as with any development, if you’re stuck: ask for help!
-To execute existing integration tests on your local Dataverse installation, a helpful command line tool to use is `Maven `_. You should have Maven installed as per the `Development Environment `_ guide, but if not it’s easily done via Homebrew: ``brew install maven``.
+To execute existing integration tests on your local Dataverse installation from the command line, use Maven. You should have Maven installed as per :doc:`dev-environment`, but if not it's easily done via Homebrew: ``brew install maven``.
Once installed, you may run commands with ``mvn [options] [] []``.
-+ If you want to run just one particular API test, it’s as easy as you think:
++ If you want to run just one particular API test class:
+
+ ``mvn test -Dtest=UsersIT``
- ``mvn test -Dtest=FileRecordJobIT``
++ If you want to run just one particular API test method,
+
+ ``mvn test -Dtest=UsersIT#testMergeAccounts``
+ To run more than one test at a time, separate by commas:
@@ -284,33 +312,37 @@ To run a test with Testcontainers, you will need to write a JUnit 5 test.
Please make sure to:
1. End your test class with ``IT``
-2. Provide a ``@Tag("testcontainers")`` to be picked up during testing.
+2. Annotate the test class with two tags:
-.. code:: java
+ .. code:: java
- /** A very minimal example for a Testcontainers integration test class. */
- @Testcontainers
- @Tag("testcontainers")
- class MyExampleIT { /* ... */ }
+ /** A very minimal example for a Testcontainers integration test class. */
+ @Testcontainers(disabledWithoutDocker = true)
+ @Tag(edu.harvard.iq.dataverse.util.testing.Tags.INTEGRATION_TEST)
+ @Tag(edu.harvard.iq.dataverse.util.testing.Tags.USES_TESTCONTAINERS)
+ class MyExampleIT { /* ... */ }
-If using upstream Modules, e.g. for PostgreSQL or similar, you will need to add
+If using upstream modules, e.g. for PostgreSQL or similar, you will need to add
a dependency to ``pom.xml`` if not present. `See the PostgreSQL module example. `_
To run these tests, simply call out to Maven:
.. code::
- mvn -P tc verify
+ mvn verify
+
+Notes:
-.. note::
+1. Remember to have Docker ready to serve or tests will fail.
+2. You can skip running unit tests by adding ``-DskipUnitTests``
+3. You can choose to ignore test with Testcontainers by adding ``-Dit.groups='integration & !testcontainers'``
+ Learn more about `filter expressions in the JUnit 5 guide `_.
- 1. Remember to have Docker ready to serve or tests will fail.
- 2. This will not run any unit tests or API tests.
-Measuring Coverage of Integration Tests
----------------------------------------
+Measuring Coverage of API Tests
+-------------------------------
-Measuring the code coverage of integration tests with Jacoco requires several steps. In order to make these steps clear we'll use "/usr/local/payara6" as the Payara directory and "dataverse" as the Payara Unix user.
+Measuring the code coverage of API tests with Jacoco requires several steps. In order to make these steps clear we'll use "/usr/local/payara6" as the Payara directory and "dataverse" as the Payara Unix user.
Please note that this was tested under Glassfish 4 but it is hoped that the same steps will work with Payara.
@@ -360,8 +392,8 @@ Run this as the "dataverse" user.
Note that after deployment the file "/usr/local/payara6/glassfish/domains/domain1/config/jacoco.exec" exists and is empty.
-Run Integration Tests
-~~~~~~~~~~~~~~~~~~~~~
+Run API Tests
+~~~~~~~~~~~~~
Note that even though you see "docker-aio" in the command below, we assume you are not necessarily running the test suite within Docker. (Some day we'll probably move this script to another directory.) For this reason, we pass the URL with the normal port (8080) that app servers run on to the ``run-test-suite.sh`` script.
@@ -395,6 +427,10 @@ target/coverage-it/index.html is the place to start reading the code coverage re
Load/Performance Testing
------------------------
+See also :doc:`/qa/performance-tests` in the QA Guide.
+
+.. _locust:
+
Locust
~~~~~~
@@ -494,7 +530,7 @@ Future Work on Integration Tests
- Automate testing of dataverse-client-python: https://github.com/IQSS/dataverse-client-python/issues/10
- Work with @leeper on testing the R client: https://github.com/IQSS/dataverse-client-r
- Review and attempt to implement "API Test Checklist" from @kcondon at https://docs.google.com/document/d/199Oq1YwQ4pYCguaeW48bIN28QAitSk63NbPYxJHCCAE/edit?usp=sharing
-- Generate code coverage reports for **integration** tests: https://github.com/pkainulainen/maven-examples/issues/3 and http://www.petrikainulainen.net/programming/maven/creating-code-coverage-reports-for-unit-and-integration-tests-with-the-jacoco-maven-plugin/
+- Generate code coverage reports for **integration** tests: https://github.com/pkainulainen/maven-examples/issues/3 and https://www.petrikainulainen.net/programming/maven/creating-code-coverage-reports-for-unit-and-integration-tests-with-the-jacoco-maven-plugin/
- Consistent logging of API Tests. Show test name at the beginning and end and status codes returned.
- expected passing and known/expected failing integration tests: https://github.com/IQSS/dataverse/issues/4438
@@ -519,7 +555,3 @@ Future Work on Accessibility Testing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Using https://github.com/GlobalDataverseCommunityConsortium/dataverse-ansible and hooks available from accessibility testing tools, automate the running of accessibility tools on PRs so that developers will receive quicker feedback on proposed code changes that reduce the accessibility of the application.
-
-----
-
-Previous: :doc:`sql-upgrade-scripts` | Next: :doc:`documentation`
diff --git a/doc/sphinx-guides/source/developers/tips.rst b/doc/sphinx-guides/source/developers/tips.rst
index e1ee40cafa5..839ae3aa19d 100755
--- a/doc/sphinx-guides/source/developers/tips.rst
+++ b/doc/sphinx-guides/source/developers/tips.rst
@@ -2,7 +2,7 @@
Tips
====
-If you just followed the steps in :doc:`dev-environment` for the first time, you will need to get set up to deploy code to your app server. Below you'll find other tips as well.
+If you just followed the steps in :doc:`classic-dev-env` for the first time, you will need to get set up to deploy code to your app server. Below you'll find other tips as well.
.. contents:: |toctitle|
:local:
@@ -10,7 +10,7 @@ If you just followed the steps in :doc:`dev-environment` for the first time, you
Iterating on Code and Redeploying
---------------------------------
-When you followed the steps in the :doc:`dev-environment` section, the war file was deployed to Payara by the Dataverse Software installation script. That's fine but once you're ready to make a change to the code you will need to get comfortable with undeploying and redeploying code (a war file) to Payara.
+When you followed the steps in the :doc:`classic-dev-env` section, the war file was deployed to Payara by the Dataverse Software installation script. That's fine but once you're ready to make a change to the code you will need to get comfortable with undeploying and redeploying code (a war file) to Payara.
It's certainly possible to manage deployment and undeployment of the war file via the command line using the ``asadmin`` command that ships with Payara (that's what the Dataverse Software installation script uses and the steps are documented below), but we recommend getting set up with an IDE such as Netbeans to manage deployment for you.
@@ -99,7 +99,7 @@ With over 100 tables, the Dataverse Software PostgreSQL database ("dvndb") can b
pgAdmin
~~~~~~~~
-Back in the :doc:`dev-environment` section, we had you install pgAdmin, which can help you explore the tables and execute SQL commands. It's also listed in the :doc:`tools` section.
+Back in the :doc:`classic-dev-env` section, we had you install pgAdmin, which can help you explore the tables and execute SQL commands. It's also listed in the :doc:`tools` section.
SchemaSpy
~~~~~~~~~
@@ -238,6 +238,4 @@ with the following code in ``SettingsWrapper.java``:
A more serious example would be direct calls to PermissionServiceBean methods used in render logic expressions. This is something that has happened and caused some problems in real life. A simple permission service lookup (for example, whether a user is authorized to create a dataset in the current dataverse) can easily take 15 database queries. Repeated multiple times, this can quickly become a measurable delay in rendering the page. PermissionsWrapper must be used exclusively for any such lookups from JSF pages.
-----
-
-Previous: :doc:`dev-environment` | Next: :doc:`troubleshooting`
+See also :doc:`performance`.
diff --git a/doc/sphinx-guides/source/developers/tools.rst b/doc/sphinx-guides/source/developers/tools.rst
index a21becd14cf..9b3e38232e8 100755
--- a/doc/sphinx-guides/source/developers/tools.rst
+++ b/doc/sphinx-guides/source/developers/tools.rst
@@ -2,11 +2,16 @@
Tools
=====
-These are handy tools for your :doc:`/developers/dev-environment/`.
+These are handy tools for your :doc:`dev-environment`.
.. contents:: |toctitle|
:local:
+Tools for Faster Deployment
++++++++++++++++++++++++++++
+
+See :ref:`ide-trigger-code-deploy` in the Container Guide.
+
Netbeans Connector Chrome Extension
+++++++++++++++++++++++++++++++++++
@@ -18,7 +23,7 @@ Unfortunately, while the Netbeans Connector Chrome Extension used to "just work"
pgAdmin
+++++++
-You probably installed pgAdmin when following the steps in the :doc:`dev-environment` section but if not, you can download it from https://www.pgadmin.org
+You may have installed pgAdmin when following the steps in the :doc:`classic-dev-env` section but if not, you can download it from https://www.pgadmin.org
Maven
+++++
@@ -28,20 +33,20 @@ With Maven installed you can run ``mvn package`` and ``mvn test`` from the comma
PlantUML
++++++++
-PlantUML is used to create diagrams in the guides and other places. Download it from http://plantuml.com and check out an example script at https://github.com/IQSS/dataverse/blob/v4.6.1/doc/Architecture/components.sh . Note that for this script to work, you'll need the ``dot`` program, which can be installed on Mac with ``brew install graphviz``.
+PlantUML is used to create diagrams in the guides and other places. Download it from https://plantuml.com and check out an example script at https://github.com/IQSS/dataverse/blob/v4.6.1/doc/Architecture/components.sh . Note that for this script to work, you'll need the ``dot`` program, which can be installed on Mac with ``brew install graphviz``.
Eclipse Memory Analyzer Tool (MAT)
++++++++++++++++++++++++++++++++++
The Memory Analyzer Tool (MAT) from Eclipse can help you analyze heap dumps, showing you "leak suspects" such as seen at https://github.com/payara/Payara/issues/350#issuecomment-115262625
-It can be downloaded from http://www.eclipse.org/mat
+It can be downloaded from https://www.eclipse.org/mat
If the heap dump provided to you was created with ``gcore`` (such as with ``gcore -o /tmp/app.core $app_pid``) rather than ``jmap``, you will need to convert the file before you can open it in MAT. Using ``app.core.13849`` as example of the original 33 GB file, here is how you could convert it into a 26 GB ``app.core.13849.hprof`` file. Please note that this operation took almost 90 minutes:
``/usr/java7/bin/jmap -dump:format=b,file=app.core.13849.hprof /usr/java7/bin/java app.core.13849``
-A file of this size may not "just work" in MAT. When you attempt to open it you may see something like "An internal error occurred during: "Parsing heap dump from '/tmp/heapdumps/app.core.13849.hprof'". Java heap space". If so, you will need to increase the memory allocated to MAT. On Mac OS X, this can be done by editing ``MemoryAnalyzer.app/Contents/MacOS/MemoryAnalyzer.ini`` and increasing the value "-Xmx1024m" until it's high enough to open the file. See also http://wiki.eclipse.org/index.php/MemoryAnalyzer/FAQ#Out_of_Memory_Error_while_Running_the_Memory_Analyzer
+A file of this size may not "just work" in MAT. When you attempt to open it you may see something like "An internal error occurred during: "Parsing heap dump from '/tmp/heapdumps/app.core.13849.hprof'". Java heap space". If so, you will need to increase the memory allocated to MAT. On Mac OS X, this can be done by editing ``MemoryAnalyzer.app/Contents/MacOS/MemoryAnalyzer.ini`` and increasing the value "-Xmx1024m" until it's high enough to open the file. See also https://wiki.eclipse.org/index.php/MemoryAnalyzer/FAQ#Out_of_Memory_Error_while_Running_the_Memory_Analyzer
PageKite
++++++++
@@ -58,7 +63,7 @@ The first time you run ``./pagekite.py`` a file at ``~/.pagekite.rc`` will be
created. You can edit this file to configure PageKite to serve up port 8080
(the default app server HTTP port) or the port of your choosing.
-According to https://pagekite.net/support/free-for-foss/ PageKite (very generously!) offers free accounts to developers writing software the meets http://opensource.org/docs/definition.php such as the Dataverse Project.
+According to https://pagekite.net/support/free-for-foss/ PageKite (very generously!) offers free accounts to developers writing software the meets https://opensource.org/docs/definition.php such as the Dataverse Project.
MSV
+++
@@ -96,7 +101,7 @@ Download SonarQube from https://www.sonarqube.org and start look in the `bin` di
-Dsonar.test.exclusions='src/test/**,src/main/webapp/resources/**' \
-Dsonar.issuesReport.html.enable=true \
-Dsonar.issuesReport.html.location='sonar-issues-report.html' \
- -Dsonar.jacoco.reportPath=target/jacoco.exec
+ -Dsonar.jacoco.reportPath=target/coverage-reports/jacoco-unit.exec
Once the analysis is complete, you should be able to access http://localhost:9000/dashboard?id=edu.harvard.iq%3Adataverse to see the report. To learn about resource leaks, for example, click on "Bugs", the "Tag", then "leak" or "Rule", then "Resources should be closed".
@@ -261,10 +266,3 @@ We can see that the first ``FGC`` resulted in reducing the ``"O"`` by almost 7GB
etc. ...
It is clearly growing - so now we can conclude that indeed something there is using memory in a way that's not recoverable, and this is a clear problem.
-
-
-
-
-----
-
-Previous: :doc:`making-releases` | Next: :doc:`unf/index`
diff --git a/doc/sphinx-guides/source/developers/troubleshooting.rst b/doc/sphinx-guides/source/developers/troubleshooting.rst
index 832785f9860..2c437ca8b2e 100755
--- a/doc/sphinx-guides/source/developers/troubleshooting.rst
+++ b/doc/sphinx-guides/source/developers/troubleshooting.rst
@@ -2,7 +2,7 @@
Troubleshooting
===============
-Over in the :doc:`dev-environment` section we described the "happy path" of when everything goes right as you set up your Dataverse Software development environment. Here are some common problems and solutions for when things go wrong.
+Over in the :doc:`classic-dev-env` section we described the "happy path" of when everything goes right as you set up your Dataverse Software development environment. Here are some common problems and solutions for when things go wrong.
.. contents:: |toctitle|
:local:
@@ -110,7 +110,3 @@ If you are seeing ``Response code: 400, [url] domain of URL is not allowed`` it'
``./asadmin delete-jvm-options '-Ddataverse.siteUrl=http\://localhost\:8080'``
``./asadmin create-jvm-options '-Ddataverse.siteUrl=http\://demo.dataverse.org'``
-
-----
-
-Previous: :doc:`tips` | Next: :doc:`version-control`
diff --git a/doc/sphinx-guides/source/developers/unf/index.rst b/doc/sphinx-guides/source/developers/unf/index.rst
index 2423877348f..596bb0cf3bf 100644
--- a/doc/sphinx-guides/source/developers/unf/index.rst
+++ b/doc/sphinx-guides/source/developers/unf/index.rst
@@ -27,7 +27,7 @@ with Dataverse Software 2.0 and throughout the 3.* lifecycle, UNF v.5
UNF v.6. Two parallel implementation, in R and Java, will be
available, for cross-validation.
-Learn more: Micah Altman and Gary King. 2007. “A Proposed Standard for the Scholarly Citation of Quantitative Data.†D-Lib Magazine, 13. Publisher’s Version Copy at http://j.mp/2ovSzoT
+Learn more: Micah Altman and Gary King. 2007. “A Proposed Standard for the Scholarly Citation of Quantitative Data.†D-Lib Magazine, 13. Publisher’s Version Copy at https://j.mp/2ovSzoT
**Contents:**
@@ -37,7 +37,3 @@ Learn more: Micah Altman and Gary King. 2007. “A Proposed Standard for the Sch
unf-v3
unf-v5
unf-v6
-
-----
-
-Previous: :doc:`/developers/tools` | Next: :doc:`/developers/remote-users`
diff --git a/doc/sphinx-guides/source/developers/unf/unf-v3.rst b/doc/sphinx-guides/source/developers/unf/unf-v3.rst
index 3f0018d7fa5..98c07b398e0 100644
--- a/doc/sphinx-guides/source/developers/unf/unf-v3.rst
+++ b/doc/sphinx-guides/source/developers/unf/unf-v3.rst
@@ -34,11 +34,11 @@ For example, the number pi at five digits is represented as -3.1415e+, and the n
1. Terminate character strings representing nonmissing values with a POSIX end-of-line character.
-2. Encode each character string with `Unicode bit encoding `_. Versions 3 through 4 use UTF-32BE; Version 4.1 uses UTF-8.
+2. Encode each character string with `Unicode bit encoding `_. Versions 3 through 4 use UTF-32BE; Version 4.1 uses UTF-8.
3. Combine the vector of character strings into a single sequence, with each character string separated by a POSIX end-of-line character and a null byte.
-4. Compute a hash on the resulting sequence using the standard MD5 hashing algorithm for Version 3 and using `SHA256 `_ for Version 4. The resulting hash is `base64 `_ encoded to support readability.
+4. Compute a hash on the resulting sequence using the standard MD5 hashing algorithm for Version 3 and using `SHA256 `_ for Version 4. The resulting hash is `base64 `_ encoded to support readability.
5. Calculate the UNF for each lower-level data object, using a consistent UNF version and level of precision across the individual UNFs being combined.
@@ -49,4 +49,4 @@ For example, the number pi at five digits is represented as -3.1415e+, and the n
8. Combine UNFs from multiple variables to form a single UNF for an entire data frame, and then combine UNFs for a set of data frames to form a single UNF that represents an entire research study.
Learn more:
-Software for computing UNFs is available in an R Module, which includes a Windows standalone tool and code for Stata and SAS languages. Also see the following for more details: Micah Altman and Gary King. 2007. "A Proposed Standard for the Scholarly Citation of Quantitative Data," D-Lib Magazine, Vol. 13, No. 3/4 (March). (Abstract: `HTML `_ | Article: `PDF `_)
+Software for computing UNFs is available in an R Module, which includes a Windows standalone tool and code for Stata and SAS languages. Also see the following for more details: Micah Altman and Gary King. 2007. "A Proposed Standard for the Scholarly Citation of Quantitative Data," D-Lib Magazine, Vol. 13, No. 3/4 (March). (Abstract: `HTML `_ | Article: `PDF `_)
diff --git a/doc/sphinx-guides/source/developers/unf/unf-v6.rst b/doc/sphinx-guides/source/developers/unf/unf-v6.rst
index 9648bae47c8..b2495ff3dd9 100644
--- a/doc/sphinx-guides/source/developers/unf/unf-v6.rst
+++ b/doc/sphinx-guides/source/developers/unf/unf-v6.rst
@@ -156,7 +156,7 @@ For example, to specify a non-default precision the parameter it is specified us
| Allowed values are {``128`` , ``192`` , ``196`` , ``256``} with ``128`` being the default.
| ``R1`` - **truncate** numeric values to ``N`` digits, **instead of rounding**, as previously described.
-`Dr. Micah Altman's classic UNF v5 paper `_ mentions another optional parameter ``T###``, for specifying rounding of date and time values (implemented as stripping the values of entire components - fractional seconds, seconds, minutes, hours... etc., progressively) - but it doesn't specify its syntax. It is left as an exercise for a curious reader to contact the author and work out the details, if so desired. (Not implemented in UNF Version 6 by the Dataverse Project).
+`Dr. Micah Altman's classic UNF v5 paper `_ mentions another optional parameter ``T###``, for specifying rounding of date and time values (implemented as stripping the values of entire components - fractional seconds, seconds, minutes, hours... etc., progressively) - but it doesn't specify its syntax. It is left as an exercise for a curious reader to contact the author and work out the details, if so desired. (Not implemented in UNF Version 6 by the Dataverse Project).
Note: we do not recommend truncating character strings at fewer bytes than the default ``128`` (the ``X`` parameter). At the very least this number **must** be high enough so that the printable UNFs of individual variables or files are not truncated, when calculating combined UNFs of files or datasets, respectively.
diff --git a/doc/sphinx-guides/source/developers/version-control.rst b/doc/sphinx-guides/source/developers/version-control.rst
index aacc245af5a..c5669d02e77 100644
--- a/doc/sphinx-guides/source/developers/version-control.rst
+++ b/doc/sphinx-guides/source/developers/version-control.rst
@@ -24,7 +24,7 @@ The goals of the Dataverse Software branching strategy are:
- allow for concurrent development
- only ship stable code
-We follow a simplified "git flow" model described at http://nvie.com/posts/a-successful-git-branching-model/ involving a "master" branch, a "develop" branch, and feature branches such as "1234-bug-fix".
+We follow a simplified "git flow" model described at https://nvie.com/posts/a-successful-git-branching-model/ involving a "master" branch, a "develop" branch, and feature branches such as "1234-bug-fix".
Branches
~~~~~~~~
@@ -34,6 +34,8 @@ The "master" Branch
The "`master `_" branch represents released versions of the Dataverse Software. As mentioned in the :doc:`making-releases` section, at release time we update the master branch to include all the code for that release. Commits are never made directly to master. Rather, master is updated only when we merge code into it from the "develop" branch.
+.. _develop-branch:
+
The "develop" Branch
********************
@@ -65,21 +67,65 @@ The example of creating a pull request below has to do with fixing an important
Find or Create a GitHub Issue
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-For guidance on which issue to work on, please ask! Also, see https://github.com/IQSS/dataverse/blob/develop/CONTRIBUTING.md
+An issue represents a bug (unexpected behavior) or a new feature in Dataverse. We'll use the issue number in the branch we create for our pull request.
+
+Finding GitHub Issues to Work On
+********************************
+
+Assuming this is your first contribution to Dataverse, you should start with something small. The following issue labels might be helpful in your search:
+
+- `good first issue `_ (these appear at https://github.com/IQSS/dataverse/contribute )
+- `hacktoberfest `_
+- `Help Wanted: Code `_
+- `Help Wanted: Documentation `_
+
+For guidance on which issue to work on, please ask! :ref:`getting-help-developers` explains how to get in touch.
+
+Creating GitHub Issues to Work On
+*********************************
+
+You are very welcome to create a GitHub issue to work on. However, for significant changes, please reach out (see :ref:`getting-help-developers`) to make sure the team and community agree with the proposed change.
+
+For small changes and especially typo fixes, please don't worry about reaching out first.
+
+Communicate Which Issue You Are Working On
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Let's say you want to tackle https://github.com/IQSS/dataverse/issues/3728 which points out a typo in a page of the Dataverse Software's documentation.
+In the issue you can simply leave a comment to say you're working on it.
If you tell us your GitHub username we are happy to add you to the "read only" team at https://github.com/orgs/IQSS/teams/dataverse-readonly/members so that we can assign the issue to you while you're working on it. You can also tell us if you'd like to be added to the `Dataverse Community Contributors spreadsheet `_.
-Create a New Branch off the develop Branch
+Create a New Branch Off the develop Branch
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Always create your feature branch from the latest code in develop, pulling the latest code if necessary. As mentioned above, your branch should have a name like "3728-doc-apipolicy-fix" that starts with the issue number you are addressing, and ends with a short, descriptive name. Dashes ("-") and underscores ("_") in your branch name are ok, but please try to avoid other special characters such as ampersands ("&") that have special meaning in Unix shells.
+Always create your feature branch from the latest code in develop, pulling the latest code if necessary. As mentioned above, your branch should have a name like "3728-doc-apipolicy-fix" that starts with the issue number you are addressing (e.g. `#3728 `_) and ends with a short, descriptive name. Dashes ("-") and underscores ("_") in your branch name are ok, but please try to avoid other special characters such as ampersands ("&") that have special meaning in Unix shells.
Commit Your Change to Your New Branch
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Making a commit (or several commits) to that branch. Ideally the first line of your commit message includes the number of the issue you are addressing, such as ``Fixed BlockedApiPolicy #3728``.
+For each commit to that branch, try to include the issue number along with a summary in the first line of the commit message, such as ``Fixed BlockedApiPolicy #3728``. You are welcome to write longer descriptions in the body as well!
+
+.. _writing-release-note-snippets:
+
+Writing a Release Note Snippet
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+We highly value your insight as a contributor when in comes to describing your work in our release notes. Not every pull request will be mentioned in release notes but most are.
+
+As described at :ref:`write-release-notes`, at release time we compile together release note "snippets" into the final release notes.
+
+Here's how to add a release note snippet to your pull request:
+
+- Create a Markdown file under ``doc/release-notes``. You can reuse the name of your branch and append ".md" to it, e.g. ``3728-doc-apipolicy-fix.md``
+- Edit the snippet to include anything you think should be mentioned in the release notes, such as:
+
+ - Descriptions of new features
+ - Explanations of bugs fixed
+ - New configuration settings
+ - Upgrade instructions
+ - Etc.
+
+Release note snippets do not need to be long. For a new feature, a single line description might be enough. Please note that your release note will likely be edited (expanded or shortened) when the final release notes are being created.
Push Your Branch to GitHub
~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -89,14 +135,16 @@ Push your feature branch to your fork of the Dataverse Software. Your git comman
Make a Pull Request
~~~~~~~~~~~~~~~~~~~
-Make a pull request to get approval to merge your changes into the develop branch. Note that once a pull request is created, we'll remove the corresponding issue from our kanban board so that we're only tracking one card.
+Make a pull request to get approval to merge your changes into the develop branch.
+If the pull request notes indicate that release notes are necessary, the workflow can then verify the existence of a corresponding file and respond with a 'thank you!' message. On the other hand, if no release notes are detected, the contributor can be gently reminded of their absence. Please see :doc:`making-releases` for guidance on writing release notes.
+Note that once a pull request is created, we'll remove the corresponding issue from our kanban board so that we're only tracking one card.
Feedback on the pull request template we use is welcome! Here's an example of a pull request for issue #3827: https://github.com/IQSS/dataverse/pull/3827
Make Sure Your Pull Request Has Been Advanced to Code Review
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Now that you've made your pull request, your goal is to make sure it appears in the "Code Review" column at https://github.com/orgs/IQSS/projects/2.
+Now that you've made your pull request, your goal is to make sure it appears in the "Code Review" column at https://github.com/orgs/IQSS/projects/34.
Look at https://github.com/IQSS/dataverse/blob/master/CONTRIBUTING.md for various ways to reach out to developers who have enough access to the GitHub repo to move your issue and pull request to the "Code Review" column.
@@ -238,7 +286,3 @@ GitHub documents how to make changes to a fork at https://help.github.com/articl
vim path/to/file.txt
git commit
git push OdumInstitute 4709-postgresql_96
-
-----
-
-Previous: :doc:`troubleshooting` | Next: :doc:`sql-upgrade-scripts`
diff --git a/doc/sphinx-guides/source/index.rst b/doc/sphinx-guides/source/index.rst
index f6eda53d718..3184160b387 100755
--- a/doc/sphinx-guides/source/index.rst
+++ b/doc/sphinx-guides/source/index.rst
@@ -20,6 +20,7 @@ These documentation guides are for the |version| version of Dataverse. To find g
developers/index
container/index
style/index
+ qa/index.md
How the Guides Are Organized
----------------------------
@@ -45,7 +46,7 @@ Other Resources
Additional information about the Dataverse Project itself
including presentations, information about upcoming releases, data
management and citation, and announcements can be found at
-`http://dataverse.org/ `__
+`https://dataverse.org/ `__
**User Group**
@@ -68,7 +69,7 @@ The support email address is `support@dataverse.org `__
-or use `GitHub pull requests `__,
+or use `GitHub pull requests `__,
if you have some code, scripts or documentation that you'd like to share.
If you have a **security issue** to report, please email `security@dataverse.org `__. See also :ref:`reporting-security-issues`.
diff --git a/doc/sphinx-guides/source/installation/advanced.rst b/doc/sphinx-guides/source/installation/advanced.rst
index 87f2a4fd0ab..3de5d0ea07c 100644
--- a/doc/sphinx-guides/source/installation/advanced.rst
+++ b/doc/sphinx-guides/source/installation/advanced.rst
@@ -7,6 +7,8 @@ Advanced installations are not officially supported but here we are at least doc
.. contents:: |toctitle|
:local:
+.. _multiple-app-servers:
+
Multiple App Servers
--------------------
diff --git a/doc/sphinx-guides/source/installation/config.rst b/doc/sphinx-guides/source/installation/config.rst
index f9fe74afc7c..2baa2827250 100644
--- a/doc/sphinx-guides/source/installation/config.rst
+++ b/doc/sphinx-guides/source/installation/config.rst
@@ -1,4 +1,3 @@
-=============
Configuration
=============
@@ -143,7 +142,7 @@ The need to redirect port HTTP (port 80) to HTTPS (port 443) for security has al
Your decision to proxy or not should primarily be driven by which features of the Dataverse Software you'd like to use. If you'd like to use Shibboleth, the decision is easy because proxying or "fronting" Payara with Apache is required. The details are covered in the :doc:`shibboleth` section.
-Even if you have no interest in Shibboleth, you may want to front your Dataverse installation with Apache or nginx to simply the process of installing SSL certificates. There are many tutorials on the Internet for adding certs to Apache, including a some `notes used by the Dataverse Project team `_, but the process of adding a certificate to Payara is arduous and not for the faint of heart. The Dataverse Project team cannot provide much help with adding certificates to Payara beyond linking to `tips `_ on the web.
+Even if you have no interest in Shibboleth, you may want to front your Dataverse installation with Apache or nginx to simply the process of installing SSL certificates. There are many tutorials on the Internet for adding certs to Apache, including a some `notes used by the Dataverse Project team `_, but the process of adding a certificate to Payara is arduous and not for the faint of heart. The Dataverse Project team cannot provide much help with adding certificates to Payara beyond linking to `tips `_ on the web.
Still not convinced you should put Payara behind another web server? Even if you manage to get your SSL certificate into Payara, how are you going to run Payara on low ports such as 80 and 443? Are you going to run Payara as root? Bad idea. This is a security risk. Under "Additional Recommendations" under "Securing Your Installation" above you are advised to configure Payara to run as a user other than root.
@@ -155,7 +154,7 @@ If you really don't want to front Payara with any proxy (not recommended), you c
``./asadmin set server-config.network-config.network-listeners.network-listener.http-listener-2.port=443``
-What about port 80? Even if you don't front your Dataverse installation with Apache, you may want to let Apache run on port 80 just to rewrite HTTP to HTTPS as described above. You can use a similar command as above to change the HTTP port that Payara uses from 8080 to 80 (substitute ``http-listener-1.port=80``). Payara can be used to enforce HTTPS on its own without Apache, but configuring this is an exercise for the reader. Answers here may be helpful: http://stackoverflow.com/questions/25122025/glassfish-v4-java-7-port-unification-error-not-able-to-redirect-http-to
+What about port 80? Even if you don't front your Dataverse installation with Apache, you may want to let Apache run on port 80 just to rewrite HTTP to HTTPS as described above. You can use a similar command as above to change the HTTP port that Payara uses from 8080 to 80 (substitute ``http-listener-1.port=80``). Payara can be used to enforce HTTPS on its own without Apache, but configuring this is an exercise for the reader. Answers here may be helpful: https://stackoverflow.com/questions/25122025/glassfish-v4-java-7-port-unification-error-not-able-to-redirect-http-to
If you are running an installation with Apache and Payara on the same server, and would like to restrict Payara from responding to any requests to port 8080 from external hosts (in other words, not through Apache), you can restrict the AJP listener to localhost only with:
@@ -179,56 +178,383 @@ Persistent Identifiers and Publishing Datasets
Persistent identifiers (PIDs) are a required and integral part of the Dataverse Software. They provide a URL that is
guaranteed to resolve to the datasets or files they represent. The Dataverse Software currently supports creating
-identifiers using one of several PID providers. The most appropriate PIDs for public data are DOIs (provided by
+identifiers using any of several PID types. The most appropriate PIDs for public data are DOIs (e.g., provided by
DataCite or EZID) and Handles. Dataverse also supports PermaLinks which could be useful for intranet or catalog use
cases. A DOI provider called "FAKE" is recommended only for testing and development purposes.
+Dataverse can be configured with one or more PID providers, each of which can mint and manage PIDs with a given protocol
+(e.g., doi, handle, permalink) using a specific service provider/account (e.g. with DataCite, EZId, or HandleNet)
+to manage an authority/shoulder combination, aka a "prefix" (PermaLinks also support custom separator characters as part of the prefix),
+along with an optional list of individual PIDs (with different authority/shoulders) than can be managed with that account.
+
Testing PID Providers
+++++++++++++++++++++
-By default, the installer configures the DataCite test service as the registration provider. DataCite requires that you
-register for a test account, configured with your own prefix (please contact support@datacite.org).
+By default, the installer configures the Fake DOI provider as the registration provider. Unlike other DOI Providers, the Fake Provider does not involve any
+external resolution service and is not appropriate for use beyond development and testing. You may wish instead to test with
+PermaLinks or with a DataCite test account (which uses DataCite's test infrastructure and will help assure your Dataverse instance can make network connections to DataCite.
+DataCite requires that you register for a test account, which will have a username, password and your own prefix (please contact support@datacite.org for a test account.
+You may wish to `contact the GDCC `_ instead - GDCC is able to provide DataCite accounts with a group discount and can also provide test accounts.).
Once you receive the login name, password, and prefix for the account,
-configure the credentials via :ref:`dataverse.pid.datacite.username` and
-:ref:`dataverse.pid.datacite.password`, then restart Payara.
-
-Configure the prefix via the API (where it is referred to as :ref:`:Authority`):
+configure the credentials as described below.
-``curl -X PUT -d 10.xxxx http://localhost:8080/api/admin/settings/:Authority``
+Alternately, you may wish to configure other providers for testing:
-.. TIP::
- This testing section is oriented around DataCite but other PID Providers can be tested as well.
-
- EZID is available to University of California scholars and researchers. Testing can be done using the authority 10.5072 and shoulder FK2 with the "apitest" account (contact EZID for credentials) or an institutional account. Configuration in Dataverse is then analogous to using DataCite.
- - The PermaLink and FAKE DOI providers do not involve an external account. See :ref:`permalinks` and (for the FAKE DOI provider) the :doc:`/developers/dev-environment` section of the Developer Guide.
+ - The PermaLink provider, like the FAKE DOI provider, does not involve an external account.
+ Unlike the Fake DOI provider, the PermaLink provider creates PIDs that begin with "perma:", making it clearer that they are not DOIs,
+ and that do resolve to the local dataset/file page in Dataverse, making them useful for some production use cases. See :ref:`permalinks` and (for the FAKE DOI provider) the :doc:`/developers/dev-environment` section of the Developer Guide.
-Once all is configured, you will be able to publish datasets and files, but **the persistent identifiers will not be citable**,
-and they will only resolve from the DataCite test environment (and then only if the Dataverse installation from which
-you published them is accessible - DOIs minted from your laptop will not resolve). Note that any datasets or files
-created using the test configuration cannot be directly migrated and would need to be created again once a valid DOI
-namespace is configured.
+Provider-specific configuration is described below.
-One you are done testing, to properly configure persistent identifiers for a production installation, an account and associated namespace must be
-acquired for a fee from a DOI or HDL provider. **DataCite** (https://www.datacite.org) is the recommended DOI provider
+Once all is configured, you will be able to publish datasets and files, but **the persistent identifiers will not be citable**
+as they, with the exception of PermaLinks, will not redirect to your dataset page in Dataverse.
+
+Note that any datasets or files created using a test configuration cannot be directly migrated to a production PID provider
+and would need to be created again once a valid PID Provider(s) are configured.
+
+One you are done testing, to properly configure persistent identifiers for a production installation, an account and associated namespace (e.g. authority/shoulder) must be
+acquired for a fee from a DOI or HDL provider. (As noted above, PermaLinks May be appropriate for intranet and catalog uses cases.)
+**DataCite** (https://www.datacite.org) is the recommended DOI provider
(see https://dataversecommunity.global for more on joining DataCite through the Global Dataverse Community Consortium) but **EZID**
(http://ezid.cdlib.org) is an option for the University of California according to
https://www.cdlib.org/cdlinfo/2017/08/04/ezid-doi-service-is-evolving/ .
**Handle.Net** (https://www.handle.net) is the HDL provider.
-Once you have your DOI or Handle account credentials and a namespace, configure your Dataverse installation
-using the JVM options and database settings below.
+Once you have your DOI or Handle account credentials and a prefix, configure your Dataverse installation
+using the settings below.
+
+
+Configuring PID Providers
++++++++++++++++++++++++++
+
+There are two required global settings to configure PID providers - the list of ids of providers and which one of those should be the default.
+Per-provider settings are also required - some that are common to all types and some type specific. All of these settings are defined
+to be compatible with the MicroProfile specification which means that
+
+1. Any of these settings can be set via system properties (see :ref:`jvm-options` for how to do this), environment variables, or other
+ MicroProfile Config mechanisms supported by the app server.
+ `See Payara docs for supported sources `_.
+2. Remember to protect your secrets. For passwords, use an environment variable (bare minimum), a password alias named the same
+ as the key (OK) or use the `"dir config source" of Payara `_ (best).
+
+ Alias creation example:
+
+ .. code-block:: shell
+
+ echo "AS_ADMIN_ALIASPASSWORD=changeme" > /tmp/p.txt
+ asadmin create-password-alias --passwordfile /tmp/p.txt dataverse.pid.datacite1.datacite.password
+ rm /tmp/p.txt
+
+3. Environment variables follow the key, replacing any dot, colon, dash, etc. into an underscore "_" and all uppercase
+ letters. Example: ``dataverse.pid.default-provider`` -> ``DATAVERSE_PID_DEFAULT_PROVIDER``
+
+Global Settings
+^^^^^^^^^^^^^^^
+
+The following three global settings are required to configure PID Providers in the Dataverse software:
+
+.. _dataverse.pid.providers:
+
+dataverse.pid.providers
+^^^^^^^^^^^^^^^^^^^^^^^
+
+A comma-separated list of the ids of the PID providers to use. IDs should be simple unique text strings, e.g. datacite1, perma1, etc.
+IDs are used to scope the provider-specific settings but are not directly visible to users.
+
+.. _dataverse.pid.default-provider:
+
+dataverse.pid.default-provider
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ID of the default PID provider to use.
+
+.. _dataverse.spi.pidproviders.directory:
+
+dataverse.spi.pidproviders.directory
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The path to the directory where JAR files containing additional types of PID Providers can be added.
+Dataverse includes providers that support DOIs (DataCite, EZId, or FAKE), Handles, and PermaLinks.
+PID provider jar files added to this directory can replace any of these or add new PID Providers.
+
+Per-Provider Settings
+^^^^^^^^^^^^^^^^^^^^^
+
+Each Provider listed by id in the dataverse.pid.providers setting must be configured with the following common settings and any settings that are specific to the provider type.
+
+.. _dataverse.pid.*.type:
+
+dataverse.pid.*.type
+^^^^^^^^^^^^^^^^^^^^
+
+The Provider type, currently one of ``datacite``, ``ezid``, ``FAKE``, ``hdl``, or ``perma``. The type defines which protocol a service supports (DOI, Handle, or PermaLink) and, for DOI Providers, which
+DOI service is used.
+
+.. _dataverse.pid.*.label:
+
+dataverse.pid.*.label
+^^^^^^^^^^^^^^^^^^^^^
+
+A human-readable label for the provider
+
+.. _dataverse.pid.*.authority:
+
+dataverse.pid.*.authority
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. _dataverse.pid.*.shoulder:
+
+dataverse.pid.*.shoulder
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+In general, PIDs are of the form ``:/*`` where ``*`` is the portion unique to an individual PID. PID Providers must define
+the authority and shoulder (with the protocol defined by the ``dataverse.pid.*.type`` setting) that defines the set of existing PIDs they can manage and the prefix they can use when minting new PIDs.
+(Often an account with a PID service provider will be limited to using a single authority/shoulder. If your PID service provider account allows more than one combination that you wish to use in Dataverse, configure multiple PID Provider, one for each combination.)
+
+.. _dataverse.pid.*.identifier-generation-style:
+
+dataverse.pid.*.identifier-generation-style
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+By default, Pid Providers in Dataverse generate a random 6 character string,
+pre-pended by the Shoulder if set, to use as the identifier for a Dataset.
+Set this to ``storedProcGenerated`` to generate instead a custom *unique*
+identifier (again pre-pended by the Shoulder if set) through a database
+stored procedure or function (the assumed default setting is ``randomString``).
+When using the ``storedProcGenerated`` setting, a stored procedure or function must be created in
+the database.
+
+As a first example, the script below (downloadable
+:download:`here `) produces
+sequential numerical values. You may need to make some changes to suit your
+system setup, see the comments for more information:
+
+.. literalinclude:: ../_static/util/createsequence.sql
+ :language: plpgsql
+
+As a second example, the script below (downloadable
+:download:`here `) produces
+sequential 8 character identifiers from a base36 representation of current
+timestamp.
+
+.. literalinclude:: ../_static/util/identifier_from_timestamp.sql
+ :language: plpgsql
+
+Note that the SQL in these examples scripts is Postgres-specific.
+If necessary, it can be reimplemented in any other SQL flavor - the standard
+JPA code in the application simply expects the database to have a saved
+function ("stored procedure") named ``generateIdentifierFromStoredProcedure()``
+returning a single ``varchar`` argument.
+
+Please note that this setting interacts with the ``dataverse.pid.*.datafile-pid-format``
+setting below to determine how datafile identifiers are generated.
+
+
+.. _dataverse.pid.*.datafile-pid-format:
+
+dataverse.pid.*.datafile-pid-format
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This setting controls the way that the "identifier" component of a file's
+persistent identifier (PID) relates to the PID of its "parent" dataset - for a give PID Provider.
+
+By default the identifier for a file is dependent on its parent dataset.
+For example, if the identifier of a dataset is "TJCLKP", the identifier for
+a file within that dataset will consist of the parent dataset's identifier
+followed by a slash ("/"), followed by a random 6 character string,
+yielding "TJCLKP/MLGWJO". Identifiers in this format are what you should
+expect if you leave ``dataverse.pid.*.datafile-pid-format`` undefined or set it to
+``DEPENDENT`` and have not changed the ``dataverse.pid.*.identifier-generation-style``
+setting from its default.
+
+Alternatively, the identifier for File PIDs can be configured to be
+independent of Dataset PIDs using the setting ``INDEPENDENT``.
+In this case, file PIDs will not contain the PIDs of their parent datasets,
+and their PIDs will be generated the exact same way that datasets' PIDs are,
+based on the ``dataverse.pid.*.identifier-generation-style`` setting described above
+(random 6 character strings or custom unique identifiers through a stored
+procedure, pre-pended by any shoulder).
+
+The chart below shows examples from each possible combination of parameters
+from the two settings. ``dataverse.pid.*.identifier-generation-style`` can be either
+``randomString`` (the default) or ``storedProcGenerated`` and
+``dataverse.pid.*.datafile-pid-format`` can be either ``DEPENDENT`` (the default) or
+``INDEPENDENT``. In the examples below the "identifier" for the dataset is
+"TJCLKP" for ``randomString`` and "100001" for ``storedProcGenerated`` (when
+using sequential numerical values, as described in
+:ref:`dataverse.pid.*.identifier-generation-style` above), or "krby26qt" for
+``storedProcGenerated`` (when using base36 timestamps, as described in
+:ref:`dataverse.pid.*.identifier-generation-style` above).
+
++-----------------+---------------+----------------------+---------------------+
+| | randomString | storedProcGenerated | storedProcGenerated |
+| | | | |
+| | | (sequential numbers) | (base36 timestamps) |
++=================+===============+======================+=====================+
+| **DEPENDENT** | TJCLKP/MLGWJO | 100001/1 | krby26qt/1 |
++-----------------+---------------+----------------------+---------------------+
+| **INDEPENDENT** | MLGWJO | 100002 | krby27pz |
++-----------------+---------------+----------------------+---------------------+
+
+As seen above, in cases where ``dataverse.pid.*.identifier-generation-style`` is set to
+``storedProcGenerated`` and ``dataverse.pid.*.datafile-pid-format`` is set to ``DEPENDENT``,
+each file within a dataset will be assigned a number *within* that dataset
+starting with "1".
+
+Otherwise, if ``dataverse.pid.*.datafile-pid-format`` is set to ``INDEPENDENT``, each file
+within the dataset is assigned with a new PID which is the next available
+identifier provided from the database stored procedure. In our example:
+"100002" when using sequential numbers or "krby27pz" when using base36
+timestamps.
+
+.. _dataverse.pid.*.managed-list:
+
+dataverse.pid.*.managed-list
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. _dataverse.pid.*.excluded-list:
+
+dataverse.pid.*.excluded-list
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+With at least some PID services, it is possible for the authority(permission) to manage specific individual PIDs
+to be transferred between accounts. To handle these cases, the individual PIDs, written in the
+standard format, e.g. doi:10.5072/FK2ABCDEF can be added to the comma-separated ``managed`` or ``excluded`` list
+for a given provider. For entries on the ``managed- list``, Dataverse will assume this PID
+Provider/account can update the metadata and landing URL for the PID at the service provider
+(even though it does not match the provider's authority/shoulder settings). Conversely,
+Dataverse will assume that PIDs on the ``excluded-list`` cannot be managed/updated by this provider
+(even though they match the provider's authority/shoulder settings). These settings are optional
+with the default assumption that these lists are empty.
+
+.. _dataverse.pid.*.datacite:
+
+DataCite-specific Settings
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+dataverse.pid.*.datacite.mds-api-url
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+dataverse.pid.*.datacite.rest-api-url
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+dataverse.pid.*.datacite.username
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+dataverse.pid.*.datacite.password
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+PID Providers of type ``datacite`` require four additional parameters that define how the provider connects to DataCite.
+DataCite has two APIs that are used in Dataverse:
+
+The base URL of the `DataCite MDS API `_,
+used to mint and manage DOIs. Current valid values for ``dataverse.pid.*.datacite.mds-api-url`` are "https://mds.datacite.org" (production) and "https://mds.test.datacite.org" (testing, the default).
+
+The `DataCite REST API `_ is also used - :ref:`PIDs API ` information retrieval and :doc:`/admin/make-data-count`.
+Current valid values for ``dataverse.pid.*.datacite.rest-api-url`` are "https://api.datacite.org" (production) and "https://api.test.datacite.org" (testing, the default).
+
+DataCite uses `HTTP Basic authentication `_
+for `Fabrica `_ and their APIs. You need to provide
+the same credentials (``username``, ``password``) to Dataverse software to mint and manage DOIs for you.
+As noted above, you should use one of the more secure options for setting the password.
+
+
+.. _dataverse.pid.*.ezid:
+
+EZId-specific Settings
+^^^^^^^^^^^^^^^^^^^^^^
+
+dataverse.pid.*.ezid.api-url
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+dataverse.pid.*.ezid.username
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+dataverse.pid.*.ezid.password
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Note that use of `EZId `_ is limited primarily to University of California institutions. If you have an EZId account,
+you will need to configure the ``api-url`` and your account ``username`` and ``password``. As above, you should use one of the more secure
+options for setting the password.
+
+.. _dataverse.pid.*.permalink:
+
+PermaLink-specific Settings
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+dataverse.pid.*.permalink.base-url
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+dataverse.pid.*.permalink.separator
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+PermaLinks are a simple PID option intended for intranet and catalog use cases. They can be used without an external service or
+be configured with the ``base-url`` of a resolution service. PermaLinks also allow a custom ``separator`` to be used. (Note: when using multiple
+PermaLink providers, you should avoid ambiguous authority/separator/shoulder combinations that would result in the same overall prefix.)
+
+.. _dataverse.pid.*.handlenet:
+
+Handle-specific Settings
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+dataverse.pid.*.handlenet.index
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+dataverse.pid.*.handlenet.independent-service
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+dataverse.pid.*.handlenet.auth-handle
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+dataverse.pid.*.handlenet.key
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+dataverse.pid.*.handlenet.path
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+dataverse.pid.*.handlenet.passphrase
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Note: If you are **minting your own handles** and plan to set up your own handle service, please refer to `Handle.Net documentation `_.
+
+Configure your Handle.net ``index`` to be used registering new persistent
+identifiers. Defaults to ``300``.
+
+Indices are used to separate concerns within the Handle system. To add data to
+an index, authentication is mandatory. See also chapter 1.4 "Authentication" of
+the `Handle.Net Technical Documentation `__
+
+Handle.Net servers use a public key authentication method where the public key
+is stored in a handle itself and the matching private key is provided from this
+file. Typically, the absolute path ends like ``handle/svr_1/admpriv.bin``.
+The key file may (and should) be encrypted with a passphrase (used for
+encryption with AES-128). See
+also chapter 1.4 "Authentication" of the `Handle.Net Technical Documentation
+`__
+
+Provide an absolute ``key.path`` to a private key file authenticating requests to your
+Handle.Net server.
+
+Provide a ``key.passphrase`` to decrypt the private key file at ``dataverse.pid.*.handlenet.key.path``.
+
+Set ``independent-service`` to true if you want to use a Handle service which is setup to work 'independently' (No communication with the Global Handle Registry).
+By default this setting is false.
+
+Set ``auth-handle`` to / to be used on a global handle service when the public key is NOT stored in the default handle.
+This setting is optional. If the public key is, for instance, stored in handle: ``21.T12996/USER01``, ``auth-handle`` should be set to this value.
+
.. _pids-doi-configuration:
-Configuring Your Dataverse Installation for DOIs
-++++++++++++++++++++++++++++++++++++++++++++++++
+Backward-compatibility for Single PID Provider Installations
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-As explained above, by default your Dataverse installation attempts to register DOIs for each
-dataset and file under a test authority. You must apply for your own credentials.
+While using the PID Provider configuration settings described above is recommended, Dataverse installations
+only using a single PID Provider can use the settings below instead. In general, these legacy settings mirror
+those above except for not including a PID Provider id.
-Here are the configuration options for DOIs:
+Configuring Your Dataverse Installation for a Single DOI Provider
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Here are the configuration options for DOIs.:
**JVM Options for DataCite:**
@@ -258,8 +584,8 @@ this provider.
.. _pids-handle-configuration:
-Configuring Your Dataverse Installation for Handles
-+++++++++++++++++++++++++++++++++++++++++++++++++++
+Configuring Your Dataverse Installation for a Single Handle Provider
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here are the configuration options for handles. Most notably, you need to
change the ``:Protocol`` setting, as it defaults to DOI usage.
@@ -279,16 +605,12 @@ change the ``:Protocol`` setting, as it defaults to DOI usage.
- :ref:`:IndependentHandleService <:IndependentHandleService>` (optional)
- :ref:`:HandleAuthHandle <:HandleAuthHandle>` (optional)
-Note: If you are **minting your own handles** and plan to set up your own handle service, please refer to `Handle.Net documentation `_.
+Note: If you are **minting your own handles** and plan to set up your own handle service, please refer to `Handle.Net documentation `_.
.. _permalinks:
-Configuring Your Dataverse Installation for PermaLinks
-++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
-PermaLinks are a simple mechanism to provide persistent URLs for datasets and datafiles (if configured) that does not involve an external service providing metadata-based search services.
-They are potentially appropriate for Intranet use cases as well as in cases where Dataverse is being used as a catalog or holding duplicate copies of datasets where the authoritative copy already has a DOI or Handle.
-PermaLinks use the protocol "perma" (versus "doi" or "handle") and do not use a "/" character as a separator between the authority and shoulder. It is recommended to choose an alphanumeric value for authority that does not resemble that of DOIs (which are primarily numeric and start with "10." as in "10.5072") to avoid PermaLinks being mistaken for DOIs.
+Configuring Your Dataverse Installation for a Single PermaLink Provider
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here are the configuration options for PermaLinks:
@@ -305,6 +627,8 @@ Here are the configuration options for PermaLinks:
- :ref:`:DataFilePIDFormat <:DataFilePIDFormat>` (optional)
- :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` (optional, defaults to false)
+You must restart Payara after making changes to these settings.
+
.. _auth-modes:
Auth Modes: Local vs. Remote vs. Both
@@ -498,14 +822,18 @@ Logging & Slow Performance
.. _file-storage:
-File Storage: Using a Local Filesystem and/or Swift and/or Object Stores and/or Trusted Remote Stores
------------------------------------------------------------------------------------------------------
+File Storage
+------------
By default, a Dataverse installation stores all data files (files uploaded by end users) on the filesystem at ``/usr/local/payara6/glassfish/domains/domain1/files``. This path can vary based on answers you gave to the installer (see the :ref:`dataverse-installer` section of the Installation Guide) or afterward by reconfiguring the ``dataverse.files.\.directory`` JVM option described below.
-A Dataverse installation can alternately store files in a Swift or S3-compatible object store, and can now be configured to support multiple stores at once. With a multi-store configuration, the location for new files can be controlled on a per-Dataverse collection basis.
+A Dataverse installation can alternately store files in a Swift or S3-compatible object store, or on a Globus endpoint, and can now be configured to support multiple stores at once. With a multi-store configuration, the location for new files can be controlled on a per-Dataverse collection basis.
+
+A Dataverse installation may also be configured to reference some files (e.g. large and/or sensitive data) stored in a web or Globus accessible trusted remote store.
-A Dataverse installation may also be configured to reference some files (e.g. large and/or sensitive data) stored in a web-accessible trusted remote store.
+A Dataverse installation can be configured to allow out of band upload by setting the ``dataverse.files.\.upload-out-of-band`` JVM option to ``true``.
+By default, Dataverse supports uploading files via the :ref:`add-file-api`. With S3 stores, a direct upload process can be enabled to allow sending the file directly to the S3 store (without any intermediate copies on the Dataverse server).
+With the upload-out-of-band option enabled, it is also possible for file upload to be managed manually or via third-party tools, with the :ref:`Adding the Uploaded file to the Dataset ` API call (described in the :doc:`/developers/s3-direct-upload-api` page) used to add metadata and inform Dataverse that a new file has been added to the relevant store.
The following sections describe how to set up various types of stores and how to configure for multiple stores.
@@ -534,6 +862,27 @@ If you wish to change which store is used by default, you'll need to delete the
It is also possible to set maximum file upload size limits per store. See the :ref:`:MaxFileUploadSizeInBytes` setting below.
+.. _labels-file-stores:
+
+Labels for File Stores
+++++++++++++++++++++++
+
+If you find yourself adding many file stores with various configurations such as per-file limits and direct upload, you might find it helpful to make the label descriptive.
+
+For example, instead of simply labeling an S3 store as "S3"...
+
+.. code-block:: none
+
+ ./asadmin create-jvm-options "\-Ddataverse.files.s3xl.label=S3"
+
+... you might want to include some extra information such as the example below.
+
+.. code-block:: none
+
+ ./asadmin create-jvm-options "\-Ddataverse.files.s3xl.label=S3XL, Filesize limit: 100GB, direct-upload"
+
+Please keep in mind that the UI will only show so many characters, so labels are best kept short.
+
.. _storage-files-dir:
File Storage
@@ -550,7 +899,7 @@ Multiple file stores should specify different directories (which would nominally
Swift Storage
+++++++++++++
-Rather than storing data files on the filesystem, you can opt for an experimental setup with a `Swift Object Storage `_ backend. Each dataset that users create gets a corresponding "container" on the Swift side, and each data file is saved as a file within that container.
+Rather than storing data files on the filesystem, you can opt for an experimental setup with a `Swift Object Storage `_ backend. Each dataset that users create gets a corresponding "container" on the Swift side, and each data file is saved as a file within that container.
**In order to configure a Swift installation,** you need to complete these steps to properly modify the JVM options:
@@ -566,7 +915,7 @@ First, run all the following create commands with your Swift endpoint informatio
./asadmin $ASADMIN_OPTS create-jvm-options "\-Ddataverse.files..username.endpoint1=your-username"
./asadmin $ASADMIN_OPTS create-jvm-options "\-Ddataverse.files..endpoint.endpoint1=your-swift-endpoint"
-``auth_type`` can either be ``keystone``, ``keystone_v3``, or it will assumed to be ``basic``. ``auth_url`` should be your keystone authentication URL which includes the tokens (e.g. for keystone, ``https://openstack.example.edu:35357/v2.0/tokens`` and for keystone_v3, ``https://openstack.example.edu:35357/v3/auth/tokens``). ``swift_endpoint`` is a URL that looks something like ``http://rdgw.swift.example.org/swift/v1``.
+``auth_type`` can either be ``keystone``, ``keystone_v3``, or it will assumed to be ``basic``. ``auth_url`` should be your keystone authentication URL which includes the tokens (e.g. for keystone, ``https://openstack.example.edu:35357/v2.0/tokens`` and for keystone_v3, ``https://openstack.example.edu:35357/v3/auth/tokens``). ``swift_endpoint`` is a URL that looks something like ``https://rdgw.swift.example.org/swift/v1``.
Then create a password alias by running (without changes):
@@ -662,7 +1011,7 @@ You'll need an AWS account with an associated S3 bucket for your installation to
**Make note** of the **bucket's name** and the **region** its data is hosted in.
To **create a user** with full S3 access and nothing more for security reasons, we recommend using IAM
-(Identity and Access Management). See `IAM User Guide `_
+(Identity and Access Management). See `IAM User Guide `_
for more info on this process.
To use programmatic access, **Generate the user keys** needed for a Dataverse installation afterwards by clicking on the created user.
@@ -733,7 +1082,7 @@ Additional profiles can be added to these files by appending the relevant inform
aws_access_key_id =
aws_secret_access_key =
-Place these two files in a folder named ``.aws`` under the home directory for the user running your Dataverse Installation on Payara. (From the `AWS Command Line Interface Documentation `_:
+Place these two files in a folder named ``.aws`` under the home directory for the user running your Dataverse Installation on Payara. (From the `AWS Command Line Interface Documentation `_:
"In order to separate credentials from less sensitive options, region and output format are stored in a separate file
named config in the same folder")
@@ -799,27 +1148,28 @@ List of S3 Storage Options
.. table::
:align: left
- =========================================== ================== ========================================================================== =============
- JVM Option Value Description Default value
- =========================================== ================== ========================================================================== =============
- dataverse.files.storage-driver-id Enable as the default storage driver. ``file``
- dataverse.files..type ``s3`` **Required** to mark this storage as S3 based. (none)
- dataverse.files..label > **Required** label to be shown in the UI for this storage (none)
- dataverse.files..bucket-name > The bucket name. See above. (none)
- dataverse.files..download-redirect ``true``/``false`` Enable direct download or proxy through Dataverse. ``false``
- dataverse.files..upload-redirect ``true``/``false`` Enable direct upload of files added to a dataset to the S3 store. ``false``
- dataverse.files..ingestsizelimit Maximum size of directupload files that should be ingested (none)
- dataverse.files..url-expiration-minutes > If direct uploads/downloads: time until links expire. Optional. 60
- dataverse.files..min-part-size > Multipart direct uploads will occur for files larger than this. Optional. ``1024**3``
- dataverse.files..custom-endpoint-url > Use custom S3 endpoint. Needs URL either with or without protocol. (none)
- dataverse.files..custom-endpoint-region > Only used when using custom endpoint. Optional. ``dataverse``
- dataverse.files..profile > Allows the use of AWS profiles for storage spanning multiple AWS accounts. (none)
- dataverse.files..proxy-url > URL of a proxy protecting the S3 store. Optional. (none)
- dataverse.files..path-style-access ``true``/``false`` Use path style buckets instead of subdomains. Optional. ``false``
- dataverse.files..payload-signing ``true``/``false`` Enable payload signing. Optional ``false``
- dataverse.files..chunked-encoding ``true``/``false`` Disable chunked encoding. Optional ``true``
- dataverse.files..connection-pool-size > The maximum number of open connections to the S3 server ``256``
- =========================================== ================== ========================================================================== =============
+ =========================================== ================== =================================================================================== =============
+ JVM Option Value Description Default value
+ =========================================== ================== =================================================================================== =============
+ dataverse.files.storage-driver-id Enable as the default storage driver. ``file``
+ dataverse.files..type ``s3`` **Required** to mark this storage as S3 based. (none)
+ dataverse.files..label > **Required** label to be shown in the UI for this storage (none)
+ dataverse.files..bucket-name > The bucket name. See above. (none)
+ dataverse.files..download-redirect ``true``/``false`` Enable direct download or proxy through Dataverse. ``false``
+ dataverse.files..upload-redirect ``true``/``false`` Enable direct upload of files added to a dataset in the S3 store. ``false``
+ dataverse.files..upload-out-of-band ``true``/``false`` Allow upload of files by out-of-band methods (using some tool other than Dataverse) ``false``
+ dataverse.files..ingestsizelimit Maximum size of directupload files that should be ingested (none)
+ dataverse.files..url-expiration-minutes > If direct uploads/downloads: time until links expire. Optional. 60
+ dataverse.files..min-part-size > Multipart direct uploads will occur for files larger than this. Optional. ``1024**3``
+ dataverse.files..custom-endpoint-url > Use custom S3 endpoint. Needs URL either with or without protocol. (none)
+ dataverse.files..custom-endpoint-region > Only used when using custom endpoint. Optional. ``dataverse``
+ dataverse.files..profile > Allows the use of AWS profiles for storage spanning multiple AWS accounts. (none)
+ dataverse.files..proxy-url > URL of a proxy protecting the S3 store. Optional. (none)
+ dataverse.files..path-style-access ``true``/``false`` Use path style buckets instead of subdomains. Optional. ``false``
+ dataverse.files..payload-signing ``true``/``false`` Enable payload signing. Optional ``false``
+ dataverse.files..chunked-encoding ``true``/``false`` Disable chunked encoding. Optional ``true``
+ dataverse.files..connection-pool-size > The maximum number of open connections to the S3 server ``256``
+ =========================================== ================== =================================================================================== =============
.. table::
:align: left
@@ -859,7 +1209,7 @@ You may provide the values for these via any `supported MicroProfile Config API
Reported Working S3-Compatible Storage
######################################
-`Minio v2018-09-12 `_
+`Minio v2018-09-12 `_
Set ``dataverse.files..path-style-access=true``, as Minio works path-based. Works pretty smooth, easy to setup.
**Can be used for quick testing, too:** just use the example values above. Uses the public (read: unsecure and
possibly slow) https://play.minio.io:9000 service.
@@ -952,7 +1302,7 @@ Once you have configured a trusted remote store, you can point your users to the
dataverse.files..type ``remote`` **Required** to mark this storage as remote. (none)
dataverse.files..label > **Required** label to be shown in the UI for this storage. (none)
dataverse.files..base-url > **Required** All files must have URLs of the form /* . (none)
- dataverse.files..base-store > **Optional** The id of a base store (of type file, s3, or swift). (the default store)
+ dataverse.files..base-store > **Required** The id of a base store (of type file, s3, or swift). (the default store)
dataverse.files..download-redirect ``true``/``false`` Enable direct download (should usually be true). ``false``
dataverse.files..secret-key > A key used to sign download requests sent to the remote store. Optional. (none)
dataverse.files..url-expiration-minutes > If direct downloads and using signing: time until links expire. Optional. 60
@@ -961,6 +1311,47 @@ Once you have configured a trusted remote store, you can point your users to the
=========================================== ================== ========================================================================== ===================
+.. _globus-storage:
+
+Globus Storage
+++++++++++++++
+
+Globus stores allow Dataverse to manage files stored in Globus endpoints or to reference files in remote Globus endpoints, with users leveraging Globus to transfer files to/from Dataverse (rather than using HTTP/HTTPS).
+See :doc:`/developers/big-data-support` for additional information on how to use a globus store. Consult the `Globus documentation