Skip to content

Commit

Permalink
Merge branch 'develop' into GlobusDownload
Browse files Browse the repository at this point in the history
  • Loading branch information
landreev committed Apr 19, 2024
2 parents e546989 + 447d576 commit af4f918
Show file tree
Hide file tree
Showing 67 changed files with 1,325 additions and 293 deletions.
101 changes: 101 additions & 0 deletions .github/workflows/maven_cache_management.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
name: Maven Cache Management

on:
# Every push to develop should trigger cache rejuvenation (dependencies might have changed)
push:
branches:
- develop
# According to https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy
# all caches are deleted after 7 days of no access. Make sure we rejuvenate every 7 days to keep it available.
schedule:
- cron: '23 2 * * 0' # Run for 'develop' every Sunday at 02:23 UTC (3:23 CET, 21:23 ET)
# Enable manual cache management
workflow_dispatch:
# Delete branch caches once a PR is merged
pull_request:
types:
- closed

env:
COMMON_CACHE_KEY: "dataverse-maven-cache"
COMMON_CACHE_PATH: "~/.m2/repository"

jobs:
seed:
name: Drop and Re-Seed Local Repository
runs-on: ubuntu-latest
if: ${{ github.event_name != 'pull_request' }}
permissions:
# Write permission needed to delete caches
# See also: https://docs.github.com/en/rest/actions/cache?apiVersion=2022-11-28#delete-a-github-actions-cache-for-a-repository-using-a-cache-id
actions: write
contents: read
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Determine Java version from Parent POM
run: echo "JAVA_VERSION=$(grep '<target.java.version>' modules/dataverse-parent/pom.xml | cut -f2 -d'>' | cut -f1 -d'<')" >> ${GITHUB_ENV}
- name: Set up JDK ${{ env.JAVA_VERSION }}
uses: actions/setup-java@v4
with:
java-version: ${{ env.JAVA_VERSION }}
distribution: temurin
- name: Seed common cache
run: |
mvn -B -f modules/dataverse-parent dependency:go-offline dependency:resolve-plugins
# This non-obvious order is due to the fact that the download via Maven above will take a very long time (7-8 min).
# Jobs should not be left without a cache. Deleting and saving in one go leaves only a small chance for a cache miss.
- name: Drop common cache
run: |
gh extension install actions/gh-actions-cache
echo "🛒 Fetching list of cache keys"
cacheKeys=$(gh actions-cache list -R ${{ github.repository }} -B develop | cut -f 1 )
## Setting this to not fail the workflow while deleting cache keys.
set +e
echo "🗑️ Deleting caches..."
for cacheKey in $cacheKeys
do
gh actions-cache delete $cacheKey -R ${{ github.repository }} -B develop --confirm
done
echo "✅ Done"
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Save the common cache
uses: actions/cache@v4
with:
path: ${{ env.COMMON_CACHE_PATH }}
key: ${{ env.COMMON_CACHE_KEY }}
enableCrossOsArchive: true

# Let's delete feature branch caches once their PR is merged - we only have 10 GB of space before eviction kicks in
deplete:
name: Deplete feature branch caches
runs-on: ubuntu-latest
if: ${{ github.event_name == 'pull_request' }}
permissions:
# `actions:write` permission is required to delete caches
# See also: https://docs.github.com/en/rest/actions/cache?apiVersion=2022-11-28#delete-a-github-actions-cache-for-a-repository-using-a-cache-id
actions: write
contents: read
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Cleanup caches
run: |
gh extension install actions/gh-actions-cache
BRANCH=refs/pull/${{ github.event.pull_request.number }}/merge
echo "🛒 Fetching list of cache keys"
cacheKeysForPR=$(gh actions-cache list -R ${{ github.repository }} -B $BRANCH | cut -f 1 )
## Setting this to not fail the workflow while deleting cache keys.
set +e
echo "🗑️ Deleting caches..."
for cacheKey in $cacheKeysForPR
do
gh actions-cache delete $cacheKey -R ${{ github.repository }} -B $BRANCH --confirm
done
echo "✅ Done"
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
5 changes: 5 additions & 0 deletions doc/release-notes/10022_upload_redirect_without_tagging.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
If your S3 store does not support tagging and gives an error if you configure direct uploads, you can disable the tagging by using the ``dataverse.files.<id>.disable-tagging`` JVM option. For more details see https://dataverse-guide--10029.org.readthedocs.build/en/10029/developers/big-data-support.html#s3-tags #10022 and #10029.

## New config options

- dataverse.files.<id>.disable-tagging
1 change: 1 addition & 0 deletions doc/release-notes/10242-add-feature-dv-api
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
New api endpoints have been added to allow you to add or remove featured collections from a dataverse collection.
5 changes: 5 additions & 0 deletions doc/release-notes/10316_cvoc_http_headers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
You are now able to add HTTP request headers required by the External Vocabulary Services you are implementing.

A combined documentation can be found on pull request [#10404](https://github.com/IQSS/dataverse/pull/10404).

For more information, see issue [#10316](https://github.com/IQSS/dataverse/issues/10316) and pull request [gddc/dataverse-external-vocab-support#19](https://github.com/gdcc/dataverse-external-vocab-support/pull/19).
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
The API endpoint for getting the Dataset version has been extended to include latestVersionPublishingStatus.
3 changes: 3 additions & 0 deletions doc/release-notes/10339-workflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
The computational workflow metadata block has been updated to present a clickable link for the External Code Repository URL field.

Release notes should include the usual instructions, for those who have installed this optional block, to update the computational_workflow block. (PR#10441)
6 changes: 6 additions & 0 deletions doc/release-notes/10389-metadatablocks-api-extension.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
New optional query parameters added to ``api/metadatablocks`` and ``api/dataverses/{id}/metadatablocks`` endpoints:

- ``returnDatasetFieldTypes``: Whether or not to return the dataset field types present in each metadata block. If not set, the default value is false.
- ``onlyDisplayedOnCreate``: Whether or not to return only the metadata blocks that are displayed on dataset creation. If ``returnDatasetFieldTypes`` is true, only the dataset field types shown on dataset creation will be returned within each metadata block. If not set, the default value is false.

Added new ``displayOnCreate`` field to the MetadataBlock and DatasetFieldType payloads.
3 changes: 3 additions & 0 deletions doc/release-notes/10464-add-name-harvesting-client-facet.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
The Metadata Source facet has been updated to show the name of the harvesting client rather than grouping all such datasets under 'harvested'

TODO: for the v6.13 release note: Please add a full re-index using http://localhost:8080/api/admin/index to the upgrade instructions.
1 change: 1 addition & 0 deletions doc/release-notes/10468-doc-datalad-integration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
DataLad has been integrated with Dataverse. For more information, see https://dataverse-guide--10470.org.readthedocs.build/en/10470/admin/integrations.html#datalad
1 change: 1 addition & 0 deletions doc/release-notes/9887-new-superuser-status-endpoint.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
The existing API endpoint for toggling the superuser status of a user has been deprecated in favor of a new API endpoint that allows you to explicitly and idempotently set the status as true or false. For details, see [the guides](https://dataverse-guide--10440.org.readthedocs.build/en/10440/api/native-api.html), #9887 and #10440.
26 changes: 26 additions & 0 deletions doc/sphinx-guides/source/admin/integrations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,32 @@ Globus transfer uses an efficient transfer mechanism and has additional features
Users can transfer files via `Globus <https://www.globus.org>`_ into and out of datasets, or reference files on a remote Globus endpoint, when their Dataverse installation is configured to use a Globus accessible store(s)
and a community-developed `dataverse-globus <https://github.com/scholarsportal/dataverse-globus>`_ app has been properly installed and configured.

DataLad
+++++++

`DataLad`_ is a free and open source decentralized data management system that is built on `git`_
and `git-annex`_ and provides a unified interface for version control, deposition, content retrieval,
provenance tracking, reproducible execution, and further collaborative management of distributed and
arbitrarily large datasets.

If your dataset is structured as a `DataLad dataset`_ and you have a local DataLad installation,
the `datalad-dataverse`_ extension package provides interoperability with Dataverse for the purpose
of depositing DataLad datasets to and retrieving DataLad datasets from Dataverse instances, together
with full version history.

For further information, visit the ``datalad-dataverse`` extension's `documentation page`_, see the
`quickstart`_ for installation details, or follow the step-by-step `tutorial`_ to get hands-on
experience.

.. _DataLad: https://www.datalad.org
.. _git: https://git-scm.com
.. _git-annex: https://git-annex.branchable.com
.. _DataLad dataset: https://handbook.datalad.org/en/latest/basics/basics-datasets.html
.. _datalad-dataverse: https://github.com/datalad/datalad-dataverse
.. _documentation page: https://docs.datalad.org/projects/dataverse/en/latest/index.html
.. _quickstart: https://docs.datalad.org/projects/dataverse/en/latest/settingup.html
.. _tutorial: https://docs.datalad.org/projects/dataverse/en/latest/tutorial.html


Embedding Data on Websites
--------------------------
Expand Down
6 changes: 4 additions & 2 deletions doc/sphinx-guides/source/admin/metadatacustomization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -552,6 +552,8 @@ Great care must be taken when reloading a metadata block. Matching is done on fi

The ability to reload metadata blocks means that SQL update scripts don't need to be written for these changes. See also the :doc:`/developers/sql-upgrade-scripts` section of the Developer Guide.

.. _using-external-vocabulary-services:

Using External Vocabulary Services
----------------------------------

Expand All @@ -577,9 +579,9 @@ In general, the external vocabulary support mechanism may be a better choice for
The specifics of the user interface for entering/selecting a vocabulary term and how that term is then displayed are managed by third-party Javascripts. The initial Javascripts that have been created provide auto-completion, displaying a list of choices that match what the user has typed so far, but other interfaces, such as displaying a tree of options for a hierarchical vocabulary, are possible.
Similarly, existing scripts do relatively simple things for displaying a term - showing the term's name in the appropriate language and providing a link to an external URL with more information, but more sophisticated displays are possible.

Scripts supporting use of vocabularies from services supporting the SKOMOS protocol (see https://skosmos.org) and retrieving ORCIDs (from https://orcid.org) are available https://github.com/gdcc/dataverse-external-vocab-support. (Custom scripts can also be used and community members are encouraged to share new scripts through the dataverse-external-vocab-support repository.)
Scripts supporting use of vocabularies from services supporting the SKOMOS protocol (see https://skosmos.org), retrieving ORCIDs (from https://orcid.org), and using ROR (https://ror.org/) are available https://github.com/gdcc/dataverse-external-vocab-support. (Custom scripts can also be used and community members are encouraged to share new scripts through the dataverse-external-vocab-support repository.)

Configuration involves specifying which fields are to be mapped, whether free-text entries are allowed, which vocabulary(ies) should be used, what languages those vocabulary(ies) are available in, and several service protocol and service instance specific parameters.
Configuration involves specifying which fields are to be mapped, whether free-text entries are allowed, which vocabulary(ies) should be used, what languages those vocabulary(ies) are available in, and several service protocol and service instance specific parameters, including the ability to send HTTP headers on calls to the service.
These are all defined in the :ref:`:CVocConf <:CVocConf>` setting as a JSON array. Details about the required elements as well as example JSON arrays are available at https://github.com/gdcc/dataverse-external-vocab-support, along with an example metadata block that can be used for testing.
The scripts required can be hosted locally or retrieved dynamically from https://gdcc.github.io/ (similar to how dataverse-previewers work).

Expand Down
12 changes: 11 additions & 1 deletion doc/sphinx-guides/source/api/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,11 @@ This API changelog is experimental and we would love feedback on its usefulness.
:local:
:depth: 1

v6.3
----

- **/api/admin/superuser/{identifier}**: The POST endpoint that toggles superuser status has been deprecated in favor of a new PUT endpoint that allows you to specify true or false. See :ref:`set-superuser-status`.

v6.2
----

Expand All @@ -24,4 +29,9 @@ v6.1
v6.0
----

- **/api/access/datafile**: When a null or invalid API token is provided to download a public (non-restricted) file with this API call, it will result on a ``401`` error response. Previously, the download was allowed (``200`` response). Please note that we noticed this change sometime between 5.9 and 6.0. If you can help us pinpoint the exact version (or commit!), please get in touch. See :doc:`dataaccess`.
- **/api/access/datafile**: When a null or invalid API token is provided to download a public (non-restricted) file with this API call, it will result on a ``401`` error response. Previously, the download was allowed (``200`` response). Please note that we noticed this change sometime between 5.9 and 6.0. If you can help us pinpoint the exact version (or commit!), please get in touch. See :doc:`dataaccess`.

v5.6
----

- **/api/dataverses/$PARENT/datasets**: The "create dataset" API endpoint now requires the header ``Content-type:application/json`` to be passed. The error can be confusing, saying something about validation, such as ``'{"status":"ERROR","message":"Validation Failed: Title is required. (Invalid value:edu.harvard.iq.dataverse.DatasetField[ id=null ])...``. See :ref:`create-dataset-command`.
Loading

0 comments on commit af4f918

Please sign in to comment.