404
+ +Page not found
+ + +diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 0000000..e69de29 diff --git a/404.html b/404.html new file mode 100644 index 0000000..62e7197 --- /dev/null +++ b/404.html @@ -0,0 +1,114 @@ + + +
+ + + + +Page not found
+ + +This module is a component in the DANS Data Station Architecture.
+ +This page contains information for developers about how to contribute to this project.
+The project uses poetry for a build system. If you don't have it installed yet, install poetry for +your +user with:
+python3 -m pip install --user poetry
+
+After poetry
is installed, change directory to the project root and execute:
poetry install
+
+This will install the project and its dependencies in the poetry virtual environment for the project.
+After poetry install
the commands provided by the module can be tested by prepending poetry run
to the command line,
+e.g.:
poetry run dv-banner list
+
+The mapping from command to the function that implements it is defined in pyproject.toml
, in the tool.poetry.scripts
+section.
For more information about how to use poetry, see the poetry documentation.
+Poetry provides no support for debugging. To debug a command in PyCharm, create a new run configuration by
+right-clicking on the corresponding entry-point script and selecting "Modify Run Configuration". In the dialog that
+appears, you can specify the command line parameters. After saving the configuration you can start it by clicking the
+green arrow or bug icon in the toolbar. (This should all be familiar to you if you have used an IDE before, but it is
+different from the way we work in Java projects, where the program is started with the scripts in dans-dev-tools
and
+you then attach a debugger.)
Working directory
+By default PyCharm will use the directory of the entry-point script as the working directory. This means that
+configuration file (.dans-datastation-tools.yml
) in the root of the project will not be found. Instead, a
+new configuration file will be created in the directory of the entry-point script. This may be confusing if you are
+not aware of it, because poetry
will still use the configuration file in the root of the project. To avoid this,
+you can change the working directory in the run configuration to the root of the project.
On occasion, you may want to add to this documentation site. It is important to test that your changes look good before
+you commit them. To do this, you can use the start-mkdocs.sh
script in the dans-dev-tools
project.
+(See dans-dev-tools: start-mkdocs.sh
.)
start-mkdocs.sh
+
+Edit the files in docs
and browse to or refresh http://127.0.0.1:8080 to view your changes.
Note that here we are using a separate virtual environment. This way we don't get the dependencies
+for dans-datastation-tools
and the doc site confused.
The user interface of dans-datastation-tools consists of a set of commands that can be executed from the command line. +In order to make the user interface consistent, the following guidelines should be followed.
+Commands target a specific object type, e.g. a dataset, a file, and perform an action on that object, e.g. create, read, +etc. The general pattern for a command name is:
+<object-type>-<action>
+
+For example:
+dv-dataset-publish # object-type: dv-dataset, action: publish
+
+Subcommands
+In some cases the action is a subcommand. This is an inconsistency at the moment. We may want to change this in the +future, either by making all actions subcommands or by making all actions regular commands.
+The following object types are currently supported: dans-bag
, dv-dataset
, dv-user
, dv-banner
, ingest-flow
.
+This list may be extended in the future.
There are two types of parameters:
+The input that identifies the object to perform the action on is always a positional parameter. This can for example be +a dataset identifier, a file path, etc., or file containing a list of such identifiers to be processed in batch mode.
+Named parameters are used to modify the action or provide additional input. They can be optional or required.
+--output-file
should be included, with the
+ default value -
(meaning the standard output).--report-file
parameter, with the default value -
(meaning the standard output).Commands each have their own entry-point script in the root package datastation
. They must all have a main
+function and a __main__
section that calls that function. The latter is needed so that you can debug the command in
+PyCharm.
The main
function is mapped to the command name in pyproject.toml
, in the tool.poetry.scripts
section. The name of
+the entry-point script is the same as the command name, with -
replaced by _
and a .py
extension added. For
+example, the dv-dataset-publish
command is implemented (by the main
function) in the dv_dataset_publish.py
script.
The entry-point scripts are not meant to be imported by other modules. Their only purpose is to provide a command-line +interface and should do as little else as possible.
+Most commands talk to a remote service, e.g. a Dataverse server. The code that talks to the remote service is in a
+dedicated subpackage, e.g. dataverse
for the Dataverse server. There is also a common
subpackage that common
+functionality for all commands and utilities that are not specific to a remote service.
There is one configuration file which is in YAML format and contains a section for each targeted service. The objects +that need a specific section take a dictionary with only that section as a parameter, e.g.:
+from datastation.dataverse.dataverse_client import DataverseClient
+from datastation.common.config import init
+
+config = init()
+dataverse_client = DataverseClient(config['dataverse'])
+
+Note that DataverseClient does not know about the other sections in the configuration file. On the other hand it does +not receive each individual parameter as a separate argument either. This is to avoid having to transfer all the +parameters to the constructor of the client. This style is intended to strike a balance between the two extremes.
+Format the code with PyCharm's code formatter.
+Unit tests should go under src/tests
. The test files should be named test_<module>.py
and the test classes should be
+named Test<Module/Class/Function>
. There can be multiple test classes in a test file.
Use the following syntax for string interpolation:
+name = "John"
+f"Hello {name}"
+
+Do not use the old %
syntax, or the .format()
method or string concatenation. Also avoid concatenating strings with
++
or +=
. Use string interpolation instead.
Command line utilities for Data Station application management
+pip3 install dans-datastation-tools
+
+# To find out what a command does, use <command> --help, e.g.:
+# dans-bag-validate --help
+
+# Below is a partial list of commands. Commands are grouped by the application they target.
+# Commands in a group have the same prefix so you can use command line completion to find
+# all commands in a group. For example, to find all commands targeting Dataverse,
+# type `dv-` and press the tab key twice. This will list all commands starting with `dv-`.
+# Some commands have subcommands, you can find the subcommands with the --help option.
+
+# DANS bag validation
+dans-bag-validate
+
+# Commands targeting Dataverse start with dv-, e.g.:
+
+# Dataverse banner management
+dv-banner
+
+# Commands targeting Dataverse datasets start with dv-dataset-, e.g.:
+dv-dataset-delete-draft
+dv-dataset-destroy
+dv-dataset-destroy-migration-placeholder
+dv-dataset-find-by-role-assignment
+
+# Dataverse user management
+dv-user-import
+
+# Ingest flow management
+ingest-flow
+
+This module contains a variety of commands to facilitate DANS Data Station management. To find out what a command does,
+use <command> --help
. See the comments in the SYNOPSIS section above to get an idea of what commands are available.
Some of the commands targeting Dataverse datasets can be used to process a large number of datasets in a batch. These
+commands take a trailing argument pid_or_pids_file
. As the name suggests, this argument can be either a single PID or
+a file containing a list of PIDs. The file should contain one PID per line. These commands usually have the following
+options:
--wait-between-items
: the number of seconds to wait between processing each dataset. This is useful to avoid
+ overloading the server.--fail-fast
: fail on the first error. If this option is not given, the command will continue processing the
+ remaining datasets after an error has occurred.--report-file
: the name of a CSV file in which a summary of the results will be written. The file will be created
+ if it does not exist, otherwise it will be overwritten.dans-bag-validate
with jq¶The JSON output of this command can be queried with jq
. This tool has a very good
+manual. However, to get you started, here are some example
+queries:
dans-bag-validate <target> > ~/results.json
+
+# Print only the bag location and the violations
+cat results.json
+
+# Count the number of bags that are non-compliant
+cat results.json | jq '[.[] | select(."Is compliant" == false)] | length'
+
+# Get the paths to the *deposits* containing valid bags. Note that "Bag location" is one level too deep, that's why we need to remove the
+# last path element. The detour through to_entries seems necessary to get rid of the array structure around the results.
+cat results.json | select(."Is compliant" == true)] | map(."Bag location") | map(split("/") | .[:-1] | join("/")) | to_entries[] | "\(.value)"
+
+This is the recommended way, when installing on your own machine.
+pip3 install --user dans-datastation-tools
+
+You may have to add the directory where pip3
installs the command to the PATH
manually.
This is useful when installing on a server where the commands need to be shared by multiple users.
+sudo pip3 install dans-datastation-tools
+
+The configuration file is called .dans-datastation-tools.yml
. Each command starts by looking for this file in the
+current working directory and then in the user's home directory. If it is not found in either location it is
+instantiated with some default and placeholder values in the current directory. It is recommended that you move this
+file to your home directory. Using the configuration file from the current working directory is mainly useful for
+development.
For the available configuration options and their meaning, see the explanatory comments in the configuration file +itself.
+ +' + escapeHtml(summary) +'
' + noResultsText + '
'); + } +} + +function doSearch () { + var query = document.getElementById('mkdocs-search-query').value; + if (query.length > min_search_length) { + if (!window.Worker) { + displayResults(search(query)); + } else { + searchWorker.postMessage({query: query}); + } + } else { + // Clear results for short queries + displayResults([]); + } +} + +function initSearch () { + var search_input = document.getElementById('mkdocs-search-query'); + if (search_input) { + search_input.addEventListener("keyup", doSearch); + } + var term = getSearchTermFromLocation(); + if (term) { + search_input.value = term; + doSearch(); + } +} + +function onWorkerMessage (e) { + if (e.data.allowSearch) { + initSearch(); + } else if (e.data.results) { + var results = e.data.results; + displayResults(results); + } else if (e.data.config) { + min_search_length = e.data.config.min_search_length-1; + } +} + +if (!window.Worker) { + console.log('Web Worker API not supported'); + // load index in main thread + $.getScript(joinUrl(base_url, "search/worker.js")).done(function () { + console.log('Loaded worker'); + init(); + window.postMessage = function (msg) { + onWorkerMessage({data: msg}); + }; + }).fail(function (jqxhr, settings, exception) { + console.error('Could not load worker.js'); + }); +} else { + // Wrap search in a web worker + var searchWorker = new Worker(joinUrl(base_url, "search/worker.js")); + searchWorker.postMessage({init: true}); + searchWorker.onmessage = onWorkerMessage; +} diff --git a/search/search_index.json b/search/search_index.json new file mode 100644 index 0000000..e45fc4f --- /dev/null +++ b/search/search_index.json @@ -0,0 +1 @@ +{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"dans-datastation-tools \u00b6 Command line utilities for Data Station application management SYNOPSIS \u00b6 pip3 install dans-datastation-tools # To find out what a command does, use