Skip to content

Using the Pusher

jamclar edited this page May 27, 2022 · 3 revisions

The program remains the same for each user, but will run with different user settings, depending on who is using it.

The user will first install the program using the installation instructions provided In the readme file which can be found here: Readme

Once installed, the program is ran entirely from the command line, where the user will initialise the program along with the command line options.

The program requires input from the user in two forms, the command line and the configuration file (detailed here: About the configuration file) The command line options are the ones which can generally be expected to change in the short term, whereas the configuration file contains data which is not expected to change in the short term.

Command line options Deployment ID This is the Deployment ID for the glider files that are being sent to the BODC Archive API. The deployment ID is used in two places within the program. Firstly the holdings endpoint is called which returns all the filenames for a specific deployment. When new files are being sent, the deployment ID is used to show the user how many files are already there, and also to check to see if they are about to add any duplicates. It is also used to send a new file to the API, as the deployment ID routes the file to the correct location within the archive.

Data Directory The data directory is the place where the actual glider files are located within the machine that is running the pusher. The program will search through this directory in either a recursive or non recursive way (See bullet point 6). The data directory should be a proper path, for example: '/users/documents/glider-folder/folder1/'. The program will not work correctly if a valid path is not specified.

Config File The config file referenced earlier needs to be referenced in the command line options. This is because the program will go to the location passed in by the user and will then attempt to read the config file and set up the options for the rest of the program. The path to the config file should be valid, for example: '/users/documents/config/config_file.json'.

Production The users have the option of sending files to the production archive, or the testing archive, and this is determined by the production flag. The flag can be used in two ways --production (This will run the program sending files to the production archive) --non-production (This will run the program sending files to the testing archive) By default, it is set to --non-production, so if the user omits this flag, the test archive will be the default place to send files.

Dry Run A dry run is a feature of the program which simulates a full file send, and can be used to give the end user a clear picture of what files will go to the archive in a real run of the program. Running in dry run mode will do the following:

  • Inform the user of the dry run mode
  • Call the API to show how many files are currently in the archive for the particular deployment ID
  • Search the Data Directory for glider files, and indicate if the glider file currently exists in the archive - If the file does not currently exist in the archive it will indicate that it will be sent - Once all files are processed, it will show a summary of how many files would have been sent, along with a summary of any duplicates.

The flag can be used in two ways: --dry-run (Performs a dry run) --no-dry-run (Performs a real send of files) By default it is set to --no-dry-run, so if the user omits this flag a full send of files will take place.

Recursive The recursive option provides an extra layer of functionality when choosing glider files to send. Some users may have a folder which contains sub folders, which then contain glider files. Other users may have folders and sub folders but may not want the sub folders to be searched through. The recursive option allows for 2 modes of operation --recursive (In this mode the data directory will be searched for files, and any subfolders will also be searched) --non-recursive (In this mode the data directory will be searched for files, but any subfolders will be ignored) By default it is set to --recursive so the data directory will be searched for files along with any sub directories.

Once the command line options are established, the program can be ran like the example below:

bodc-archive-pusher start --deployment-id "123" --data-directory "/data/dep-123" --config-file "/data/config.json" --production --no-dry-run

The example program run shown above is saying the following Run the program with '123' as the deployment id The 'start' indicates that we are starting a send of files The directory where the glider files are is  "/data/dep-123" The config file is located at "/data/config.json" The program is sending to the production archive API The program is doing a real send of files, not a dry run

The user will expect to see an output on the screen which is the authentication step that must be completed before a send of files is attempted. Once the authentication is complete then the program will proceed to the next step. The user will be greeted with 2 pieces of information, and a timeout warning of 900 seconds.

A URL to use for authentication (This is powered by Auth0, and the link will take them to a webpage asking for a verification code) A code to use for verification (This code can be copy/pasted and put in the webpage, then they can click 'confirm')

The user will then see some information on the screen to verify the running mode of the program. They will be told if a dry run is occurring, and if they are running in a production setting or not. The holdings API endpoint is called first which will show how many files are currently in the archive for a particular deployment

If not running as a dry run: The program will then attempt to send each file, receiving a message before each file send, and after a successful file send. If any errors are encountered then this will also be shown. If running as a dry run: The program will go through each file in the data directory, check to see if it already exists in the archive and will then indicate to the user that this file would have been sent if not in dry-run mode.

Throughout the process there are two types of logging that are maintained. Save file logs (Only populated with a date/time and filename when a glider file is successfully sent to the Archive) System logs (Used to keep track of all system events such as errors, running modes, dry runs etc)

Stopping a deployment

It is also possible to stop a deployment that is in progress. To do this, the user needs the deployment_id and config_file for deployment that is currently running.

If this command is used to start a deployment: bodc-archive-pusher start --deployment-id "123" --data-directory "/data/dep-123" --config-file "/data/config.json" --production --no-dry-run

Then the following command can be ran in a separate terminal window to stop it:

bodc-archive-pusher stop--deployment-id "123"  --config-file "/data/config.json

Clone this wiki locally