Data DJ is a value-adding service for collections and archives, initially conceived at ETH Library Lab and currently in development at ETH Library. It helps to provide more convenient and efficient access to batches of digitised records and files. The service works in conjunction with collections' existing websites and search portals. The collection's website forwards the user's request for a list of files to the Data DJ, our service then gathers and compresses the files, and notifies the user via email with a convenient download link.
The requests to the sample application DataDJ can be accessed at https://dj-api-ucooq6lz5a-oa.a.run.app/. The Requests presented throughout the README are written for Visual Studio Code REST Client, however they can simply be transformed to be used with other API Clients or curl
.
If you are planning to work on this project, contact us to ask for the detailed internal documentation.
Edit the curl request below to include your email
and the list of files
that you want to download (note the included filepath). Aditionally meta
information can be included using said field. The endpoint can be called using curl
. Once the files have been gathered and downloaded, you should receive an email with the download link. This endpoint should be called by a data collection, forwarding the files requested by a user and specifiying the users email address. Please note that the archiveID remains empty in the current iteration of the service.
Example:
POST https://dj-api-ucooq6lz5a-oa.a.run.app/archive
Content-Type: application/json
Authorization: Bearer service_key
{
"email": "[email protected]",
"archiveID": "",
"content": [
{
"sourceID": "0ff529e3",
"files": ["/test/dir/file1", "/test/dir/file2"]
},
{
"sourceID": "eba48cdb",
"files": ["/test/dir/file3", "/test/dir/file4"]
}],
"meta": "{meta: information}"
}
GET https://dj-api-ucooq6lz5a-oa.a.run.app/ping
An admin can task the DJ to generate a new service token/key and to send an email with a redeem link to the specified email address. The service key is required by collections to interact with the DJ for anything related to creating and altering archives.
POST https://dj-api-ucooq6lz5a-oa.a.run.app/admin/createKeyLink
Content-Type: application/json
Authorization: Bearer admin_key
{
"email": "[email protected]"
}
A taskhandler is the part of the DataDJ responsible for gathering and compressing the requested files, as well as sending an email containing a download link to the user who requested the files. In order to interact to the API part of the DataDJ, the taskhandler requires a handler token/key similar to a service key. Said key can be generated by an admin via the following request and has to be manually handed to the operator of the taskhandler in question (for now).
POST https://dj-api-ucooq6lz5a-oa.a.run.app/admin/registerHandler
Content-Type: application/json
Authorization: Bearer admin_key
A source is a representation of a collection holding files to be downloaded. This services the purpose to identify which files have to be gathered where and also to keep track of the origin of every file to provide an overview of every sources contribution to the final archive with all its files. The registration request returns a source-id which subsequentially has to be used to uniquely identify the source when interacting with the DataDJ.
POST https://dj-api-ucooq6lz5a-oa.a.run.app/source
Content-Type: application/json
Authorization: Bearer service_key
{
"name": "Test-Source-One",
"Organisation": "ETHZ"
}
https://dj-api-ucooq6lz5a-oa.a.run.app/archive
This endpoint expects a request that contains four fields:
{
"email":"",
"archiveID":"",
"files":[],
"meta": ""
}
email
, archiveID
and meta
are strings, whereas files
is a list of strings containing the names of the files.
Depending on which fields are left empty, the API triggers different operations. For now only option 4 is being used in tests, whereas the other option are kept for the future.
Both email
and archiveID
are left empty, whereas files
contains the names of the files the archive should be initialised with.
Example:
POST https://dj-api-ucooq6lz5a-oa.a.run.app/archive
Content-Type: application/json
Authorization: Bearer service_key
{
"email": "",
"archiveID": "",
"content": [
{
"sourceID": "0ff529e3",
"files": ["/test/dir/file1", "/test/dir/file2"]
},
{
"sourceID": "eba48cdb",
"files": ["/test/dir/file3", "/test/dir/file4"]
}],
"meta": "{meta: information}"
}
email
is left empty. archiveID
contains the identifier of a previously created archive and files
the list of files you want to add to the archive.
Example:
POST https://dj-api-ucooq6lz5a-oa.a.run.app/archive
Content-Type: application/json
Authorization: service_key
{
"email": "",
"archiveID": "e01fd941",
"content": [
{
"sourceID": "0ff529e3",
"files": ["/test/dir/file1", "/test/dir/file2"]
},
{
"sourceID": "eba48cdb",
"files": ["/test/dir/file3", "/test/dir/file4"]
}],
"meta": "{meta: information}"
}
email
contains the email address the download link is being sent to, archiveID
specifies the archive you want to download and files
is left empty. The DataDj will send you a download link that allows you to download the archive as a .zip file.
Example:
POST https://dj-api-ucooq6lz5a-oa.a.run.app/archive
Content-Type: application/json
Authorization: Bearer service_key
{
"email": "[email protected]",
"archiveID": "e01fd941",
"content": [],
"meta": ""
}
email
contains the email address the download link is being sent to, archiveID
is left empty and files
contains the names of the files you want to download.
The DJ creates an archive of the files in the request and will also return its identifier in the response, in case that archive needs to be accessed or modified later on. However it is not necessary to separatly trigger the notification containing the download link as this is going to happen automatically.
Example:
POST https://dj-api-ucooq6lz5a-oa.a.run.app/archive
Content-Type: application/json
Authorization: Bearer service_key
{
"email": "[email protected]",
"archiveID": "",
"content": [
{
"sourceID": "0ff529e3",
"files": ["/test/dir/file1", "/test/dir/file2"]
},
{
"sourceID": "eba48cdb",
"files": ["/test/dir/file3", "/test/dir/file4"]
}],
"meta": "{meta: information}"
}
Currently, the /archive
endpoint returns an object describing the order which was created for the archive in question. Orders are objects telling the taskhandlers which archives should be downloaded.
{
"orderID": "a5777ffb",
"archiveID": "4afc3f67",
"email": "[email protected]",
"date": "2022-12-14 16:27:28.967665178 +0000 UTC m=+67114.216955617",
"status": "opened",
"sources": [
"0ff529e3"
]
}
https://data-dj-2021.oa.r.appspot.com/archive/id
This endpoint allows to inspect the contents of an archive id
either in the browser or via an API client. The response is a JSON object representing the archive.
Example:
GET https://dj-api-ucooq6lz5a-oa.a.run.app/archive/a2e11165
Content-Type: application/json
Authorization: Bearer service_key
Example Response:
{
"id": "a2e11165",
"content": [
{
"sourceID": "0ff529e3",
"files": [
"/test/dir/file1",
"/test/dir/file2"
]
},
{
"sourceID": "eba48cdb",
"files": [
"/test/dir/file3",
"/test/dir/file4"
]
}
],
"meta": "{meta: information}",
"timeCreated": "2022-12-09 13:31:43.320372 +0100 CET m=+305.508934168",
"timeUpdated": "",
"status": "opened",
"sources": [
"0ff529e3",
"eba48cdb"
]
}
- make a copy of
.env.example
and save it as.env.local
- replace the example directory paths, bucketnames and other settings as needed.
option a: run with go
download and run the redis image with docker
docker pull redis
docker run --name dj-redis -p 6379:6379 -d redis
start the task handler
open a terminal in project root.
export all of the variables in the .env.local
file
run the task handler
source .env.local
export $(cut -d= -f1 .env.local)
go run ./taskHandler/*.go
open a separate terminal in project root.
export all of the variables in the .env.local
file
run the api
source .env.local && export $(cut -d= -f1 .env.local)
go run ./api/*.go
note that for any changes in the environment file to take effect, you must export the variables again and restart that part of the application.
option b: (to be completed)
to run publisher and subscriber applications using docker. include the path to the .env.local file in the docker run command.
docker run --env-file=./.env.local -p 8080:8080 data-dj-image
docker build --platform=linux/amd64 -f Dockerfile.api -t dj-api-amd64 .
docker tag dj-api-amd64:0.0.1 europe-west6-docker.pkg.dev/data-dj-2021/dj-docker-repo/dj-api:0.0.1
docker push europe-west6-docker.pkg.dev/data-dj-2021/dj-docker-repo/dj-api:0.0.1
- Follow instructions: https://zahadum.notion.site/Google-Cloud-4c32dcbe1cfb4b479e8680e852ef0d84
curl -X POST "0.0.0.0:8765/admin/createKeyLink"
-H "Authorization: Bearer $ADMIN_KEY"
-H "content:application/json"
-d '{"email":"[email protected]"}'`
generates a token
saves hashed token in mongo
middleware function validates token during requests
set mongo collection to delete a document after the given number of seconds.
Does not apply if the index field is not in the document e.g. if a doc does not have expiryRequestedDate
it will not be deleted.
db.apiKeys.createIndex( { "expiryRequestedDate": 1 }, { expireAfterSeconds: 3600 } )
-
Learning Go by Jon Bodner
general reference for programming in GO; types, syntax, imports etc.
see Ch13 for writing tests
http://www.inanzzz.com/index.php/post/g7e8/running-mongodb-migration-script-at-the-docker-startup