BitStore is a DataHub microservice for storing blobs i.e. files. It is a lightweight auth wrapper for an S3-compatible object store that integrates with the rest of the DataHub stack and especially the auth
service.
make install
make test
python server.py
AUTH_SERVER
- the FQ URL of the auth server. Used for looking up the public key for communicating with the auth server from the auth server.- Object store: connection info for the underlying S3-style objectstore service
STORAGE_ACCESS_KEY_ID STORAGE_SECRET_ACCESS_KEY STORAGE_BUCKET_NAME
STORAGE_PATH_PATTERN
- pattern for generating the storage path in the objectstore for a given rile. That is,object_store_path = make_path(STORAGE_PATH_PATTERN.format{fileinfo})
. May contain any format string available for a file in authorize API including{path}
(relative path to file in package){md5}
.{basename}
which is the filename, extracted from the{path}
{dirname}
which is the dirname, extracted from the{path}
{extension}
which is the extension of the filename{md5}
(and{md5_hex}
which is the md5 in hex form)
Note: in addition to file info the owner and dataset (name) are available as{owner}
and{dataset}
. Examples:
custom/path/{owner}/{dataset}/{path}
will, given{owner: datahq, name: datax, path: data/file.csv}
will end up withcustom/path/datahq/datax/data/file.csv
{md5}
- storage path is md5 hash of the file (assuming md5 hash is provided)
Note: requested permissions to auth server will be like:
permissions:
datapackage-upload
service:
SERVICE_NAME (config defined above e.g. 'rawstore')
/authorize
Method: POST
Query Parameters:
jwt
- permission token (received from/user/authorize
)
Headers:
Auth-Token
- permission token (can be used instead of thejwt
query parameter)
Body:
JSON content with the following structure:
{
"metadata": {
"owner": "<user-id-of-uploader>",
"name": "<data-set-unique-id>"
},
"filedata": {
"<relative-path-to-file-in-package-1>": {
"length": 1234, #length in bytes of data
"md5": "<md5-hash-of-the-data>",
"type": "<content-type-of-the-data>",
"name": "<file-name>"
},
"<relative-path-to-file-in-package-2>": {
"length": 4321,
"md5": "<md5-hash-of-the-data>",
"type": "<content-type-of-the-data>",
"name": "<file-name>"
}
...
}
}
owner
must match the userid
that is in the authentication token.
Returns
Signed urls to upload into S3:
{
fileData: {
"<file-name-1>": {
"md5-hash": "...",
"name": "<file-name>",
"type": "<file-type>",
"upload_query": {
'Content-MD5': '...',
'Content-Type': '...',
'acl': 'public-read',
'key': '<path>',
'policy': '...',
'x-amz-algorithm': 'AWS4-HMAC-SHA256',
'x-amz-credential': '...',
'x-amz-date': '<date-time-in-ISO',
'x-amz-signature': '...'
},
"upload_url": "<s3-url>",
"exists": true/false
},
"<file-name-2>": ...,
...
}
}
/info
Method: GET
Query Parameters:
jwt
- permission token (received from/user/authorize
)
Headers:
Auth-Token
- permission token (can be used instead of thejwt
query parameter)
Returns:
JSON content with the following structure:
{
"prefixes": [
"https://datastore.openspending.org/123456789",
]
}
prefixes
is the list of possible prefixes for an uploaded file for this user.
/presign
Methos: GET
Query Parameters:
jwt
- permission token (received from/user/authorize
)url
- original URL for S3 objectownerid
- authenticated user Id
Headers:
Auth-Token
- permission token (can be used instead of thejwt
query parameter)
Returns:
Original or Pre-Signed S3 URL:
{
"url": "https://s3.amazonaws.com/rawstore/ownername/dataset/maydata.csv?x=y",
}