Skip to content
aworkman edited this page Jul 16, 2014 · 13 revisions

Content Hosting Service Documentation

For the purposes of this project, this service contains the following endpoints:

  • Search
  • Advanced Search
  • GetKeys
  • GetMetadata
  • SetMetadata
  • GetParadata
  • SetUserReviewParadata
  • SetUserCommentParadata
  • UploadFile
  • DownloadFile
  • DeleteFile
  • DeleteAll
  • GetAllVersions

Installation

Install Riak

Riak 2.0.0pre20 Download page

Download the Riak package relevant to your operating system and architecture. Install according to the instructions.

Install the Prototype

$ git clone https://github.com/adlnet/Content-Hosting-Service.git
$ cd content-repo
$ pip install virtualenv
$ virtualenv env
$ source env/bin/activate
(env)$ pip install -r requirements.txt

to quit and leave the virtualenv:

(env)$ deactivate

Running

to start riak:

$ sudo riak start

to start the webserver:

(env)$ cd content-repo
(env)$ source env/bin/activate
(env)$ python main.py

Stopping

to stop riak:

$ sudo riak stop

to stop the webserver:

(env)$^C

Re-Starting

to restart riak:

$ sudo riak restart

Riak Search

To conduct searches of the objects stored as part of the Riak Key-Object pairs, search needs to be enabled. To enable search it needs to be set in the app.config file (on every node in your cluster) located in the /etc/riak/ directory:

{riak_search, [
               {enabled, true}
              ]},

Riak must be then be restarted for search to take effect. Next, from the command line, the key-value search hook must be installed for each bucket you want to be indexed for search:

$ search-cmd install yourbucketname
 :: Installing Riak Search <--> KV hook on bucket 'yourbucketname'.

To conduct test searches of the objects in your buckets you can use the following command from the command line:

$ search-cmd search yourbucketname "some query"

How results are formatted:

 :: Searching for 'some query' / '' in yourbucketname...

------------------------------

index/id: yourbucketname/key_1
p -> [0]
score -> 0.35355339059327373

------------------------------

index/id: yourbucketname/key_2
p -> [0]
score -> 0.35355339059327373

------------------------------


 :: Found 2 results.

The results are represented as tuples thus:

index/id translates to: result[0] / result[1]

{p: [1], score: [1]} translates to: result[2]

Search queries:

Search queries are formed according to Apache Lucene Query Parser Syntax. Since this project uses JSON documents as objects, the field names of the JSON object are used as index field names. Nested objects will use underscore ('_') as a field name separator. (The underscore was chosen because the character is not currently reserved by Lucene syntax.)

For example, storing the following JSON object in a Search-enabled bucket:

{
 "name":"Joe Coder",
 "bio":"I'm an engineer, making awesome things.",
 "favorites":{
              "website":"Reddit",
              "language":"Python"
             }
}

Would cause four fields to be indexed: "name", "bio", "favorites_website", and "favorites_language". You could later query this data with queries like, "bio:engineer AND favorites_language:Python".

Using Search in the Code:

Enabling search on a bucket (Python):

import riak

client = riak.RiakClient(pb_port=8087, protocol='pbc')
my_db = client.bucket('my_riak_bucket')
my_db.enable_search()

Executing a search (Python/JSON):

keys_found = {'results': []}
search_query = 'field:term'
search_results = client.search('my_riak_bucket', search_query)

for result in search_query.run():
    result_key = result[1]
    keys_found['results'].append({'key': result_key})

Usage

This is a very simple API. It exposes several endpoints:

  • /CHS/keys/ for checking the keys stored in the database,
  • /CHS/search/ that sends and receives JSON documents to conduct a search of the database using one term,
  • /CHS/advanced_search/ that sends and receives JSON documents to conduct an advanced, more specific search of the database using three terms,
  • /CHS/get_metadata/{id} that sends and receives JSON documents to retrieve a file's metadata, where {id} is the filename,
  • /CHS/set_metadata/{id} that sends and receives JSON documents to set a file's metadata, where {id} is the filename,
  • /CHS/get_paradata/{id} that sends and receives JSON documents to retrieve a file's paradata, where {id} is the filename,
  • /CHS/set_user_review_paradata/{id} that sends and receives JSON documents to set a file's user review oriented paradata, where {id} is the filename,
  • /CHS/set_user_comment_paradata/{fid}/{rid} that sends and receives JSON documents to set a file's user comment oriented paradata, where {fid} is the filename and {rid} is the review_id of the related user review for the file {fid},
  • /CHS/upload_file for uploading a file, or multiple files to the database,
  • /CHS/download_file/{id} for downloading a file from the database, where {id} is the filename,
  • /CHS/{id} for deleting a file from the database, where {id} is the filename,
  • /CHS/deleteAll/ for deleting all files from the database,
  • /CHS/get_all_versions/{id} that sends and receives JSON documents for obtaining a list of versions of a file in the database, where {id} is the filename.

Riak Database Object Structure

dataProfile = {
	'file_location': {
		'local_path': None
	},
	'metadata': {
		'author': None,
		'title': None,
		'description': None,
		'upload_date': None,
		'last_modified_date': None,
		'mime_type': None,
		'resource_type': None,
		'keywords': None,
		'version': None
	},
	'paradata': {
		'user_reviews': [{
		        'review_id': None,
		        'user_rating': None,
		        'user_name': None,
		        'user_review_title': None,
		        'user_review': None,
		        'timestamp': None
		    }
		],
		'user_comments': [{
		        'review_id': None,
		        'comment_id': None,
		        'helpful': None,
		        'user_name': None,
		        'user_comment': None,
		        'timestamp': None
		    }
		]
	}
}

Endpoint /CHS/keys/

GET /CHS/keys/

Returns an HTML template with a list of all the keys in the database.

Arguments

None

Returns

200 OK (HTML)

Returns HTML with a list of the keys in the database.

404 Not Found (HTML)

There are no keys. The database is empty.

Endpoint /CHS/search

POST /CHS/search

Use this method to perform a simple search of the database object's metadata. It takes a JSON document with a single group of search terms (group, field, term).

Arguments

None

Example JSON request:

{
    "search": {
        "group": "metadata",
        "field": "keywords",
        "term": "test"
    }
}

The query would look like:

"metadata_keywords:test"

Returns

200 OK (JSON)

Returns JSON with a list of files' metadata that match the search-term.

400 Bad Request (No body)

The request body is not a valid JSON body, or does not contain the required term field.

404 Not Found (no body)

There were no results returned.

Example JSON return:

{
    "results": [
        {
            "key": "11_1linear.pdf"
        },
        {
            "key": "14_2linear.pdf"
        }
    ]
}

Endpoint /CHS/advanced_search

POST /CHS/advanced_search

Use this method to perform an advanced search of the database object's metadata/paradata. It takes a JSON document with three required groups of search terms (group1-3, field1-3, term1-3). At least two groups need to have valid arguments. This method can be expanded to include more groups and conditional words (OR/NOT).

Arguments

None

Example JSON request:

{
    "advanced_search": {
        "group1": "metadata",
        "field1": "author",
        "term1": "author",
        "group2": "metadata",
        "field2": "keywords",
        "term2": "test",
        "group3": "",
        "field3": "",
        "term3": ""
    }
}

The query would look like:

"metadata_author:author AND metadata_keywords:test"

Returns

200 OK (JSON)

A list of metadata matching the search criteria was successfully returned.

400 Bad Request (No body)

The request body is not a valid JSON body, or does not contain the required term fields.

404 Not Found (No body)

There were no results returned.

Example JSON return:

{
    "results": [
        {
            "key": "11_1linear.pdf"
        },
        {
            "key": "14_2linear.pdf"
        },
        {
            "key": "3037.pdf"
        }
    ]
}

Endpoint /CHS/get_metadata/{id}

GET /CHS/get_metadata/{id}

Use this method to retrieve the metadata JSON document stored with the given {id}.

Arguments

None

Returns

200 OK (JSON)

The metadata was successfully retrieved.

404 Not Found (no body)

There were no files associated with the {id}.

Example JSON return:

{
    "metadata": {
        "upload_date": "2014-06-24 13:08:32",
        "description": "some description",
        "author": "some author",
        "last_modified_date": "2014-06-24 13:08:32",
        "version": "some version",
        "mime_type": "some mime_type",
        "keywords": "some keyword, some keyword",
        "title": "some title",
        "resource_type": "pdf"
    }
}

Endpoint /CHS/set_metadata/{id}

POST /CHS/set_metadata/{id}

This method is for setting or updating the metadata for the file. Currently, the required fields are:

  • author
  • title
  • description
  • keywords
  • mime_type
  • version

Other fields set by/updated by the system:

  • upload_date
  • last_modified_date
  • resource_type

Arguments

None

Example JSON request:

{
    "metadata": {
        "author": "some author",
        "title": "some title",
        "description": "some description",
        "mime_type": "some mime_type",
        "keywords": "some keyword, some keyword",
        "version": "some version"
    }
}

Returns

200 OK

The metadata was successfully set or updated.

400 Bad Request (no body)

The request body is not a valid JSON body, or does not contain the required field(s).

Example JSON return:

{
    "metadata": {
        "upload_date": "2014-06-24 13:08:32",
        "description": "some description",
        "author": "some author",
        "last_modified_date": "2014-06-24 13:08:32",
        "version": "some version",
        "mime_type": "some mime_type",
        "keywords": "some keyword, some keyword",
        "title": "some title",
        "resource_type": "pdf"
    }
}

Endpoint /CHS/get_paradata/{id}

GET /CHS/get_paradata/{id}

Use this method to retrieve the paradata JSON document stored with the given {id}.

Arguments

None

Returns

200 OK (JSON)

The paradata was successfully retrieved.

404 Not Found (no body)

There were no files associated with the {id}.

Example JSON return:

{
    "paradata": {
        "user_reviews": [
            {
                "review_id": "5303de2e-307d-4dac-b86e-498671834d00",
                "timestamp": "2014-06-24 13:25:36",
                "user_rating": "5.0",
                "user_review_title": "review of file content",
                "user_review": "this file is awesome",
                "user_name": "Alan"
            }
        ],
        "user_comments": [
            {
                "timestamp": "2014-06-24 14:01:31",
                "comment_id": "4c84a785-cb48-4d87-a0ad-625fba3244a9",
                "helpful": "yes",
                "user_name": "Bill",
                "review_id": "5303de2e-307d-4dac-b86e-498671834d00",
                "user_comment": "this review really helped me"
            }
        ]
    }
}

Endpoint /CHS/set_user_review_paradata/{id}

POST /CHS/set_user_review_paradata/{id}

This method takes a POST request with a key (filename) {id} and a JSON document with the user review paradata and is for setting or updating the user review paradata for the file. Currently, the required fields are:

  • user rating
  • user name
  • user review title
  • user review

Other fields set by/updated by the system:

  • review_id
  • timestamp

Arguments

None

Example JSON request:

{
        "user_reviews": {
                "user_rating": "5.0",
                "user_review_title": "review of file content",
                "user_review": "this file is awesome",
                "user_name": "Alan"
        }
}

Returns

200 OK

The user review paradata was successfully set or updated.

400 Bad Request (no body)

The request body is not a valid JSON body, or does not contain the required field(s).

Example JSON return:

{
    "paradata": {
        "user_reviews": [
            {
                "review_id": "5303de2e-307d-4dac-b86e-498671834d00",
                "timestamp": "2014-06-24 13:25:36",
                "user_rating": "5.0",
                "user_review_title": "review of file content",
                "user_review": "this file is awesome",
                "user_name": "Alan"
            }
        ],
        "user_comments": [
            {
                "timestamp": "2014-06-24 14:01:31",
                "comment_id": "4c84a785-cb48-4d87-a0ad-625fba3244a9",
                "helpful": "yes",
                "user_name": "Bill",
                "review_id": "5303de2e-307d-4dac-b86e-498671834d00",
                "user_comment": "this review really helped me"
            }
        ]
    }
}

Endpoint /CHS/set_user_comment_paradata/{fid}/{rid}

POST /CHS/set_user_comment_paradata/{fid}/{rid}

This method takes a POST request with key (filename) {fid}, review id {rid} and a JSON document with the user comment paradata and is for setting or updating the user comment paradata for the file. Currently, the required fields are:

  • user name
  • user comment
  • helpful

Other fields set by/updated by the system:

  • comment_id
  • timestamp

Arguments

None

Example JSON request:

{
        "user_comments": {
                "helpful": "yes",
                "user_name": "Bill",
                "user_comment": "this review really helped me"
        }
}

Returns

200 OK

The user comment paradata was successfully set or updated.

400 Bad Request (no body)

The request body is not a valid JSON body, or does not contain the required field(s).

Example JSON return:

{
    "paradata": {
        "user_reviews": [
            {
                "review_id": "5303de2e-307d-4dac-b86e-498671834d00",
                "timestamp": "2014-06-24 13:25:36",
                "user_rating": "5.0",
                "user_review_title": "review of file content",
                "user_review": "this file is awesome",
                "user_name": "Alan"
            }
        ],
        "user_comments": [
            {
                "timestamp": "2014-06-24 14:01:31",
                "comment_id": "4c84a785-cb48-4d87-a0ad-625fba3244a9",
                "helpful": "yes",
                "user_name": "Bill",
                "review_id": "5303de2e-307d-4dac-b86e-498671834d00",
                "user_comment": "this review really helped me"
            }
        ]
    }
}

Endpoint /CHS/upload_file

POST /CHS/upload_file

This method allows for one or more files to be selected from a web form for upload. Currently, the accepted file formats are zip, pdf, and xml but it can be setup to accept others. The current design has the uploaded files stored to an uploads folder on the server and the path to the file stored as part of the file's object metadata.

Arguments

None

Returns

200 OK

Successful file upload.

400 Bad Request (no body)

The file being uploaded is not an accepted file format.

409 Conflict (no body)

The file already exists in the database.

Endpoint /CHS/download_file/{id}

GET /CHS/download_file/{id}

This method takes a filename {id} and downloads the file from the database.

Arguments

None

Returns

200 OK

The file {id} is successfully downloaded.

404 Not Found (no body)

There were no files associated with the {id}.

Endpoint /CHS/{id}

DELETE /CHS/{id}

This method takes a filename {id} and deletes it from the database.

Arguments

None

Returns

200 OK (JSON)

The file with the given {id} was successfully deleted.

404 Not Found (no body)

There were no files associated with the {id}.

Endpoint /CHS/DeleteAll/

GET /CHS/DeleteAll/

This method deletes all entries from the database.

Arguments

None

Returns

200 OK (JSON)

The operation was successfully completed.

Endpoint /CHS/get_all_versions/{id}

GET /CHS/get_all_versions/{id}

Use this method to get a list of all the versions of a file with the given {id} stored in the database. (This method is currently stubbed out for testing responses).

Arguments

None

Returns

200 OK (JSON)

Returns a document with a list of file versions for the file with the given {id}.

404 Not Found (no body)

There were no files associated with the {id}.