Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add "GET /files" controller #21

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 16 additions & 1 deletion cloud_storage_handler/api/elixircloud/csh/controllers.py
Original file line number Diff line number Diff line change
@@ -3,7 +3,8 @@
import logging
from http import HTTPStatus

from flask import jsonify
from flask import current_app, jsonify
from minio.error import S3Error

logger = logging.getLogger(__name__)

@@ -13,3 +14,17 @@ def home():
return jsonify(
{"message": "Welcome to the Cloud Storage Handler server!"}
), HTTPStatus.OK
psankhe28 marked this conversation as resolved.
Show resolved Hide resolved


psankhe28 marked this conversation as resolved.
Show resolved Hide resolved
def list_files():
"""Endpoint to list all files in the MinIO bucket."""
try:
psankhe28 marked this conversation as resolved.
Show resolved Hide resolved
minio_config = current_app.config.foca.custom.minio
bucket_name = minio_config.bucket_name
minio_client = current_app.config.foca.custom.minio.client.client
objects = minio_client.list_objects(bucket_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually the only line that could raise an S3Error if I'm not mistaken. All the other lines could also raise errors, but not S3Errors.

Please have a look at other FOCA implementations and understand when and when not to define errors. You only want to treat errors that you reasonably expect (and that you want to actually handle in some way). But make sure that you make use of FOCA's built-in error handling. If your error handling doesn't add anything to the existing error handling (which maps all known/defined errors to the error message and code defined in the exceptions dictionary in cloud_storage_handler.exceptions and raises a default 500 errors for any errors it doesn't know about), then leave it.

In this case, you can probably assume that the config is valid, because it was actually validated at some point. If it were not valid (and attributes would be missing), that would really be an unexpected error pointing to some deeper problem that the user shouldn't know about. In such cases, no specific error handling is needed, it is truly an InternalServerError. Admins would then need to investigate, and they would look at the logs and see the actual error trace. That's exactly what we want to happen - so no need to or additional value from handling such errors specifically.

You need to think hard in each case whether there is something that we want to let the user/client to know. Most importantly, will there be a point in retrying or not? An S3 error could be because the S3 service has a temporary glitch, so it might actually be worth trying again. But a BadRequest would always be expected to fail, so clients should not retry.

Here, I do think that you actually handle errors quite right: S3Error is also the only error I'd catch, but you should only put the try / catch around that one line that can actually raise such an error. And you want to define a custom error and use FOCA to map that error to a 500 or more appropriate error response with an informative error message.

Everything else you can rightly ignore, at least for now.

Make sure to learn about FOCA, read on the different HTTP status codes, REST philosophy and error handling, especially in web services.

files = [obj.object_name for obj in objects]
return jsonify({"files": files}), 200
psankhe28 marked this conversation as resolved.
Show resolved Hide resolved
psankhe28 marked this conversation as resolved.
Show resolved Hide resolved
Comment on lines +22 to +27
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider implementing a specific class for this controller, in a dedicated module. There will be more code added at a later point, and we don't want these functions to get too complex.


except S3Error as err:
return jsonify({"error": str(err)}), 500
Comment on lines +29 to +30
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best to raise an application-specific, custom error. Describe it in cloud_storage_handler.exceptions and make sure to also include it in the exceptions dictionary of that module. Have a look at HTTP status codes to see if you find a more appropriate status code than 500. If so, use that, otherwise keep 500. In any case, use the dictionary to map it to 500.

18 changes: 18 additions & 0 deletions cloud_storage_handler/api/specs/specs.yaml
Original file line number Diff line number Diff line change
@@ -35,4 +35,22 @@ paths:
description: The request is malformed.
'500':
description: An unexpected error occurred.
/list_files:
psankhe28 marked this conversation as resolved.
Show resolved Hide resolved
get:
description: |
Returns a list of all files in the minio bucket
operationId: list_files
responses:
'200':
description: The list of files has been retrieved successfully.
content:
application/json:
schema:
type: array
items:
type: string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want to return more than just a string, rather a list of objects following a model that you define. At the very least, when uploading, you should create a resource ID, maybe a UUID4 or something similar. Please look at other services and see how they do that.

Actually, it would have made more sense to start with implementing POST /files, as that would have clarified that point better. It is expected by REST semantics that a POST request creates a resource, and that resources has a certain ID, and that ID should be guaranteed to be unique (which a filename is not guaranteed to be).

You could also consider including other basic info, like file size, some sort of hash like MD5 (could be configurable; a hash could then be used to avoid duplicate files to be registered, if we want that; we can discuss that), maybe the MIME type of the file if it's available (or whatever info you get from the file magic string, if present), possibly a description if that is provided during upload (if there is a field for that) - or whatever else DRS needs to create a POST /objects.

But given that you started with GET /files, let's keep the model simple for now and only return a unique identifier and a filename. And then discuss about additional properties when you implement POST /files.

'400':
description: The request is malformed.
'500':
description: An unexpected error occurred.
...
24 changes: 24 additions & 0 deletions tests/test_integration/test_operations.py
Original file line number Diff line number Diff line change
@@ -27,3 +27,27 @@ def test_get_root():
mock_get.assert_called_once_with(server_url)

print("Finished test_get_root")


def test_get_files():
"""Test the list_file endpoint of the service with a mocked response."""
print("Starting test_get_files...")

server_url = "http://localhost:8080/elixircoud/csh/v1/list_files"

with mock.patch("requests.get") as mock_get:
mock_response = mock.Mock()
psankhe28 marked this conversation as resolved.
Show resolved Hide resolved
mock_response.status_code = HTTPStatus.OK
mock_response.json.return_value = {"files": ["file1.txt", "file2.txt"]}
mock_get.return_value = mock_response

response = requests.get(server_url)
print(f"Response status code: {response.status_code}")

assert response.status_code == HTTPStatus.OK
assert "files" in response.json()
assert isinstance(response.json()["files"], list)

mock_get.assert_called_once_with(server_url)

print("Finished test_get_files")