Feature/mtchbx 45 #24

nimonika · 2024-12-10T13:53:17Z

Context

We need to begin implementing endpoints for our API. This PR implements endpoints for Source objects.

Changes proposed in this pull request

Three endpoints for sources

Guidance to review

The POST endpoint is based on the assumption that the client can send an entire file. This may not be a realistic assumption. For large files, the client will need to send chunks as multipart objects, which in turn will need to be sent as multiparts to s3. Unit tests have been written only for the two GET endpoints as the POST is more of an integration test

Checklist:

My code follows the style guidelines of this project
New and existing unit tests pass locally with my changes

leo-mazzone · 2024-12-12T15:57:41Z

src/matchbox/server/utils/s3.py

+    Args:
+        file (BinaryIO): File to upload.
+        bucket_name (str): Target S3 bucket.
+        object_name (str): S3 object name. If not specified, file_name is used.


What's file_name?

leo-mazzone · 2024-12-12T15:58:58Z

src/matchbox/server/api.py

+class SourceItem(BaseModel):
+    """Response model for source"""
+
+    schema: str
+    table: str
+    id: str
+    resolution: str | None = None
+
+
+class Sources(BaseModel):
+    """Response model for sources"""
+
+    sources: list[SourceItem]


should we start moving this stuff to common?

See below. I think my favoured approach is separate Response* classes for now?

leo-mazzone · 2024-12-12T16:00:01Z

src/matchbox/server/api.py

+class Sources(BaseModel):
+    """Response model for sources"""
+
+    sources: list[SourceItem]


do we need a separate pydantic object? Can't we just return list[Sources]?

Sources requires a warehouse, and therefore a valid engine, and it doesn't expose the resolution hash (though it could). Two solutions:

SourceBase with the common fields, then Source and SourceResponse as subclasses with connection-enabled fields in one, and the resolution hash in the other

Keep response models separate to common -- they need structures that aren't relevant to the common objects

leo-mazzone · 2024-12-12T16:01:00Z

src/matchbox/server/api.py

+) -> dict[str, SourceItem] | str:
+    datasets = backend.datasets.list()
+    for dataset in datasets:
+        resolution = hexlify(dataset.resolution).decode("ascii")


can we use base64?

leo-mazzone · 2024-12-12T16:02:07Z

src/matchbox/server/api.py

+            return {"source": result_obj}
+    return "Source not found"


if the source is not found we need to return a 404, and use a response with a proper schema, probably a codifier error message. And it is found, let's just return the object, no need to wrap it in a dict.

leo-mazzone · 2024-12-12T16:03:44Z

src/matchbox/server/api.py

+    return "Source not found"
+
+
+@app.post("/sources/uploadFile")


camelCase? Are we sure? We should also follow the endpoint convention we'd set

leo-mazzone · 2024-12-12T16:04:06Z

src/matchbox/server/api.py

+
+@app.post("/sources/uploadFile")
+async def add_source_to_s3(
+    file: UploadFile, bucket_name: str = Form(...), object_name: str = Form(...)


we shouldn't allow specifying a bucket name, nor an object name (directly)

leo-mazzone · 2024-12-12T16:05:10Z

src/matchbox/server/api.py

+    if is_file_uploaded:
+        return "File was successfully uplaoded"
+    return "File could not be uplaoded"


HTTP status codes please, and Pydantic schemas for the response

leo-mazzone · 2024-12-12T16:10:35Z

src/matchbox/server/api.py

+
+@app.post("/sources/uploadFile")
+async def add_source_to_s3(
+    file: UploadFile, bucket_name: str = Form(...), object_name: str = Form(...)


From a quick look at the docs, they're using annotations for form parameters
https://fastapi.tiangolo.com/tutorial/request-forms/

leo-mazzone · 2024-12-12T16:11:58Z

test/server/test_api.py

-    # def test_list_sources():
-    #     response = client.get("/sources")
-    #     assert response.status_code == 200
+    @patch("matchbox.server.base.BackendManager.get_backend")


The tests will need to be updated to reflect requested changes in the API

leo-mazzone

As per my comments

nimonika added 8 commits December 3, 2024 09:48

Implement the GET sources call in the API

e7024fb

Implement the GET source call, given a hash as a hexdigest in the API

ab5ecbc

Fix test for listing the sources

3b1ced5

Remove redundant imports

de47420

Remove redundant imports

9c5fde0

Merge branch 'main' into feature/mtchbx-45

1199d5e

Enable parquet file uploads to S3 through the POST endpoint

b4ab949

Move the AWS S3 client outside the postgresql folder

0cf1d7b

nimonika requested a review from leo-mazzone December 10, 2024 13:53

wpfl-dbt added 6 commits December 12, 2024 14:39

Ran ruff format

f054b2f

Tidied up s3 function and boto3 dependency

2d00496

Fixed unit tests

725048e

Updated uv lock to deal with urllib3 conflict

e73215a

Merged main

6f6332f

Unit tests passing locally

88deb28

leo-mazzone reviewed Dec 12, 2024

View reviewed changes

leo-mazzone requested changes Dec 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/mtchbx 45 #24

Feature/mtchbx 45 #24

nimonika commented Dec 10, 2024 •

edited by wpfl-dbt

Loading

leo-mazzone Dec 12, 2024

leo-mazzone Dec 12, 2024

wpfl-dbt Dec 12, 2024

leo-mazzone Dec 12, 2024

wpfl-dbt Dec 12, 2024

leo-mazzone Dec 12, 2024

leo-mazzone Dec 12, 2024 •

edited

Loading

leo-mazzone Dec 12, 2024 •

edited

Loading

leo-mazzone Dec 12, 2024 •

edited

Loading

leo-mazzone Dec 12, 2024

leo-mazzone Dec 12, 2024

leo-mazzone Dec 12, 2024

leo-mazzone left a comment

Feature/mtchbx 45 #24

Are you sure you want to change the base?

Feature/mtchbx 45 #24

Conversation

nimonika commented Dec 10, 2024 • edited by wpfl-dbt Loading

Context

Changes proposed in this pull request

Guidance to review

Checklist:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leo-mazzone Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

leo-mazzone Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

leo-mazzone Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leo-mazzone left a comment

Choose a reason for hiding this comment

nimonika commented Dec 10, 2024 •

edited by wpfl-dbt

Loading

leo-mazzone Dec 12, 2024 •

edited

Loading

leo-mazzone Dec 12, 2024 •

edited

Loading

leo-mazzone Dec 12, 2024 •

edited

Loading