Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/mtchbx 45 #24

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

Feature/mtchbx 45 #24

wants to merge 14 commits into from

Conversation

nimonika
Copy link
Collaborator

@nimonika nimonika commented Dec 10, 2024

Context

We need to begin implementing endpoints for our API. This PR implements endpoints for Source objects.

Changes proposed in this pull request

  • Three endpoints for sources

Guidance to review

The POST endpoint is based on the assumption that the client can send an entire file. This may not be a realistic assumption. For large files, the client will need to send chunks as multipart objects, which in turn will need to be sent as multiparts to s3. Unit tests have been written only for the two GET endpoints as the POST is more of an integration test

Checklist:

  • My code follows the style guidelines of this project
  • New and existing unit tests pass locally with my changes

@nimonika nimonika requested a review from lmazz1-dbt December 10, 2024 13:53
Args:
file (BinaryIO): File to upload.
bucket_name (str): Target S3 bucket.
object_name (str): S3 object name. If not specified, file_name is used.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's file_name?

Comment on lines +49 to +61
class SourceItem(BaseModel):
"""Response model for source"""

schema: str
table: str
id: str
resolution: str | None = None


class Sources(BaseModel):
"""Response model for sources"""

sources: list[SourceItem]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we start moving this stuff to common?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See below. I think my favoured approach is separate Response* classes for now?

Comment on lines +58 to +61
class Sources(BaseModel):
"""Response model for sources"""

sources: list[SourceItem]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a separate pydantic object? Can't we just return list[Sources]?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sources requires a warehouse, and therefore a valid engine, and it doesn't expose the resolution hash (though it could). Two solutions:

  • SourceBase with the common fields, then Source and SourceResponse as subclasses with connection-enabled fields in one, and the resolution hash in the other
  • Keep response models separate to common -- they need structures that aren't relevant to the common objects

) -> dict[str, SourceItem] | str:
datasets = backend.datasets.list()
for dataset in datasets:
resolution = hexlify(dataset.resolution).decode("ascii")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use base64?

Comment on lines +126 to +127
return {"source": result_obj}
return "Source not found"
Copy link
Collaborator

@lmazz1-dbt lmazz1-dbt Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the source is not found we need to return a 404, and use a response with a proper schema, probably a codifier error message. And it is found, let's just return the object, no need to wrap it in a dict.

return "Source not found"


@app.post("/sources/uploadFile")
Copy link
Collaborator

@lmazz1-dbt lmazz1-dbt Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

camelCase? Are we sure? We should also follow the endpoint convention we'd set


@app.post("/sources/uploadFile")
async def add_source_to_s3(
file: UploadFile, bucket_name: str = Form(...), object_name: str = Form(...)
Copy link
Collaborator

@lmazz1-dbt lmazz1-dbt Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't allow specifying a bucket name, nor an object name (directly)

Comment on lines +135 to +137
if is_file_uploaded:
return "File was successfully uplaoded"
return "File could not be uplaoded"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP status codes please, and Pydantic schemas for the response


@app.post("/sources/uploadFile")
async def add_source_to_s3(
file: UploadFile, bucket_name: str = Form(...), object_name: str = Form(...)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a quick look at the docs, they're using annotations for form parameters
https://fastapi.tiangolo.com/tutorial/request-forms/

# def test_list_sources():
# response = client.get("/sources")
# assert response.status_code == 200
@patch("matchbox.server.base.BackendManager.get_backend")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests will need to be updated to reflect requested changes in the API

Copy link
Collaborator

@lmazz1-dbt lmazz1-dbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per my comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants