Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: upload drs object endpoint #27

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

feat: upload drs object endpoint #27

wants to merge 9 commits into from

Conversation

psankhe28
Copy link
Collaborator

@psankhe28 psankhe28 commented Oct 28, 2024

Description

Checklist

  • My code follows the contributing guidelines
    of this project, including, in particular, with regard to any style guidelines
  • The title of my PR complies with the
    Conventional Commits specification; in particular, it clearly
    indicates that a change is a breaking change
  • I acknowledge that all my commits will be squashed into a single commit,
    using the PR title as the commit message
  • I have performed a self-review of my own code
  • [] I have commented my code in hard-to-understand areas
  • I have updated the user-facing documentation to describe any new or
    changed behavior
  • I have added type annotations for all function/class/method interfaces
    or updated existing ones (only for Python, TypeScript, etc.)
  • I have provided appropriate documentation
    (Google-style Python docstrings) for all
    packages/modules/functions/classes/methods or updated existing ones
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature
    works
  • New and existing unit tests pass locally with my changes
  • I have not reduced the existing code coverage

Comments

Summary by Sourcery

Add new endpoints for handling file uploads using the TUS protocol, including initiating uploads, uploading chunks, and completing uploads by storing files in MinIO. Enable CORS in the deployment configuration to support cross-origin requests.

New Features:

  • Introduce an endpoint to initiate a TUS upload session, allowing clients to start uploading files in chunks.
  • Add functionality to upload file chunks using a unique object ID, supporting resumable uploads.
  • Implement a feature to complete the upload process by transferring the file to MinIO storage, with duplicate detection based on file hash.

Enhancements:

  • Enable CORS support in the deployment configuration to allow cross-origin requests.

Documentation:

  • Update API specifications to include new endpoints for initiating uploads, uploading chunks, and completing uploads.

Signed-off-by: Pratiksha Sankhe <[email protected]>
Copy link
Contributor

sourcery-ai bot commented Oct 28, 2024

Reviewer's Guide by Sourcery

This PR implements a new endpoint for uploading DRS objects using the TUS (Tus Upload Server) protocol, which enables resumable file uploads. The implementation includes a three-step process: upload initiation, chunk-based file upload, and upload completion with MinIO storage integration. The system also includes duplicate file detection using MD5 hashing.

Sequence diagram for DRS object upload process

sequenceDiagram
    actor User
    participant FlaskApp as Flask Application
    participant MinIO as MinIO Storage

    User->>FlaskApp: POST /upload/initiate
    FlaskApp-->>User: 201 Created (object_id)

    loop Upload Chunks
        User->>FlaskApp: PATCH /upload/{object_id}/chunk
        FlaskApp-->>User: 204 No Content
    end

    User->>FlaskApp: POST /upload/complete/{object_id}
    alt Duplicate Detected
        FlaskApp-->>User: 409 Conflict (Duplicate object)
    else
        FlaskApp->>MinIO: Store object
        MinIO-->>FlaskApp: Acknowledgement
        FlaskApp-->>User: 200 OK (Upload complete)
    end
Loading

Class diagram for updated Flask application

classDiagram
    class FlaskApp {
        +initiate_upload()
        +upload_chunk(object_id)
        +complete_upload(object_id)
        +get_chunks(object, chunk_size)
        +compute_file_hash(file_path)
    }
    class MinIOClient {
        +list_objects(bucket_name)
        +stat_object(bucket_name, object_name)
        +fput_object(bucket_name, object_name, file_path, content_type)
    }
    FlaskApp --> MinIOClient : uses
Loading

File-Level Changes

Change Details Files
Implementation of TUS upload protocol endpoints
  • Added endpoint for initiating uploads that generates a unique object ID
  • Created chunk upload endpoint that handles partial file uploads
  • Implemented upload completion endpoint that transfers files to MinIO
  • Added CORS support for upload operations
cloud_storage_handler/api/elixircloud/csh/controllers.py
cloud_storage_handler/api/specs/specs.yaml
File handling and storage features
  • Implemented MD5 hash computation for duplicate detection
  • Added temporary file management for upload chunks
  • Created chunk reading generator for efficient file processing
cloud_storage_handler/api/elixircloud/csh/controllers.py
Configuration updates
  • Added TUS-specific configuration settings
  • Enabled CORS in deployment configuration
cloud_storage_handler/api/elixircloud/csh/controllers.py
deployment/config.yaml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time. You can also use
    this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Signed-off-by: Pratiksha Sankhe <[email protected]>
Signed-off-by: Pratiksha Sankhe <[email protected]>
Signed-off-by: Pratiksha Sankhe <[email protected]>
@psankhe28 psankhe28 requested a review from uniqueg October 28, 2024 18:33
Copy link

codecov bot commented Oct 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (0b08ba9) to head (c067f0a).

Additional details and impacted files
@@            Coverage Diff            @@
##              main       #27   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            4         4           
  Lines           30        30           
=========================================
  Hits            30        30           
Flag Coverage Δ
test_integration 100.00% <ø> (ø)
test_unit 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@uniqueg uniqueg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's pretty cool :)

My major worry is that all chunks are stored on the Flask application. This potentially requires huge storage needs on the microservice (especially if multiple users upload at the same time). We need to find a way around this, maybe by copying the chunks to s3 and assembling there? That would be much better. Or even better: If you could upload the chunks directly to s3.

Here's what GenAI says about this: https://g.co/gemini/share/c57f76565274

But I do recognize that this would be quite a bit of extra work, so if you agree, for now, maybe let's move ahead with what we have right now. But if you think you can pull off one of the two approaches, please go ahead.

Either way, please add all the usual parts:

  • Write proper Google-style docstrings with "Args" and "Returns" sections (and "Raises" where applicable)
  • Add type hints for all function args and return values
  • Provide unit tests
  • Adopt a class-based approach; set state where it makes sense, e.g., TUS_UPLOAD_DIR should be an attribute
  • Split up longer functions into smaller ones (particularly the complete_upload() one, but also upload_chunk() is quite longish)

Signed-off-by: Pratiksha Sankhe <[email protected]>
@psankhe28 psankhe28 requested a review from uniqueg October 30, 2024 19:59
@psankhe28
Copy link
Collaborator Author

It's pretty cool :)

My major worry is that all chunks are stored on the Flask application. This potentially requires huge storage needs on the microservice (especially if multiple users upload at the same time). We need to find a way around this, maybe by copying the chunks to s3 and assembling there? That would be much better. Or even better: If you could upload the chunks directly to s3.

Here's what GenAI says about this: https://g.co/gemini/share/c57f76565274

But I do recognize that this would be quite a bit of extra work, so if you agree, for now, maybe let's move ahead with what we have right now. But if you think you can pull off one of the two approaches, please go ahead.

Either way, please add all the usual parts:

  • Write proper Google-style docstrings with "Args" and "Returns" sections (and "Raises" where applicable)
  • Add type hints for all function args and return values
  • Provide unit tests
  • Adopt a class-based approach; set state where it makes sense, e.g., TUS_UPLOAD_DIR should be an attribute
  • Split up longer functions into smaller ones (particularly the complete_upload() one, but also upload_chunk() is quite longish)

Okay, I will make the necessary changes.
I missed this.
Sorry

Signed-off-by: Pratiksha Sankhe <[email protected]>
Signed-off-by: Pratiksha Sankhe <[email protected]>
Signed-off-by: Pratiksha Sankhe <[email protected]>
Signed-off-by: Pratiksha Sankhe <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants