Upload recording feature #787

KIRA009 · 2024-06-21T13:51:48Z

What kind of change does this PR introduce?
This PR addresses #724

Summary
This PR adds a script to deploy an app to AWS lambda, that is then used by the openadapt app to upload zipfiles of recordings from users.

Checklist

My code follows the style guidelines of OpenAdapt
I have performed a self-review of my code
If applicable, I have added tests to prove my fix is functional/effective
I have linted my code locally prior to submission
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation (e.g. README.md, requirements.txt)
New and existing unit tests pass locally with my changes

How can your code be run and tested?
From the project root, run python -m scripts.recording_uploader.deploy (ensure that you have the necessary aws creds configured). Once the command completes, note the api url in the output, and paste that onto the config.py's RECORDING_UPLOAD_URL variable. Once that is done, start the app, navigate to a recording detail page, and click on the "Upload recording" button. Check the s3 bucket to confirm that the recording has been uploaded (in the form of a zip file)

Other information

abrichr

Thank you for putting this together @KIRA009 ! Just left a few small comments, happy to chat about any of it if you like! 🙏 😄

abrichr · 2024-06-22T15:55:46Z

openadapt/utils.py

+    with open(file_path, "rb") as file:
+        files = {"file": (filename, file)}
+        resp = requests.put(upload_url, files=files)
+        resp.raise_for_status()


What do you think about returning the response here?

abrichr · 2024-06-22T15:57:05Z

scripts/recording_uploader/.gitignore

@@ -0,0 +1,244 @@
+
+# Created by https://www.gitignore.io/api/osx,linux,python,windows,pycharm,visualstudiocode


Interesting, can you please clarify why this is necessary / preferable to a minimal .gitignore? What did you need to ignore here? Why not keep it in the root .gitignore?

This is autogenerated from the template. You are right though, this isn't needed

abrichr · 2024-06-22T15:58:12Z

scripts/recording_uploader/README.md

+
+## Deploy the application
+
+There is a `deploy` script that creates the s3 bucket and deploys the application using the SAM CLI (included as part of the dev dependencies of this project). The bucket name is hardcoded in the script. The SAM CLI is set up to run in `guided` mode, which will prompt the user every time befor deploying, in case the user wants to change the default values.


Typo: befor -> before

Can we make the bucket name configurable, with the aws region etc?

abrichr · 2024-06-22T16:01:00Z

scripts/recording_uploader/__init__.py

@@ -0,0 +1 @@
+"""Init file for the recording_uploader package."""


If this is a package, should we move it outside of scripts?

It should be named as a module instead 😅

abrichr · 2024-06-22T16:11:58Z

scripts/recording_uploader/uploader/app.py

+def get_presigned_url() -> dict:
+    """Generate a presigned URL for uploading a recording to S3."""
+    bucket = "openadapt"
+    region_name = "us-east-1"


What do you think about putting these in NAMED_CONSTANTS at the top of this file, and passing them in as default keyword arguments to get_presigned_url?

abrichr · 2024-06-22T16:25:55Z

scripts/recording_uploader/deploy.py

+    if guided:
+        commands.append("--guided")
+    subprocess.run(commands, cwd=CURRENT_DIR, check=True)
+    print("Lambda function deployed successfully.")


What do you think about using logger here?

abrichr · 2024-06-22T16:26:33Z

scripts/recording_uploader/deploy.py

+            Bucket=bucket,
+        )
+    except (s3.exceptions.BucketAlreadyExists, s3.exceptions.BucketAlreadyOwnedByYou):
+        proceed = input(f"Bucket '{bucket}' already exists. Proceed? [y/N] ")


Is this necessary? What happens if the user proceeds if the bucket has already been created?

I thought this would be good to have in case the user doesn't want to overwrite existing buckets, but I realise that is quite a niche case

Will this remove existing data?

Its unlikely that it will, given we are generating random filenames, so yes maybe we can remove this part

I have removed this

abrichr · 2024-06-22T16:27:53Z

scripts/recording_uploader/uploader/app.py

+            "Bucket": bucket,
+            "Key": key,
+        },
+        ExpiresIn=3600,


Can you please make this a named constant, e.g. ONE_HOUR_IN_SECONDS = 60 * 60?

abrichr · 2024-06-22T16:29:07Z

scripts/recording_uploader/uploader/app.py

+        region_name=region_name,
+        endpoint_url=f"https://s3.{region_name}.amazonaws.com",
+    )
+    key = f"recordings/{uuid4()}.zip"


What do you think about adding the user's unique id to the path, e.g. recordings/{user_id}/{upload_id}.zip?

Makes sense

abrichr · 2024-06-22T16:38:30Z

Once we start scaling we should consider supporting B2, e.g. app.py:

"""Lambda-like function for generating a presigned URL for uploading a recording to B2."""

from typing import Any
from uuid import uuid4
import json
from b2sdk.v2 import B2Api, InMemoryAccountInfo

def get_b2_client() -> B2Api:
    """Create and return a B2 client."""
    info = InMemoryAccountInfo()
    b2_api = B2Api(info)
    b2_api.authorize_account("production", "applicationKeyId", "applicationKey")
    return b2_api

def lambda_handler(*args: Any, **kwargs: Any) -> dict:
    """Main entry point for the function."""
    return {
        "statusCode": 200,
        "body": json.dumps(get_presigned_url()),
    }

def get_presigned_url() -> dict:
    """Generate a presigned URL for uploading a recording to B2."""
    bucket_name = "openadapt"
    b2_api = get_b2_client()
    bucket = b2_api.get_bucket_by_name(bucket_name)
    file_name = f"recordings/{uuid4()}.zip"
    file_info = {'how': 'good-file'}

    presigned_url = bucket.get_upload_url(file_name, file_info=file_info)
    
    return {"url": presigned_url['upload_url'], "upload_auth_token": presigned_url['authorization_token']}

For now let's stick with s3.

KIRA009 · 2024-07-01T12:45:23Z

Before this is merged, we need to setup the upload url and update in on config.py - RECORDING_UPLOAD_URL

abrichr · 2024-07-05T17:05:05Z

scripts/recording_uploader/deploy.py

+        region_name (str): The AWS region to deploy the Lambda function to.
+        guided (bool): Whether to use the guided SAM deployment.
+    """
+    s3 = boto3.client(


@KIRA009 can you please modify this to use the credentials specified in openadapt.config? Specifically we should add AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY or similar.

Also, please add some documentation regarding the permissions required for this IAM user.

Do we want to add these keys to the config? They won't be used anywhere else in the project, and I think when you run the deploy script, if boto3 doesn't find appropriate keys in the place where its looking, the user is notified of that.

For the access keys, I followed this without any explicit permissions - https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html#Using_CreateAccessKey. These keys are the ones that you need to run the deploy script, which then creates appropriate iam users on its own. Once its deployed, if you want you can delete the original access keys. Should I add this in the readme?

abrichr · 2024-07-05T17:06:49Z

scripts/recording_uploader/uploader/requirements.txt

@@ -0,0 +1 @@
+boto3==1.34.84


What do you think about moving all of this outside of /scripts, and into e.g. /admin or similar?

I think the uploader module has to remain inside the recording_uploader module. If you are talking about the recording_uploader module, we could, but I think scripts is also a good enough place for it to be in. Let me know

Yes I meant to move recording_uploader into a new directory, /admin or similar.

abrichr · 2024-07-15T15:47:32Z

README.md

+If you want to self host the app, you should run the following scripts
+
+**recording_uploader**
+- Ensure that you have valid AWS credentials added in your environment


What do you think about loading the AWS credentials from config.py?

…loader stack

abrichr · 2024-11-10T00:37:36Z

openadapt/share.py

+        except Exception as exc:
+            logger.exception(exc)
+
+    Thread(target=_inner).start()


Should we add some error handling / retrying / reporting around this?

Added retries

abrichr · 2024-11-10T00:38:18Z

openadapt/utils.py

+        file_path (str): The path to the file to upload.
+        body (dict): The body of the request.
+    """
+    filename = os.path.basename(file_path)


Suggested change

filename = os.path.basename(file_path)

file_name = os.path.basename(file_path)

What do you think about logging this path?

abrichr

Thank you for putting this together @KIRA009 !

What do you think about something like this:

class Recording(db.Base):
    __tablename__ = "recording"
...
    s3_url = sa.Column(sa.String, nullable=True)

And perhaps display this value in the Dashboard.

abrichr · 2024-11-10T00:50:59Z

What do you think about renaming the directory to simply uploader instead of recording_uploader?

abrichr · 2024-11-10T02:57:48Z

@KIRA009 what do you think about https://www.cloudflare.com/en-ca/developer-platform/products/r2/:

Forever Free
10 GB / month

abrichr · 2024-11-11T21:49:40Z

admin/recording_uploader/deploy.py

+    # check if aws credentials are set
+    if os.getenv("AWS_ACCESS_KEY_ID") is None:
+        raise ValueError("AWS_ACCESS_KEY_ID is not set")
+    if os.getenv("AWS_SECRET_ACCESS_KEY") is None:


Why not read this from config?

This script is not supposed to be part of the OpenAdapt app, its an admin script that needs to be run by the owner (you) on a machine that has the relevant aws creds in its environment. From the PR description

From the project root, run python -m scripts.recording_uploader.deploy (ensure that you have the necessary aws creds configured). Once the command completes, note the api url in the output, and paste that onto the config.py's RECORDING_UPLOAD_URL variable

Because config.py is more closely related to settings of the app, I didn't think it'd be useful to add these there.

I think we want to read from config.py.

How would you suggest a user override these settings? The default values will be empty in config.py and config.defaults.json, so the only ways are to manually edit the config.json file in the data folder, or we expose it in the dashboard settings page? A regular user won't be needing to edit this, and might get confused if its in the dashboard settings

abrichr · 2024-11-11T21:50:01Z

admin/recording_uploader/deploy.py

+            region_name=region_name,
+            endpoint_url=f"https://s3.{region_name}.amazonaws.com",
+        )
+        bucket = "openadapt"


What do you think about defining this is config.py?

The same reason as above, plus this is hardcoded because this script will be run once in a while (only when the lambda function is changed). And ideally we won't be changing bucket names between runs.

abrichr requested changes Jun 22, 2024

View reviewed changes

abrichr reviewed Jun 22, 2024

View reviewed changes

abrichr reviewed Jul 5, 2024

View reviewed changes

abrichr reviewed Jul 15, 2024

View reviewed changes

KIRA009 added 7 commits November 9, 2024 18:40

feat: Add script to deploy uploader code to a lambda

cc75fdf

feat: Add upload recording button in dashboard

68a973f

chore: Fix flake8 lint errors

dbf76be

feat: Upload recording to user id specific folders

7fee87d

chore: Replace package with module and remove unwanted code

cd33510

chore: Move recording uploader to separate admin folder

23238a0

docs: Update README.md with details on how to deploy the recording up…

3554bb4

…loader stack

KIRA009 force-pushed the feature/upload-recording branch from bfffb05 to 3554bb4 Compare November 9, 2024 15:46

lint: Fix linting

489d28b

abrichr reviewed Nov 10, 2024

View reviewed changes

abrichr requested changes Nov 10, 2024

View reviewed changes

KIRA009 added 3 commits November 11, 2024 01:00

feat: Add upload button to recordings page

da83b18

fix: Linting

f673cb8

feat: Add ability to delete uploaded recordings, switched off by default

8e4bf17

abrichr reviewed Nov 11, 2024

View reviewed changes

feat: Add retries to s3 uploads

99ea0f3

lint: Fix flake8 linting

4a5b80e

		@@ -0,0 +1,244 @@

		# Created by https://www.gitignore.io/api/osx,linux,python,windows,pycharm,visualstudiocode


		## Deploy the application

		There is a `deploy` script that creates the s3 bucket and deploys the application using the SAM CLI (included as part of the dev dependencies of this project). The bucket name is hardcoded in the script. The SAM CLI is set up to run in `guided` mode, which will prompt the user every time befor deploying, in case the user wants to change the default values.

		@@ -0,0 +1 @@
		"""Init file for the recording_uploader package."""

	filename = os.path.basename(file_path)
	file_name = os.path.basename(file_path)

		@@ -0,0 +1 @@
		boto3==1.34.84

Upload recording feature #787

Are you sure you want to change the base?

Upload recording feature #787

Conversation

KIRA009 commented Jun 21, 2024 • edited Loading

abrichr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KIRA009 Jun 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abrichr Jun 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abrichr commented Jun 22, 2024

KIRA009 commented Jul 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abrichr left a comment • edited Loading

Choose a reason for hiding this comment

abrichr commented Nov 10, 2024

abrichr commented Nov 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KIRA009 commented Jun 21, 2024 •

edited

Loading

KIRA009 Jun 24, 2024 •

edited

Loading

abrichr Jun 22, 2024 •

edited

Loading

abrichr left a comment •

edited

Loading