Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support s3 urls for input and output #32

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

grusell
Copy link
Contributor

@grusell grusell commented Apr 4, 2025

Description

This commit add support for using s3 urls on the format s3:/// in both input and output.

If ans s3 URL is used as input, a presigned URL is created and used as input to ffmpeg. The duration of the presigned URLs can be controlled with the 'remote-files.s3.presignDurationSeconds' config property.

If an s3 URL is used for 'outputFolder', output will first be stored locally and then uploaded to s3 once transcoding is finished.

Aws credentials are read with DefaultCredentialsProvider, meaning aws credentials can be provided in a number of ways, see https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/auth/credentials/DefaultCredentialsProvider.html;

Not that when using s3 urls for input, the presigned URLs will be shown in the logs. If this is not desirable, setting logging.config (or env variable LOGGING_CONFIG) to 'classpath:logback-json-mask-s3-presign.xml'
will use a log config that masks the presign query parameters.

By setting env variable REMOTEFILES_S3_ANONYMOUSACCESS to true, s3 urls will be accessed in anonymous mode, corresponding to using the '--no-sign-request' flag with the aws cli. Any s3 access key or secrets key configured will be ignored. Multipart upload will be disabled in this case since the s3 sdk does not support multipart upload when using anonymous access.

Note that support for chunked encoding with s3 input/output is not yet implemented

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • [X ] New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

  • An integration test using s3 has been added
  • S3 input/output has also been tested in 'real world' quite a bit

Checklist:

  • [X ] I confirm that I wrote and/or have the right to submit the contents of my PR, by agreeing to the Developer Certificate of Origin (see https://github.com/svt/open-source-project-template/blob/master/docs/CONTRIBUTING.adoc[docs/CONTRIBUTING]).
  • [X ] My code follows the style guidelines of this project
  • [X ] I have performed a self-review of my own code
  • [X ] I have commented my code in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • [X ] My changes generate no new warnings
  • [X ] I have added tests that prove my fix is effective or that my feature works
  • [X ] New and existing unit tests pass locally with my changes
  • [X ] Any dependent changes have been merged and published in downstream modules
  • [X ] PR has an informative and human-readable title
  • [X ] Changes are limited to a single goal (no scope creep)
  • [X ] Code can be automatically merged (no conflicts)

This commit add support for using s3 urls on the format
s3://<BUCKET>/<KEY> in both input and output.

If ans s3 URL is used as input, a presigned URL is created
and used as input to ffmpeg. The duration of the presigned URLs can be
controlled with the 'remote-files.s3.presignDurationSeconds' config
property.

If an s3 URL is used for 'outputFolder', output will first be stored
locally and then uploaded to s3 once transcoding is finished.

Aws credentials are read with DefaultCredentialsProvider, meaning
aws credentials can be provided in a number of ways, see
https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/auth/credentials/DefaultCredentialsProvider.html;

Not that when using s3 urls for input, the presigned URLs will be
shown in the logs. If this is not desirable, setting
logging.config (or env variable LOGGING_CONFIG) to
'classpath:logback-json-mask-s3-presign.xml'
will use a log config that masks the presign query parameters.

By setting env variable REMOTEFILES_S3_ANONYMOUSACCESS to true, s3 urls will be
accessed in anonymous mode, corresponding to using the '--no-sign-request' flag
with the aws cli. Any s3 access key or secrets key configured will be
ignored. Multipart upload will be disabled in this case since the s3
sdk does not support multipart upload when using anonymous access.

Signed-off-by: Gustav Grusell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant