Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: filemanager presigned route #475

Merged
merged 8 commits into from
Aug 15, 2024
33 changes: 33 additions & 0 deletions lib/workload/stateless/stacks/filemanager/docs/API_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,15 @@ make start

This serves Swagger OpenAPI docs at `http://localhost:8000/swagger-ui` when using default settings.

## API configuration

The API has some environment variables that can be used to configure behaviour (for the presigned url route):

| Option | Description | Type | Default |
|----------------------------------|----------------------------------------------------------------------------|---------------------|--------------|
| `FILEMANAGER_API_PRESIGN_LIMIT` | The maximum file size in bytes which presigned URLs will be generated for. | Integer | `"20971520"` |
| `FILEMANAGER_API_PRESIGN_EXPIRY` | The expiry time for presigned urls. | Duration in seconds | `"300"` |

The deployed instance of the filemanager API can be reached using the desired stage at `https://file.<stage>.umccr.org`
using the orcabus API token. To retrieve the token, run:

Expand Down Expand Up @@ -150,6 +159,30 @@ For example, count the total records:
curl -H "Authorization: Bearer $TOKEN" "https://file.dev.umccr.org/api/v1/s3/count" | jq
```

## Presigned URLs

The filemanager API can also generate presigned URLs. Presigned URLs can only be generated for objects that currently
exist in S3.

For example, generate a presigned URL for a single record:

```sh
curl -H "Authorization: Bearer $TOKEN" "https://file.dev.umccr.org/api/v1/s3/presign/0190465f-68fa-76e4-9c36-12bdf1a1571d" | jq
```

Or, for multiple records, which supports the same query parameters as list operations (except `currentState` as that is implied):

```sh
curl -H "Authorization: Bearer $TOKEN" "https://file.dev.umccr.org/api/v1/s3/presign?page=10&rowsPerPage=50" | jq
```

Specify `responseContentDisposition` for either of the above routes to change the `response-content-disposition` for the
presigned `GetObject` request. This can either be `inline` or `attachment`. The default is `inline`:

```sh
curl -H "Authorization: Bearer $TOKEN" "https://file.dev.umccr.org/api/v1/s3/presign?responseContentDisposition=attachment" | jq
```

## Some missing features

There are some missing features in the query API which are planned, namely:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ use filemanager::env::Config;
use lambda_http::Error;
use std::sync::Arc;

use filemanager::clients::aws::s3;
use filemanager::database::Client;
use filemanager::handlers::aws::{create_database_pool, update_credentials};
use filemanager::handlers::init_tracing;
Expand All @@ -21,7 +22,11 @@ async fn main() -> Result<(), Error> {
debug!(?config, "running with config");

let client = Client::new(create_database_pool(&config).await?);
let state = AppState::new(client, Arc::new(config));
let state = AppState::new(
client,
Arc::new(config),
Arc::new(s3::Client::with_defaults().await),
);

let app =
router(state.clone()).route_layer(from_fn_with_state(state, update_credentials_middleware));
Expand All @@ -35,7 +40,7 @@ async fn update_credentials_middleware(
request: Request,
next: Next,
) -> Response {
let result = update_credentials(state.client().connection_ref(), state.config()).await;
let result = update_credentials(state.database_client().connection_ref(), state.config()).await;

if let Err(err) = result {
return ErrorStatusCode::InternalServerError(ErrorResponse::new(format!(
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
use axum::serve;
use clap::{Parser, Subcommand};
use filemanager::clients::aws::s3;
use filemanager::database::aws::migration::Migration;
use filemanager::database::{Client, Migrate};
use filemanager::env::Config;
Expand Down Expand Up @@ -73,7 +74,11 @@ async fn main() -> Result<()> {
debug!(?config, "running with config");

let client = Client::from_config(&config).await?;
let state = AppState::new(client.clone(), config.clone());
let state = AppState::new(
client.clone(),
config.clone(),
Arc::new(s3::Client::with_defaults().await),
);

if let Some(load) = args.load_sql_file {
info!(
Expand All @@ -85,7 +90,7 @@ async fn main() -> Result<()> {
File::open(load).await?.read_to_string(&mut script).await?;

state
.client()
.database_client()
.connection_ref()
.execute_unprepared(&script)
.await?;
Expand All @@ -105,7 +110,7 @@ async fn main() -> Result<()> {
.with_bucket_divisor(bucket_divisor)
.with_key_divisor(key_divisor)
.with_shuffle(shuffle)
.build(state.client())
.build(state.database_client())
.await;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ migrate = ["sqlx/migrate"]
# Serde
serde = { version = "1", features = ["derive"] }
serde_json = "1"
serde_with = "3"
serde_with = { version = "3", features = ["chrono"] }

# Async
async-trait = "0.1"
Expand Down
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
//! A mockable wrapper around the S3 client.
//!

use std::result;

use crate::clients::aws::config::Config;
use aws_sdk_s3 as s3;
use aws_sdk_s3::error::SdkError;
use aws_sdk_s3::operation::get_object::{GetObjectError, GetObjectOutput};
use aws_sdk_s3::operation::head_object::{HeadObjectError, HeadObjectOutput};
use aws_sdk_s3::operation::list_buckets::{ListBucketsError, ListBucketsOutput};
use aws_sdk_s3::presigning::{PresignedRequest, PresigningConfig};
use aws_sdk_s3::types::ChecksumMode::Enabled;
use chrono::Duration;
use mockall::automock;

use crate::clients::aws::config::Config;
use std::result;

pub type Result<T, E> = result::Result<T, SdkError<E>>;

Expand Down Expand Up @@ -67,4 +67,29 @@ impl Client {
.send()
.await
}

/// Execute the `GetObject` operation and generate a presigned url for the object.
pub async fn presign_url(
&self,
key: &str,
bucket: &str,
response_content_disposition: &str,
expires_in: Duration,
) -> Result<PresignedRequest, GetObjectError> {
self.inner
.get_object()
.response_content_disposition(response_content_disposition)
.checksum_mode(Enabled)
.key(key)
.bucket(bucket)
.presigned(
PresigningConfig::expires_in(
expires_in
.to_std()
.map_err(SdkError::construction_failure)?,
)
.map_err(SdkError::construction_failure)?,
)
.await
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -204,13 +204,10 @@ mod tests {
#[tokio::test]
async fn generate_iam_token_env() {
let env_config = EnvConfig {
database_url: None,
pgpassword: None,
pghost: Some("127.0.0.1".to_string()),
pgport: Some(5432),
pguser: Some("filemanager".to_string()),
sqs_url: None,
paired_ingest_mode: false,
..Default::default()
};

test_generate_iam_token(|config| async {
Expand Down
24 changes: 23 additions & 1 deletion lib/workload/stateless/stacks/filemanager/filemanager/src/env.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,14 @@

use crate::error::Error::ConfigError;
use crate::error::Result;
use chrono::Duration;
use envy::from_env;
use serde::Deserialize;
use serde_with::serde_as;
use serde_with::DurationSeconds;

/// Configuration environment variables for filemanager.
#[serde_as]
#[derive(Debug, Clone, Deserialize, Default, Eq, PartialEq)]
pub struct Config {
pub(crate) database_url: Option<String>,
Expand All @@ -18,6 +22,11 @@ pub struct Config {
pub(crate) sqs_url: Option<String>,
#[serde(default, rename = "filemanager_paired_ingest_mode")]
pub(crate) paired_ingest_mode: bool,
#[serde(rename = "filemanager_api_presign_limit")]
pub(crate) api_presign_limit: Option<u64>,
#[serde_as(as = "Option<DurationSeconds<i64>>")]
#[serde(rename = "filemanager_api_presign_expiry")]
pub(crate) api_presign_expiry: Option<Duration>,
}

impl Config {
Expand Down Expand Up @@ -72,6 +81,16 @@ impl Config {
self.paired_ingest_mode
}

/// Get the presigned size limit.
pub fn api_presign_limit(&self) -> Option<u64> {
self.api_presign_limit
}

/// Get the presigned expiry time.
pub fn api_presign_expiry(&self) -> Option<Duration> {
self.api_presign_expiry
}

/// Get the value from an optional, or else try and get a different value, unwrapping into a Result.
pub fn value_or_else<T>(value: Option<T>, or_else: Option<T>) -> Result<T> {
value
Expand Down Expand Up @@ -101,7 +120,8 @@ mod tests {
("PGUSER", "user"),
("FILEMANAGER_SQS_URL", "url"),
("FILEMANAGER_PAIRED_INGEST_MODE", "true"),
("FILEMANAGER_API_SERVER_ADDR", "127.0.0.1:8080"),
("FILEMANAGER_API_PRESIGN_LIMIT", "123"),
("FILEMANAGER_API_PRESIGN_EXPIRY", "60"),
]
.into_iter()
.map(|(key, value)| (key.to_string(), value.to_string()));
Expand All @@ -118,6 +138,8 @@ mod tests {
pguser: Some("user".to_string()),
sqs_url: Some("url".to_string()),
paired_ingest_mode: true,
api_presign_limit: Some(123),
api_presign_expiry: Some(Duration::seconds(60))
}
)
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,8 @@ pub enum Error {
ParseError(String),
#[error("missing host header")]
MissingHostHeader,
#[error("creating presigned url: `{0}`")]
PresignedUrlError(String),
}

impl From<sqlx::Error> for Error {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,22 +1,24 @@
//! Query builder involving get operations on the database.
//!

use sea_orm::{EntityTrait, Select};
use sea_orm::{ConnectionTrait, EntityTrait, Select};
use uuid::Uuid;

use crate::database::entities::s3_object;
use crate::database::Client;
use crate::error::Result;

/// A query builder for get operations.
pub struct GetQueryBuilder<'a> {
client: &'a Client,
pub struct GetQueryBuilder<'a, C> {
connection: &'a C,
}

impl<'a> GetQueryBuilder<'a> {
impl<'a, C> GetQueryBuilder<'a, C>
where
C: ConnectionTrait,
{
/// Create a new query builder.
pub fn new(client: &'a Client) -> Self {
Self { client }
pub fn new(connection: &'a C) -> Self {
Self { connection }
}

/// Build a select query for finding an s3 object by id.
Expand All @@ -26,9 +28,7 @@ impl<'a> GetQueryBuilder<'a> {

/// Get a specific s3 object by id.
pub async fn get_s3_by_id(&self, id: Uuid) -> Result<Option<s3_object::Model>> {
Ok(Self::build_s3_by_id(id)
.one(self.client.connection_ref())
.await?)
Ok(Self::build_s3_by_id(id).one(self.connection).await?)
}
}

Expand All @@ -47,7 +47,7 @@ mod tests {
let entries = EntriesBuilder::default().build(&client).await.s3_objects;

let first = entries.first().unwrap();
let builder = GetQueryBuilder::new(&client);
let builder = GetQueryBuilder::new(client.connection_ref());
let result = builder.get_s3_by_id(first.s3_object_id).await.unwrap();

assert_eq!(result.as_ref(), Some(first));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ where
/// (<uuid>, <current_attributes>),
/// ...
/// ) AS values)
/// update <object|s3_object> set attributes = (
/// update <s3_object> set attributes = (
/// select attributes from update_with where object_id = id
/// ) where object_id in (select id from update_with)
/// returning <updated_objects>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ impl From<Error> for ErrorStatusCode {
Error::InvalidQuery(_) | Error::ParseError(_) | Error::MissingHostHeader => {
Self::BadRequest(err.to_string().into())
}
Error::QueryError(_) | Error::SerdeError(_) => {
Error::QueryError(_) | Error::SerdeError(_) | Error::PresignedUrlError(_) => {
Self::InternalServerError(err.to_string().into())
}
Error::ExpectedSomeValue(_) => Self::NotFound(err.to_string().into()),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,12 @@ use crate::database::entities::sea_orm_active_enums::{EventType, StorageClass};
use crate::routes::filter::wildcard::{Wildcard, WildcardEither};
use sea_orm::prelude::{DateTimeWithTimeZone, Json};
use serde::{Deserialize, Serialize};
use utoipa::{IntoParams, ToSchema};
use utoipa::IntoParams;

/// The available fields to filter `s3_object` queries by. Each query parameter represents
/// an `and` clause in the SQL statement. Nested query string style syntax is supported on
/// JSON attributes. Wildcards are supported on some of the fields.
#[derive(Serialize, Deserialize, Debug, Default, IntoParams, ToSchema)]
#[derive(Serialize, Deserialize, Debug, Default, IntoParams)]
#[serde(default, rename_all = "camelCase")]
#[into_params(parameter_in = Query)]
pub struct S3ObjectsFilter {
Expand Down
Loading