Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract user photos from submissions and make available via S3 #1701

Open
spwoodcock opened this issue Jul 24, 2024 · 5 comments
Open

Extract user photos from submissions and make available via S3 #1701

spwoodcock opened this issue Jul 24, 2024 · 5 comments
Assignees
Labels
backend Related to backend code effort:low Likely a few hours priority:high Should be addressed as a priority

Comments

@spwoodcock
Copy link
Member

spwoodcock commented Jul 24, 2024

Is your feature request related to a problem? Please describe.

  • Currently we store the submissions zip on S3.
  • It's located under fmtm-data/{org_id}/{project_id}/submission.zip.
  • This is a 'cached' zip that can be updated if there are new submissions in ODK.
  • The images for a submission will also be stored here.
  • We want to use these images during the validation of conflation steps, but they are buried in the zip.

Describe the solution you'd like

  • When we re-generate submission.zip, we should extract out the images.
  • They should also be stored in S3 under: fmtm-data/{org_id}/{project_id}/images/{entity_id}.jpeg
  • If there are multiple images for a submission we also need to handle this, perhaps with numbering:
    • fmtm-data/{org_id}/{project_id}/images/1-{entity_id}.jpeg
    • fmtm-data/{org_id}/{project_id}/images/2-{entity_id}.jpeg
    • fmtm-data/{org_id}/{project_id}/images/3-{entity_id}.jpeg
  • This way we can load the image easily, directly in the frontend.
  • For example in a popup or modal to display the image during validation or conflation.

Additional considerations

  • When we get the submissions for a project we should use attachments=false for a fast response.
  • If we identify any submissions where photos were taken, but they are not yet present in the S3 bucket, we should call a function to handle this.
  • The function should run in a background task, get the submission photo via https://docs.getodk.org/central-api-submission-management/#downloading-an-attachment
  • Then the downloaded photo should be uploaded to S3 and database record made.
@Sujanadh
Copy link
Collaborator

Sujanadh commented Aug 1, 2024

I don't think we save submissions in S3 anymore. We used to store them, but after introducing entities, we are not saving them.

@spwoodcock
Copy link
Member Author

Oh of course 🤦‍♂️

We can think of something!

@spwoodcock
Copy link
Member Author

When we get the submission zip during the endpoint call, could we run a BackgroundTask at the end to extract the images and place them on S3, if they don't exist?

@spwoodcock
Copy link
Member Author

There is object stat for Minio client, which checks if an object exists via HEAD request.

https://min.io/docs/minio/linux/developers/python/API.html#stat_object

We could add that to s3.py as a function to check object exists. Raise an exception if it's doesn't exist. Catch the exception and do a put with the existing upload obj from memory

So for each available image we would check if it exists, and if not upload

@spwoodcock
Copy link
Member Author

spwoodcock commented Aug 1, 2024

This solution isn't scalable though. What we should probably do is add a new db table called submission_images to store a record of the upload and link to S3.

Ideally we should get the submissions first with attachments=false for a fast response.

Then we can quickly check the submission JSON for any new submissions that include a reference to an attachment image we don't have in our DB. If a new image is present, in a background task we get the submissions again with attachments=true to process the images, upload to S3, and insert a db record.

Edit: even more efficient would be if we could download only the new attachments / images for a particular submission from ODK Central API! I think we could use this endpoint https://docs.getodk.org/central-api-submission-management/#downloading-an-attachment

Do you think this is possible?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Related to backend code effort:low Likely a few hours priority:high Should be addressed as a priority
Projects
Development

No branches or pull requests

4 participants