Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine file storage strategy in API #2

Open
rekibnikufesin opened this issue Jun 25, 2019 · 5 comments
Open

Determine file storage strategy in API #2

rekibnikufesin opened this issue Jun 25, 2019 · 5 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed
Milestone

Comments

@rekibnikufesin
Copy link
Contributor

rekibnikufesin commented Jun 25, 2019

We need to determine how uploaded files are stored by the API.
Some initial thoughts, input welcome:

  • API receives the file, submits the hash to protocol
  • pulls any metadata/mime-types from the file
    • creates a thumbnail image --> sends to public S3 bucket for discoverability
    • in dynamodb: store mime-type, S3 URL, hash, thumbnail path, file size (bytes), metadata provided by supply user
  • should we store this in S3? Or should it be a local file system to the API...?
    • S3 Pros:
      • S3 is cheap and can easily be used as a CDN source
      • delivery payload can be a signed URL with expiration
    • S3 Cons:
      • S3 is slow as a file system
    • File system pros:
      • fast 🐇
    • File system cons:
      • syncing with CDN can be a pain
      • need multiple file systems or shared file system for supporting highly available API instances
      • this rules out ECS Fargate. We'd either build on EC2 or EC2 based ECS
@rekibnikufesin rekibnikufesin added enhancement New feature or request help wanted Extra attention is needed labels Jun 25, 2019
@rekibnikufesin rekibnikufesin added this to the FFA:Zero milestone Jun 25, 2019
@ReidWilliams
Copy link

Hey @rekibnikufesin do you have a sense of when S3 being slower might have an impact on a user? Which step would take longer with S3 compared to a local filesystem?

Does it impact the upload speed itself (I'd guess that's dominated by the user's internet speed)
Is it the lag between API receipt of the full file and a hash being available to send to protocol?
Something else?

Something else that's probably relevant to this decision: we should think carefully about how we'd use a CDN, S3 or anything that puts a file behind a permanent URL. If we do that, it means that one user can buy access, see the file's permanent url, then share that url on the internet and give everyone else free access to the file.

I think the final delivery url, the place that leads to the raw file download via browser would need to be one time use or expire after a short amount of time (minutes or hours).

@ReidWilliams
Copy link

I read your comment more carefully, and expiring S3 urls do seem like a good thing to have.

@rekibnikufesin
Copy link
Contributor Author

The lag would be between API receipt of the full file and a hash being available to send to protocol. Given that we're looking at ~15 seconds on Mainnet for a transaction anyway, I'm starting to think this is less of an issue than I originally thought.

@rekibnikufesin
Copy link
Contributor Author

RE: Expiring URLs - we can have the time as little as a few minutes. I'm thinking of something like

@ReidWilliams
Copy link

Re: lag, good point, and there's the voting itself that creates a delay before a listing is available for use, so yeah filesystem lag doesn't seem to be a big issue to me either.

@rekibnikufesin rekibnikufesin self-assigned this Jul 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants