Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PLIP: TUS reliable large uploads with simpler file upload UI in volto and better feedback on slow connections #5423

Open
14 tasks
djay opened this issue Nov 22, 2023 · 12 comments · May be fixed by #6351
Open
14 tasks

Comments

@djay
Copy link
Member

djay commented Nov 22, 2023

PLIP (Plone Improvement Proposal)

Abstract

This PLIP has three aims (and could be separated if desired)

  • Implement TUS in volto for "contents > upload" with a UI similar to quanta design (inc progress bars)
    • pick a js tus client to use in core
    • brings parity with classic
    • a UI that gives feedback on progress when files are large (even without TUS)
    • bring UI closer to Quanta roadmap
  • Implement TUS uploads when editing files, ie TUS support for any dexterrity content type with file fields and for images/videos fields in blocks
    • otherwise its hard to explain that large uploads can only be done via content > uploads
  • Simplify new file object creation UI by replacing "Add > File" with "Add > Upload files" (as per quanta design)
    • because File objects are not an obvious way to add files
  • Add a mode for TUS via restapi that doesn't require a shared temp folder so we can have single implementation and UI that requires less maintenance. (@davisagli suggestion)

Context

  • Classic had only support for TUS in the contents file upload. This seems to be currently broken?

Motivation

  • As an installer I can enable optional TUS support to enable large uploads and choose chunk size and temp folder to suit deployment but UX for the editors mostly the same
  • As an editor it should be obvious and quick to bulk upload files
  • As an editor, if the connection is slow or intermittent I should have clear indications upload is working, and options to resume if upload was not successful and this should apply to all places in the UI I can upload a file including editing a file object or in blocks.
  • As Ops, I can enable POST size limits and still allow larger uploads.

Assumptions

  • TUS requires more complex installation so will be turned off unless installer knows what they are doing
  • Folder contents > upload is a currently “hidden feature”
  • Add > File is too many clicks for most people and it’s not obvious the title is optional

Proposal & Implementation

TUS for contents upload

  • enabled by env variables
    • frontend TUS support will be enabled with TUS_CHUNK_SIZE and TUS_TMP_FILE_DIR environment variable
      • otherwise file will be sent as a whole and TUS "create with upload" api call since there is no guarantee the tmp dir will be the same between requests.
  • propose to use tus-js-client and semantic UI.
  • code will be borrowed for two existing attempts. Both branches replace the only contents upload page

TUS for file fields/blocks

  • no current branch but rough plan
  • would likely do upload when save button clicked
    • preserve transactional integrity and less likely to leave partial files around
    • OR - upload as soon as file is picked but don’t do the final step to move into a blob until save
      • need to check if this is possible. - pass in the uid of the upload and have the backend do the final step.
      • cancel will need to remove temp files
      • pro:
        • big upload can finish while user fills out rest of form.
      • con:
        • more likely to be abandoned temp files so more clean up?
        • validation won't happen until save so large files files or "bad" files could be uploaded to tmp.

Replace Add > File

  • Replace "Add > File” with “Add > Upload files” and the get bulk upload page
  • Any special file based content type e.g. Video, would still appear in Add menu
    • but could have control panel option to hide them?
  • Upload files would create content types based on file extensions and contenttype, so won’t always be “File” content type.
  • image

UI for "upload files"

  • propose to make similar to quanta (but with semantic)

    • image
    • page not a modal
    • page has no theme footer header etc
    • Put uploaded elements above dropzone. makes more logical sense that adding adds to the end of the list
    • Put title of: Upload to "test" to give more context so users more sure of where they are uploading
    • Include image preview
  • Additional things not specified in quanta design

    • Visually indicate on the dropzone when hovered over
    • "Back" instead of "Save" and "Cancel"
      • quanta design has save and cancel buttons which imply uploads happens only when you click save
      • instead we propose upload as soon as you select file and replace save and cancel with "back" button
      • Once single file uploaded there is a "delete" button to delete that single file (next to last modified date)
    • progress bar and overall status
      • during upload progress bar and cancel button replaces last modified date in quanta design. once complete "last modified date" and delete button.
      • current pretagov UI
        • image
      • designed to show upload is not hung even if its slow
        • progress bar is too small to show movement for slow connection, so bytes below helps as well as estimate that keeps changing.
    • retry button
      • if upload fails, retry button appears so upload is resumed/restarted without having to reselect file
      • replaces last modified date
    • return to "uploaded" page.
      • Nice to have: user can click on uploaded file, edit, then use back button to get to upload page again and still be shown the same list of uploaded files.
      • allows bulk editing/renaming after bulk upload
    • Any validation error for a file (from DX) or other error will display next to the row with an indication this file hasn't been uploaded.
    • Preview - done client side or after file created?

TUS without temp folder

  • use ZODB blobs to store partial uploads and then join them at the end
    • This will have the tradeoff of using double the space until packing occurs but setting up a temp file will avoid this
    • we could also set the default chunk size to be very large so as to avoid additional storage being needed in most cases?

Deliverables

  • create PR "upload view always using TUS and showing large file progress"
    • Take pretagov branch and replace uploady with tus-js-client
    • Change restapi to only accept "creation with upload" api call when no tus temp dir is defined and change frontend to check which way it should upload. means TUS client is always used even for simple plone deployments
    • remove old upload code not using TUS
    • Add env var to set max CHUNK size
    • tests?
    • Document deployment including TUS temp folder to enable resuming and chunked uploads for slow connections.

Additional work

  • rearrange upload page layout to be closer to quanta layout. move out of overlay.
  • create PR for replacing "add file" with "upload files"
  • submit PR for "large/TUS file support in all volto"
    • develop replacement volto file component that works with TUS and won't finish transaction until save
    • replace all file uploads in the volto UI
    • replace inline uploads like video and image blocks

Risks

OK/Cancel for "upload files"? ie finish upload/transactions when click save, not after picking.

  • preview would have to be done client side
  • Pro:
    • lets you change your mind about upload before a db transaction happens.
    • could allow you to edit title or other metadata before you upload?
  • con:
    • UI would work differently when TUS is turned off. files either have to be uploaded on save button or objects created directly after upload without TUS since its not a two step process.
    • user might not realise they need to click save otherwise upload is lost
    • last step can take a long time depending on how the serverside is arranged (if tmp and blobs on seperate disks or blobs stored in the DB). Would there need to be an additional progress for this?

Size of bundle with extra lib

  • it's pretty small and its easier to maintain if all the code uses the same method client code to do uploads.

File Widget

  • upload on save vs upload on pick. which works best?

Participants

Context

Related optimisation on the backend to speed up the last part of a large upload

@davisagli
Copy link
Sponsor Member

@djay The linked images give 404s for me. Is it possible to share them as part of this issue?

At first glance, I like this direction. Thanks for all your notes.

I'd like to see the TUS backend in plone.restapi gain support for storing the temp files in the ZODB instead of raw files. That way it can also work when the ZODB is not using a shared filesystem blobstorage.

@djay
Copy link
Member Author

djay commented Jan 8, 2024

@davisagli reuploaded the images.

I'd like to see the TUS backend in plone.restapi gain support for storing the temp files in the ZODB instead of raw files. That way it can also work when the ZODB is not using a shared filesystem blobstorage.

It's a good idea but I think it can be handled by a seperate PR or PLIP? It would double the storage needed until the DB is packed which is why it wasn't done this way originally but I think thats fine for a default unoptimised system.

@davisagli
Copy link
Sponsor Member

@djay I would say optimized for horizontal scalability rather than optimized for performance and minimum disk use, but fair point that there's a tradeoff there. And sure, it can be done separately from this PLIP. But it's related in that if the frontend can count on the backend to always support the TUS protocol, then it might simplify the frontend by not needing multiple file upload widgets.

@djay
Copy link
Member Author

djay commented Jan 8, 2024

@davisagli good point. I'll put it in there

@plone plone locked and limited conversation to collaborators Feb 14, 2024
@djay
Copy link
Member Author

djay commented May 1, 2024

@davisagli I'd have to check if the protocol supports it but if TUS allows a single chunk in a single api request then we could possibly not have to handle combining them on the backend for the default setup. So then we use the same UI and same client code and still sending via TUS endpoint but the space and transactions used on the backend are the same as without TUS.
EDIT: yes it seems to support this - https://tus.io/protocols/resumable-upload#creation-with-upload

@runyaga
Copy link
Member

runyaga commented May 16, 2024

Our usage:

  • We are using relstorage without shared blobs on the filesystem
  • Our particular concern is not larger(ish) files but auto-retry on upload.
  • We are on a particular poor network link in our case; we do not want the user to click retry unless there is some threshold of failed resumed attempts.
  • We are not currently using Volto on this project. If that is requirement we could likely switch.

Concerns:

  • Do we want to the TUS implementation to be in Plone proper? I see the benefit, but would/could a first step be integrating the TUS reference implementation? This is a different set of trade offs but the upside is we dont have to maintain the server portion (at least initially).
  • This PLIP talks about uploading files but I believe users are expecting to be able to upload directories and/or even nested folders/files.
  • Any concurrency of multiple uploads could potentially reek havoc with transactions
  • Big wins are parallelize chunks of files. If we have to send one file at a time its going to be nightmarishly slow.

Other:

  • Reaping unused resources isnt a big deal. I mean this is going to be a pretty sophisticated usage.
  • Prior conversations with @davisagli was that the Folder Contents changes to integrate something like this is fairly invasive. I would suggest this be worked out first. Drag n Drop into Folder Contents is the primary mode people expect to use the system. People don't expect to "Add File" or even "Click to perform Bulk Upload".

@djay
Copy link
Member Author

djay commented May 17, 2024

@runyaga TUS was in classic in plone 4. It seems to have got lost along the way - plone/plone.app.content#64 but it would not be hard to bring back.

That implimentation required a temp folder that all instances had access to, ie a shared file system.

Do we want to the TUS implementation to be in Plone proper? I see the benefit, but would/could a first step be integrating the TUS reference implementation? This is a different set of trade offs but the upside is we dont have to maintain the server portion (at least initially).

There is no real trade off if TUS can be set single api call in cases where a shared file system is not available. Then it will not be any different from a normal upload other than being able to be retried and having progress in the UI.
I agree with @davisagli that its better to have one implemenation that gets maintained rather than two. As you can see from history, niche code that is only enabled by a few gets forgotten and left out.

This PLIP talks about uploading files but I believe users are expecting to be able to upload directories and/or even nested folders/files.

yes it should possibly deal with that. I think from memory you need to use additional browser apis to get information about folders but I might be wrong. Certainly our implimentation didn't deal with recreating folder structures.
Another issue that I think a folder has no mime type so I'm not sure how if there is a way in plone to say what the default content type is for a folder in the case of folderless plone like volto is.

Any concurrency of multiple uploads could potentially reek havoc with transactions

The old and new TUS implimentations upload chunks to a temp folder in a shared filesystem and don't do any transactions until it is finally combined. @davisagli suggested that if no shared filesystem was available then maybe it the chunks could be stored in the ZODB which would then have problems with lots of transactions. I'd argue its safer in such a case to go back to not using chunks and use a single api call upload. You have to reupload the whole thing again on disconnection but I think the transactions problem is worse.
TUS can be configured to do parallel uploads but we didn't use it that way adn the old classic way didn't do this either. It wasn't slow.

Big wins are parallelize chunks of files. If we have to send one file at a time its going to be nightmarishly slow.

None of the implementations so far did this. If it was enabled then you'd have to ensure that the deployment config could set a maximum so you could ensure you don't overload your own site. But in theory you could enable this.

Prior conversations with @davisagli was that the Folder Contents changes to integrate something like this is fairly invasive. I would suggest this be worked out first. Drag n Drop into Folder Contents is the primary mode people expect to use the system.

The PLIP is to change the upload view and file add. It could be extended to so a drop on file contents activted the upload view.

People don't expect to "Add File" or even "Click to perform Bulk Upload".

I agree the current upload button is too well hidden. The current "Add file" is much more visible and I'd argue where begginers a led when they want to upload files and hense the reason to remove it and replace it with "Upload files" so that misguided user journey is fixed. But no harm in having DND in folder contents also.

We have some code for volto but it uses another design system. So if we get time we can try to put in a PR.
We have to first

  • redo it using the simpler tus client js
  • redo it using default volto theming
  • get it to work in the single api upload mode in cases where no TUS shared folder exists.
    What it does do
  • display nice progress for files uploading so you can tell its still working even if the connection is slow and the files are big

if anyone is motivated to work on that before we get to it we will happily share the branch we have.

@runyaga
Copy link
Member

runyaga commented May 23, 2024

Do we want to the TUS implementation to be in Plone proper? I see the benefit, but would/could a first step be integrating the TUS reference implementation? This is a different set of trade offs but the upside is we dont have to maintain the server portion (at least initially).

There is no real trade off if TUS can be set single api call in cases where a shared file system is not available. Then it will not be any different from a normal upload other than being able to be retried and having progress in the UI.
I agree with @davisagli that it's better to have one implementation that gets maintained rather than two. As you can see from history, niche code that is only enabled by a few gets forgotten and left out.

I do not see a direct answer to the proposition "Could a first step be integrating with the TUS reference impl?" I was thinking about requiring the reference TUS go container to be running alongside Plone. Then the integration really becomes "Use the TUS reference implementation" and we bundle a set of webhooks on the Plone side that know how to work with tusd which, to me, seems like an easier problem to solve.

Some huge positives:

  • tusd is not being maintained by us and is actively maintained
  • tusd documentation is not being maintained by us and is actively worked on
  • the integration package focuses on the UI/UX layer and the webhook plone<->tusd lifecycle documentation and implementation

My understanding:

  • tus-js-client is way forward as its least opinionated
  • You guys have a working "in-Plone" TUS implementation that is in Production today at Preta that supports: progress, cancel and resume.
  • "in-Plone" uploads participate in ZODB transactions and block at least 1 thread per concurrent file
  • There is not clarity on the UI/UX implementation; basically any place where a File widget can exist there is a desire to make it work with TUS toggled on/off (seems like a large ask?)

@djay mind sharing me to your TUSD branch?

@djay
Copy link
Member Author

djay commented May 23, 2024

@runyaga plone restapi supports TUS and has done for some time.
https://plonerestapi.readthedocs.io/en/querysources/tusupload.html

The problem is not on the server side. The problem is that both classic and volto don't yet have client side clients.

"in-Plone" uploads participate in ZODB transactions and block at least 1 thread per concurrent file

This is not true. The rest-api current implimentation doesn't do any transactions until the chunks are all uploaded. Instead each request is put into a shared temp folder and only the last TUS upload request results in one big transaction.

There is not clarity on the UI/UX implementation; basically any place where a File widget can exist there is a desire to make it work with TUS toggled on/off (seems like a large ask?)

The first step is to have it working with the contents upload. Working out how to have it working with other file widgets is a harder subsequent task.

The desire is to not have TUS turned on or off but always use it but perhaps change how the transactions work depending on how its deployed since having a shared temp folder is not always possible.

We will share our branch but it's mixed in with a bunch of other stuff right now so we just have to find time to seperate out the volto UI code that just deals with TUS and progress.

@djay
Copy link
Member Author

djay commented Jun 1, 2024

@davisagli @sneridagh if this was done with uploady library instead of js-tus-client do you think it would be accepted? We have a implementation using that which just needs some polishing and looking at what uploady provides, using js-tus-client might involve adding what uploady already has. Size wise it doesn't seem so much different. But all depends what you want in core.

@davisagli
Copy link
Sponsor Member

@djay It looks worth considering; I can see the benefit of using an existing integration into React rather than a pure-JS library.

@djay
Copy link
Member Author

djay commented Sep 20, 2024

@victorchrollo14 is having a go at implementing this

@djay djay changed the title PLIP: TUS reliable large uploads with simpler file upload UI in volto PLIP: TUS reliable large uploads with simpler file upload UI in volto and better feedback on slow connections Oct 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Status: Backlog
Development

Successfully merging a pull request may close this issue.

4 participants