You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The size of the RethinkDB data dump is quite large (>200 MB). It is problematic because it lengthens the duration of a data dump or restore (order of minutes now). This is all despite the fact that there are less than 400 public Documents.
I did a bit of experimenting with a recent dump by removing various items and looking at the dump size (.tar.gz) summarized in Table 1. It appears that pruning the _ops field (which stashes every action performed) and the relatedPapers can drop the size almost 88%. Given this, some possible solutions to keep the size reasonable:
_ops: Periodically prune on a manual basis on some local dump then restore
relatedPapers: Store PMIDs (rather than full paper details) and let the browser retrieve these on-demand
We do this with app-ui and it is pretty reasonable
The size of the RethinkDB data dump is quite large (>200 MB). It is problematic because it lengthens the duration of a data dump or restore (order of minutes now). This is all despite the fact that there are less than 400 public Documents.
I did a bit of experimenting with a recent dump by removing various items and looking at the dump size (.tar.gz) summarized in Table 1. It appears that pruning the
_ops
field (which stashes every action performed) and therelatedPapers
can drop the size almost 88%. Given this, some possible solutions to keep the size reasonable:_ops
: Periodically prune on a manual basis on some local dump then restorerelatedPapers
: Store PMIDs (rather than full paper details) and let the browser retrieve these on-demandRelated: #937 (comment)
Table 1. Biofactoid RethinkDB dump file sizes
*Dump archive: factoid_dump_2024-06-19_14-28-33-767.tar.gz
**Actions applied to Document and Element table
The text was updated successfully, but these errors were encountered: