Size of the RethinkDB database is expanding at a rapid rate #1276

jvwong · 2024-06-19T18:05:23Z

The size of the RethinkDB data dump is quite large (>200 MB). It is problematic because it lengthens the duration of a data dump or restore (order of minutes now). This is all despite the fact that there are less than 400 public Documents.

I did a bit of experimenting with a recent dump by removing various items and looking at the dump size (.tar.gz) summarized in Table 1. It appears that pruning the _ops field (which stashes every action performed) and the relatedPapers can drop the size almost 88%. Given this, some possible solutions to keep the size reasonable:

_ops: Periodically prune on a manual basis on some local dump then restore
relatedPapers: Store PMIDs (rather than full paper details) and let the browser retrieve these on-demand
- We do this with app-ui and it is pretty reasonable

Related: #937 (comment)

Table 1. Biofactoid RethinkDB dump file sizes

Description	Size (MB)	% Change	Comments
*Full DB (june 19, 2024)	202	0.0%	Counts: 4649 Documents and 6813 Elements
**Remove _ops	96	-52.5%
**Remove relatedPapers	131	-35.1%
Remove trashed and initiated Documents	173	-14.4%	Removed 4226 Documents and 2192 Elements

*Dump archive: factoid_dump_2024-06-19_14-28-33-767.tar.gz
**Actions applied to Document and Element table

The text was updated successfully, but these errors were encountered:

jvwong mentioned this issue Sep 23, 2024

Add search list Biofactoid note PathwayCommons/app-ui#1468

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Size of the RethinkDB database is expanding at a rapid rate #1276

Size of the RethinkDB database is expanding at a rapid rate #1276

jvwong commented Jun 19, 2024

Size of the RethinkDB database is expanding at a rapid rate #1276

Size of the RethinkDB database is expanding at a rapid rate #1276

Comments

jvwong commented Jun 19, 2024