Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Size of the RethinkDB database is expanding at a rapid rate #1276

Open
jvwong opened this issue Jun 19, 2024 · 0 comments
Open

Size of the RethinkDB database is expanding at a rapid rate #1276

jvwong opened this issue Jun 19, 2024 · 0 comments

Comments

@jvwong
Copy link
Member

jvwong commented Jun 19, 2024

The size of the RethinkDB data dump is quite large (>200 MB). It is problematic because it lengthens the duration of a data dump or restore (order of minutes now). This is all despite the fact that there are less than 400 public Documents.

I did a bit of experimenting with a recent dump by removing various items and looking at the dump size (.tar.gz) summarized in Table 1. It appears that pruning the _ops field (which stashes every action performed) and the relatedPapers can drop the size almost 88%. Given this, some possible solutions to keep the size reasonable:

  • _ops: Periodically prune on a manual basis on some local dump then restore
  • relatedPapers: Store PMIDs (rather than full paper details) and let the browser retrieve these on-demand
    • We do this with app-ui and it is pretty reasonable

Related: #937 (comment)

Table 1. Biofactoid RethinkDB dump file sizes

Description Size (MB) % Change Comments
*Full DB (june 19, 2024) 202 0.0% Counts: 4649 Documents and 6813 Elements
**Remove _ops 96 -52.5%
**Remove relatedPapers 131 -35.1%
Remove trashed and initiated Documents 173 -14.4% Removed 4226 Documents and 2192 Elements

*Dump archive: factoid_dump_2024-06-19_14-28-33-767.tar.gz
**Actions applied to Document and Element table

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant