Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize database dumping and loading procedures #29

Open
logstar opened this issue Sep 15, 2021 · 0 comments
Open

Optimize database dumping and loading procedures #29

logstar opened this issue Sep 15, 2021 · 0 comments

Comments

@logstar
Copy link
Contributor

logstar commented Sep 15, 2021

Database dumping and loading procedures are interdependent.

In the current framework:

  • Database is dumped into .sql.gz, which takes about 1 hour to dump and compress a 90GB .sql file. The .sql.gz format is selected to make database dump files completely reproducible, which is one of the key requirements for PedOT project. The custom format, dumped with -Fc option, can speed up database loading, which is suggested by Shiping Zhang, but the custom format dump files have different sha256 checksums for different dumps for the same database.
  • Database is loaded using psql, which is the only compatible tool for loading .sql.gz file. The loading process takes about 1.5 hour to decompress the .sql.gz file and load the 90GB .sql file. After loading, one index is created, which takes about two hours.

cc @blackdenc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant