Speed up processing: DB insertion and multiprocessing #600

nesnoj · 2025-01-22T11:52:20Z

Succeeding issue of recently closed #546.
I merged both topics into one issue as I think they interfere - e.g. choosing a specific DB insertion method might not work (efficiently) in multiprocessing. If you disagree, feel free to separate them.

Explore faster methods of writing to the database for

sqlite
postgres

Add multiprocessing for

XML parsing
Write to DB (if applicable due to concurrency -> table locks)

Notes on DB insertion: #546 (comment)
Notes on parallelization: #546 (comment)

Feel free to amend :)

FlorianK13 · 2025-01-22T13:04:12Z

we reached #600 🥇

AlexandraImbrisca · 2025-01-22T19:35:16Z

Hi @nesnoj! Thanks for creating this issue. I'm testing my approach a bit more (different number of cores & different operating systems) and I'll create the PR. I've been developing and testing on MacOS & Linux and I'll continue with Windows. The approach is quite simple and uses the standard concurrent.futures library with a few options to optimize the access to the database

About writing the data to the postgre database: would you mind tackling this separately? I didn't get the chance to look into how optimize writing yet and the parallelization is database-agnostic right now

nesnoj · 2025-01-23T05:35:29Z

About writing the data to the postgre database: would you mind tackling this separately? I didn't get the chance to look into how optimize writing yet and the parallelization is database-agnostic right now

Sure, feel free to create a separate issue if that makes sense to you.

nesnoj added the 🚀 feature New feature or request label Jan 22, 2025

nesnoj mentioned this issue Jan 22, 2025

Increase parsing speed #546

Closed

nesnoj assigned AlexandraImbrisca Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up processing: DB insertion and multiprocessing #600

Speed up processing: DB insertion and multiprocessing #600

nesnoj commented Jan 22, 2025

FlorianK13 commented Jan 22, 2025

AlexandraImbrisca commented Jan 22, 2025 •

edited

Loading

nesnoj commented Jan 23, 2025

Speed up processing: DB insertion and multiprocessing #600

Speed up processing: DB insertion and multiprocessing #600

Comments

nesnoj commented Jan 22, 2025

FlorianK13 commented Jan 22, 2025

AlexandraImbrisca commented Jan 22, 2025 • edited Loading

nesnoj commented Jan 23, 2025

AlexandraImbrisca commented Jan 22, 2025 •

edited

Loading