-
-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use SQLite for parts database #99
base: master
Are you sure you want to change the base?
Conversation
Don't commit: - cached parts database - IntelliJ IDE files
b734c9f
to
64f1d8a
Compare
64f1d8a
to
7980dbd
Compare
The intent here is to use this with https://github.com/phiresky/sql.js-httpvfs. There are major missing pieces: - the python sqlite on my system doesn't have the fts5 extension availible, so while sql.js-httpvfs theoretically supports full-text search, it is not availible - the sql.js-httpvfs VFS needs to be modified to download the index file and map between the index & the gzipped data, and use the Compression Streams API to decompress each chunk - no work was done on the frontend portion of jlcparts
7980dbd
to
b0a6a59
Compare
I've updated the above with some more data.
generated with this test script
data:
NOTE: this time is compression time, not decompression time seems to me like zstd with 64KiB dict & 8K pages is likely the best tradeoff here. zstd has good decompression speed across a range of compression levels: https://github.com/facebook/zstd#benchmarks |
Thank you for your input. I really appreciate it. I thought about using SQLite for a while, but I never found time (and motivation) to do it. Let me share my findings:
|
I find this part of the site particularly frustrating. I don't like having to constantly refresh all the data and have to think about that, I'd rather that just get handled seamlessly in the background.
Interesting. The full-text feature definitely needs to be investigated further. I don't know about SQLite, but with Postgresql, there's things like covering indexes and index-only scans. I wonder if it'd be possible to get that in SQLite FTS. I'm not familiar with the existing code, but there's also plenty of knobs around tokenizing the query, stop words, throttling delay, limiting result counts, etc. If you have your old notes, I'd be interested in reading them.
Not opposed to this, but I don't see why it couldn't be both: fetch chunks only as they are needed, and cache them locally. In fact, I wonder if web browsers already handle caching internally these days.
I stuck with the existing IndexedDB schema design in my testing. But yes, there is plenty to look at in terms of structuring the data so it can be queried easily.
I've often felt this same temptation to combine two major changes into one 😃. I've found it ends up more motivating and more efficient for me to do changes in smaller chunks, even if at times it seems like I'm doing work that I will soon replace. |
There's many config options for full-text-search with sqlite. I don't remember if I wrote my research about this down somewhere but you can reduce the FTS size by >90% by setting detail=none and contentless (https://www.sqlite.org/fts5.html#the_detail_option). It does reduce the power of the queries you can do a lot though. Also note that sqlite is very easy to compile and there's also drop in python packages to get a newer and more complete sqlite into python. There's one powerful statically hosted full text search engine that scales to a terabyte of data that i know of called summa, but for an index with a compressed size of 50MB it's probably not worth it. There's a few other JS libraries that allow creating a keyword / full text search index that you serialize to JSON that you fully download (with a smaller size) and you could then use in combination with SQLite or something else to fetch the full data dynamically. One example (I think) is https://github.com/nextapps-de/flexsearch |
Also just as a note if you have too much free time: If you download the whole DB then you can alternatively also create a minimal db without indexes and without FTS, download that, and create the indexes and FTS search locally. Trading bandwidth for local compute. |
3dc9b5b
to
bb8f331
Compare
The intent here is to use this with https://github.com/phiresky/sql.js-httpvfs. There are major missing pieces:
Quick benchmarks:
note: not super helpful in the CPU & time area because I have not tested read performance.
See #37