-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comparison to other approaches #94
Comments
hey @davidmezzetti thanks for the kind words! Happy to help review any Here are my thoughts on other tools: Vectorlite: I haven't spent much time playing with this, but it seems great! Do note that the hnswlib and other C++ dependencies make it harder to compile and run on some platforms (WASM, smaller devices, etc.), but that probably doesn't matter much in the Python world. They also store vector indexes outside the SQLite database (in separate files), which is good for performance but has other not-so-great side-effects (may not be transaction-safe, doesn't work with SQLite backup and replication tools, deviates from SQLite's "it's a single file" ethos, etc). Also their benchmarks for sqlite-vec can be improved a bit, I'll probably send a PR to that repo sometime in the near future. Specifically insert/query performance can be improved greatly with a few libSQL's vector support: Also great, models after pgvector, which uses custom indexes that are not available in vanilla SQLite. But libsql folks can change whatever they want in their fork. It does require using a the libsql driver instead of the default sqlite driver, but that's usually a SQLite: I don't think SQLite will ever add vector support, but I'd be happy to see it. they'll never add new custom column types besides the builtin text/blob/int/float types (so no libsql-like support), but they could possibly add an extension like their JSON/FTS5/R-Tree extensions for vector search. That being said, none of their code use SIMD operations, so I doubt it would be as fast as For ANN indexes: they 100% will be stored inside the SQLite database. That way backups work, litestream replication works, and transactions continue as-is. I could maybe see a future where indexes are stored in separate files for performance reasons (an opt-in feature), but that would be far down the road. Let me know if there's anything else I can answer, or email me directly if you wanna chat on a call |
I appreciate the quick response. I'm glad to hear that the ANN indexes will be stored inside the SQLite database. One other thing that came to mind regarding SELECT d.* FROM documents d
LEFT JOIN vectors v ON v.indexid = d.indexid
WHERE d.flag=1 AND v.embedding MATCH ?
LIMIT 100 |
I added sqlite-vec as an ANN to txtai here - neuml/txtai#780 A future task will integrate this even further into a single SQLite data file with content and vectors. |
Hi @asg017 - Thank you for putting this library together. Storing and searching vectors directly in SQLite is a really important use case.
I believe you've come across txtai. I'm looking to add SQLite vector indexing in the next release. I've added a component for sqlite-vec and it works quite well. This enables a unified vector search with a single SQL query (i.e. join the data with the vectors) vs the current approach of having to join external indexes like Faiss with data stored in a database.
sqlite-vec has received good traction, congratulations on that. I also see there are other options out there and I was curious on your thoughts.
I see you're working on having indexing in #25, which is great. Do you envision that the indexes will somehow be directly stored in the SQLite database or in an external file? Or is that still TBD?
I appreciate any insights you can provide.
The text was updated successfully, but these errors were encountered: