-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duckdbvfs: Forward to DuckDB file system via vfs mechanism #66
base: main
Are you sure you want to change the base?
Conversation
Minor modifications on top of demovfs, available as part of sqlite3 source code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Some comments below. I wonder if it is possible to only use the DuckDB file system for e.g. https or s3 files (or in case of WASM) and fallback to the regular SQLite file system otherwise?
We could switch to the DuckDB FS entirely but that would at least require all the methods there to be implemented fully.
@@ -17,6 +17,14 @@ | |||
|
|||
namespace duckdb { | |||
|
|||
DatabaseInstance *_db; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to cause concurrency issues if we have multiple DuckDB's in the same process, no?
The solution that I discussed with @Maxxen before for spatial is to make this a thread-local variable that is set prior to calling into any SQLite functions that might trigger the VFS. Can that be done here as well?
*/ | ||
static int duckdbTruncate(sqlite3_file *pFile, sqlite_int64 size){ | ||
#if 0 | ||
if( ftruncate(((DemoFile *)pFile)->fd, size) ) return SQLITE_IOERR_TRUNCATE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably shouldn't be a no-op for our file system no?
int *pResOut | ||
){ | ||
//FIXME: This might need to be properly implemented | ||
*pResOut = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with removing this is that we break file locking of SQLite - which means the SQLite scanner is no longer safe to use with other SQLite instances
** file has been synced to disk before returning. | ||
*/ | ||
static int duckdbDelete(sqlite3_vfs *pVfs, const char *zPath, int dirSync){ | ||
//FIXME: This might need to be properly implemented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably also needs to be implemented
Thanks for the review, it should be possible to have multiple file systems at the same time, I took a shortcut there mostly to increase testing surface, will give it another proper look at it later (and ping Max on the lock). |
Update: I have a branch with most problem solved, last one is the concurrency issue, that's also the most serious. |
Sqlite supports virtual file systems, this is a simple implementation, on top of demovfs, of the duckdb file system.
This allows to read sqlite database, for example over http(s):// or s3://. Fixes #39 and it's a big step towards solving also duckdb/duckdb-wasm#1213.
Implementation is mostly copy-pasted from demovfs (a demo virtual file system provided with sqlite) to use DuckDB FileHandles and their methods.
Implementation is in src/sqlite/duckdbvfs.cpp, places I modified are the one with mention of
duckdb::FileHandle
orFIXME
.Performance
One concern is that currently performance for sqlite_attach over the wire is not amazing, like 20 seconds for
CALL sqlite_attach('https://raw.githubusercontent.com/duckdblabs/sqlite_scanner/main/data/db/sakila.db');
and I believe this is due to 2 factors:
Behaviour seems to be the same in native and over the wire, as in the same segments are read in the same order, so I am not sure if that's something that should be solved at at the virtual file system layer or in general.
Local performances seems unchanged, and scan_sqlite performances seems good also over the wire.