-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add basic query implementation #19
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I accept this PR, except for the need to test queries with empty blobs returned. Otherwise, it's OK for merge
v = sqlite3_errcode(sqlite3_db_handle(s)) | ||
|
||
if not (v in [SQLITE_OK, SQLITE_ROW, SQLITE_DONE]): | ||
raise (ref Defect)(msg: $sqlite3_errstr(v)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This became repetitive pattern, may be replaced sometime in future
SQLITE_TRANSIENT_GCSAFE) | ||
|
||
if not (v == SQLITE_OK): | ||
raise (ref Defect)(msg: $sqlite3_errstr(v)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said before, we should discuss when raise Defect and when return failure, but at this moment let's accept the already implemented approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case I wasn't sure of the best approach because we can't return
from an inline iterator. We could yield
a failure and break
the internal loop, or leave it to the user to break
in their loop over the iterator when an iteration-step yields a failure. Either way the cleanup in the defer
block will be triggered.
Since I wasn't sure, for now I chose to raise Defect on any error, either directly or via nim-result's expect
.
dataLen = sqlite3_column_bytes(s, queryStmtDataCol) | ||
dataBytes = cast[ptr UncheckedArray[byte]](blob) | ||
data = @(toOpenArray(dataBytes, 0, dataLen - 1)) | ||
fut = newFuture[QueryResponse]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will it work with blob==nil ? your testing code for queries doesn't include empty blobs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add some tests for that. Should already work since the approach is roughly the same for method get
, which supports empty blobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've accounted for empty blobs in 32596eb.
979ceac
to
6ef19c4
Compare
I've accounted for empty blobs in 32596eb. I also replaced the original PR description with notes that I thought to be relevant now and in the future, even if at present we move forward with the PR as-is. Let me know what you think, and if we're good to go, I'll squash my |
your PR comments should be documents of its own. As far as you tested empty blobs, this PR is OK for me. as of chosen API, it should be debated before implementation. Now the API you choosen is already implemented, so let's Merge it and go to work on integration |
32596eb
to
f6d2c08
Compare
This PR adds a basic
query
implementation for baseDatastore
,NullDatastore
, and most importantlySQLiteDatastore
.Closes #8.
An implementation for
FileSystemDatastore
will be skipped at this time, though it can be revisited in the future.The immediate use case for this facility is codex-storage/nim-codex#138, it's also relevant to codex-storage/nim-codex#131.
For codex-storage/nim-codex#137, if we want to use
TieredDatastore
, then some decisions need to made re: the desired behavior, as explained below.General Notes
The first choice was how to implement the facility: should it be
method query
oriterator query
? The second choice was whether it should be sync or async.A
method query
approach would naturally involvereturn
ingseq[(key, data)]
(entire results for the query), whileiterator query
would involveyield
ing(key, data)
one at a time.I chose to proceed with
iterator query
and made it async:Note that this is presently "faux async" just like
method get
, etc.Types
Query
andQueryKey
The original vision of Python Datastore (embraced by various implementations in JavaScript and Go) is to have a
Query
class/object that provides a high-level almost declarative API for specifying what to query, how to filter, sort, and so on.The basic implementation in this PR does not attempt to fulfill that vision. Rather, a
Query
object is constructed only with aQueryKey
, and at the momentQueryKey
is simply an alias forKey
.In the case of
SQLiteDatastore
, the idea is that aQueryKey
will use theGLOB
syntax supported by SQLite to match multiple rows.Imagine that a Codex manifest with CID
abcd1234
is stored with keymanifest:abcd1234
. To retrieve all the blocks stored for that manifest, one could do something like:Limitations of SQLite GLOB
SQLite's GLOB syntax is not a form of regular expressions. The wildcard
*
is greedy such that the following won't work re: querying for all the manifests but not their blocks (per hypothetical example above):That would match all manifests and all their blocks.
If desirable, there are a number of ways to work around that at the application level (i.e. in a program using
SQLiteDatastore
), or we could consider enabling PCRE support in SQLite and usingREGEXP
instead ofGLOB
for theQueryKey
parameter.Async + Iterator
Consider the support built into the JavaScript language for asynchronous iteration:
Symbol.asyncIterator
for-await...of
async function*
Those language constructs work together so that async iteration is well-defined and distinct from sync iteration. For example,
for-await...of
needs to be used in the general case of async iteration because it's not generally possible to know in advance how many times an async iterable will yield a value. In other words, you can't usePromise.all()
, you need to step through each promised value, awaiting each in turn. Or you can use some sugar lib like IxJS to avoid manually writingfor-await...of
loops.We don't currently have libs or lang features for Nim that are dedicated to async iteration. So, it's important to understand that it will be up to the caller to
await
on each step in a loop over asynciterator query
, at least in the general case, i.e. instead of trying to collect all the responses into chronos'allFinished()
.Because
iterator query
forSQLiteDatastore
is "faux async", it technically could work to exhaust the iterator and run the collectedFuture
s throughallFinished()
. But when we move to a case likeTieredDatastore
(depending on how that works), or when we have a multi-threaded solution where the actual querying of the database takes place on a worker thread, then it's not feasible to do that, at least not without additional machinery.TieredDatastore
In the original Python Datastore, there is this doc comment for
class TieredDatastore
:And below that:
Indeed, it's not clear how a
query
implementation (in Nim, Python, or other lang) should operate internally if a query op were to span multiple stores.So
iterator query
for nim-datastore'sTieredDatastore
should probably have the last datastore directly handle queries.However, that does not mean that application-specific derivatives of
TieredDatastore
could not have specialized logic for e.g.method get
vs.iterator query
.Codex, for example, might have this arrangement re: nim-datastore layers in a derivative of
TieredDatastore
:While
method get
may attempt to resolve a key fromNetwork
if neitherLRU cache
norSQLite
has it,iterator query
could bypassLRU cache
and go directly toSQlite
while also not attempting to fall through toNetwork
.TODO in future PRs
We could prevent GLOB characters (
*
,?
,[
,]
) from being used in strings passed toKey.init()
while allowing them to be used inQueryKey.init()
.In #11 we look toward a refactor that will be generic with respect to parameters and return types. That will provide additional flexibility, allowing concrete implementations to diverge in terms of their choice of base
Key
and other base types, how e.g.Key
relates toQuery
andQueryKey
(and even data layout in the backend), and making it simpler for applications to specialize on types and behaviors forget
,query
, etc. for a specific backend on which they choose to build.