-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start piker.storage
subsys: cross-(ts)db middlewares
#486
base: master
Are you sure you want to change the base?
Conversation
a0e1464
to
4a13c89
Compare
We can convert this to a draft if necessary if/when #483 lands |
4a13c89
to
d3da84e
Compare
Previously we were passing the `fqme: str` which isn't as extensive nor were we able to pass `MktPair` direct to backend history manager-loading routines (which should be able to rely on always receiving it since currently `stream_quotes()` is always called first for setup). This also starts a slight bit of configuration oriented tsdb info loading (via a new `conf.toml`) such that a user can decide to host their (marketstore) db on a remote host and our container spawning and client code will do the right startup automatically based on the config. |-> Related to this I've added some comments about doing storage backend module loading which should get actually written out as part of patches coming in #486 (or something related). Don't allow overruns again in history context since it seems it was never a problem?
d3da84e
to
465a4c3
Compare
7ff80fc
to
4fdac6a
Compare
I'm in favor of doing or own solution and I would rather stop maintaining any marketstore related coded, in the end we were almost gonna spend as much work mantaining marketstore that just doing our own thing right. |
yup totally agree! ok then i'll be putting up some finishing functionality touches, hopefully tests, and then dropping all that junk 🏄🏼 |
0a07530
to
bad71fa
Compare
The plan is to offer multiple tsdb and other storage backends (for a variety of use cases) and expose them similarly to how we do for broker and data providers B)
To kick off our (tsdb) storage backends this adds our first implementing a new `Storage(Protocol)` client interface. Going foward, the top level `.storage` pkg-module will now expose backend agnostic APIs and helpers whilst specific backend implementations will adhere to that middle-ware layer. Deats: - add `.storage.marketstore.Storage` as the first client implementation, moving all needed (import) dependencies out from `.service.marketstore` as well as `.ohlc_key_map` and `get_client()`. - move root `conf.toml` loading from `.data.history` into `.storage.__init__.open_storage_client()` which now takes in a `name: str` and does all the work of loading the correct backend module, its config, and determining if a service-instance can be contacted and a client loaded; in the case where this fails we raise a new `StorageConnectionError`. - add a new `.storage.get_storagemod()` just like we have for brokers. - make `open_storage_client()` also return the backend module such that the history-data layer can make backend specific calls as needed (eg. ohlc_key_map). - fall back to a basic non-tsdb backfill when `open_storage_client()` raises the new connection error.
Turns out you can mix and match `click` with `typer` so this moves what was the `.data.cli` stuff into `storage.cli` and uses the integration api to make it all work B) New subcmd: `piker store` - add `piker store ls` which lists all fqme keyed time-series from backend. - add `store delete` to remove any such key->time-series. - now uses a nursery for multi-timeframe concurrency B) Mask out all the old `marketstore` specific subcmds for now (streaming, ingest, storesh, etc..) in anticipation of moving them into a subpkg-module and make sure to import the sub-cmd module in our top level cli package. Other `.storage` api tweaks: - drop the reraising with custom error (for now). - rename `Storage` -> `StorageClient` (or should it be API?).
Since we want to be able to support user-configurable vnc socketaddrs, this preps for passing the piker client direct into the vnc hacker routine so that we can (eventually load) and read the ib brokers config settings into the client and then read those in the `asyncvnc` task spawner.
Not sure how this lasted so long without complaint (literally since we added history 1m OHLC it seems; guess it means most backends are pretty tolerant XD ) but we've been sending 2 cancels per order (dialog) due to the mirrored lines on each chart: 1s and 1m. This fixes that by reworking the `OrderMode` methods to be a bit more sane and less conflated with the graphics (lines) layer. Deatz: - add new methods: - `.oids_from_lines()` line -> oid extraction, - `.cancel_orders()` which makes the order client cancel requests from a `oids: list[str]`. - re-impl `.cancel_all_orders()` and `.cancel_orders_under_cursor()` to use the above methods thus fixing the original bug B)
This was actually incorrect prior, we were rounding triggered limit orders with the `.size_tick` value's digits when we should have been using the `.price_tick` (facepalm). So fix that and compute the rounding number of digits (as passed to the round(<value>, ndigits=<here>)` builtin) and store it in the `DarkBook.triggers` tuples so that at trigger/match time the round call is done *just prior* to msg send to `brokerd` given the last known live L1 queue price.
Since crypto backends now also may expand an FQME like `xbteur.kraken` -> `xbteur.spot.kraken` (by filling in the venue token), we need to use this identifier when looking up per-market order dialogs or submitting new requests. The simple fix is to simply look up that expanded from from the `Feed.flumes` table which is always keyed by the `MktPair.fqme: str` - the expanded form.
Since we only ever want to do incremental y-range calcs based on the price always skip any tick types emitted by the data daemon which aren't defined in the fundamental set. Further, toss in a new `debug_n_trade: bool` toggle which by default turns off all loggin and profiler calls; if you want to do profiling this has to now be adjusted manually!
…field casting
1e89f98
to
3535986
Compare
Means commenting out the `data.cli.ingest()` as it will be deleted in the up coming #486 anyway.
A common usage error is to run `piker anal mnq.cme.ib` where the CLI passed fqme is not actually fully-qualified (in this case missing an expiry token) and we get an underlying `FileNotFoundError` from the `StorageClient.read_ohlcv()` call. In such key misses, scan the existing `StorageClient._index` for possible matches and report in a `raise from` the new error. CHERRY into #486
I guess since i started supporting the whole "allow a gap between the latest tsdb sample and the latest retrieved history frame" the overlap slicing has been completely borked XD where we've been sticking in duplicate history samples and this has caused all sorts of down stream time-series processing issues.. So fix that but ensuring whenever there IS an overlap between history in the latest frame and the tsdb that we always prefer the latest frame's data and slice OUT the tsdb's duplicate indices.. CHERRY TO #486
Since the `diff: int` serves as a predicate anyway (when `0` nothing duplicate was detected) might as well just return it directly since it's likely also useful for the caller when doing deeper anal. Also, handle the zero-diff case by just returning early with a copy of the input frame and a `diff=0`. CHERRY INTO #486
Using a bunch of fancy `numpy` vec ops (and ideally eventually extending the same to `polars`) this is a first draft of `get_null_segs()` a `col: str` field-value-is-zero detector which filters to all zero-valued input frame segments and returns the corresponding useful slice-indexes: - gap absolute (in shm buffer terms) index-endpoints as `absi_zsegs` for slicing to each null-segment in the src frame. - ALL abs indices of rows with zeroed `col` values as `absi_zeros`. - the full set of the input frame's row-entries (view) which are null valued for the chosen `col` as `zero_t`. Use this new null-segment-detector in the `.data.history.start_backfill()` task to attempt to fill null gaps that might be extant from some prior backfill attempt. Since `get_null_segs()` should now deliver a sequence of slices for each gap we don't really need to have the `while gap_indices:` loop any more, so just move that to the end-of-func and warn log (for now) if all gaps aren't eventually filled. TODO: -[ ] do the null-seg detection and filling concurrently from most-recent-frame backfilling. -[ ] offer the same detection in `.storage.cli` cmds for manual tsp anal. -[ ] make the graphics layer actually update correctly when null-segs are filled (currently still broken somehow in the `Viz` caching layer?) CHERRY INTO #486
Thinking about just moving all of that module (after a content breakup) to a new `.piker.tsp` which will mostly depend on the `.data` and `.storage` sub-pkgs; the idea is to move biz-logic for tsdb IO/mgmt and orchestration with real-time (shm) buffers and the graphics layer into a common spot for both manual analysis/research work and better separation of low level data structure primitives from their higher level usage. Add a better `data.history` mod doc string in prep for this move as well as clean out a bunch of legacy commented cruft from the `trimeter` and `marketstore` days. TO CHERRY #486 (if we can)
Can't ref `dt_eps` and `tsdb_entry` if they don't exist.. like for 1s sampling from `binance` (which dne). So make sure to add better logic guard and only open the finaly backload nursery if we actually need to fill the gap between latest history and where tsdb history ends. TO CHERRY #486
Launch pad for work towards the task list in #485 🏄🏼
As a start this introduces a new
piker.storage
subsystem toprovide for database related middleware(s) as well as new storage
backend using
polars
and apache parquet files to implementa built-in, local-filesystem managed "time series database":
nativedb
.After some extensive tinkering and brief performance measures I'm
tempted to go all in on this home grown solution for a variety of
reasons (see details in 27932e4) but re-summarizing some of them
here:
settings yet!
ecosystem.
choose to swap out stuff like the replication system or networking
access.
polars
already has a multi-db compat layer with multi-enginesupport we can leverage and completely sidestep integration work
with multiple standard tsdbs?
Core dev discusssion
we've put some work into
marketstore
support machineryincluding:
super useful going forward FWIW!).
anyio-marketstore
an async client written and maintained byour devs.
features.
AND the question is whether we're ok abandoning some of this
and/or reimplementing it on our own around the new apache data
file/model ecosystems?
we can definitely accomplish ingest, pub-sub and replication
on our own (without really much effort) with the following
existing subsystems and frameworks:
tractor
actor which writes to apache arrow (IPC)files and flushes to parquet on size constraints.
tractor
actor andtrio-websocket
something like
borg
(with it's unofficial API client) toaccomplish file syncing across many user-hosts.
borg
has a community API: https://github.com/spslater/borgapishould we drop all the existing
marketstore
code?the new implementation changes in the
.data.history
layer.or less deal breakers, like
marketstore
: gRPC snappy compression crash? #443arcticdb
is a better solution longer run thenmkts was anyway given it's large insti usage..?
ToDo:
CHERRY from
binance
live (futures) trading support #519:CHERRY from Account tests #528
outstanding obvious regression due to this patch set 😂
off:
=> pretty sure this is fixed now after reworking the gap
filling logic inside
.data.history.start_backfill()
drop market store code in general depending on outcome of
above discussion.
.storage.marketstore
andanyio-marketstore
dep?.service._ahab
layer?.data.history
!from tsdb multi-backend support and testing! #485
.storage
with subpkgs for backends and an API/ mgmt layer
outstanding tsdb bugs:
docs on new filesystem layout and config options:
nativedb/
dir[storage]
section toconf.toml
:from tsdb editing and mgmt toolz #312 we need chart-UI integration for a buncha stuff:
reload history
for a highlighted section or gap B).storage.cli
refinement:--tsdb
in readme #313) documenting--tsdb
is no longer needed since wedon't need to offer optional docker activation, since we don't
need it using
nativedb
backend!piker store
cmdsanal
subcmd do gap detection and discrepancyreporting (at the least) against market-venue known operating
hours.
new
natived
backend implemented withpolars
+ apacheparquet files B)
since we're already moving to use
typer
in Rekt pps? problem? =>piker.accounting
#489, let's also addconfirmation support for the new
pikerd storage -d
flag:.storage.cli
!gap backfilling (as detailed in
f45b76e)
still requires some work for full functionality including:
update loop to ensure we do a forced graphics data formatter
update when gap-backfilling is complete.
rt ingest and fast parquest update deferred to Quote ingest using apache stack: arrow / parquet #536
currently we aren't storing rt data (received during data
session but not previously written to storage) on teardown..
dfs and then parquet at some frequency / teardown?
related to above, what about for FSP ingest and storage?
marketstore
tick ingest #314 probablyshould be re-created but for
nativedb
and a new writeuparound arrow IPC and feather formats?
(likely as follow up) use the lazy
polar
s API to dolarger then mem processing both for charting and remote
(host) processing:
use
polars
to do price series anomaly repairs, such as iscauses by stock splits or for handling bugs in data providers
where a ticker name was repurposed for a new asset and the
price history has mega gap:
deciding on file organization, naming schema, subdirs for
piker subsystems, etc.
period and then simply use the multiple files reader support:
https://pola-rs.github.io/polars-book/user-guide/io/multiple/
mnq.cme.20230616.ib.ohlcv1s.parquet
but we can probablychange the meta-data token part
ohlcv1s
to be moreparse-able and readable?
.
in:ohlcv.1s.<otherinfo>
?a
.config/piker/nativedb/fsp/
subdir?