-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
solana: use bloom filters to query instruction accounts #13
base: master
Are you sure you want to change the base?
Conversation
this pr has no updated test fixtures so it's draft for now |
if let Some(val) = opt { | ||
let bit_array: Vec<_> = val.bool()?.iter().map(|opt| opt.unwrap()).collect(); | ||
let bloom = sqd_bloom_filter::BloomFilter::from_bit_array(bit_array, 7); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eldargab would it make sense to import num hashes (7) as sqd_data::solana::tables::instruction::NUM_HASHES
. it would require adding sqd_data as a dep for this crate
}, | ||
"instructions": [ | ||
{ | ||
"account": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this query return instructions with BOTH accounts included? or it should behave as or
@eldargab
|
||
|
||
pub struct BloomFilter { | ||
bit_array: Vec<bool>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool
s in Rust use 1 byte of memory, making it an inefficient representation. Let's store them as a u8
array
Also note that reading and writing parts don't really have to belong to the same struct if it's not convenient for the implementation — sometimes a little repetition is okay
|
||
|
||
pub type Base58Builder = StringBuilder; | ||
pub type BytesBuilder = StringBuilder; | ||
pub type JsonBuilder = StringBuilder; | ||
pub type AccountListBuilder = ListBuilder<Base58Builder>; | ||
pub type AccountIndexList = ListBuilder<UInt8Builder>; | ||
pub type BloomFilterBuilder = ListBuilder<BooleanBuilder>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BooleanArray
s are used to store N independent bool values where each of them may be null. So for the sake of efficiency (both memory and reading) we should use the BinaryBuilder
here — the bloom filter may be either present with the constant number of bytes or absent
let series = sqd_polars::arrow::array_series("values", arr)?; | ||
for value in series.list()? { | ||
result_mask.push(self.bloom_contains(value)?); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the main use-case is building a filter from a fixed set of accounts (from query) and applying it to large numbers of instructions. So it's important to run evaluation as fast as possible. Here you're building a new bloom filter instance unnecessary — there should be no hash calculation during the evaluation.
Instead, given an account address, you can precalculate its hashes and build a bitmask S with the same length as the bloom filter with NUM_HASHES
ones set in it. Then to check whether a filter F for some instruction contains that account, you only have to check that S is a subset (in terms of bitsets) of F, which is done easily with the bitwise AND: S & F == S
.
No description provided.