Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

solana: use bloom filters to query instruction accounts #13

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tmcgroul
Copy link
Collaborator

No description provided.

@tmcgroul tmcgroul marked this pull request as draft February 11, 2025 22:02
@tmcgroul
Copy link
Collaborator Author

this pr has no updated test fixtures so it's draft for now

@tmcgroul tmcgroul requested a review from eldargab February 11, 2025 22:04
if let Some(val) = opt {
let bit_array: Vec<_> = val.bool()?.iter().map(|opt| opt.unwrap()).collect();
let bloom = sqd_bloom_filter::BloomFilter::from_bit_array(bit_array, 7);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eldargab would it make sense to import num hashes (7) as sqd_data::solana::tables::instruction::NUM_HASHES. it would require adding sqd_data as a dep for this crate

},
"instructions": [
{
"account": [
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this query return instructions with BOTH accounts included? or it should behave as or
@eldargab



pub struct BloomFilter {
bit_array: Vec<bool>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bools in Rust use 1 byte of memory, making it an inefficient representation. Let's store them as a u8 array

Also note that reading and writing parts don't really have to belong to the same struct if it's not convenient for the implementation — sometimes a little repetition is okay



pub type Base58Builder = StringBuilder;
pub type BytesBuilder = StringBuilder;
pub type JsonBuilder = StringBuilder;
pub type AccountListBuilder = ListBuilder<Base58Builder>;
pub type AccountIndexList = ListBuilder<UInt8Builder>;
pub type BloomFilterBuilder = ListBuilder<BooleanBuilder>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BooleanArrays are used to store N independent bool values where each of them may be null. So for the sake of efficiency (both memory and reading) we should use the BinaryBuilder here — the bloom filter may be either present with the constant number of bytes or absent

let series = sqd_polars::arrow::array_series("values", arr)?;
for value in series.list()? {
result_mask.push(self.bloom_contains(value)?);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the main use-case is building a filter from a fixed set of accounts (from query) and applying it to large numbers of instructions. So it's important to run evaluation as fast as possible. Here you're building a new bloom filter instance unnecessary — there should be no hash calculation during the evaluation.

Instead, given an account address, you can precalculate its hashes and build a bitmask S with the same length as the bloom filter with NUM_HASHES ones set in it. Then to check whether a filter F for some instruction contains that account, you only have to check that S is a subset (in terms of bitsets) of F, which is done easily with the bitwise AND: S & F == S.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants