-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Shanfang edited this page May 2, 2018
·
7 revisions
This repo contains implementation of sketching algorithms for size of join estimation. The update performance of sketches can be significantly improved if only a sample of the data is sketched, without significant degradation in the accuracy. In this repo, Bernoulli sampling is used. For details of the sampling algorithms and sketching techniques, please checkout the references page.
If you are using Mac, follow these steps:
- launch the terminal
- run
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" < /dev/null 2> /dev/null
- run
brew install gsl
For other systems, please checkout the documentation on GSL
- run
make
- run
./sketch_bernoulli_sampling.out
followed by the following parameters:
dom_size
tuples_no
buckets_no
rows_no
DIST_PARAM
DIST_SHUFF
SAMP_PROB
num_runs - run
make clean
to remove all intermediate files.
-
dom_size
defines the size of domain for I -
tuples_no
defines the number of tuples in a relation -
buckets_no
defines the number of buckets when generating sketch -
row_no
defines number of rows for the generated sketch -
DIST_PARAM
determines the shape of zipf’s distribution. -
DIST_SHUFF
is the decor_param. decor_param = 0 corresponds to complete randomness; decor_param = 1 corresponds to complete positive correlation or identical relations -
bernoulli_p
set the probability p for Bernoulli sampling -
runs_no
defines number of runs for test