Skip to content

Commit

Permalink
[MISC] Rename maximum_false_positive_rate->maximum_fpr.
Browse files Browse the repository at this point in the history
  • Loading branch information
smehringer authored and eseiler committed Oct 23, 2023
1 parent 58067e0 commit 226b1f1
Show file tree
Hide file tree
Showing 14 changed files with 74 additions and 82 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ int main()
seqan::hibf::config config{.input_fn = get_user_bin_data, // required
.number_of_user_bins = 3u, // required
.number_of_hash_functions = 2u,
.maximum_false_positive_rate = 0.05,
.maximum_fpr = 0.05,
.threads = 1u};

// The HIBF constructor will determine a hierarchical layout for the user bins and build the filter.
Expand Down
58 changes: 29 additions & 29 deletions include/hibf/config.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,20 +30,20 @@ namespace seqan::hibf
*
* Here is the list of all configs options:
*
* | Type | Option Name | Default | Note |
* |:--------|:------------------------------------------------------------|:-------:|:-----------------------|
* | General | seqan::hibf::config::input_fn | - | [REQUIRED] |
* | General | seqan::hibf::config::number_of_user_bins | - | [REQUIRED] |
* | General | seqan::hibf::config::number_of_hash_functions | 2 | |
* | General | seqan::hibf::config::maximum_false_positive_rate | 0.05 | [RECOMMENDED_TO_ADAPT] |
* | General | seqan::hibf::config::relaxed_fpr | 0.3 | |
* | General | seqan::hibf::config::threads | 1 | [RECOMMENDED_TO_ADAPT] |
* | Layout | seqan::hibf::config::sketch_bits | 12 | |
* | Layout | seqan::hibf::config::tmax | 0 | 0 indicates unset |
* | Layout | seqan::hibf::config::max_rearrangement_ratio | 0.5 | |
* | Layout | seqan::hibf::config::alpha | 1.2 | |
* | Layout | seqan::hibf::config::disable_estimate_union | false | |
* | Layout | seqan::hibf::config::disable_rearrangement | false | |
* | Type | Option Name | Default | Note |
* |:--------|:----------------------------------------------|:-------:|:-----------------------|
* | General | seqan::hibf::config::input_fn | - | [REQUIRED] |
* | General | seqan::hibf::config::number_of_user_bins | - | [REQUIRED] |
* | General | seqan::hibf::config::number_of_hash_functions | 2 | |
* | General | seqan::hibf::config::maximum_fpr | 0.05 | [RECOMMENDED_TO_ADAPT] |
* | General | seqan::hibf::config::relaxed_fpr | 0.3 | |
* | General | seqan::hibf::config::threads | 1 | [RECOMMENDED_TO_ADAPT] |
* | Layout | seqan::hibf::config::sketch_bits | 12 | |
* | Layout | seqan::hibf::config::tmax | 0 | 0 indicates unset |
* | Layout | seqan::hibf::config::max_rearrangement_ratio | 0.5 | |
* | Layout | seqan::hibf::config::alpha | 1.2 | |
* | Layout | seqan::hibf::config::disable_estimate_union | false | |
* | Layout | seqan::hibf::config::disable_rearrangement | false | |
*
* As a copy and paste source, here are all config options with their defaults:
*
Expand All @@ -62,7 +62,7 @@ namespace seqan::hibf
* Check the documentation of the following options that influence the memory consumption:
* * seqan::hibf::config::threads
* * seqan::hibf::config::number_of_hash_functions
* * seqan::hibf::config::maximum_false_positive_rate
* * seqan::hibf::config::maximum_fpr
*
* ## Validation
*
Expand Down Expand Up @@ -134,7 +134,7 @@ struct config
/*!\brief The desired maximum false positive rate of the underlying Bloom Filters. [RECOMMENDED_TO_ADAPT]
*
* We ensure that when querying a single hash value in the (H)IBF, the probability of getting a false positive answer
* will not exceed the value set for seqan::hibf::config::maximum_false_positive_rate.
* will not exceed the value set for seqan::hibf::config::maximum_fpr.
* The internal Bloom Filters will be configured accordingly. Individual Bloom Filters might have a different
* but always lower false positive rate (FPR).
*
Expand All @@ -147,30 +147,30 @@ struct config
*
* \sa [Bloom Filter Calculator](https://hur.st/bloomfilter/).
*/
double maximum_false_positive_rate{0.05};
double maximum_fpr{0.05};

/*!\brief Allow a higher FPR in non-accuracy-critical parts of the HIBF structure.
*
* Some parts in the hierarchical structure are not critical to ensure the seqan::hibf::config::maximum_false_positive_rate.
* These can be allowed to have a higher FPR to reduce the overall space consumption taking into account a small
* decrease in runtime performance.
* Some parts in the hierarchical structure are not critical to ensure the seqan::hibf::config::maximum_fpr.
* These can be allowed to have a higher FPR to reduce the overall space consumption, while only minimally
* affecting the runtime performance.
*
* Value must be in range [0,1].
* Value must be equal to or larger than seqan::hibf::config::maximum_false_positive_rate.
* Value must be in range (0.0,1.0).
* Value must be equal to or larger than seqan::hibf::config::maximum_fpr.
* Recommendation: default value (0.3)
*
* ### Technical details
*
*
* Merged bins in an HIBF layout will always be followed by one or more lower-level IBFs that will have split bins
* or single bins (split = 1) to recover the original user bins. Thus, the FPR of merged bins does not determine the
* seqan::hibf::config::maximum_false_positive_rate, but is independent. Choosing a higher FPR for merged bins can
* seqan::hibf::config::maximum_fpr, but is independent. Choosing a higher FPR for merged bins can
* lower the memory requirement but increases the runtime. Experiments show that the decrease in memory is
* significant, while the runtime suffers only slightly. The accuracy of the results is not affected by this
* parameter.
*
* Note: For each IBF there is a limit to how high the FPR of merged bins can be. Specifically, the FPR for merged
* bins can never decrease the IBF size more than what is needed to ensure the
* seqan::hibf::config::maximum_false_positive_rate for split bins. This means that, at some point, choosing even
* seqan::hibf::config::maximum_fpr for split bins. This means that, at some point, choosing even
* higher values for this parameter will have no effect anymore.
*
* \sa [Bloom Filter Calculator](https://hur.st/bloomfilter/).
Expand Down Expand Up @@ -293,10 +293,10 @@ struct config
*
* Constrains:
* * seqan::hibf::config::number_of_hash_functions must be in `[1,5]`.
* * seqan::hibf::config::maximum_false_positive_rate must be in `(0.0,1.0)`.
* * seqan::hibf::config::relaxed_fpr must be in `[0.0,1.0]`.
* * seqan::hibf::config::maximum_fpr must be in `(0.0,1.0)`.
* * seqan::hibf::config::relaxed_fpr must be in `(0.0,1.0)`.
* * seqan::hibf::config::relaxed_fpr must be equal to or larger than
* seqan::hibf::config::maximum_false_positive_rate.
* seqan::hibf::config::maximum_fpr.
* * seqan::hibf::config::threads must be greater than `0`.
* * seqan::hibf::config::sketch_bits must be in `[5,32]`.
* * seqan::hibf::config::tmax must be at most `18446744073709551552`.
Expand All @@ -323,7 +323,7 @@ struct config

archive(CEREAL_NVP(number_of_user_bins));
archive(CEREAL_NVP(number_of_hash_functions));
archive(CEREAL_NVP(maximum_false_positive_rate));
archive(CEREAL_NVP(maximum_fpr));
archive(CEREAL_NVP(relaxed_fpr));
archive(CEREAL_NVP(threads));

Expand Down
2 changes: 1 addition & 1 deletion include/hibf/hierarchical_interleaved_bloom_filter.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ class hierarchical_interleaved_bloom_filter
*
* Options recommended to adapt to your setup:
* * `threads` - Choose number of threads depending on your hardware settings to speed up construction
* * `maximum_false_positive_rate` - How many false positive answers can you tolerate? A low FPR (e.g. 0.001) is
* * `maximum_fpr` - How many false positive answers can you tolerate? A low FPR (e.g. 0.001) is
* needed if you can tolerate a high RAM peak when using the HIBF but post-processing steps are heavy and FPs
* should be avoided. A high FPR (e.g. `0.3`) can be chosed if you want a very small HIBF and false positive
* can be easily filtered in the down-stream analysis
Expand Down
9 changes: 5 additions & 4 deletions include/hibf/layout/hierarchical_binning.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -78,13 +78,14 @@ class hierarchical_binning
if (max_id == max_split_id) // Overall max bin is a split bin.
return max_id;

// the minimum size of the TBs of this IBF to ensure the maximum_false_positive_rate for split bins
size_t const minimum_bits{build::bin_size_in_bits({.fpr = config.maximum_false_positive_rate,
// Split cardinality `max_split_size` already accounts for fpr correction.
// The minimum size of the TBs of this IBF to ensure the maximum_false_positive_rate for split bins.
size_t const minimum_bits{build::bin_size_in_bits({.fpr = config.maximum_fpr,
.hash_count = config.number_of_hash_functions,
.elements = max_split_size})};

// the potential size of the TBs of this IBF given the allowed merged bin FPR
size_t const merged_bits{build::bin_size_in_bits({.fpr = config.relaxed_fpr,
// The potential size of the TBs of this IBF given the allowed merged bin FPR.
size_t const merged_bits{build::bin_size_in_bits({.fpr = config.relaxed_fpr, //
.hash_count = config.number_of_hash_functions,
.elements = max_size})};

Expand Down
3 changes: 1 addition & 2 deletions src/build/construct_ibf.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,7 @@ seqan::hibf::interleaved_bloom_filter construct_ibf(robin_hood::unordered_flat_s
assert(!max_bin_is_merged || number_of_bins == 1u); // merged max bin implies (=>) number of bins == 1

size_t const kmers_per_bin{(kmers.size() + number_of_bins - 1u) / number_of_bins}; // Integer ceil
double const fpr = max_bin_is_merged ? data.config.relaxed_fpr
: data.config.maximum_false_positive_rate;
double const fpr = max_bin_is_merged ? data.config.relaxed_fpr : data.config.maximum_fpr;

size_t const bin_bits{bin_size_in_bits({.fpr = fpr, //
.hash_count = data.config.number_of_hash_functions,
Expand Down
11 changes: 5 additions & 6 deletions src/config.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -72,16 +72,15 @@ void config::validate_and_set_defaults()
if (number_of_hash_functions == 0u || number_of_hash_functions > 5u)
throw std::invalid_argument{"[HIBF CONFIG ERROR] config::number_of_hash_functions must be in [1,5]."};

if (maximum_false_positive_rate <= 0.0 || maximum_false_positive_rate >= 1.0)
throw std::invalid_argument{"[HIBF CONFIG ERROR] config::maximum_false_positive_rate must be in (0.0,1.0)."};
if (maximum_fpr <= 0.0 || maximum_fpr >= 1.0)
throw std::invalid_argument{"[HIBF CONFIG ERROR] config::maximum_fpr must be in (0.0,1.0)."};

if (relaxed_fpr <= 0.0 || relaxed_fpr >= 1.0)
throw std::invalid_argument{
"[HIBF CONFIG ERROR] config::relaxed_fpr must be in (0.0,1.0)."};
throw std::invalid_argument{"[HIBF CONFIG ERROR] config::relaxed_fpr must be in (0.0,1.0)."};

if (relaxed_fpr < maximum_false_positive_rate)
if (relaxed_fpr < maximum_fpr)
throw std::invalid_argument{"[HIBF CONFIG ERROR] config::relaxed_fpr must be "
"greater than or equal to config::maximum_false_positive_rate."};
"greater than or equal to config::maximum_fpr."};

if (threads == 0u)
throw std::invalid_argument{"[HIBF CONFIG ERROR] config::threads must be greater than 0."};
Expand Down
5 changes: 3 additions & 2 deletions src/hierarchical_interleaved_bloom_filter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -182,8 +182,9 @@ void build_index(hierarchical_interleaved_bloom_filter & hibf,
layout::graph::node const & root_node = data.ibf_graph.root;

size_t const t_max{root_node.number_of_technical_bins};
data.fpr_correction = layout::compute_fpr_correction(
{.fpr = config.maximum_false_positive_rate, .hash_count = config.number_of_hash_functions, .t_max = t_max});
data.fpr_correction = layout::compute_fpr_correction({.fpr = config.maximum_fpr, //
.hash_count = config.number_of_hash_functions,
.t_max = t_max});

hierarchical_build(hibf, root_node, data);

Expand Down
2 changes: 1 addition & 1 deletion src/interleaved_bloom_filter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ size_t max_bin_size(config & configuration)
max_size = std::max(max_size, kmers.size());
}

return build::bin_size_in_bits({.fpr = configuration.maximum_false_positive_rate,
return build::bin_size_in_bits({.fpr = configuration.maximum_fpr, //
.hash_count = configuration.number_of_hash_functions,
.elements = max_size});
}
Expand Down
4 changes: 2 additions & 2 deletions src/layout/compute_layout.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -31,11 +31,11 @@ layout compute_layout(config const & config,
std::stringstream output_buffer;
std::stringstream header_buffer;

data_store store{.false_positive_rate = config.maximum_false_positive_rate,
data_store store{.false_positive_rate = config.maximum_fpr,
.hibf_layout = &resulting_layout,
.kmer_counts = std::addressof(kmer_counts),
.sketches = std::addressof(sketches)};
store.fpr_correction = compute_fpr_correction({.fpr = config.maximum_false_positive_rate,
store.fpr_correction = compute_fpr_correction({.fpr = config.maximum_fpr, //
.hash_count = config.number_of_hash_functions,
.t_max = config.tmax});

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ auto set_up(::benchmark::State const & state)
seqan::hibf::config config{.input_fn = distribute_hashes_across_ub,
.number_of_user_bins = num_ub,
.number_of_hash_functions = hash_num,
.maximum_false_positive_rate = fpr,
.maximum_fpr = fpr,
.threads = 4u, // Only applies to layout and build
.disable_estimate_union = true};

Expand Down
4 changes: 2 additions & 2 deletions test/snippet/hibf/hibf_construction.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@ int main()
seqan::hibf::config config{.input_fn = my_input, // required
.number_of_user_bins = 2, // required
.number_of_hash_functions = 2,
.maximum_false_positive_rate = 0.05, // recommended to adapt
.threads = 1, // recommended to adapt
.maximum_fpr = 0.05, // recommended to adapt
.threads = 1, // recommended to adapt
.sketch_bits = 12,
.tmax = 0, // triggers default copmutation
.alpha = 1.2,
Expand Down
2 changes: 1 addition & 1 deletion test/snippet/readme.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ int main()
seqan::hibf::config config{.input_fn = get_user_bin_data, // required
.number_of_user_bins = 3u, // required
.number_of_hash_functions = 2u,
.maximum_false_positive_rate = 0.05,
.maximum_fpr = 0.05,
.threads = 1u};

// The HIBF constructor will determine a hierarchical layout for the user bins and build the filter.
Expand Down
Loading

0 comments on commit 226b1f1

Please sign in to comment.