Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infrastructure for Kolmogorov-Smirnov tests, (tested!) methods for sampling with replacement, and changes to include guards #101

Merged
merged 9 commits into from
Jul 16, 2024

Conversation

rileyjmurray
Copy link
Contributor

@rileyjmurray rileyjmurray commented Jul 3, 2024

As usual, the scope of this PR exceeded my original plans.

Incidental, but notable changes

I removed the include-guard pattern

#ifndef <file identifier>_hh
#define <file identifier>_hh
// ... code ...
#endif

and replaced it with #pragma once everywhere.

Main changes

I added three functions to RandBLAS/util.hh:

  • sample_indices_iid: sample with replacement from an index set according to a (nonuniform) probability distribution, as specified by its cumulative distribution function.
  • weights_to_cdf: converts a buffer of nonnegative numbers into a cumulative distribution function.
  • sample_indices_iid_uniform: a more efficient version of sample_indices_iid, specialized to sampling from the uniform distribution.

I added a file

test/test_basic_rng/rng_common.hh

This file is where we'll put code that's used to construct our statistical tests. Conceptually distinct parts of the file:

  • a hard-coded statistical table for running two-sided Kolmogorov-Smirnov tests, plus functions for performing lookups in this table.
  • a section for purely combinatorial helper functions. Right now there's only one such function:log_binomial_coefficient.
  • a section for computing some quantities of interest for the hypergeometric distribution. The function for constructing the PMF is important since we can use it in a Kolmogorov-Smirnov test for correctness of repeated_fisher_yates (our function for sampling uniformly from an index set without replacement). Note: I haven't implemented this test yet!

I also added

test/test_basic_rng/test_sample_indices.cc

Right now it only contains tests for sampling with replacement. It should also contain tests for sampling without replacement, where an empirical CDF for the hypergeometric distribution can be constructed from repeated_fisher_yates and the true CDF can be computed from test_basic_rng/rng_common.hh.

@rileyjmurray rileyjmurray changed the title WIP: Statistical tests Infrastructure for Kolmogorov-Smirnov tests, (tested!) methods for sampling with replacement, and changes to include guards Jul 16, 2024
@rileyjmurray rileyjmurray marked this pull request as ready for review July 16, 2024 21:31
@rileyjmurray rileyjmurray merged commit dd7d158 into main Jul 16, 2024
5 checks passed
@rileyjmurray rileyjmurray deleted the statistical-tests branch July 16, 2024 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant