-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add statistical sampling tests #73
Comments
To start with, let's take a simple generative model over the three symbol alphabet
and Then we want to do guided generation given the regex "11[01]+|0[01]*", which corresponds to the DFA We want to construct some probabilistic tests for sampling from this DFA with these transition probabilities. The idea is to compute some statistical quantities of a generation and look compare simulations to the exact quantity. Let's look at the length of the generation. Once we are in the penultimate state of the DFA, the number of transitions needed to move to the terminal state a negative-binomial(1, 0.1) distribution, which has mean 9. Hence, the expected length of a generated string (including the eos token) is (The 2+1 was missing, reflects the "[01]+", so the path starting with a 1 is always at least 3 long.) This example can be expanded out in various ways. For instance, we can look at more complex functionals (eg number of 1s generated). We can also look at a more complex regex. Any thoughts about how complex we should get? |
That sounds great!
For now, we need the simplest tests possible for addressing statistical consistency. Our CI runs are already getting pretty long. |
Do you think these should be in a separate set of tests, given that they are stochastic and, therefore, somewhat flakey? |
These tests can start in their own module, if that's what you're asking; otherwise, we'll fix the seeds to avoid the flakiness for now. |
Leaving this here just in case we need it in the future. Let which has mean A quick calculation says that the standard deviation of the mean of 100 samples is about 0.3. But I'm freezing the seed. |
We need simple tests that confirm some expectations regarding sampling via an FSM.
For example, such a test might use a simple vocabulary over
"0"
,"1"
, andeos
, enumerate allthe distinct token sequences of a given length—fixing the probabilities/logits, of course—and, with that, draw samples and assert something about the empirical probabilities.
N.B. This could be done after #67 (and possibly #69, #71) to avoid more refactoring.
The text was updated successfully, but these errors were encountered: