-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-stochastic sampling and literature #104
Comments
Not really -- at one point, I had started on a write-up to analyze poisson scheduler, but it got nasty quite quickly. (It might be easier with negative binomial distribution instead of poisson, left as future work.)
Not currently, but it is being implemented as part of #96. |
Okay, that PR is closed. Is it superseeded by #103? |
Yep, indeed.
…On Fri, Oct 20, 2017, 07:39 Stefan Balke ***@***.***> wrote:
Okay, that PR is closed. Is it superseeded by #103
<#103>?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#104 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA4t87Ic8grj9bzw5ZbM6a9K23xz3p88ks5suLCGgaJpZM4QAUa2>
.
|
@cjacoby thanks for the effort. Can you explain how the above mentioned case is considered in your PR? I think
Is this covered? |
Okay, lessee. So, the behavior you describe would be (I think) best approximated by the following: Use a def file_slicer(x):
"""
Parameters
------------
x : str
Filename
Yields
------
sample
"""
# TODO
streams = [pescador.Streamer(file_slicer, x) for x in your_files]
file_mux = pescador.mux.PoissonMux(streams, k=100, rate=None, mode="exhaustive")
# Now, you would do this:
while True:
for sample in file_mux.iterate():
# do your thing here
# the above will break out when all streams are exhausted
# but when you loop back, all streams will be reset. But another way you could run it is like this: for sample in file_mux.cycle():
# do your thing
Does that help / make sense? BTW, you can do the same thing right now in the 1.1 release (before #103 is complete) with the following pescador.Mux(streams, k=100, rate=None, with_replacement=False, revive=False) Note: If you don't use |
Hey, thanks for the explanations. I think doing pescador.Mux(streams, k=100, rate=None, with_replacement=False, revive=False) is perfect for validation error. In training, I think doing pescador.Mux(streams, k=100, rate=None, with_replacement=False, revive=True) could be a thing. The only "missing" feature would now be that someone has to do the bookkeeping to check which streams are left in this epoch (an epoch is now defined as cycling through the whole dataset). What happens now is that a stream might get empty, is put in line again, and by chance can get activated as the next stream to open (though very unlikely). Why am I so pedantic about these things? I think the behaviour of the muxer should be as transparent as possible. Considering DNN research, publications are already using this package and I fear that by these different sampling schemas, we may introduce new effects. This in turn means that we may discuss results which only could happen due to the sampling schema. It is another kind of "hyperparameter" we somehow have to address or at least be aware of. However, I like the package and I'm using it because it may give me control about my data sampling, e.g. if I use Thanks a lot. P.S.: The
For me, "infinitely" means that it will be reseted after its empty but that should be controlled via |
💯 agreed! Team pescadores definitely supports such pedantry, e.g. RNG seeding or having good mechanisms for keeping an audit trail for sample presentation order (see #85). I'd also point out (lament?) that, despite every meaningful neural network result in the last 20+ years leveraging stochastic gradient descent, only a small fraction of that research acknowledges the role that ordering will have on the models that result (and even smaller still do something about it, i.e. curriculum learning). I think more emphasis should be placed on how data are sampled, regardless of what tooling one uses for training. This probably gets a little muddy when doing things like asynchronous SGD / pooling gradients / other distributed madness ... but one problem at a time 😄 . Bonus thought: personally, I'm not necessarily convinced that an "epoch" is a meaningful measurement of progress during training, but that's maybe a different discussion. |
P.S.: The rate=None could be documented more clearly. ATM it says:
If None, sample infinitely from each stream.
For me, "infinitely" means that it will be reseted after its empty but
that should be controlled via revive=True. Maybe one could be clearer about
that. Can do a PR for that if wanted.
This is definitely unclear, and perhaps even wrong. Documentation updates
are pending on completion of #103, since it has a bunch of API changes for
2.0. I'll open an issue with this example so it is documented for when that
happens.
`rate=None` causes the underlying `pescador.Streamer()` to get launched
with `.iterate(max_iter=None)`, which means that that streamer will run
until it is empty. If that streamer is infinite, it will continue streaming
infinitely, but that is actually a property of the underlying streamer you
passed in, not the Mux. Whether or not that streamer gets restarted after
it is empty or not is `revive` (in the pescador<=1.1).
Minor followup on what Eric said, Mux auditing is on the roadmap / in
discussion, but is out of scope on #103.
…On Sun, Oct 29, 2017 at 9:54 AM Eric J. Humphrey ***@***.***> wrote:
Considering DNN research, publications are already using this package and
I fear that by these different sampling schemas, we may introduce new
effects. This in turn means that we may discuss results which only could
happen due to the sampling schema. It is another kind of "hyperparameter"
we somehow have to address or at least be aware of.
💯 agreed! Team pescadores definitely supports such pedantry, e.g. RNG
seeding
<https://github.com/pescadores/pescador/blob/master/pescador/mux.py#L30>
or having good audit trail for sample presentation order (see #85
<#85>).
I'd also point out (lament?) that, despite every meaningful neural network
result in the last 20+ years leveraging *stochastic* gradient descent,
only a small fraction of that research acknowledges the role that ordering
will have on the models that result (and even smaller still do something
about it, *i.e.* curriculum learning). I think more emphasis should be
placed on how data are sampled, regardless of what tooling one uses for
training. This probably gets a little muddy when doing things like
asynchronous SGD / pooling gradients / other distributed madness ... but
one problem at a time 😄 .
Bonus thought: personally, I'm not necessarily convinced that an "epoch"
is a meaningful measurement of progress during training, but that's maybe a
different discussion.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#104 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA4t84InZbzqryT9r0nLOfpzMhtz457Tks5sxK2vgaJpZM4QAUa2>
.
|
Circling back on this after merging #103 -- I think what you actually need for deterministic (fixed sample) validation is either a roundrobin or chain mux in cycle mode. As long as the individual streamers are deterministic, this will ensure that you get the same data each time through the cycle. You will need to be careful to enforce two things:
If you time this correctly, then the mux will be back at the beginning position when the next validation call happens, and you'll get the same sequence out. We should add these examples to the documentation gallery. |
Thanks for the explanations. Will look into this soon. Maybe can contribute first drafts for some docs since I have to read some things before I guess :) |
Maybe we can also take a simple classification example such as MNIST in a jupyter notebook and check the influence on the accuracy when using different sampling schemas. |
Sure, although I think it's simpler and cleaner to look directly at the statistics of samples, like we're doing in a few of the unit tests now. Imagine a list of streamers If it does, that's enough to imply that any function (eg, BUT, we should definitely include a concrete example of how to do it properly, and mnist is as good a demo as any. |
I've added a couple of advanced examples in #114 (rtd) that should help with this issue. More generally, returning to the two questions that sparked this thread:
Once 2.0 is out, I'd like to write a paper formalizing what we did #32 and providing some empirical measurements of the poissonmux's output distribution under different regimes (active vs rate vs n_streams). I think this is out of scope for the core pescador project / documentation, but we'll certainly link back to it once it's out.
I think this example covers this for validation using After I write up the training epoch example, will this issue be sufficiently addressed? |
Side note: I hacked up a quick and dirty experiment measuring the poissonmux's properties as a function of (x-axis is iterations, y-axis is entropy of the sample stream indices over time) Take-aways:
|
Super cool, thanks for these experiments. Makes it much easier to get an intuition! |
@stefan-balke do you think this issue is now sufficiently resolved? |
Yes, thanks, closing this out! |
Hi pescadores,
first of all: thanks for this nice package!
I have used it now for some DNNs and it seems to work. However, two questions remain:
Maybe I haven't fully understood the muxing behaviour...but I want to! :)
Thanks
Stefan
The text was updated successfully, but these errors were encountered: