-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.1 -> v2.0 Semantics refactor #80
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes mostly look good. There are a couple of things I'd like to see added, and we need to properly handle deprecation and parameter renames.
pescador/buffered.py
Outdated
@@ -56,31 +58,31 @@ def activate(self): | |||
"""Activates the stream.""" | |||
self.stream_ = self.streamer | |||
|
|||
def generate(self, max_batches=None): | |||
def iterate(self, max_iter=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
persist the old generate function here. we can wrap the call signature for API compatibility, but there should be a warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
roger that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pescador/buffered.py
Outdated
def buffer_batch(generator, buffer_size): | ||
'''Buffer an iterable of batches into larger (or smaller) batches | ||
def buffer_batch(streamer, buffer_size): | ||
'''Buffer data samples from an streamer into one data object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't actually require a streamer, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it needs an iterable; not sure what a sane variable name would be here. I also aspire to change this on a subsequent PR, so .... any strong opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opted for stream
pescador/core.py
Outdated
@@ -8,6 +8,11 @@ | |||
|
|||
class StreamActivator(object): | |||
def __init__(self, streamer): | |||
for mname in ['activate', 'deactivate']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just check isinstance(streamer, Streamer)
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tl;dr: ya, that's better.
commentary: I think I had a lapse of Java ptsd, and wanted to check that the object implements the interface of activate
/deactivate
, since it doesn't strictly matter that it's a Streamer (although when would we run into that? /shrug). If we really cared, a more Pythonic way of doing this would be to define an ActiveStream mixin, but ... this is total overkill for where we are right now, and mention it now only for completeness.
pescador/core.py
Outdated
raise PescadorError('streamer must be a generator, iterable, or ' | ||
'Streamer') | ||
raise PescadorError('`streamer` must be a callable function that ' | ||
'returns an iterable object.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we actually generalize this and support iterables as well as functions? Call it a stretch goal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it does as of this change 😄 ... I've updated the error message, and added a Streamer(iterable)
test in da42dbf.
pescador/core.py
Outdated
|
||
def deactivate(self): | ||
self.stream_ = None | ||
|
||
def generate(self, max_batches=None): | ||
'''Instantiate the generator | ||
def iterate(self, max_iter=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, deprecation rename / persist old signature with warning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
roger roger
pescador/core.py
Outdated
|
||
# TODO: ejhumphrey would like to deprecate this class method and turn it | ||
# into a "transform". @bmcfee please advise? example to follow? | ||
def tuples(self, *items, **kwargs): | ||
'''Generate data in tuple-form instead of dicts. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not 100% on what a transform
is defined to be.
I find the .tuples()
interface quite useful for interfacing with keras though, and I'd prefer to not have to replace it with something more cumbersome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a long-form response here, including why I think this is an anti-pattern.
tl;dr: Streamer.tuples
only works for a subset of data streams that might be fed through it; I'd propose explicitly separating Streamer
functionality from processing data streams.
The thing I'd like to replace it with is a separate tuples
function (as sketched out in that linked issue), which I'd argue isn't any more cumbersome than what we currently have and Streamer remains data stream agnostic.
pescador/mux.py
Outdated
lam=256.0, pool_weights=None, with_replacement=True, | ||
prune_empty_seeds=True, revive=False, | ||
def __init__(self, streamers, k, | ||
lam=256.0, weights=None, with_replacement=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deprecation renames EVERYWHERE
while we're at it though, I'd like to replace lam
=>rate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha roger
pescador/zmq_stream.py
Outdated
@@ -81,15 +81,16 @@ def zmq_recv_batch(socket, flags=0, copy=True, track=False): | |||
return results | |||
|
|||
|
|||
def zmq_worker(port, streamer, terminate, copy=False, max_batches=None): | |||
def zmq_worker(port, streamer, terminate, copy=False, max_iter=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renames
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yayaya
@ejhumphrey Re: handling variable rename deprecations, here's how librosa does it. Note that the current stable version has no lingering renames, but rolling back to 0.4.3 you can find some examples.
The futurepast package implements some of these things and more, but it's not yet at an initial release. |
thanks @bmcfee, made it pretty far ... still have some work to do, but any advice / pointers will be helpful. |
@ejhumphrey I don't have an example of |
alrighty, updated the examples, and added some slightly more meaningful benchmarking to the zmq_example, which burns cycles to prove that things are happening in the background. LET'S DO THIS |
examples/frameworks/keras_example.py
Outdated
X = np.atleast_2d(X) | ||
# y's are binary vectors, and should be of shape (1, 10) after this. | ||
# y's are binary vectors, and should be of shape (10,) after this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
atleast_2d will make them shape (1,10)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed; get rid of the atleast_2d
on the Y
, if that' swhat you intend (which I think it is).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
examples/mux_files_example.py
Outdated
@@ -61,14 +61,14 @@ def npz_generator(npz_path): | |||
"""Generate data from an npz file.""" | |||
npz_data = np.load(npz_path) | |||
X = npz_data['X'] | |||
# y's are binary vectors, and should be of shape (1, 10) after this. | |||
# y's are binary vectors, and should be of shape (10,) after this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment here, i think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment about the shape is almost certainly not right here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pescador/maps.py
Outdated
inputs = [inputs] | ||
if outputs and isinstance(outputs, six.string_types): | ||
outputs = [outputs] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon closer inspection, I think this still isn't correct.
I don't think fit(inputs=x)
and fit(inputs=[x])
do the same thing (I might be wrong on that though). If that's the case, then upcast-downcast will not behave as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with some minor nits, specifically:
- version increment in separate PR?
- Don't forget about documentation updates! (
/doc/...
). I could help with that though, if you like. I'd also be for that being in another PR.
examples/frameworks/keras_example.py
Outdated
X = np.atleast_2d(X) | ||
# y's are binary vectors, and should be of shape (1, 10) after this. | ||
# y's are binary vectors, and should be of shape (10,) after this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed; get rid of the atleast_2d
on the Y
, if that' swhat you intend (which I think it is).
examples/mux_files_example.py
Outdated
@@ -61,14 +61,14 @@ def npz_generator(npz_path): | |||
"""Generate data from an npz file.""" | |||
npz_data = np.load(npz_path) | |||
X = npz_data['X'] | |||
# y's are binary vectors, and should be of shape (1, 10) after this. | |||
# y's are binary vectors, and should be of shape (10,) after this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment about the shape is almost certainly not right here.
pescador/buffered.py
Outdated
number of *buffered* batches that are generated, | ||
not the number of individual samples. | ||
|
||
partial : bool, default=True | ||
Return buffers smaller than the requested size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the phrasing in the other case this came up, below; Return buffers smaller than the requested size.
makes it sound like it always does that, which is not quite what it means. (But, a 'nit really)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Use the `streamers` parameter instead. | ||
The `seed_pool` parameter will be removed in pescador 2.0. | ||
|
||
lam : float > 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 on converting lam
to rate
- I think that's sooo much more intuitive.
However, you missed including rate
in the docs here. I suspect the lam
above is supposed to be rate
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, caught it
pescador/version.py
Outdated
@@ -3,4 +3,4 @@ | |||
"""Version info""" | |||
|
|||
short_version = '1.0' | |||
version = '1.0.0' | |||
version = '1.1.0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bmcfee - Correct me if I misunderstand the case, but I believe last time I tried to do this that you requested all version increments be in a separate PR. I would expect that to apply here.
Also... shouldn't we increment the short_version
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can I trouble y'all for consensus? I'd love to get this over the goal line today 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plz see: https://github.com/pescadores/pescador/blob/master/CONTRIBUTING.md, "Additional Comments".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
roger roger
|
||
T.__eq_lists(reference, estimate) | ||
|
||
estimate = pescador.BufferedStreamer(gen_stream, buf_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious; is this copypasta, or have you kept the old interface here on purpose? Same below.
(If these tests are here to make sure the old interface still works, that's fine, just mention that in a comment someplace, and that they should be removed when the deprecations are removed.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya, the whole buffered
submodule is going away in 2.0, I've added a comment.
couple comments here because this thread is getting insane.
|
Re: Versioning. I'm posting this here, too 'cause embedded is getting too crazy. The Re: Docs - let's start an Issue with a checklist? |
(Once we have those two things, I think this PR is 👍 ) |
hey team, two things:
|
|
If ``True``, Streamers that previously produced no data are never | ||
prune_empty_streams : bool | ||
Disable streamers that produce no data. | ||
If ``True``, streamers that previously produced no data are never | ||
revisited. | ||
Note that this may be undesireable for streams where past emptiness | ||
may not imply future emptiness. | ||
|
||
revive: bool | ||
If ``with_replacement`` is ``False``, setting ``revive=True`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing.
@bmcfee, can I trouble you for a ✅ ? |
pescador/mux.py
Outdated
break | ||
|
||
# TODO: Stream uniqueness can't be guaranteed here. | ||
# Other machinery necessary to impose permutation, this is | ||
# implicitly `with_replacement` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really? I thought we zero out the probability of active streams when with_replacement=False
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct, and I've already removed the comment locally; this was my read at midnight last night, and fixed it this am with some coffee-inspired clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw I was totally wrong and just thought i did (uhderp)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. One dangling rejoinder to a todo/comment you left in mux, but otherwise fine.
replaced by #88 |
Renames tests for clearer mappings between submodules
Addresses much of the conversation from #75
Streamer.generate
is nowStreamer.iterate
, since it returns an iteratorStreamer
types are now directly iterableMux
semantics are simplified to remove the (now superfluous)seed
concept, which were synonymous with streams; now there are only streamers and streams.batch
is scrubbed in all core functionality.Will resolve: #64, #75, #76, #77, #78, #79, #81