Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.1 -> v2.0 Semantics refactor #80

Closed
wants to merge 43 commits into from
Closed

Conversation

ejhumphrey
Copy link
Collaborator

@ejhumphrey ejhumphrey commented Apr 9, 2017

Renames tests for clearer mappings between submodules
Addresses much of the conversation from #75

  • will break the current API, and needs some help signaling deprecations
  • Streamer.generate is now Streamer.iterate, since it returns an iterator
  • Streamertypes are now directly iterable
  • Mux semantics are simplified to remove the (now superfluous) seed concept, which were synonymous with streams; now there are only streamers and streams.
  • the notion of batch is scrubbed in all core functionality.

Will resolve: #64, #75, #76, #77, #78, #79, #81

@ejhumphrey ejhumphrey requested review from cjacoby and bmcfee April 9, 2017 17:38
Copy link
Collaborator

@bmcfee bmcfee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes mostly look good. There are a couple of things I'd like to see added, and we need to properly handle deprecation and parameter renames.

@@ -56,31 +58,31 @@ def activate(self):
"""Activates the stream."""
self.stream_ = self.streamer

def generate(self, max_batches=None):
def iterate(self, max_iter=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

persist the old generate function here. we can wrap the call signature for API compatibility, but there should be a warning.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

roger that

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bmcfee do you have an example of "@Move"ing a class's method? I could hack it out of if need be, but I'd rather do the fast thing.

def buffer_batch(generator, buffer_size):
'''Buffer an iterable of batches into larger (or smaller) batches
def buffer_batch(streamer, buffer_size):
'''Buffer data samples from an streamer into one data object.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't actually require a streamer, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it needs an iterable; not sure what a sane variable name would be here. I also aspire to change this on a subsequent PR, so .... any strong opinions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opted for stream

pescador/core.py Outdated
@@ -8,6 +8,11 @@

class StreamActivator(object):
def __init__(self, streamer):
for mname in ['activate', 'deactivate']:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just check isinstance(streamer, Streamer) instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl;dr: ya, that's better.

commentary: I think I had a lapse of Java ptsd, and wanted to check that the object implements the interface of activate/deactivate, since it doesn't strictly matter that it's a Streamer (although when would we run into that? /shrug). If we really cared, a more Pythonic way of doing this would be to define an ActiveStream mixin, but ... this is total overkill for where we are right now, and mention it now only for completeness.

pescador/core.py Outdated
raise PescadorError('streamer must be a generator, iterable, or '
'Streamer')
raise PescadorError('`streamer` must be a callable function that '
'returns an iterable object.')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we actually generalize this and support iterables as well as functions? Call it a stretch goal.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it does as of this change 😄 ... I've updated the error message, and added a Streamer(iterable) test in da42dbf.

pescador/core.py Outdated

def deactivate(self):
self.stream_ = None

def generate(self, max_batches=None):
'''Instantiate the generator
def iterate(self, max_iter=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, deprecation rename / persist old signature with warning

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

roger roger

pescador/core.py Outdated

# TODO: ejhumphrey would like to deprecate this class method and turn it
# into a "transform". @bmcfee please advise? example to follow?
def tuples(self, *items, **kwargs):
'''Generate data in tuple-form instead of dicts.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not 100% on what a transform is defined to be.

I find the .tuples() interface quite useful for interfacing with keras though, and I'd prefer to not have to replace it with something more cumbersome.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a long-form response here, including why I think this is an anti-pattern.

tl;dr: Streamer.tuples only works for a subset of data streams that might be fed through it; I'd propose explicitly separating Streamer functionality from processing data streams.

The thing I'd like to replace it with is a separate tuples function (as sketched out in that linked issue), which I'd argue isn't any more cumbersome than what we currently have and Streamer remains data stream agnostic.

pescador/mux.py Outdated
lam=256.0, pool_weights=None, with_replacement=True,
prune_empty_seeds=True, revive=False,
def __init__(self, streamers, k,
lam=256.0, weights=None, with_replacement=True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deprecation renames EVERYWHERE

while we're at it though, I'd like to replace lam=>rate

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha roger

@@ -81,15 +81,16 @@ def zmq_recv_batch(socket, flags=0, copy=True, track=False):
return results


def zmq_worker(port, streamer, terminate, copy=False, max_batches=None):
def zmq_worker(port, streamer, terminate, copy=False, max_iter=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renames

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yayaya

@bmcfee bmcfee added the API label Apr 9, 2017
@bmcfee bmcfee modified the milestones: 1.1.0, 2.0.0 Apr 9, 2017
@bmcfee
Copy link
Collaborator

bmcfee commented Apr 9, 2017

@ejhumphrey Re: handling variable rename deprecations, here's how librosa does it. Note that the current stable version has no lingering renames, but rolling back to 0.4.3 you can find some examples.

  1. Check out the util.deprecation submodule
  2. Here's an example of it in use: https://github.com/librosa/librosa/blob/446708889bf04438aaac7fe64aa738a09955bbf6/librosa/core/constantq.py#L327-L329
  3. There's also some helpers in the util.decorators submodule for deprecating entire functions, or moving them to different submodules/renaming them.
  4. Here's an example of move/rename behavior: https://github.com/librosa/librosa/blob/446708889bf04438aaac7fe64aa738a09955bbf6/librosa/core/audio.py#L692-L693
  5. Here's an example of deprecated function behavior: https://github.com/librosa/librosa/blob/446708889bf04438aaac7fe64aa738a09955bbf6/librosa/feature/spectral.py#L1337-L1338

The futurepast package implements some of these things and more, but it's not yet at an initial release.

@ejhumphrey
Copy link
Collaborator Author

thanks @bmcfee, made it pretty far ... still have some work to do, but any advice / pointers will be helpful.

@bmcfee
Copy link
Collaborator

bmcfee commented Apr 9, 2017

@ejhumphrey I don't have an example of @moveing a class method, because that gets a bit more involved. However, I think you don't necessarily want or need that here, since the call signature has changed as well. I would just write a shim function to translate to the new API, and a custom warning message.

@ejhumphrey
Copy link
Collaborator Author

alrighty, updated the examples, and added some slightly more meaningful benchmarking to the zmq_example, which burns cycles to prove that things are happening in the background.

LET'S DO THIS

X = np.atleast_2d(X)
# y's are binary vectors, and should be of shape (1, 10) after this.
# y's are binary vectors, and should be of shape (10,) after this.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

atleast_2d will make them shape (1,10)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed; get rid of the atleast_2d on the Y, if that' swhat you intend (which I think it is).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -61,14 +61,14 @@ def npz_generator(npz_path):
"""Generate data from an npz file."""
npz_data = np.load(npz_path)
X = npz_data['X']
# y's are binary vectors, and should be of shape (1, 10) after this.
# y's are binary vectors, and should be of shape (10,) after this.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment here, i think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment about the shape is almost certainly not right here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

pescador/maps.py Outdated
inputs = [inputs]
if outputs and isinstance(outputs, six.string_types):
outputs = [outputs]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon closer inspection, I think this still isn't correct.

I don't think fit(inputs=x) and fit(inputs=[x]) do the same thing (I might be wrong on that though). If that's the case, then upcast-downcast will not behave as expected.

Copy link
Collaborator

@cjacoby cjacoby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with some minor nits, specifically:

  • version increment in separate PR?
  • Don't forget about documentation updates! (/doc/...). I could help with that though, if you like. I'd also be for that being in another PR.

X = np.atleast_2d(X)
# y's are binary vectors, and should be of shape (1, 10) after this.
# y's are binary vectors, and should be of shape (10,) after this.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed; get rid of the atleast_2d on the Y, if that' swhat you intend (which I think it is).

@@ -61,14 +61,14 @@ def npz_generator(npz_path):
"""Generate data from an npz file."""
npz_data = np.load(npz_path)
X = npz_data['X']
# y's are binary vectors, and should be of shape (1, 10) after this.
# y's are binary vectors, and should be of shape (10,) after this.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment about the shape is almost certainly not right here.

number of *buffered* batches that are generated,
not the number of individual samples.

partial : bool, default=True
Return buffers smaller than the requested size.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the phrasing in the other case this came up, below; Return buffers smaller than the requested size. makes it sound like it always does that, which is not quite what it means. (But, a 'nit really)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Use the `streamers` parameter instead.
The `seed_pool` parameter will be removed in pescador 2.0.

lam : float > 0.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 on converting lam to rate - I think that's sooo much more intuitive.

However, you missed including rate in the docs here. I suspect the lam above is supposed to be rate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, caught it

@@ -3,4 +3,4 @@
"""Version info"""

short_version = '1.0'
version = '1.0.0'
version = '1.1.0'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bmcfee - Correct me if I misunderstand the case, but I believe last time I tried to do this that you requested all version increments be in a separate PR. I would expect that to apply here.

Also... shouldn't we increment the short_version too?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can I trouble y'all for consensus? I'd love to get this over the goal line today 😄

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

roger roger


T.__eq_lists(reference, estimate)

estimate = pescador.BufferedStreamer(gen_stream, buf_size)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious; is this copypasta, or have you kept the old interface here on purpose? Same below.

(If these tests are here to make sure the old interface still works, that's fine, just mention that in a comment someplace, and that they should be removed when the deprecations are removed.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya, the whole buffered submodule is going away in 2.0, I've added a comment.

@ejhumphrey
Copy link
Collaborator Author

couple comments here because this thread is getting insane.

  • versioning: what's our stance (this vs separate)?
  • docs: help would be great. also I dunno what I'm doing. so help would be really great.

@cjacoby
Copy link
Collaborator

cjacoby commented Apr 17, 2017

Re: Versioning. I'm posting this here, too 'cause embedded is getting too crazy.

The CONTRIBUTING.md says:
Please keep branches as simple as possible: do not implement multiple features in the same branch, unless they are mutually dependent on one another. By the same token, please do not change unrelated portions of the repository (e.g., version numbers or changelog entries).
and from that, I'd say, separate PR for version update.

Re: Docs - let's start an Issue with a checklist?

@cjacoby
Copy link
Collaborator

cjacoby commented Apr 17, 2017

(Once we have those two things, I think this PR is 👍 )

@ejhumphrey
Copy link
Collaborator Author

hey team, two things:

  • for clarity, you're saying create a docs issue and fix it in this PR? or create an issue and fix it elsewhere?
  • in parallel, I started updating a recent project to (what will be) pescador 1.1, and having some difficulties dealing with Mux, specifically, the Mux-of-Muxes case that remains a little bit suspicious, as far as I'm concerned. I'm still in the process of determining whether my current struggles are a result of poor initial design of data streams in that project, a fundamental issue with the Mux API, or some combination of both. Regardless, I don't want to merge this PR until I've got concrete answers about what the hell is going on.

@cjacoby
Copy link
Collaborator

cjacoby commented Apr 17, 2017

  • Re: docs. I'm proposing it gets fixed in a separate PR. The items could get logged in that PR, or in an issue. But I'd start with an issue at least. If you write up an issue with all the components that changed such that they will affect the docs, I could take a stab at updating the docs to reflect the changes made in a PR.

  • Re: Muxes. This sounds like a good candidate for identifying via improved testing, ya?

If ``True``, Streamers that previously produced no data are never
prune_empty_streams : bool
Disable streamers that produce no data.
If ``True``, streamers that previously produced no data are never
revisited.
Note that this may be undesireable for streams where past emptiness
may not imply future emptiness.

revive: bool
If ``with_replacement`` is ``False``, setting ``revive=True``
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bmcfee @cjacoby, pop quiz ... if with_replacement=True, what is the significance of revive?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing.

@ejhumphrey
Copy link
Collaborator Author

ejhumphrey commented Apr 18, 2017

@bmcfee, can I trouble you for a ✅ ?

pescador/mux.py Outdated
break

# TODO: Stream uniqueness can't be guaranteed here.
# Other machinery necessary to impose permutation, this is
# implicitly `with_replacement`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really? I thought we zero out the probability of active streams when with_replacement=False.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct, and I've already removed the comment locally; this was my read at midnight last night, and fixed it this am with some coffee-inspired clarity.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw I was totally wrong and just thought i did (uhderp)

Copy link
Collaborator

@bmcfee bmcfee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One dangling rejoinder to a todo/comment you left in mux, but otherwise fine.

@ejhumphrey
Copy link
Collaborator Author

replaced by #88

@ejhumphrey ejhumphrey closed this Apr 18, 2017
@bmcfee bmcfee mentioned this pull request Dec 8, 2017
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simplified __call__ interface?
3 participants