-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: epoch / randomness tracking or auditing, and callback API #85
Comments
I like this idea in the abstract. I'm not sure how exactly it should look though -- a full keras-style callback infrastructure seems overkill, since we don't really have a top-level controller to trigger events. Thinking a little more about it, this seems like two issues to me.
def my_callback(...):
do some stuff
epoch = EpochCallback(n=1000, callbacks=[my_callback])
for item in epoch(my_streamer):
do some other stuff then it's just a matter of specifying the interface for callback functions. |
Thoughts from the peanut gallery - is this still in for 1.1.0, or is this really a 2.0 feature? #92 is labelled as 2.0. Trying to clean up what really needs to be done for 1.1 so I can prioritize. |
I'd say ... if the right design is a callback, then 2.0; if it's setting some counters in the object, then maybe 1.1. maybe. thoughts? |
I think the callback is probably better / more future-proof. |
on the auditing front, I was just wrestling with my design of "how" I wanted to sample some data for training, and decided I wanted to log my samplers so I could parse things out later and check my statistics. The important parts look like this.. but first! this is entirely proof-of-concept, though I am curious to subsequently discuss better designs, impact on efficiency, etc. When I'm writing research packages, I like to have my data stream machinery defined in the same submodule. After my import preamble (which includes stream_logger = logging.getLogger("stream_logger")
# Nothing crazy here.
def init_stream_logging(log_file):
hdlr = logging.FileHandler(log_file)
formatter = logging.Formatter('%(asctime)s %(levelname)s %(message)s')
hdlr.setFormatter(formatter)
stream_logger.addHandler(hdlr)
stream_logger.setLevel(logging.INFO) Then, say I have a sampler that plucks observations from an NPZ file ... I add a logging statement on index selection, before yielding data: def my_sampler(feature_dir, key):
with np.load(os.path.join(feature_dir, "{}.npz".format(key)) as data:
X = data['x_in']
Y = data['y_true']
N = len(X)
while True:
n = np.random.randint(0, N)
stream_logger.info(json.dumps(dict(key=key, n=n, y_true=Y[n])))
yield dict(x_in=X[n], y_true=Y[n]) Then, for completeness, I'll init a file handler and mux a stream. Assume I've got a bunch of files that are like init_stream_logging("samples.log")
stream = pescador.Mux(
[pescador.Streamer(my_sampler, "/path/to/features", key) for key in 'abcdefg'],
k=5, rate=10, revive=True, with_replacement=False, prune_empty_streams=True)
list(stream.iterate(max_iter=1000)) Now we can go pop open that log file and look at some stats! from collections import Counter
samples = [json.loads(l.strip().split("INFO ")[-1]) for l in open("samples.log")]
Counter([x['y_true'] for x in samples])
# Produces something like...
Counter({0: 110, 1: 99, 2: 101, 3: 92, 4: 100, 5: 101, 6: 111, 7: 92, 8: 94, 9: 100})
Counter([x['key'] for x in samples])
# Produces something like...
Counter({'a': 166, 'b': 127, 'c': 154, 'd': 136, 'e': 152, 'f': 122, 'g': 143}) or whatever. I haven't had much time to mull this over, so I'm sure I'll have more ideas / opinions later, but thought this was worth sharing given the discussion in #104 (assuming @stefan-balke, @cjacoby may care). In particular though, I'm somewhat worried about the time this would lose to json serialization (there will be so many samples...), and the log-parsing after the fact is quite gross. I'm not sure if atomic logging versus building a cache that'd be worth background threading, but ... I'm guessing. Also, I kinda like the idea of using logging (rather than keras history) to track training loss / error, but maybe this is out of scope. I'd also be keen to "type" the logs to filter on say samples versus other events, but didn't come across any docs on this (nor did I look very hard). also also, I looked around and |
Can this be a 2.1 feature? It should only add to the API, not change existing functionality. |
👍
…On Wed, Jan 24, 2018 at 8:20 AM Brian McFee ***@***.***> wrote:
Can this be a 2.1 feature? It should only add to the API, not change
existing functionality.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#85 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA4t8ylUexgDF8pCTYcLiVpd1Ca4H_Seks5tN1gzgaJpZM4NAVCS>
.
|
@cjacoby got any cycles to look into this? I'd like to get 2.1 off my stack in the short term. |
Given the radio silence on this, and a lack of clear picture of what exactly the API should be, how do you all feel about dropping this feature @ejhumphrey @cjacoby ? I can see its utility, but it also makes things much more complicated. |
Sorry! I could make time to work on this this week if you think it's
useful. (Reading emails tho, that I am bad at making time for ;) )
I think it might make sense to punt to next version just to at least
clarify the API/approach. Or, we make it "provisional"?
I have *an* approach in my head, though don't know if it's the best one.
…On Thu, Aug 22, 2019, 10:57 Brian McFee ***@***.***> wrote:
Given the radio silence on this, and a lack of clear picture of what
exactly the API should be, how do you all feel about dropping this feature
@ejhumphrey <https://github.com/ejhumphrey> @cjacoby
<https://github.com/cjacoby> ? I can see its utility, but it also makes
things much more complicated.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#85?email_source=notifications&email_token=AAHC344ASMD5TIFEJQTUX7TQF3HRNA5CNFSM4DIBKCJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD454TRY#issuecomment-524011975>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAHC34ZDXZ5SJV4DTQGQ3NDQF3HRNANCNFSM4DIBKCJA>
.
|
Ok. How about we punt it to some yet-to-be-determined 3.x release then? |
I suspect that some sort of callback or query on the Mux object is the right solution here, but I figured it would be best to initiate a discussion first.
The text was updated successfully, but these errors were encountered: