-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor sinks #386
Refactor sinks #386
Conversation
…_sinks # Conflicts: # streamz/__init__.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good refactor, I am in favour.
streamz/sinks.py
Outdated
def __init__(self, upstream, func, *args, **kwargs): | ||
self.func = func | ||
# take the stream specific kwargs out | ||
stream_name = kwargs.pop("stream_name", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, and I know it's only moved code, but would be good to not have to update these whenever the superclass might gain more keywords
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how we can both allow mixing stream-specific and func arguments in **kwargs
and avoid explicitly listing them.
I can see two solutions here:
- disallow stream kwargs and just call
func(*args, **kwargs)
- make it a normal stream and have the user deal with constructing the correct single-arg function (with wrapping, closure, partials etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see here: 5c1b3a6#diff-eb662403bf433f3884021435c3f6cd010b387b4174a01367ae415b3e593cde5eR56-R60
Still looks ugly to me, but it does what you asked for :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think, @CJ-Wright , is this better or worse?
streamz/sinks.py
Outdated
1 | ||
""" | ||
def __init__(self, upstream, file, end="\n", mode="a", **kwargs): | ||
self._fp = open(file, mode=mode, buffering=1) if isinstance(file, str) else file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integrate with fsspec.open
?
Why set the buffering?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why set the buffering?
Used line buffering in development to see what's going on without delay, forgot to remove. Also some tests won't work as is without it, but I guess I'll just use wait_for
.
Integrate with
fsspec.open
?
This is really a question about what filesystem functionality should be provided in core. If we want to integrate this sink with fsspec, then from_textfile should be updated too (or some other nodes added, like from_fs
/sink_to_fs
), which is beyond sink refactoring. And then, we should think about how users might have a way to install fsspec extras (s3, hdfs etc.) and what other fs-related nodes might be useful, like:
- listening for new files in a folder/glob and "tailing" them
- writing to same-sized file partitions (like
split
command) - writing to files partitioned by time (e.g. rotating every 5 minutes)
What I'm getting at is that it looks like a plugin to me. We could have streamz[fs]
for nodes and fsspec
built-ins and then streamz[s3,hdfs]
for extras.
Codecov Report
@@ Coverage Diff @@
## master #386 +/- ##
==========================================
+ Coverage 95.74% 95.77% +0.02%
==========================================
Files 17 18 +1
Lines 2559 2577 +18
==========================================
+ Hits 2450 2468 +18
Misses 109 109
Continue to review full report at Codecov.
|
I took the liberty of making the default |
The line that's annoying codecov is this, which should now be removed, if we don't have |
I also have a question. What is the point of returning I tried commenting it out and running tests, everything seems to work the same. |
Hmm I'm not certain. Although at this point I'd trust the tests, if we break something then we should have a test for that behavior |
Looks like we're keeping it for now :) Feel free to make a new PR with the change. |
Great, thanks! |
Just noticed |
Yeah, I mentioned it in PR description. Wasn't confident I should be deleting it, especially because it's not really a source/stream node, but more of a utility function. If you're sure it won't be a breaking change, I'll remove. |
Sorry I missed that.
Maybe just move it to sinks for now?
…On November 27, 2020 6:40:30 PM EST, Mikhail Akimov ***@***.***> wrote:
> Just noticed sink_to_file in streamz.sources
Yeah, I mentioned it in PR description. Wasn't confident I should be
deleting it, especially because it's not really a source/stream node,
but more of a utility function. If you're sure it won't be a breaking
change, I'll remove.
--
You are receiving this because you modified the open/close state.
Reply to this email directly or view it on GitHub:
#386 (comment)
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
|
Resolves #384
Sink
classsinks.py
test_sinks.py
sink_to_file
(which is not a sink, but still someone might be using it)sink_to_textfile
(for symmetry) since there isfrom_textfile
in coresink_to_list
is a special case since it returns a list. This can be done with a class, but involves some black magic with__new__
, don't think it's worth it.