Replies: 5 comments 2 replies
-
I'd not worry about adding a mapping of hist axes names; you can already set a label to show something different than the name, or change the names later. I'd also like to remove the need to add the Scatter plots would have a different name, right? |
Beta Was this translation helpful? Give feedback.
-
I'm slightly in favour of not adding an In your example you note that the awkward array ends up appearing half-way into the code block. Interestingly, I tend to make my fill calls comprise solely of the identifiers, e.g. ...
n_saturated_left = ak.count_nonzero(left.is_saturated, axis=-1)
n_saturated_right = ak.count_nonzero(right.is_saturated, axis=-1)
return (
Hist.new.Integer(0, 128).Int64().fill(n_saturated_left).fill(n_saturated_right)
) I think I've been subconciously driven to do this because the complexity of the hist construction otherwise adds too much noise to the code. That said, I think this noise is fairly inescapable - histograms need a lot of detail to be constructed, so perhaps they should be their own line of code. I ended up making a list of features that I think we/hist could benefit from. I'll enumerate them here:
I think first-and-foremostly, making arrays = np.broadcast_arrays(
zone, n_saturated > 0, n_ringing_chain, n_non_ringing_chain
)
hist.fill(
*[
np.ravel(x)
for x in arrays
]
) vs arrays = np.broadcast_arrays(
zone, n_saturated > 0, n_ringing_chain, n_non_ringing_chain
)
hist.fill(*arrays) I don't know whether I'd go as far as wanting I wonder whether it would be helpful for hist to support mappings, a bit like matplotlib does with array = ak.zip({
'mag': [1, 0.2, 0.3, 0.9, 0.8, 0.4],
'x': [10, 20, 30, 4, 5, 6],
})
hist = Hist.new.Reg(1024, 0, 1, name="mag").Reg(1024, 0, 10, name="x").Int64().fill(array)
hist.plot() Actually, this now makes me think about #984, which would make it easier to work with matplotlib as well. Right now, we have to flatten the fields manually, and then rebuild the array with hist.fill(*[ak.flatten(x) for x in x in ak.unzip(array)]) But with better hist.fill(array) Having written all of that, I'm slightly nervous that I'm proposing adding Awkward features to Hist, which might really be a bad design choice in terms of inversion of responsibility. If we did add hist support to Awkward, are we making a special exception for it because of the convenience? Would there be merit in adding some kind of "extern" namespace to the array that avoids clashing with user fields and allows for future extension, e.g. array._.hist
array.lib.hist
array.ext.hist Footnotes
|
Beta Was this translation helpful? Give feedback.
-
Regardless of whether hist does broadcasting (it seems like that would make it depend too much on Awkward), what I'm thinking about here is the ergonomics of getting a quick plot. Similarly, we have What I'm thinking about here would intentionally be a thin interface, relying on hist to interpret its arguments. The QuickConstruct method seems like the right thing to do, but only if it doesn't mean that we have to implement hist proxy work-alikes: intermediate objects of the same sort that hist defines to do QuickConstruct, but our own because we need it to do a We have other connections with external libraries, all in the src/awkward/_connect directory; this would probably go there, too. |
Beta Was this translation helpful? Give feedback.
-
Well, I'd suggest it just uses I understand the benefit of the method chaining vs function composition. I actually just switched out my own Dask helper library to use function composition in order to make the abstraction thinner, but I can see that if you don't use Black and/or want to adapt your coding style, it's not as frictionless. Recognising that my opinion is towards "let users type more if it means easier to read code", I'm going to put it aside for now ;) What about a simple hist-fill method? Assuming we want to rely explicitly on hist, the user could be allowed to construct the hist factory, or just pass in the field names if required, e.g. array = ak.zip({
'mag': [1, 0.2, 0.3, 0.9, 0.8, 0.4],
'x': [10, 20, 30, 4, 5, 6],
})
array.hist("mag") # 1D
array.hist("mag", "x") # 2D
array.hist(Hist.new.Reg(32, 0, 1, name="mag").Reg(32, 0, 10, name="x").Int64()) # Fill 2D hist Or, do you think a full-blown ak-namespaced proxy is better, e.g. array.hist.Reg(32, 0, 1, name="mag").Reg(32, 0, 10, name="x").Int64() # Fill 2D hist |
Beta Was this translation helpful? Give feedback.
-
@henryiii this is the discussion I referenced! Today we mused a This also relates to work on |
Beta Was this translation helpful? Give feedback.
-
Description of new feature
One thing that's a little annoying right now is getting data from Awkward Arrays into plots. A lot of my tutorials include stanzas like
but in the heat of interactive data analysis, you have some intermediate array result that you want to quickly plot to see what's happening. I find myself saying that the value of array-oriented interfaces is the quick interleaving of single-operation-multiple-data and feedback about that step, so that you catch mathematical errors early. (For instance, an unexpected spike at zero, a surprisingly long tail, or smeared trigger thresholds, missing mass peaks, etc.) However, the above is not easy to quickly plot.
We generally try to minimize method names in the ak.Array namespace, but suppose we add one more named
hist
, which flattens data and fills a histogram from the hist library? With the exception of selecting fields from the Awkward record array (if any), most of the arguments would be passed directly to hist for hist to take care of (and ModuleNotFoundError if hist can't be imported).Some features that would be nice to have:
[A-Za-z_][A-Za-z_0-9]*
, we should use__getattr__
instead of__getitem__
so that Vector properties like"pt"
can be used on an array of Cartesian vectors. Maybe that field selector can be a dict if one needs to map Awkward field names to hist axis names that aren't exactly the same.fill
. Nones and NaNs are removed (unless hist automatically ignores NaNs). Yes, I like to insist that flattening isn't always the right thing to do, and so it shouldn't be done automatically, but it's so often the right thing to do that it ought to be a default. (Auto-flattening in a method namedhist
is a different proposition from auto-flattening every time__array__
is called; that would be too much/dangerous.)hist=h
, in analogy with Matplotlib'sax=ax
. In this case, the existing histogram is filled.hist
is given, the method should create a new histogram, but what syntax should it use to do so?hist.numpy.histogram
? The hist constructor? Would that require users toimport hist
so they can say things likehist.new.Reg
?array.hist(*args).plot()
is not unwieldy. Or maybe, should it be a proxy that must be continued with ahist.Hist.new
-like chain, excepting that it does thefill
implicitly?hist
return value to sometimes be "a thing that can be plotted" and sometimes be "a plot." It should always be one or the other.Oh, and this would only be for v2. It might even be listed in a "reasons to switch" that we'll be publicizing next year.
@henryiii and @amangoel185 may have opinions about how this hist integration/ergonomics should go.
Beta Was this translation helpful? Give feedback.
All reactions