-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
alternative methods of collecting an axis_iter to ndarray matrix #249
Comments
stack could probably support it, it's just a lot more work to do non-Copy (Copy => no destructor, no ownership semantics). The unknown factor is if there's a perf loss with the new implementation. |
IIRC stack preallocates and then uses |
Also yes, I think it warrants testing, but I'm not sure if the original function should be reimplemented. Any perf loss by using non-Copy is probably only worth it for specific use cases like mine. In that case, maybe we should have a totally new non-Copy stack function. |
Extending an array is not cheap in general (due to flexible memory layout) so for a new stack I would consider writing the operation using |
I'd really recommend to somehow use the native types as the array element type. I.e use an array of f64, not an array of item type. That lifts the item type enum up to be around the array. It's probably not as neat to write, but it is a whole lot more efficient for numerical operations. |
Yeah I’ve thought about that. I think I’m going to create separate DataFrame structs depending on whether you know your element types are uniform.
Suchin Gururangan
Sent from
https://polymail.io/?utm_source=polymail&utm_medium=referral&utm_campaign=signature
On Wed, Dec 14, 2016 at 2:58 PM bluss < mailto:bluss <[email protected]> > wrote:
a, pre, code, a:link, body { word-wrap: break-word !important; }
I'd really recommend to somehow use the native types as the array element type. I.e use an array of f64, not an array of item type. That lifts the item type enum up to be around the array. It's probably not as neat to write, but it is a whole lot more efficient for numerical operations.
—
You are receiving this because you authored the thread.
Reply to this email directly,
#249 (comment)
, or
https://github.com/notifications/unsubscribe-auth/ABHDZ8Jx34X3L1uUYq6umtMoAceim6ZXks5rIHSTgaJpZM4LNJ7S
.
|
Oh, I didn't think about that, sorry. I've been thinking that a data frame would fix each "column" to a particular type. |
Oh oh oh I totally misread your original point — I read “array” as “matrix". You were talking about columns.
Your point is totally valid, I should lift the InnerType to the column level, as column types are fixed. That may change how the library handles missing values, though.
Suchin Gururangan
Sent from
https://polymail.io/?utm_source=polymail&utm_medium=referral&utm_campaign=signature
On Wed, Dec 14, 2016 at 4:30 PM bluss < mailto:bluss <[email protected]> > wrote:
a, pre, code, a:link, body { word-wrap: break-word !important; }
Oh, I didn't think about that, sorry. I've been thinking that a data frame would fix each "column" to a particular type.
—
You are receiving this because you authored the thread.
Reply to this email directly,
#249 (comment)
, or
https://github.com/notifications/unsubscribe-auth/ABHDZyKKCCe_O2_sF4m6ZDWa0_NRImtJks5rIIoJgaJpZM4LNJ7S
.
|
parallelization of |
Have you seen this? Might be a good resource for ideas http://wesmckinney.com/blog/a-roadmap-for-rich-scientific-data-structures-in-python/ |
In the discussion on reddit about the Utah dataframe library for Rust, the Pandas 2.0 rewrite was mentioned. Looks like Pandas is suffering form performance issues because of the way they chose to implement their original datastructures. Maybe good to keep in mind. :-) |
That is indeed the same library that @pegasos1 started this issue with. I hadn't seen the reddit post though, so thanks for the link. |
Yeah, both utah and pandas suffer from copies. However, in utah the copies are only necessary because the iterator needs to own the elements it returns. I think building the dataframe around streaming iterators would solve the issue, but haven't gotten around to looking into it yet. @bluss i've seen traces of your thoughts on streaming iterators on various forums. |
Is this still an issue? We need to formulate the solution here. |
Stack and append support Clone elements (from ndarray 0.15.2), #932 |
I'm working on a dataframe implementation that provides two-dimensional iterator adaptors over ndarray matrices.
The dataframe's data are an enum over something called
InnerType
, which allows the dataframe to support a variety of types, like dataframes in other languages:The iterator adaptors
impl Iterator<Item = (OuterType, ArrayView<'a, InnerType, usize>)>
.Notice the
InnerType::Str(String)
. Because of this value,InnerType
is not Copy, and I'm unable to collect the adaptors' items into a DataFrame viastack
. Can you help me think of another way to collect the iterator adaptor into an ndarray matrix, without needing Copy, so I can support Strings in the dataframe? This problem may also affect implementing something likeFromCSV
, which would go from a CSV reader iterator to a DataFrame.If you want to check out the project further, you can do so here: https://github.com/pegasos1/rust-dataframe
The text was updated successfully, but these errors were encountered: