-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Add pandas_compat.table_to_frame(tab) #3180
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. Could you add some tests?
Current tests for table_from_frame can be found here:
https://github.com/biolab/orange3/blob/master/Orange/data/tests/test_pandas.py
Orange/data/pandas_compat.py
Outdated
pandas.DataFrame | ||
""" | ||
def _column_to_series(col, vals): | ||
print(col.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this was used for debugging. Please remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
Orange/data/pandas_compat.py
Outdated
def _column_to_series(col, vals): | ||
print(col.name) | ||
if col.is_discrete: | ||
labels = [col.values[i] for i in np.vectorize(int)(vals)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vals.astype(int)
avoids a call into python.
But what if some values are NaN?
Alternative:
return col.name, pd.Categorical.from_codes(codes=pd.Series(vals).fillna(-1).astype(int),
categories=col.values, ordered=col.ordered)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion, thanks, fixed.
Orange/data/pandas_compat.py
Outdated
labels = [col.values[i] for i in np.vectorize(int)(vals)] | ||
return (col.name, pd.Series(labels).astype('category')) | ||
elif col.is_time: | ||
return (col.name, pd.to_datetime(vals,unit='s').to_series()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spaces after commas.
Orange/data/pandas_compat.py
Outdated
if domain.attributes: | ||
x = _columns_to_series(domain.attributes,tab.X) | ||
if domain.class_vars: | ||
y = _columns_to_series(domain.class_vars,tab.Y.reshape(tab.Y.shape[0],1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reshaping Y here doesn't work with multiple target columns. Perhaps:
table.Y.reshape(table.Y.shape[0], len(domain.class_vars))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
Codecov Report
@@ Coverage Diff @@
## master #3180 +/- ##
==========================================
+ Coverage 82.59% 82.61% +0.01%
==========================================
Files 340 340
Lines 58767 58847 +80
==========================================
+ Hits 48541 48616 +75
- Misses 10226 10231 +5 |
@astaric sure, added basic tests and looks good on Iris data set. python -m unittest Orange/data/tests/test_pandas.py |
update: fixed pylinting. |
ok, this function works across all datasets. import os
from Orange.data import Table
from Orange.data.pandas_compat import table_to_frame
datasets = [ f.split('.')[0] for f in os.listdir("Orange/datasets") if '.tab' in f]
for ds in datasets:
print(ds)
table = Table(ds)
table_to_frame(table) I don't want to introduce it as a part of test suits since it's pretty slow:
|
Convert Orange.data.Table instance to pandas dataframe [FIX] pandas_compat.table_to_frame NaN values [FIX] table_to_frame persist column order [ENH] add tests for pandas_compat.table_to_frame [FIX] pandas_compat.table_to_frame commas [FIX] pandas_compat.table_to_frame support multipe target columns
I'd add it as a test, but mark it as @Skip, so it can be uncommented when needed. You wrote in an earlier comment that the function did not work on a couple of datasets. Could you test on them as well? (not sure what the problem was) |
@astaric ok, that's a good point. Yes, it helped me to troubleshoot a problem with datetime column. works on any Orange build-in dataset |
@astaric added my script as a part of test suit. actually looks pretty neat as if something fails it pinpoints the problem and exact dataset that function failed on. |
Feature
Convert Orange.data.Table instance to pandas dataframe
Includes