[ENH] Add pandas_compat.table_to_frame(tab) #3180

apetrov · 2018-08-02T09:21:06Z

Feature

Convert Orange.data.Table instance to pandas dataframe

Includes

Code changes
Tests
Documentation

CLAassistant · 2018-08-02T09:21:14Z

All committers have signed the CLA.

astaric

Thanks for the PR. Could you add some tests?

Current tests for table_from_frame can be found here:
https://github.com/biolab/orange3/blob/master/Orange/data/tests/test_pandas.py

astaric · 2018-08-02T09:26:54Z

Orange/data/pandas_compat.py

+    pandas.DataFrame
+    """
+    def _column_to_series(col, vals):
+        print(col.name)


I guess this was used for debugging. Please remove.

kernc · 2018-08-02T10:01:16Z

Orange/data/pandas_compat.py

+    def _column_to_series(col, vals):
+        print(col.name)
+        if col.is_discrete:
+            labels = [col.values[i] for i in np.vectorize(int)(vals)]


vals.astype(int) avoids a call into python.

But what if some values are NaN?

Alternative:

return col.name, pd.Categorical.from_codes(codes=pd.Series(vals).fillna(-1).astype(int), categories=col.values, ordered=col.ordered)

Great suggestion, thanks, fixed.

kernc · 2018-08-02T10:01:53Z

Orange/data/pandas_compat.py

+            labels = [col.values[i] for i in np.vectorize(int)(vals)]
+            return (col.name, pd.Series(labels).astype('category'))
+        elif col.is_time:
+            return (col.name, pd.to_datetime(vals,unit='s').to_series())


spaces after commas.

kernc · 2018-08-02T10:05:35Z

Orange/data/pandas_compat.py

+    if domain.attributes:
+        x = _columns_to_series(domain.attributes,tab.X)
+    if domain.class_vars:
+        y = _columns_to_series(domain.class_vars,tab.Y.reshape(tab.Y.shape[0],1))


Reshaping Y here doesn't work with multiple target columns. Perhaps:

table.Y.reshape(table.Y.shape[0], len(domain.class_vars))

codecov-io · 2018-08-02T14:04:04Z

Codecov Report

Merging #3180 into master will increase coverage by 0.01%.
The diff coverage is 65.57%.

@@            Coverage Diff             @@
##           master    #3180      +/-   ##
==========================================
+ Coverage   82.59%   82.61%   +0.01%     
==========================================
  Files         340      340              
  Lines       58767    58847      +80     
==========================================
+ Hits        48541    48616      +75     
- Misses      10226    10231       +5

apetrov · 2018-08-02T15:18:28Z

@astaric sure, added basic tests and looks good on Iris data set.
checking for datasets with datetime columns so I could test datetime columns properly.

python -m unittest Orange/data/tests/test_pandas.py

apetrov · 2018-08-02T19:53:32Z

update: fixed pylinting.

apetrov · 2018-08-02T20:07:20Z

ok, this function works across all datasets.

import os
from Orange.data import Table
from Orange.data.pandas_compat import table_to_frame
datasets = [ f.split('.')[0] for f in os.listdir("Orange/datasets") if '.tab' in f]
for ds in datasets:
    print(ds)
    table = Table(ds)
    table_to_frame(table)

I don't want to introduce it as a part of test suits since it's pretty slow:

python dataset_test.py  4.64s user 0.49s system 110% cpu 4.643 total

Convert Orange.data.Table instance to pandas dataframe [FIX] pandas_compat.table_to_frame NaN values [FIX] table_to_frame persist column order [ENH] add tests for pandas_compat.table_to_frame [FIX] pandas_compat.table_to_frame commas [FIX] pandas_compat.table_to_frame support multipe target columns

astaric · 2018-08-03T07:06:17Z

I'd add it as a test, but mark it as @Skip, so it can be uncommented when needed.

You wrote in an earlier comment that the function did not work on a couple of datasets. Could you test on them as well? (not sure what the problem was)

apetrov · 2018-08-03T07:13:41Z

@astaric ok, that's a good point.

Yes, it helped me to troubleshoot a problem with datetime column. works on any Orange build-in dataset

apetrov · 2018-08-03T09:18:45Z

@astaric added my script as a part of test suit. actually looks pretty neat as if something fails it pinpoints the problem and exact dataset that function failed on.

astaric reviewed Aug 2, 2018

View reviewed changes

kernc reviewed Aug 2, 2018

View reviewed changes

ENH: test table_to_frame over all datasets

789a0c9

astaric approved these changes Aug 3, 2018

View reviewed changes

lanzagar merged commit c30fa4c into biolab:master Aug 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Add pandas_compat.table_to_frame(tab) #3180

[ENH] Add pandas_compat.table_to_frame(tab) #3180

apetrov commented Aug 2, 2018 •

edited

Loading

CLAassistant commented Aug 2, 2018 •

edited

Loading

astaric left a comment

astaric Aug 2, 2018

apetrov Aug 2, 2018

kernc Aug 2, 2018

apetrov Aug 2, 2018

kernc Aug 2, 2018

kernc Aug 2, 2018

apetrov Aug 2, 2018

codecov-io commented Aug 2, 2018 •

edited

Loading

apetrov commented Aug 2, 2018 •

edited

Loading

apetrov commented Aug 2, 2018

apetrov commented Aug 2, 2018 •

edited

Loading

astaric commented Aug 3, 2018

apetrov commented Aug 3, 2018

apetrov commented Aug 3, 2018

[ENH] Add pandas_compat.table_to_frame(tab) #3180

[ENH] Add pandas_compat.table_to_frame(tab) #3180

Conversation

apetrov commented Aug 2, 2018 • edited Loading

Feature

Includes

CLAassistant commented Aug 2, 2018 • edited Loading

astaric left a comment

Choose a reason for hiding this comment

astaric Aug 2, 2018

Choose a reason for hiding this comment

apetrov Aug 2, 2018

Choose a reason for hiding this comment

kernc Aug 2, 2018

Choose a reason for hiding this comment

apetrov Aug 2, 2018

Choose a reason for hiding this comment

kernc Aug 2, 2018

Choose a reason for hiding this comment

kernc Aug 2, 2018

Choose a reason for hiding this comment

apetrov Aug 2, 2018

Choose a reason for hiding this comment

codecov-io commented Aug 2, 2018 • edited Loading

Codecov Report

apetrov commented Aug 2, 2018 • edited Loading

apetrov commented Aug 2, 2018

apetrov commented Aug 2, 2018 • edited Loading

astaric commented Aug 3, 2018

apetrov commented Aug 3, 2018

apetrov commented Aug 3, 2018

apetrov commented Aug 2, 2018 •

edited

Loading

CLAassistant commented Aug 2, 2018 •

edited

Loading

codecov-io commented Aug 2, 2018 •

edited

Loading

apetrov commented Aug 2, 2018 •

edited

Loading

apetrov commented Aug 2, 2018 •

edited

Loading