Skip to content
This repository has been archived by the owner on Aug 17, 2024. It is now read-only.

[FEATURE] Pivot dataframe #99

Open
Irio opened this issue Oct 24, 2019 · 2 comments
Open

[FEATURE] Pivot dataframe #99

Irio opened this issue Oct 24, 2019 · 2 comments

Comments

@Irio
Copy link

Irio commented Oct 24, 2019

Is your feature request related to a problem? Please describe.
Pivoting is a common task in data processing. It is used when we want to calculate statistics of specific attributes.

From Pandas' documentation:

Screen Shot 2019-10-24 at 13 04 20
Screen Shot 2019-10-24 at 13 04 32

Describe the solution you'd like
I've been looking to libraries capable of pivoting dataframes. Pandas, in my view, has a very complete implementation of the method.

Describe alternatives you've considered
This is a need I had at work. Since I could only find one library – AlaSQL – and with only a simple implementation, that does not pivot over multiple columns or allows pivoting without explicit index, I've written my own version of the function. To derive the algorithm, I replicated the behavior seen in the pivot_table function of Pandas.

Additional context
I have already written the function, and it works for a few test cases I listed:

✓ with one value, one index, and one column (130ms)
✓ without index, aggregates values only by columns (61ms)
✓ without columns, aggregates values only by columns (69ms)
✓ with multiple values, one index, and one column (67ms)
✓ with one value, one index, and multiple columns (97ms)
✓ with one value, multiple index, and one column (84ms)
✓ with one value, multiple index, and multiple columns (177ms)

My function is not integrated into dataframe-js, but it works on dataframes generated with dataframe-js. Would you be interested in bringing this feature in? If so, I may need some help on reviewing it to ensure it follows the conventions of the project – of source code and unit tests. I could open a WIP pull request and we discuss the necessary changes on the go.

function pivotTable(dataframe, values = [], index = [], columns = []) {
  if (values.length > 1) {
    throw 'Not implemented';
  }
  const value = values[0];
  const hasSelectedIndex = index && index.length;
  const hasSelectedColumns = columns && columns.length;

  let agg = new Map();
  const aggKeys = index.concat(columns).filter((x) => x);
  const groupedDF = dataframe['groupBy'](...aggKeys);
  let rowKey = value;
  let colKey = value;
  groupedDF.aggregate(group => {
    const row = group.getRow(0);
    if (hasSelectedIndex) {
      rowKey = row.select(...index).toArray().join('_');
    }
    const baseAttrs = Object.fromEntries(index.map((i) => [i, row.get(i)]));
    agg.set(rowKey, agg.get(rowKey) || baseAttrs);
    if (hasSelectedColumns) {
      colKey = row.select(...columns).toArray().join('_');
    }
    agg.get(rowKey)[colKey] = group.stat.sum(value);
  })
  return new DataFrame(Array.from(agg.values()));
}
@Gmousse
Copy link
Owner

Gmousse commented Nov 23, 2019

Hi @Irio, Thank you for your sugesstion.
Sorry for the delay, I wasn't active these days.

Have you look the GroupedDataFrame .pivot method (https://github.com/Gmousse/dataframe-js/blob/develop/src/group.js#L156) ?
It should works similarly.

@Irio
Copy link
Author

Irio commented Dec 16, 2019

Hi @Gmousse,

Yes, I have look at them, but they don't seem to cover all the cases I needed and Pandas supports. There's always the chance they are too specific, so in this case, let me know and we can close this issue.

My major pain-point was doing operations with multiple attributes, either in index, columns, or the attribute to aggregate on.

Here are all the cases I implemented using dataframe-js and are supported by Pandas:

https://colab.research.google.com/drive/1tBXUtOzTZiTli2JR1VYFODYf7wxO6PNF

And here, my attempt to reproduce them using [email protected]. The only case that can be fully reproduced, at least with my understanding of the docs, is the first.

https://gist.github.com/Irio/c23af8d00768d48acecb87a41d62905e

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants