Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[*.py] Upgrade to latest version of pandas; Python 2/3 compatibility #664

Open
wants to merge 17 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
.idea
*.pyc
*.swp
*.DS_Store
Expand Down
45 changes: 13 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,18 @@
# ggplot
# ggplot - a working, maintained fork

<img src="./examples/example-34d773b9-ec68-40b1-999b-7bb07c208be9.png" width="400px" />
<img src="./examples/example-8f4fbffe-2999-42b0-9c34-de6f0b205733.png" width="400px" />
## Why this fork?
`ggplot` is a great python library. However, it is no longer maintained by its owner and still has a bunch of issues which remain unsolved. Some of these include incompatibility with newer versions of `pandas` and Python 3.

### What is it?
Many projects still rely on `ggplot` and many have to either move to alternatives or manually update `ggpy`'s code after installing it to fix compatibility issues. To get a better context of the issue you can refer to [#654 Is this project dead?](https://github.com/yhat/ggpy/issues/654)

This fork is a working copy of `ggplot` which is readily maintained and is open to updates and fixes so that developers do not have to make fixes manually.

## Installation
```bash
$ pip3 install git+https://github.com/sushinoya/ggpy
```

## What is ggpy?
`ggplot` is a Python implementation of the grammar of graphics. It is not intended
to be a feature-for-feature port of [`ggplot2 for R`](https://github.com/hadley/ggplot2)--though
there is much greatness in `ggplot2`, the Python world could stand to benefit
Expand All @@ -19,31 +28,3 @@ ggplot(diamonds, aes(x='price', color='clarity')) + \
facet_wrap('cut')
```
![](./docs/example.png)

### Installation
```bash
$ pip install -U ggplot
# or
$ conda install -c conda-forge ggplot
# or
pip install git+https://github.com/yhat/ggplot.git
```

### Examples
Examples are the best way to learn. There is a Jupyter Notebook full of them.
There are also notebooks that show how to do particular things with ggplot
(i.e. [make a scatter plot](./docs/how-to/Making%20a%20Scatter%20Plot.ipynb) or [make a histogram](./docs/how-to/Making%20a%20Scatter%20Plot.ipynb)).

- [docs](./docs)
- [gallery](./docs/Gallery.ipynb)
- [various examples](./examples.md)


### What happened to the old version that didn't work?
It's gone--the windows, the doors, [everything](https://www.youtube.com/watch?v=YuxCKv_0GZc).
Just kidding, [you can find it here](https://github.com/yhat/ggplot/tree/v0.6.6), though I'm not sure why you'd want to look at it. The data grouping and manipulation bits were re-written
(so they actually worked) with things like facets in mind.

### Contributing
Thanks to all of the ggplot [contributors](./contributors.md#contributors)!
See *[contributing.md](./contributing.md)*.
6 changes: 3 additions & 3 deletions docs/examples.py
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@
df = pd.DataFrame({"x": np.arange(1000)})
df['y_low'] = df.x * 0.9
df['y_high'] = df.x * 1.1
df['thing'] = ['a' if i%2==0 else 'b' for i in df.x]
df['thing'] = ['a' if i % 2 == 0 else 'b' for i in df.x]
p = ggplot(df, aes(x='x', ymin='y_low', ymax='y_high')) + geom_area()
p.save("./examples/example-" + str(uuid.uuid4()) + ".png")
# # area w/ facet
Expand Down Expand Up @@ -131,7 +131,7 @@
#
df = pd.DataFrame({"x": np.arange(100)})
df['y'] = df.x * 10
df['z'] = ["a" if x%2==0 else "b" for x in df.x]
df['z'] = ["a" if x % 2 == 0 else "b" for x in df.x]
#
# # polar coords
p = ggplot(df, aes(x='x', y='y')) + geom_point() + coord_polar()
Expand All @@ -158,7 +158,7 @@
p.save("./examples/example-" + str(uuid.uuid4()) + ".png")
#
# # # x dates formatting faceted
pageviews['z'] = ["a" if i%2==0 else "b" for i in range(len(pageviews))]
pageviews['z'] = ["a" if i % 2 == 0 else "b" for i in range(len(pageviews))]
p = ggplot(pageviews, aes(x='date_hour', y='pageviews')) + geom_line() + scale_x_date(labels=date_format('%B %-d, %Y')) + facet_grid(y='z')
p.save("./examples/example-" + str(uuid.uuid4()) + ".png")
#
Expand Down
7 changes: 3 additions & 4 deletions ggplot/aes.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,10 @@

from patsy.eval import EvalEnvironment

from . import utils

import numpy as np
import pandas as pd


class aes(UserDict):
"""
Creates a dictionary that is used to evaluate
Expand Down Expand Up @@ -72,7 +71,7 @@ def __init__(self, *args, **kwargs):
self.__eval_env__ = EvalEnvironment.capture(1)

def __deepcopy__(self, memo):
'''deepcopy support for ggplot'''
"""deepcopy support for ggplot"""
result = aes()
for key, item in self.__dict__.items():
# don't make a deepcopy of the env!
Expand Down Expand Up @@ -122,7 +121,7 @@ def _get_discrete_aes(self, df):
for aes_type, column in self.data.items():
if aes_type in ['x', 'y']:
continue
elif aes_type=="group":
elif aes_type == "group":
discrete_aes.append((aes_type, column))
elif column not in non_numeric_columns:
continue
Expand Down
12 changes: 6 additions & 6 deletions ggplot/chart_components.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,13 +46,13 @@ class xlim(object):
>>> ggplot(mpg, aes(x='hwy')) + geom_hisotgram() + xlim(0, 20)
"""
def __init__(self, low = None, high = None):
if low != None :
if low is not None:
try:
_ = low - 0
except TypeError:
raise Exception("The 'low' argument to", self.__class__.__name__,
"must be of a numeric type or None")
if high != None :
if high is not None:
try:
_ = high - 0
except TypeError:
Expand Down Expand Up @@ -83,13 +83,13 @@ class ylim(object):
>>> ggplot(mpg, aes(x='hwy')) + geom_hisotgram() + ylim(0, 5)
"""
def __init__(self, low = None, high = None):
if low != None :
if low is not None:
try:
_ = low - 0
except TypeError:
raise Exception("The 'low' argument to", self.__class__.__name__,
"must be of a numeric type or None")
if high != None :
if high is not None:
try:
_ = high - 0
except TypeError:
Expand Down Expand Up @@ -140,7 +140,7 @@ class ylab(object):

Examples
--------
>>> ggplot(mpg, aes(x='hwy')) + geom_hisotgram() + ylab("Count\n(# of cars)")
>>> ggplot(mpg, aes(x='hwy')) + geom_hisotgram() + ylab('''Count\n(# of cars)''')
"""
def __init__(self, ylab):
if ylab is None:
Expand Down Expand Up @@ -169,7 +169,7 @@ class labs(object):

Examples
--------
>>> ggplot(mpg, aes(x='hwy')) + geom_hisotgram() + labs("Miles / gallon", "Count\n(# of cars)", "MPG Plot")
>>> ggplot(mpg, aes(x='hwy')) + geom_hisotgram() + labs("Miles / gallon", '''Count\n(# of cars)''', "MPG Plot")
"""
def __init__(self, x=None, y=None, title=None):
self.x = x
Expand Down
11 changes: 6 additions & 5 deletions ggplot/facets.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ def __init__(self, data, is_wrap, rowvar=None, colvar=None, nrow=None, ncol=None
# assign subplot indices to rowvars and columnvars
self.ndim = ndim = self.calculate_ndimensions(data, rowvar, colvar)

if is_wrap==True:
if is_wrap:
if self.nrow:
self.ncol = ncol = int(math.ceil(ndim / float(self.nrow)))
self.nrow = nrow = int(self.nrow)
Expand All @@ -47,9 +47,9 @@ def __init__(self, data, is_wrap, rowvar=None, colvar=None, nrow=None, ncol=None
value = next(facet_values)
except Exception as e:
continue
if ncol==1:
if ncol == 1:
self.facet_map[value] = (row, None)
elif nrow==1:
elif nrow == 1:
self.facet_map[value] = (None, col)
else:
self.facet_map[value] = (row, col)
Expand Down Expand Up @@ -119,12 +119,13 @@ def __init__(self, x=None, y=None, nrow=None, ncol=None, scales=None):
self.scales = scales

def __radd__(self, gg):
if gg.__class__.__name__=="ggplot":
if gg.__class__.__name__ == "ggplot":
gg.facets = Facet(gg.data, True, self.x_var, self.y_var, nrow=self.nrow, ncol=self.ncol, scales=self.scales)
return gg

return self


class facet_grid(object):
"""
Layout panels from x and (optionally) y variables in a grid format.
Expand Down Expand Up @@ -155,7 +156,7 @@ def __init__(self, x=None, y=None, scales=None):
self.scales = scales

def __radd__(self, gg):
if gg.__class__.__name__=="ggplot":
if gg.__class__.__name__ == "ggplot":
gg.facets = Facet(gg.data, False, self.x_var, self.y_var, scales=self.scales)
return gg
return self
2 changes: 1 addition & 1 deletion ggplot/geoms/geom.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ def _get_plot_args(self, data, _aes):
for key, value in _aes.items():
if value not in data:
mpl_params[key] = value
elif data[value].nunique()==1:
elif data[value].nunique() == 1:
mpl_params[key] = data[value].iloc[0]
else:
mpl_params[key] = data[value]
Expand Down
2 changes: 1 addition & 1 deletion ggplot/geoms/geom_area.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ def plot(self, ax, data, _aes):
if self.last_y is None:
self.last_y = pd.Series(np.repeat(0, len(data)))
ymin = self.last_y
if self.DEFAULT_PARAMS['position']=="stack":
if self.DEFAULT_PARAMS['position'] == "stack":
ymax = self.last_y.reset_index(drop=True) + data[variables['y']].reset_index(drop=True)
else:
ymax = data[variables['y']]
Expand Down
12 changes: 6 additions & 6 deletions ggplot/geoms/geom_bar.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,9 +91,9 @@ def plot(self, ax, data, _aes, x_levels, fill_levels, lookups):

xticks = []
for i, x_level in enumerate(x_levels):
mask = data[variables['x']]==x_level
mask = data[variables['x']] == x_level
row = data[mask]
if len(row)==0:
if len(row) == 0:
xticks.append(i)
continue

Expand All @@ -111,19 +111,19 @@ def plot(self, ax, data, _aes, x_levels, fill_levels, lookups):
height = 1.0
ypos = 0
else:
mask = (lookups[variables['x']]==x_level) & (lookups[variables['fill']]==fillval)
mask = (lookups[variables['x']] == x_level) & (lookups[variables['fill']] == fillval)
height = lookups[mask]['__calc_weight__'].sum()
mask = (lookups[variables['x']]==x_level) & (lookups[variables['fill']] < fillval)
mask = (lookups[variables['x']] == x_level) & (lookups[variables['fill']] < fillval)
ypos = lookups[mask]['__calc_weight__'].sum()
else:
if fill_levels is not None:
dodge = (width * fill_idx)
dodge = width * fill_idx
else:
dodge = width
ypos = 0.0
height = row[weight_col].sum()

xy = (dodge + i - fill_x_adjustment, ypos)
xy = (dodge + i - fill_x_adjustment, ypos)

ax.add_patch(patches.Rectangle(xy, width, height, **params))
if fill_levels is not None:
Expand Down
12 changes: 6 additions & 6 deletions ggplot/geoms/geom_boxplot.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,30 +34,30 @@ def plot(self, ax, data, _aes, x_levels):

xticks = []
for (i, xvalue) in enumerate(x_levels):
subset = data[data[variables['x']]==xvalue]
subset = data[data[variables['x']] == xvalue]
xi = np.repeat(i, len(subset))
yvalues = subset[variables['y']]
xticks.append(i)

bounds_25_75 = yvalues.quantile([0.25, 0.75]).values
bounds_5_95 = yvalues.quantile([0.05, 0.95]).values

if self.params.get('outliers', True)==True:
if self.params.get('outliers', True):
mask = ((yvalues > bounds_5_95[1]) | (yvalues < bounds_5_95[0])).values
ax.scatter(x=xi[mask], y=yvalues[mask], c=self.params.get('outlier_color', 'black'))

if self.params.get('lines', True)==True:
if self.params.get('lines', True):
ax.vlines(x=i, ymin=bounds_25_75[1], ymax=bounds_5_95[1])
ax.vlines(x=i, ymin=bounds_5_95[0], ymax=bounds_25_75[0])

if self.params.get('notch', False)==True:
if self.params.get('notch', False):
ax.hlines(bounds_5_95[0], i - 0.25/2, i + 0.25/2, linewidth=2)
ax.hlines(bounds_5_95[1], i - 0.25/2, i + 0.25/2, linewidth=2)

if self.params.get('median', True)==True:
if self.params.get('median', True):
ax.hlines(yvalues.median(), i - 0.25, i + 0.25, linewidth=2)

if self.params.get('box', True)==True:
if self.params.get('box', True):
params = {
'facecolor': 'white',
'edgecolor': 'black',
Expand Down
2 changes: 1 addition & 1 deletion ggplot/geoms/geom_density.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,6 @@ def plot(self, ax, data, _aes):
params = self._get_plot_args(data, _aes)
variables = _aes.data
x = data[variables['x']]
x = x[x.isnull()==False]
x = x[x.isnull() == False]
x, y = self._calculate_density(x)
ax.plot(x, y, **params)
2 changes: 1 addition & 1 deletion ggplot/geoms/geom_histogram.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ def plot(self, ax, data, _aes):

variables = _aes.data
x = data[variables['x']]
x = x[x.isnull()==False]
x = x[x.isnull() == False]

if 'binwidth' in self.params:
params['bins'] = np.arange(np.min(x), np.max(x) + self.params['binwidth'], self.params['binwidth'])
Expand Down
4 changes: 2 additions & 2 deletions ggplot/geoms/geom_line.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,8 @@ def plot(self, ax, data, _aes):
y = data[variables['y']]

nulls = (x.isnull() | y.isnull())
x = x[nulls==False]
y = y[nulls==False]
x = x[nulls == False]
y = y[nulls == False]

if self.is_path:
pass
Expand Down
4 changes: 2 additions & 2 deletions ggplot/geoms/geom_step.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,8 @@ def plot(self, ax, data, _aes):
y = data[variables['y']]

nulls = (x.isnull() | y.isnull())
x = x[nulls==False]
y = y[nulls==False]
x = x[nulls == False]
y = y[nulls == False]

xs = [None] * (2 * (len(x)-1))
ys = [None] * (2 * (len(x)-1))
Expand Down
2 changes: 1 addition & 1 deletion ggplot/geoms/geom_tile.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ def plot(self, ax, data, _aes):
counts = data[[weight, variables['x'] + "_cut", variables['y'] + "_cut"]].groupby([variables['x'] + "_cut", variables['y'] + "_cut"]).count().fillna(0)
weighted = data[[weight, variables['x'] + "_cut", variables['y'] + "_cut"]].groupby([variables['x'] + "_cut", variables['y'] + "_cut"]).sum().fillna(0)

if self.params['interpolate']==False:
if self.params['interpolate'] == False:
def get_xy():
for x in x_bins:
for y in y_bins:
Expand Down
4 changes: 2 additions & 2 deletions ggplot/geoms/geom_violin.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@ def plot(self, ax, data, _aes, x_levels):
variables = _aes.data

xticks = []
for (i, xvalue) in enumerate(x_levels):
subset = data[data[variables['x']]==xvalue]
for i, xvalue in enumerate(x_levels):
subset = data[data[variables['x']] == xvalue]
yi = subset[variables['y']].values

# so this is weird, apparently violinplot is *the only plot that
Expand Down
Loading