Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gghighlight doesn't seem to play well with tsibble #110

Open
njtierney opened this issue Jul 18, 2019 · 14 comments
Open

gghighlight doesn't seem to play well with tsibble #110

njtierney opened this issue Jul 18, 2019 · 14 comments

Comments

@njtierney
Copy link

First up - thanks for making such a great package!

Well designed and slick, this is a thing of beauty.

I'm actually getting a bug where gghighlight doens't seem to play well with tsibble data.

I have a reprex below, where in the first instance I have a tsibble, and in the second instance, I convert it to a tibble and get a very different plot.

I'm not sure if I should have maybe posted this under tsibble (cc @earowang), but I am happy to move it there.

library(ggplot2)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(gghighlight)
library(brolgar)

wages_ts %>%
  group_by(id) %>%
  mutate(increase = increasing(ln_wages)) %>%
  ggplot(aes(x = xp,
             y = ln_wages,
             group = id)) + 
  geom_line() + 
  gghighlight(increase)
#> Warning: Unspecified temporal ordering may yield unexpected results.
#> Suggest to sort by `id`, `xp` first.

#> Warning: Unspecified temporal ordering may yield unexpected results.
#> Suggest to sort by `id`, `xp` first.

wages_ts %>%
  group_by(id) %>%
  mutate(increase = increasing(ln_wages)) %>%
  as_tibble() %>%
  ggplot(aes(x = xp,
             y = ln_wages,
             group = id)) + 
  geom_line() + 
  gghighlight(increase)

Created on 2019-07-18 by the reprex package (v0.3.0)

@yutannihilation
Copy link
Owner

Hmm, it seems gghighlight needs to be aware of indices that tsibble uses. Thanks for this good chance to learn about tsibble...

@earowang
Copy link

This warning is meant for interactive analysis such as lag(). It can be safely ignored for plotting, which is probably triggered by dplyr::arrange(). Unfortunately I cannot disable the warning on my side for plotting functions. Alternatively you could as.data.frame() the object up the front to ignore the tsibble object?

@yutannihilation
Copy link
Owner

Oh, thanks. But, I don't think this is just about warnings...

@njtierney
Btw, which plot is the desired result? The second one? Or are both wrong...?

@earowang
Copy link

haven’t noticed the plots difference. i’ll probably look into the issue too.

@njtierney
Copy link
Author

Thanks for looking into this, @yutannihilation and @earowang, The second plot is the correct result.

@yutannihilation
Copy link
Owner

Thanks! Then, the problem now seems simpler than I thought.

It seems dplyr::select() errors here with this message (and this is ignored by tryCatch()):

Error: The result is not a valid tsibble.
Do you need `as_tibble()` to work with data frame?

gghighlight uses only the parts of data and doesn't care about whether it's valid as the original class, so probably it needs to convert the data into tibble or a bare data.frame before calculating the predicates.

I will fix this when I got some time (as I'm not immediately sure where to insert as_tibble() (or fortify()?)...). Meanwhile, please use as_tibble() or as.data.frame() as a workaround.

@earowang
Copy link

A quick workaround could be using [ to replace dplyr::select(), which is silently converted to a tibble.

This also raises another issue that more and more specialist data classes built on top of tibbles with their associated dplyr methods may give contextual errors or warnings for the package code that is context-free (i.e. the package handles a general tabular data input).

@njtierney
Copy link
Author

Hi @yutannihilation - did you want a hand with a PR for this? I think @earowang 's point of using [ instead of dplyr::select() would work well for this.

@yutannihilation
Copy link
Owner

Thanks for the kind reminder! But, I'm still thinking what should happen here. [ seems OK for tsibble, but I'm not fully sure if it's really OK for other data.frame-alikes (e.g. sf).

Let me think a bit more. Nowadays I have more time on coding R, so I expect it won't be so long until I come back to gghighlight :)

@njtierney
Copy link
Author

No worries - and fair enough! I'm not sure what happens with sf for [. As Earo described, it's an interesting issue with building on top of tibbles.

Thanks again for making such a lovely package :)

@yutannihilation
Copy link
Owner

To handle this properly, I think I need to consider these cases:

  1. The object should be passed to ggplot2 as the original class, and it's fine to process them as it is (e.g. sf)
  2. The object should be passed to ggplot2 as the original class, and it can't be processed without converting to a bare data.frame
  3. The object can be passed to ggplot2 as either the original class or a bare data.frame, but it can't be processed without some conversion (e.g. tsibble)

While case 1 and case 3 are straightforward, case 2 seems difficult. Probably, I need to wait for vctrs::vec_proxy() and vctrs::vec_restore().

But, I now feel [ can handle most of the cases; it returns sf for sf, and data.frame for invalid tsibble. Let's employ [ for now and reconsider when I find other problems... Thanks for the suggestion, Earo.

@yutannihilation
Copy link
Owner

@njtierney
Sorry, I found I don't understand the intension of your code. I thought #120 fixes this issue, but it won't...

Do you try to highlight lines whose trends are always increasing? If so, increase is not a proper condition to filter the data by group (so the actual filtering is done by point) and you should write

wages %>%
  ggplot(aes(x = xp,
             y = ln_wages,
             group = id)) + 
  geom_line() +
  gghighlight(increasing(ln_wages))

and the code above highlights nothing because there are no such lines. Is this what you wanted?

library(dplyr, warn.conflicts = FALSE)
library(brolgar)

wages %>%
  group_by(id) %>%
  summarise(increase = increasing(ln_wages)) %>%
  filter(increase)
#> # A tsibble: 0 x 3 [?]
#> # Key:       id [0]
#> # … with 3 variables: id <int>, xp <dbl>, increase <lgl>

Created on 2019-10-20 by the reprex package (v0.3.0)

(I'm confused partly because of this difference about summarise()...)

library(dplyr, warn.conflicts = FALSE)
library(brolgar)

d <- wages %>%
  group_by(id) %>%
  mutate(increase = increasing(ln_wages))

d %>% 
  summarise(increase)
#> # A tsibble: 6,402 x 3 [!]
#> # Key:       id [888]
#>       id    xp increase
#>    <int> <dbl> <lgl>   
#>  1    31 0.015 FALSE   
#>  2    31 0.715 FALSE   
#>  3    31 1.73  FALSE   
#>  4    31 2.77  FALSE   
#>  5    31 3.93  FALSE   
#>  6    31 4.95  FALSE   
#>  7    31 5.96  FALSE   
#>  8    31 6.98  FALSE   
#>  9    36 0.315 FALSE   
#> 10    36 0.983 FALSE   
#> # … with 6,392 more rows

d %>%
  as_tibble() %>%
  group_by(id) %>%
  summarise(increase)
#> Error: Column `increase` must be length 1 (a summary value), not 8

Created on 2019-10-20 by the reprex package (v0.3.0)

@yutannihilation
Copy link
Owner

yutannihilation commented Jan 24, 2020

Example from http://brolgar.njtierney.com/articles/exploratory-modelling.html:

library(brolgar)
library(gghighlight)
#> Loading required package: ggplot2
library(dplyr, warn.conflicts = FALSE)

wages_slope <- key_slope(wages,ln_wages ~ xp) %>%
  left_join(wages, by = "id") 

# with `as_tibble()` workaround
p1 <- wages_slope %>% 
  as_tibble() %>% # workaround for gghighlight + tsibble
  ggplot(aes(x = xp, 
             y = ln_wages, 
             group = id)) + 
  geom_line() +
  gghighlight(.slope_xp < 0, use_direct_label = FALSE)
#> Warning: Tried to calculate with group_by(), but the calculation failed.
#> Falling back to ungrouped filter operation...

# without `as_tibble()`
p2 <- wages_slope %>% 
  ggplot(aes(x = xp, 
             y = ln_wages, 
             group = id)) + 
  geom_line() +
  gghighlight(.slope_xp < 0, use_direct_label = FALSE)
#> Warning: Tried to calculate with group_by(), but the calculation failed.
#> Falling back to ungrouped filter operation...

patchwork::wrap_plots(p1, p2)

Created on 2020-01-24 by the reprex package (v0.3.0)

@yutannihilation
Copy link
Owner

I'm still not sure if the result is expected one (shouldn't this be highlighted group-wise?), but now the result is the same with or without as_tibble()...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants