Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE REQ] Allow calculating career stats using the calculate_stats() function #501

Closed
1 task done
isaactpetersen opened this issue Dec 21, 2024 · 1 comment
Closed
1 task done

Comments

@isaactpetersen
Copy link

isaactpetersen commented Dec 21, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Is your feature request related to a problem? Please describe.

I'd like to be able to calculate players' career stats (since 1999). I used to be able to do this with the calculate_player_stats() function. However, my understanding is that the calculate_player_stats() function has been superseded by the calculate_stats() function, and I am not able to do this directly with the calculate_stats() function.

If I use the seasonal statistics, I can do post-processing to figure out some of the career statistics (i.e., the variables that can be summed across years, e.g., passing touchdowns). However, some variables cannot be meaningfully summed or averaged across years to get the "true" career statistic (e.g., completion percentage, QBR). Again, these can be estimated from seasonal statistics using additional postprocessing, weighted averages based on the number of games played in the season, etc., but doing it via postprocessing would be hack-ish.

Describe the solution you'd like

The capability to calculate players' stats across all available seasons (since 1999) was able to be performed using the calculate_player_stats() function:

nfl_pbp <- nflreadr::load_pbp(seasons = TRUE)

careerStats_offense <- nflfastR::calculate_player_stats(
  nfl_pbp,
  weekly = FALSE)

careerStats_defense <- nflfastR::calculate_player_stats_def(
  nfl_pbp,
  weekly = FALSE)

careerStats_kicking <- nflfastR::calculate_player_stats_kicking(
  nfl_pbp,
  weekly = FALSE)

It would be nice to add this capability to the calculate_stats() function. For instance, it would be helpful to add "career" as an option to the summary_level argument:

calculate_stats(
  seasons = TRUE,
  summary_level = c("season", "week", "career")
  stat_type = c("player", "team"),
  season_type = c("REG", "POST", "REG+POST")
)

Describe alternatives you've considered

No response

Additional context

No response

@mrcaseb
Copy link
Member

mrcaseb commented Jan 3, 2025

At the moment calculate_stats() calculates a total of 118 different variables. There is only one (!), namely passing_cpoe, which cannot simply be summed up. However, the code for this is freely accessible here and easy to adapt.

passing_stats_from_pbp <- pbp %>%
dplyr::filter(.data$play_type %in% c("pass", "qb_spike")) %>%
dplyr::select(
"season", "week", "team" = "posteam",
"player_id" = "passer_player_id", "qb_epa", "cpoe"
) %>%
dplyr::group_by(!!!grp_vars) %>%
dplyr::summarise(
passing_epa = sum(.data$qb_epa, na.rm = TRUE),
# mean will return NaN if all values are NA, because we remove NA
passing_cpoe = if (any(!is.na(.data$cpoe))) mean(.data$cpoe, na.rm = TRUE) else NA_real_
) %>%
dplyr::ungroup()

All other stats can either be summed or calculated from the summed stats.

Performing the complete calculation for all available seasons is extremely inefficient and can lead to memory problems on some computers. I will therefore not implement this.

@mrcaseb mrcaseb closed this as not planned Won't fix, can't repro, duplicate, stale Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants