-
Notifications
You must be signed in to change notification settings - Fork 0
/
slides.qmd
363 lines (241 loc) · 15.1 KB
/
slides.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
---
title: "FSU Sports Analytics Series"
opengraph:
image:
src: "https://raw.githubusercontent.com/sportsdataverse/fsu-sac/main/media/sdv-blue.png"
twitter:
site: "@sportsdataverse"
card: summary_large_image
creator: "@saiemgilani"
format:
revealjs:
css: style.css
theme: [my_theme.scss]
slide-number: true
preview-links: auto
footer: "Slides link [`fsusac.sportsdataverse.org`](https://fsusac.sportsdataverse.org/){target='_blank'} | [Source code](https://github.com/sportsdataverse/fsu-sac/){target='_blank'} | Author: Saiem Gilani ![](https://img.shields.io/github/followers/saiemgilani?color=eee&label=%40saiemgilani&logo=github&style=for-the-badge){target='_blank'}"
# author: "Saiem Gilani"
#affiliation:
execute:
echo: true
eval: false
---
```{r setup, include=FALSE}
if (!require("pak")) install.packages("pak")
pak::pak("hadley/emo")
```
## FSU Sports Analytics Series {style="text-align: center;"}
::: {style="text-align: center;"}
**Slide Link**: [`fsusac.sportsdataverse.org`](https://fsusac.sportsdataverse.org/)
Saiem Gilani
:::
. . .
## Overview
The topic of our conversation will center around how to get access to sports data and why you should know your way around the data generated by the sports in which you are interested.
. . .
## About me
Saiem Gilani - Lead engineer and founder of the SportsDataverse <br><a href='https://twitter.com/saiemgilani' target='blank'><img src="https://img.shields.io/twitter/follow/saiemgilani?color=blue&label=%40saiemgilani&logo=twitter&style=for-the-badge" alt="@saiemgilani"/></a> <a href='https://github.com/saiemgilani' target='blank'><img src="https://img.shields.io/github/followers/saiemgilani?color=eee&logo=Github&style=for-the-badge" alt="@saiemgilani"/></a>
### Background
Born and raised a Seminole and a proud Tallahassee native. I am an FSU alumnus in mathematics and went to graduate school at Georgia Tech for analytics. My general domain of work is machine learning and data science with a current focus on sports.
## How did you get started? {.smaller}
I attended the FSU Sports Analytics Summit 2020!
::: columns
::: {.column width="100%"}
I wrote up some of my thoughts and observations on incoming coach Mike Norvell's presentation on a handful of analytics-related topics. This became my first article for [Tomahawk Nation](https://www.tomahawknation.com/), the SBNation blog covering the Seminoles.
```{r, echo=FALSE, eval=TRUE}
knitr::include_graphics("media/tn-article.png")
```
:::
:::
::: callout-tip
## Further reading
- Gilani, Saiem (2020). [*Thoughts on FSU head coach Mike Norvell's sports analytics presentation*](https://www.tomahawknation.com/2020/2/25/21151849/coach-mike-norvell-at-fsu-sports-analytics-summit). Tomahawk Nation.
:::
. . .
## Started meeting great folks online {.smaller}
Simultaneously, I started working with the `{cfbscrapR}` package (now archived) to help write analytics driven articles.
::: columns
::: {.column width="70%"}
I would not be here without my collaborators from the `cfbscrapR` team:
- [Meyappan Subbiah](https://twitter.com/msubbaiah1)
- [Parker Fleming](https://twitter.com/statsowar)
:::
::: {.column width="30%"}
```{r cfbscrapR-logo, eval = TRUE, out.width = "45%", echo = FALSE}
knitr::include_graphics("https://raw.githubusercontent.com/saiemgilani/cfbscrapR/master/man/figures/logo.png")
```
:::
:::
::: columns
::: {.column width="70%"}
I quickly became involved with contributing to my first open-source package on [GitHub](https://github.com/sportsdataverse), eventually becoming a co-author. I then developed the successor to the package, [`{cfbfastR}`](https://cfbfastR.sportsdataverse.org).
:::
::: {.column width="30%"}
```{r cfbfastR-logo-1, eval = TRUE, out.width = "45%", echo = FALSE}
knitr::include_graphics("https://raw.githubusercontent.com/sportsdataverse/cfbfastR/main/man/figures/logo.png")
```
:::
:::
. . .
## Went to another conference {.smaller}
- I used my experience from going to the FSU Sports Analytics Symposium to sharpen my networking and communication skills
- A couple weeks later, I went to the 2020 MIT Sloan Sports Analytics Conference
- Got to meet and see some sports analytics celebrities like Seth Partnow, John Hollinger, and [Alok Pattani](https://youtu.be/eNHlsI_8gaw)
- Competed in the Hackathon, an exceptional opportunity to work with and chat with very talented individuals about shared research ideas and further steps we could take with our projects
## Everything came to a screeching halt
<div class="tenor-gif-embed" data-postid="16564497" data-share-method="host" data-aspect-ratio="1.14286" data-width="70%" style="text-align: center;justify-content: center"><a href="https://tenor.com/view/rudy-gobert-nba-touching-the-mics-utah-jazz-gif-16564497">Rudy Gobert Nba GIF</a>from <a href="https://tenor.com/search/rudy+gobert-gifs">Rudy Gobert GIFs</a></div> <script type="text/javascript" async src="https://tenor.com/embed.js"></script>
. . .
## Then, I had an idea `r emo::ji("light_bulb")`
I had a thought I am sure many of the long-standing members of the sports analytics community has had.
- *what if getting sports data for analysis was easy?*
- *what if we worked together to build the data infrastructure for research?*
- *how much further would we get?*
. . .
## [The SportsDataverse](https://sportsdataverse.org)
::: incremental
- An organization trying to make the sports data and analytics industry more diverse, inclusive, and accessible by providing high-quality resources for end-users and opportunities for practical code skill development for those that join the effort\
`r emo::ji("light_bulb")` + `r emo::ji("computer")` + `r emo::ji("chart_increasing")`
- A set of packages for loading and scraping sports data in R, Python, and Node.js with focus placed on play-by-play data\
<a href="https://r.sportsdataverse.org/" target="_blank" alt="R"> <img src="media/r-project-icon.svg" alt="R" width="40" height="40"/> </a> + <a href="https://py.sportsdataverse.org/" alt="Saiem's Python Packages" target="_blank"> <img src="media/python-original.svg" alt="python" width="40" height="40"/> </a> + <a href="https://js.sportsdataverse.org" target="_blank"> <img src="media/nodejs.png" alt="nodejs" width="60" height="50"/> </a>
:::
. . .
## The strength of the SportsDataverse {.smaller}
::: incremental
- A community of developers committed to developing and maintaining open-source sports data packages and pipelines as on-going public utilities\
`r emo::ji("group")` + `r emo::ji("speech_balloon")` + `r emo::ji("coder")` + `r emo::ji("package")`
- A set of corresponding data repositories which allow fast loading of the data for users and collectively form one of the largest open-source sports data resources with **over 250Gb of data produced** from the packages I contribute to\
`r emo::ji("key")` + `r emo::ji("crown")`
- Our organization helps establish the bench of developers from diverse backgrounds to spearhead projects and make contributions
:::
. . .
## Our progress so far <a href="https://r.sportsdataverse.org/" target="_blank" alt="R"> <img src="media/r-project-icon.svg" alt="R" width="80" height="80"/> </a> {.smaller}
[20+ R packages](https://github.com/sportsdataverse) with over a dozen sports leagues covered.
::: columns
::: {.column width="50%"}
**Pro Leagues**
- NBA
- WNBA
- NBA G-League
- MLB
- NHL
- Premier Hockey Federation
- NWSL
- A boatload of soccer leagues
:::
::: {.column width="50%"}
**Collegiate Leagues**
- College Football
- Men's College Basketball
- Women's College Basketball
- College Baseball
- College Softball
- College Football Recruiting
- College Basketball Recruiting
:::
:::
. . .
## Our progress so far <a href="https://py.sportsdataverse.org/" alt="Saiem's Python Packages" target="_blank"> <img src="media/python-original.svg" alt="python" width="80" height="80"/> </a> + <a href="https://js.sportsdataverse.org" target="_blank"> <img src="media/nodejs.png" alt="nodejs" width="84" height="70"/> </a>
Access to loadable SDV-provided data and functions in the [`sportsdataverse`](https://py.sportsdataverse.org/) python module and access to ESPN endpoints. Additional modules include: [`sportypy`](https://sportypy.sportsdataverse.org/), [`collegebaseball`](https://collegebaseball.readthedocs.io/en/latest/index.html), [`nwslpy`](https://github.com/nwslR/nwslpy), and [`recruitR-py`](https://github.com/sportsdataverse/recruitR-py/) <br> <br>
Access to ESPN endpoints (among other websites) via the [`sportsdataverse`](https://js.sportsdataverse.org/) node.js module for easy web application development.
. . .
## Why use the SportsDataverse? {.smaller}
The first public conversation on the SportsDataverse projects happened at the Carnegie Mellon Sports Analytics Conference. The [paper](https://www.stat.cmu.edu/cmsac/conference/2021/assets/pdf/SaiemGilani.pdf) I wrote for the conference was selected as the winner for the Data and Software contribution, Open Track for their reproducible research competition.
- It was built for you enthusiasts and soon-to-be entrants into the field
- Allows users to quickly access seasons worth of datasets (which are updated nightly via automated [GitHub Actions](https://github.com/sportsdataverse/hoopR-nba-data/actions)) via loading function calls, taking the burden off users to maintain their own web scraping scripts
- This in-turn provides significantly easier opportunities for reproducible research and reporting
::: callout-tip
### Further reading {.smaller}
- Gilani, Saiem (2021). [*The SportsDataverse: An Open Sports Data Initiative*](https://www.stat.cmu.edu/cmsac/conference/2021/assets/pdf/SaiemGilani.pdf). 2021 CMU Sports Analytics Conference.
:::
. . .
## Great... but how does this affect me?{.smaller}
### Well, there is a fairly direct pipeline from...
- contributing to open-source projects
- using open-source resources to create your own open-source sports analytics projects and portfolio
- being an active member of the open-source sports analytics community
### **...to getting a job in Sports Analytics**
::: callout-tip
### More on this topic
- Ventura, Sam (29 October 2022). [*Open-Sourcing the Sports Analytics Hiring Process*](https://youtu.be/_kaGsjbSIPg). 2022 CMU Sports Analytics Conference.
:::
## Using interview projects
- As more teams and organizations adopt these principles, you may get opportunities to share projects created during the interview process on GitHub and in your portfolio
- I am particularly grateful to the Brooklyn Nets for giving me the opportunity to add a project I built in under a couple weeks into my portfolio
[Blazing the Nets](https://blazingthenets.com)
## Get on GitHub! <a href="https://github.com/" alt="GitHub" target="_blank"> <img src="media/github-mark.png" alt="GitHub" width="80" height="80"/> </a>
- Sign up for an account on [GitHub.com](https://github.com)
- If you're a student, sign up for the extremely generous [GitHub Student Developer Pack](https://education.github.com/pack)
- Start sharing your code and projects online **so that people may see them**
- Build out a portfolio of interesting research topics, data visualizations, web applications
## Then share on social media!
- There are countless examples of people getting hired straight off their analysis on Twitter
- Build a following to increase your network reach
- Be prepared to have not nice things said about your work
- Take feedback constructively and incrementally improve your projects
. . .
## Who knows what can happen?
```{r rockets-email, eval = TRUE, out.width = "45%", echo = FALSE}
knitr::include_graphics("media/rockets_email.png")
```
## So, what do you really do? {.smaller}
I make data things work together to create models, reports, and applications for other stakeholders to use in their decision-making (or perhaps for upstream/downstream processes).
::: callout-note
### It's about *knowledge* and not just *code*
Tasks can include:
- Creating a useful model for player and team evaluation
- Producing a nightly report of games and player boxscores using internal metrics and methods
- Developing API's and database methods for evolving data service provider offerings
- ... doing it better, communicating it better
:::
. . .
## Some Inspirations and Heros {.smaller}
#### My beautiful and brilliant wife, Madiha, and my family
::: columns
::: {.column width="50%"}
#### My collaborators from the `cfbfastR` team:
- Akshay Easwaran
- Jared Lee
- Eric Hess
:::
::: {.column width="50%"}
```{r cfbfastR-logo, eval = TRUE, out.width = "20%", echo = FALSE}
knitr::include_graphics("https://raw.githubusercontent.com/saiemgilani/cfbfastR/master/man/figures/logo.png")
```
:::
:::
::: columns
::: {.column width="50%"}
#### The creator of [CollegeFootballData.com](https://collegefootballdata.com):
- Bill Radjewski
:::
::: {.column width="50%"}
```{r cfbd-logo, eval = TRUE, out.width = "23%", echo = FALSE}
knitr::include_graphics("https://raw.githubusercontent.com/saiemgilani/The_SportsDataverse_Initiative/main/figures/CFBDLogo.png")
```
:::
:::
::: columns
::: {.column width="50%"}
#### The `nflverse` team:
- Sebastian Carl
- Ben Baldwin
- Tan Ho
:::
::: {.column width="50%"}
```{r nflverse-logo, eval = TRUE, out.width = "23%", echo = FALSE}
knitr::include_graphics("https://raw.githubusercontent.com/nflverse/nflverse/main/man/figures/logo.png")
```
:::
:::
. . .
## Thank you
- FSU Sports Analytics Club for creating a wonderful speaker series
- The seriously awesome community of developers that helps build and maintain resources
- All y'all for listening in
. . .
## Learn more
- [`sportsdataverse`.org](https://www.sportsdataverse.org/) <a href='https://twitter.com/sportsdataverse' target='blank'><img src="https://img.shields.io/twitter/follow/sportsdataverse?color=blue&label=%40SportsDataverse&logo=twitter&style=for-the-badge" alt="@sportsdataverse"/></a>
- [`cfbfastR` docs](https://saiemgilani.github.io/cfbfastR/) <a href='https://twitter.com/cfbfastR' target='blank'><img src="https://img.shields.io/twitter/follow/cfbfastR?color=blue&label=%40cfbfastR&logo=twitter&style=for-the-badge" alt="@cfbfastR"/></a>
- [Game on Paper](https://gameonpaper.com/cfb/) - for a look at the `sportsdataverse` python package serving live advanced stats with expected points and win probability metrics.
<a href='https://twitter.com/saiemgilani' target='blank'><img src="https://img.shields.io/twitter/follow/saiemgilani?color=blue&label=%40saiemgilani&logo=twitter&style=for-the-badge" alt="@saiemgilani"/></a> <a href='https://github.com/saiemgilani' target='blank'><img src="https://img.shields.io/github/followers/saiemgilani?color=eee&logo=Github&style=for-the-badge" alt="@saiemgilani"/></a>
## Questions? {style="text-align: center;"}