forked from oscarbaruffa/BigBookofR
-
Notifications
You must be signed in to change notification settings - Fork 0
/
020-book_list.Rmd
3363 lines (2148 loc) · 137 KB
/
020-book_list.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
editor_options:
markdown:
wrap: 72
---
# Career & Community
These books aren't all strictly R focussed, but they do have a lot of relevance for many R programmers.
## Build Your Career in Data Science
[Emily Robinson](https://twitter.com/robinson_es) and [Jacqueline Nolis](https://twitter.com/skyetetra)
You are going to need more than technical knowledge to succeed as a data
scientist. Build a Career in Data Science teaches you what school leaves
out, from how to land your first job to the lifecycle of a data science
project, and even how to become a manager.
A lot of free preview, Paid $20
<https://www.manning.com/books/build-a-career-in-data-science>
## Conversations On Data Science
Roger Peng and Hilary Parker
This book collects many of their discussions from the podcast [**Not So
Standard Deviations**](https://soundcloud.com/nssd-podcast) and distills
them into a readable format.
Pay what you want for the ebook, minimum $0.00
<https://leanpub.com/conversationsondatascience>
## Executive Data Science
Brian Caffo, [Roger D. Peng](https://twitter.com/rdpeng), and [Jeffrey
Leek](https://twitter.com/jtleek)
A Guide to Training and Managing the Best Data Scientists. Learn what
you need to know to begin assembling and leading a data science
enterprise.
Pay what you want for the PDF, minimum $0.00
<https://leanpub.com/eds>
## Essays on Data Analysis
Roger Peng
This book draws a complete picture of the data analysis process, filling
out many details that are missing from previous presentations. It
presents a new perspective on what makes for a successful data analysis
and how the quality of data analyses can be judged.
Pay what you want for the ebook, minimum $0.00
<https://leanpub.com/dataanalysisessays>
## Getting Started in Data Science
[Ayodele Odubela](https://twitter.com/DataSciBae)
This book is for anyone intersted in Data Science, but is unsure where
to start. Cut through the noise and learn my best tips for understanding
Machine Learning with insight from my 4 years of industry experience.
Learn the math as it applies to real-life data projects and get an
understanding of fairness, ethics, and accounability in AI.
(Not R focussed but looks good so I'm adding it - Oscar :) )
Paid \~\$30
<https://gumroad.com/l/getting-started-in-data-science>
## Hiring Data Scientists and Machine Learning Engineers
[Roy Keyes](https://twitter.com/roycoding)
It's quite possible that the only thing more confusing than defining data science is actually hiring data scientists. Hiring Data Scientists and Machine Learning Engineers is a concise, practical guide to cut through the confusion. Whether you're the founder of a brand new startup, the senior vice president in charge of "digital transformation" at a global industrial company, the leader of a new analytics effort at a non-profit, or a junior manager of a machine learning team at a tech giant, this book will help walk you through the important questions you need to answer to determine what role and which skills you should hire for, how to source applicants, how to assess those applicants' skills, and how to set your new hires up for success. Special emphasis is placed on in-office vs remote hiring situations.
Paid, varies ~$34
https://leanpub.com/dshiring
## Introduction to Machine Learning Interviews Book
[Chip Huyen](https://huyenchip.com/)
This book is the result of the collective wisdom of many people who have sat on both sides of the table and who have spent a lot of time thinking about the hiring process. It was written with candidates in mind, but hiring managers who saw the early drafts told me that they found it helpful to learn how other companies are hiring, and to rethink their own process.
The book consists of two parts. The first part provides an overview of the machine learning interview process, what types of machine learning roles are available, what skills each role requires, what kinds of questions are often asked, and how to prepare for them. This part also explains the interviewers’ mindset and what kind of signals they look for.
The second part consists of over 200 knowledge questions, each noted with its level of difficulty -- interviews for more senior roles should expect harder questions -- that cover important concepts and common misconceptions in machine learning.
https://huyenchip.com/ml-interviews-book/
## Telling Stories With Data
[Rohan Alexander](https://twitter.com/RohanAlexander)
This aim of this book is to help you learn how to tell stories with data. It establishes a foundation on which you can build and share knowledge, based on data, about an aspect of the world of interest to you.
In this book we explore, prod, push, manipulate, knead, and ultimately, try to understand the implications of, data. The motto of the university from which I took my PhD is ‘Naturam primum cognoscere rerum’ or roughly ‘first to learn the nature of things,’ and we will indeed attempt to do that. But the original quote continues ‘temporis aeterni quoniam,’ or roughly ‘for eternal time,’ and it is tools, approaches, and workflows that enable you to establish lasting knowledge that I focus on in this book.
https://www.tellingstorieswithdata.com/
## Twitter for R Programmers
[Oscar Baruffa](https://twitter.com/OscarBaruffa), [Veerle van
Son](https://twitter.com/veerlevanson)
The R community is very active on Twitter. You can learn a lot about the
language, about new approaches to problems, make friends and even land a
job or next contract. It's a real-time pulse of the R community.What can
you gain from becoming active on Twitter? This book will talk about the
benefits and it will show you how to use Twitter.
<https://www.t4rstats.com>
## Twitter for Scientists
Not R-specific, but as many R users are also Scientists this book can
really help.
<https://t4scientists.com/>
## The Programmer's Brain : What every programmer needs to know about cognition
[Felienne Hermans](https://twitter.com/Felienne)
Explores the way your brain works when it’s thinking about code. In it, you’ll master practical ways to apply these cognitive principles to your daily programming life. You’ll improve your code comprehension by turning confusion into a learning tool, and pick up awesome techniques for reading code and quickly memorizing syntax. This practical guide includes tips for creating your own flashcards and study resources that can be applied to any new language you want to master. By the time you’re done, you’ll not only be better at teaching yourself—you’ll be an expert at bringing new colleagues and junior programmers up to speed.
Free & Paid ~$30
[Manning publications link](https://www.manning.com/books/the-programmers-brain?utm_source=felienne&utm_medium=affiliate&utm_campaign=book_hermans2_programmers_12_8_20&a_aid=felienne&a_bid=d7c7c538)
## Project Management Fundamentals for Data Analysts
[Oscar Baruffa](https://twitter.com/OscarBaruffa)
In Project Management Fundamentals for Data Analysts, I’ve boiled the concepts down to the bare essentials which can be read in under 15 minutes – you can certainly fit that into your crazy schedule (and it will help your future schedule not be so chaotic!).
These concepts can be used to great effect on their own if you wish to never read another word on the topic. It’ll also provide a solid foundation if you want to dive deeper into more formal courses or sophisticated theory.
Paid $15
https://oscarbaruffa.com/pm/
# Archeology
## How To Do Archaeological Science Using R
Ben Marwick (editor)
Archaeological science is becoming increasingly complex, and progress in this area is slowed by critical limitation of journal articles lacking the space to communicate new methods in enough detail to allow others to reproduce and reuse new research. One solution to this is to use a programming language such as R to analyse archaeological data, with authors sharing their R code with their publications to communicate our methods. This practice is becoming widespread in many other disciplines, but few archaeologists currently know how to use R or have an opportunity to learn during their training. In this forum we tackle this problem by discussing ubiquitous research methods of immediate relevance to most archaeologists, by using interactive, live-coded demonstrations of R code by archaeologists who program with R. Topics include getting data into R, working with C14 dates, spatial analysis and map-making, conducting simulations, and exploratory data visualizations.
https://benmarwick.github.io/How-To-Do-Archaeological-Science-Using-R/
## Quantitative Methods in Archaeology Using R
The first hands-on guide to using the R statistical computing system written specifically for archaeologists. It shows how to use the system to analyze many types of archaeological data. Part I includes tutorials on R, with applications to real archaeological data showing how to compute descriptive statistics, create tables, and produce a wide variety of charts and graphs. Part II addresses the major multivariate approaches used by archaeologists, including multiple regression (and the generalized linear model); multiple analysis of variance and discriminant analysis; principal components analysis; correspondence analysis; distances and scaling; and cluster analysis. Part III covers specialized topics in archaeology, including intra-site spatial analysis, seriation, and assemblage diversity.
Loan or buy ~$100
[Cambridge COre site](https://www.cambridge.org/core/books/quantitative-methods-in-archaeology-using-r/DEAE593FA2418EA3B8ECD538C34ED2D5?fbclid=IwAR0guclfEtttfDkVKNUJWfhQ1wgUlXSKAIA3f_6D3hS_9EkUKivSY9AyFD8)
# Big Data
## Exploring, Visualizing, and Modeling Big Data with R
[Okan Bulut](https://twitter.com/drokanbulut), [Christopher
Desjardins](https://github.com/cddesja)
Working with BIG DATA requires a particular suite of data analytics
tools and advanced techniques, such as machine learning (ML). Many of
these tools are readily and freely available in R. This full-day session
will provide participants with a hands-on training on how to use data
analytics tools and machine learning methods available in R to explore,
visualize, and model big data.
<https://okanbulut.github.io/bigdata/>
## Mastering Spark with R
Javier Luraschi, Kevin Kuo, Edgar Ruiz
In this book you will learn how to use Apache Spark with R. The book
intends to take someone unfamiliar with Spark or R and help you become
proficient by teaching you a set of tools, skills and practices
applicable to large-scale data science.
PS the first chapter has a Jon Snow quote ;)
<https://therinspark.com/>
# Blogdown
## blogdown: Creating Websites with R Markdown
We introduce an R package, blogdown, in this short book, to teach you
how to create websites using R Markdown and Hugo.
<https://bookdown.org/yihui/blogdown/>
## Create, Publish, and Analyze Personal Websites Using R and RStudio
[Danny Morris](https://r4sites-book.netlify.app/get-help.html#connect-with-me)
A free, digital handbook with step-by-step instructions for launching your own personal website using R, RStudio, and other freely available technologies including GitHub, Hugo, Netlify, and Google Analytics.
https://r4sites-book.netlify.app/
# Bookdown
## bookdown: Authoring Books and Technical Documents with R Markdown
This short book introduces an R package, bookdown, to change your
workflow of writing books. It should be technically easy to write a
book, visually pleasant to view the book, fun to interact with the book,
convenient to navigate through the book, straightforward for readers to
contribute or leave feedback to the book author(s), and more
importantly, authors should not always be distracted by typesetting
details.
<https://bookdown.org/yihui/bookdown/>
## A Minimal Book Example
This is a sample book written in Markdown.
<https://benmarwick.github.io/bookdown-ort/>
# Data Science
## R for Data Science
[Hadley Wickham](https://twitter.com/hadleywickham) [Garret
Grolemund](https://twitter.com/StatGarrett)
This is the website for "R for Data Science". This book will teach you
how to do data science with R: You'll learn how to get your data into R,
get it into the most useful structure, transform it, visualise it and
model it. In this book, you will find a practicum of skills for data
science. Just as a chemist learns how to clean test tubes and stock a
lab, you'll learn how to clean data and draw plots---and many other
things besides. These are the skills that allow data science to happen,
and here you will find the best practices for doing each of these things
with R. You'll learn how to use the grammar of graphics, literate
programming, and reproducible research to save time. You'll also learn
how to manage cognitive resources to facilitate discoveries when
wrangling, visualising, and exploring data.
<https://r4ds.had.co.nz/>
## R for Data Science Solutions
Solutions for the hadley and Grolemund R4Ds book
<https://jrnold.github.io/r4ds-exercise-solutions/>
*Yet another 'R for Data Science' study guide*
An alternative set of solutions for R4Ds.
<https://brshallo.github.io/r4ds_solutions/>
## Everyday Data Science
Andrew Carr
Everyday data science is a collection of tools and techniques you can use to master data science in your day-to-day life. There are case studies, tutorials, code snippets, pictures, math, and jokes. All designed as a fun introduction to the world of data science. Some example chapters include, A/B testing to make perfect lemonade, word vectors to improve your resume, differential equations for weight loss, and how a man used statistics to qualify for the Olympics. Life is full of decisions. We, as people, have the remarkable ability to make decisions in the face of uncertainty. We, as humans, have only recently developed the ability to use computers to process vast amounts of data to improve our decision making. This innovation has led to the development of the field of Data Science. This book is written to give tools and inspiration to aspiring decision makers. You make decisions daily and the methodology of data science can help.
Paid ~$8
https://gumroad.com/l/everydaydata
## An Introduction to Data Analysis
Michael Franke
his book provides basic reading material for an introduction to data analysis. It uses R to handle, plot and analyze data. After covering the use of R for data wrangling and plotting, the book introduces key concepts of data analysis from a Bayesian and a frequentist tradition. This text is intended for use as a first introduction to statistics for an audience with some affinity towards programming, but no prior exposition to R.
https://michael-franke.github.io/intro-data-analysis/index.html
## Introduction to Data Science
Rafael A Irizarry
The demand for skilled data science practitioners in industry, academia,
and government is rapidly growing. This book introduces concepts and
skills that can help you tackle real-world data analysis challenges. It
covers concepts from probability, statistical inference, linear
regression, and machine learning. It also helps you develop skills such
as R programming, data wrangling with dplyr, data visualization with
ggplot2, algorithm building with caret, file organization with
UNIX/Linux shell, version control with Git and GitHub, and reproducible
document preparation with knitr and R markdown.
<https://rafalab.github.io/dsbook/>
Pay what you want for PDF, minimum \$0.00
<https://leanpub.com/datasciencebook>
## Data Science: A First Introduction
[Tiffany-Anne Timbers](https://twitter.com/TiffanyTimbers) Trevor
Campbell Melissa Lee
This is an open source textbook aimed at introducing undergraduate
students to data science. It was originally written for the University
of British Columbia's DSCI 100 - Introduction to Data Science course. In
this book, we define data science as the study and development of
reproducible, auditable processes to obtain value (i.e., insight) from
data.
<https://ubc-dsci.github.io/introduction-to-datascience/>
## Data Science at the Command Line, 2e
Jeroen Janssens
This book is about doing data science at the command line. Our aim is to make you a more efficient and productive data scientist by teaching you how to leverage the power of the command line.
https://www.datascienceatthecommandline.com/2e/
## Practical Data Science with R, Second Edition
Nina Zumel and John Mount
Practical Data Science with R, Second Edition takes a practice-oriented
approach to explaining basic principles in the ever expanding field of
data science. You'll jump right to real-world use cases as you apply the
R programming language and statistical analysis techniques to carefully
explained examples based in marketing, business intelligence, and
decision support.
<https://www.manning.com/books/practical-data-science-with-r-second-edition#toc>
## R Programming for Data Science
Roger Peng
This book is about the fundamentals of R programming. You will get
started with the basics of the language, learn how to manipulate
datasets, how to write functions, and how to debug and optimize code.
With the fundamentals provided in this book, you will have a solid
foundation on which to build your data science toolbox.
<https://bookdown.org/rdpeng/rprogdatascience/>
## Exploratory Data Analysis... by Roger D. Peng
Roger Peng
This book teaches you to use R to effectively visualize and explore
complex datasets. Exploratory data analysis is a key part of the data
science process because it allows you to sharpen your question and
refine your modeling strategies. This book is based on the
industry-leading Johns Hopkins Data Science Specialization
Pay what you want, minimum \$0.00
<https://leanpub.com/exdata>
## edav.info/
Zach Bogart, Joyce Robbins
With this resource, we try to give you a curated collection of tools and
references that will make it easier to learn how to work with data in R.
In addition, we include sections on basic chart types/tools so you can
learn by doing.
There are also several walkthroughs where we work with data and discuss
problems as well as some tips/tricks that will help you.
<https://edav.info/>
## APS 135: Introduction to Exploratory Data Analysis with R
Dylan Z. Childs
This is the online course book for the Introduction to Exploratory Data
Analysis with R component of APS 135, a module taught by the Department
and Animal and Plant Sciences at the University of Sheffield. You will
be introduced to the R ecosystem.You will learn how to use R to carry
out data manipulation and visualisation.This book provides a foundation
for learning statistics later on.
<https://dzchilds.github.io/eda-for-bio/>
## The Art of Data Science
[Roger D. Peng](https://twitter.com/rdpeng) and Elizabeth Matsui
A Guide for Anyone Who Works with Data
This book describes the process of analyzing data. The authors have
extensive experience both managing data analysts and conducting their
own data analyses, and this book is a distillation of their experience
in a format that is applicable to both practitioners and managers in
data science. Printed copies are available through
[Lulu](https://www.lulu.com/content/paperback-book/the-art-of-data-science/18733039).
Pay what you want for the ebook, minimum \$0.00
<https://leanpub.com/artofdatascience>
## The Elements of Data Analytic Style
[Jeffrey Leek](https://twitter.com/jtleek)
Data analysis is at least as much art as it is science. This book is
focused on the details of data analysis that sometimes fall through the
cracks in traditional statistics classes and textbooks. It is based in
part on the authors blog posts, lecture materials, and tutorials.
Pay what you want for the ebook, minimum \$0.00
<https://leanpub.com/datastyle>
## Beginning Data Science in R
[Thomas Mailund](https://twitter.com/ThomasMailund)
Beginning Data Science in R details how data science is a combination of
statistics, computational science, and machine learning. You'll see how
to efficiently structure and mine data to extract useful patterns and
build mathematical models. Those with some data science or analytics
background, but not necessarily experience with the R programming
language
Paid, \~\$40
<https://amzn.to/2Ns1HHi>
## Business Intelligence with R
[Dwight Barry](https://twitter.com/healthstatsdude)
A desktop reference for busy professionals, giving you fingertip access
to a variety of BI analytic methods done in R as simply as possible.
All proceeds will support mitochondrial disorder research at Seattle
Children's Hospital.
Free or up to \$20 for a good cause!
<https://leanpub.com/businessintelligencewithr>
## R Data Science Quick Reference
[Thomas Mailund](https://twitter.com/ThomasMailund)
In this book, you'll learn about the following APIs and packages that
deal specifically with data science applications: readr, dibble,
forecasts, lubridate, stringr, tidyr, magnittr, dplyr, purrr, ggplot2,
modelr, and more.
Paid, \~\$30
<https://amzn.to/2WN1mQy>
## Modern Data Science with R
Benjamin S. Baumer, Daniel T. Kaplan, and Nicholas J. Horton
This book is intended for readers who want to develop the appropriate skills to tackle complex data science projects and “think with data” (as coined by Diane Lambert of Google). The desire to solve problems using data is at the heart of our approach.
We acknowledge that it is impossible to cover all these topics in any level of detail within a single book: Many of the chapters could productively form the basis for a course or series of courses. Instead, our goal is to lay a foundation for analysis of real-world data and to ensure that analysts see the power of statistics and data analysis. After reading this book, readers will have greatly expanded their skill set for working with these data, and should have a newfound confidence about their ability to learn new technologies on-the-fly.
This book was originally conceived to support a one-semester, 13-week undergraduate course in data science. We have found that the book will be useful for more advanced students in related disciplines, or analysts who want to bolster their data science skills. At the same time, Part I of the book is accessible to a general audience with no programming or statistics experience.
https://mdsr-book.github.io/mdsr2e/
## Modern Statistics with R
[Måns Thulin](https://twitter.com/mansthulin)
This book covers the fundamentals of data science and statistics. The
first half deals with the basics of R and R coding, data wrangling,
exploratory data analysis and more advandced programming. The second
half deals with modern statistics (favouring permutation tests, the
bootstrap and Bayesian methods over traditional asymptotic methods),
regression models and predictive modelling. It also contains information
about debugging and explanations of 25 commonly encountered error
messages in R. In addition, there are 170 or so exercises with fully
worked solutions.
<http://www.modernstatisticswithr.com/>
## Model-Based Clustering and Classification for Data Science
[Charles Bouveyron](https://twitter.com/cbouveyron), [Gilles
Celeux](https://www.imo.universite-paris-saclay.fr/~celeux/), [T.
Brendan Murphy](https://twitter.com/tbmurphy), and [Adrian E.
Raftery](https://twitter.com/AdrianRaftery1)
Among the broad field of statistical and machine learning, model-based
techniques for clustering and classification have a central position for
anyone interested in exploiting those data. This text book focuses on
the recent developments in model-based clustering and classification
while providing a comprehensive introduction to the field. It is aimed
at advanced undergraduates, graduates or first year PhD students in data
science, as well as researchers and practitioners.
<https://math.unice.fr/~cbouveyr/MBCbook/>
# Data Visualization
## ggplot2: Elegant Graphics for Data Analysis
Hadley Wickham
ggplot2 is an R package for producing statistical, or data, graphics.
Unlike most other graphics packages, ggplot2 has an underlying grammar,
based on the Grammar of Graphics (Wilkinson 2005), that allows you to
compose graphs by combining independent components. This makes ggplot2
powerful. Rather than being limited to sets of pre-defined graphics, you
can create novel graphics that are tailored to your specific problem.
<https://ggplot2-book.org/>
## A ggplot2 Tutorial for Beautiful Plotting in R
[Cédric Sherer](https://twitter.com/CedScherer)
(Oscar: Not a book per se, but it should be, so I'm adding !)
A mega tutorial of creating great ggplot2 visuals.
<https://cedricscherer.netlify.app/2019/08/05/a-ggplot2-tutorial-for-beautiful-plotting-in-r/>
## ggplot2 in 2
Lucy D'Agostino McGowan
Pay what you want, minimum \$4.99
Really good overview of ggplot2. The premise is that you'll cover the
fundamentals in 2 hours. Oscar Baruffa made a sped-up
[screencast](https://youtu.be/_G7_J8M9588) while working through it. It
did take 2 hours :).
<https://leanpub.com/ggplot2in2>
## Data Visualization - A practical introduction
[Kieran Healy](https://twitter.com/kjhealy)
This book is a hands-on introduction to the principles and practice of
looking at and presenting data using R and ggplot.
The book is free online.
<https://socviz.co/>
## Data Processing & Visualization
This document provides some tools, demonstrations, and more to make data
processing, programming, modeling, visualization, and presentation
easier.While the programming language focus is on R, where applicable
(which is most of the time), Python notebooks are also available,.
<https://m-clark.github.io/data-processing-and-visualization/>
## Data Visualization in R
Brooke Anderson
Workshop for the 2019 Navy and Marine Corps Public Health Conference. I
have based this workshop on examples for you to try yourself, because
you won't be able to learn how to program unless you try it out. I've
picked example data that I hope will be interesting to Navy and Marine
Corp public health researchers and practitioners.
<https://geanders.github.io/navy_public_health/index.html#prerequisites>
## Data Visualization with R
Rob Kabakoff
This book helps you create the most popular visualizations - from quick
and dirty plots to publication-ready graphs. The text relies heavily on
the ggplot2 package for graphics, but other approaches are covered as
well.
<https://rkabacoff.github.io/datavis/>
## Data visualisation using R, for researchers who don’t use R
[Emily Nordmann](https://twitter.com/emilynordmann), [Phil McAleer](https://twitter.com/McAleerP), [Wilhelmiina Toivo](https://twitter.com/wtoivo1), [Helena Paterson](https://twitter.com/PatersonHelena), [Lisa DeBruine](https://twitter.com/LisaDeBruine)
In this tutorial, we aim to provide a practical introduction to data visualisation using R, specifically aimed at researchers who have little to no prior experience of using R. First we detail the rationale for using R for data visualisation and introduce the “grammar of graphics” that underlies data visualisation using the ggplot package. The tutorial then walks the reader through how to replicate plots that are commonly available in point-and-click software such as histograms and boxplots, as well as showing how the code for these “basic” plots can be easily extended to less commonly available options such as violin-boxplots.
https://psyteachr.github.io/introdataviz/
## R Graphics Cookbook, 2nd edition
The goal of the cookbook is to provide solutions to common tasks and
problems in analyzing data.
The book is free online.
<https://r-graphics.org/>
## plotly Interactive web-based data visualization with R, plotly, and shiny
Carson Sievert
In this book, you'll gain insight and practical skills for creating
interactive and dynamic web graphics for data analysis from R. It makes
heavy use of plotly for rendering graphics, but you'll also learn about
other R packages that augment a data science workflow, such as the
tidyverse and shiny. Along the way, you'll gain insight into best
practices for visualization of high-dimensional data, statistical
graphics, and graphical perception.
<https://plotly-r.com/>
## Hands-On Data Visualization: Interactive Storytelling from Spreadsheets to Code
Jack Dougherty, Ilya Ilyankou
(Oscar: looks like am amazing resource and includes code templates!)
In this book, you'll learn how to create true and meaningful data
visualizations through chapters that blend design principles and
step-by-step tutorials, in order to make your information-based analysis
and arguments more insightful and compelling. Just as sentences become
more persuasive with supporting evidence and source notes, your
data-driven writing becomes more powerful when paired with appropriate
tables, charts, or maps. Words tell us stories, but visualizations show
us data stories by transforming quantitative, relational, or spatial
patterns into images. When visualizations are well-designed, they draw
our attention to what is most important in the data in ways that would
be difficult to communicate through text alone.
<https://handsondataviz.org/>
## BBC Visual and Data Journalism cookbook for R graphics
At the BBC data team, we have developed an R package and an R cookbook
to make the process of creating publication-ready graphics in our
in-house style using R's ggplot2 library a more reproducible process, as
well as making it easier for people new to R to create graphics.
<https://bbc.github.io/rcookbook/>
## Fundamentals of Data Visualization
[Claus Wilke](https://twitter.com/ClausWilke)
The book is meant as a guide to making visualizations that accurately
reflect the data, tell a story, and look professional.
The book is free online.
<https://clauswilke.com/dataviz/>
## Graphical Data Analysis with R
[Antony Unwin](http://www.gradaanwr.net/author/)
The main aim of the book is to show, using real datasets, what
information graphical displays can reveal in data. The target readership
includes anyone carrying out data analyses who wants to understand their
data using graphics.
The book is published by CRC Press and [available to
purchase](https://www.routledge.com/Graphical-Data-Analysis-with-R/Unwin/p/book/9781498715232),
but all the examples and code are freely available on a comprehensive
website accompanying the text at <http://www.gradaanwr.net/>
## JavaScript for R
[John Coene](https://john-coene.com)
Learn how to build your own data visualisation packages, improve shiny
with JavaScript, and use JavaScript for computations.
Freely available online, paid print.
<https://javascript-for-r.com>
# Field specific
## Analyzing Financial and Economic Data with R
Marcelo S. Perlin
Not surprisingly, fields with abundant access to data and practical
applications, such as economics and finance, it is expected that a
graduate student or a data analyst has learned at least one programming
language that allows him/her to do his work efficiently. Learning how to
program is becoming a requisite for the job market.
<https://www.msperlin.com/afedR/>
## Computer-age Calculus with R
Daniel Kaplan
R is closely associated with statistics, but not with calculus. It turns
out that R is an excellent language for doing calculus.
This book shows how to do common calculus calculations using R.
<https://dtkaplan.github.io/RforCalculus/>
## Data Science in Education Using R
Ryan A. Estrellado, Emily A. Bovee, Jesse Mostipak, Joshua M. Rosenberg,
and Isabella C. Velásquez
Dear Data Scientists, Educators, and Data Scientists who are Educators:
This book is a warm welcome and an invitation. If you're a data
scientist in education or an educator in data science, your role isn't
exactly straightforward. This book is our contribution to a growing
movement to merge the paths of data analysis and education. We wrote
this book to make your first step on that path a little clearer and a
little less scary.
<https://datascienceineducation.com/>
## Data Skills for Reproducible Science
[PsyTeachR team, University of Glasgow](https://psyteachr.github.io/)
This course provides an overview of skills needed for reproducible
research and open science using the statistical programming language R.
Students will learn about data visualisation, data tidying and
wrangling, archiving, iteration and functions, probability and data
simulations, general linear models, and reproducible workflows. Learning
is reinforced through weekly assignments that involve working with
different types of data.
<https://psyteachr.github.io/msc-data-skills/>
## Discrete Data Analysis with R: Visualization and Modeling Techniques for Categorical and Count Data
Michael Friendly, David Meyer
Presents an applied treatment of modern methods for the analysis of categorical data, both discrete response data and frequency data.
It explains how to use graphical methods for exploring data, spotting unusual features, visualizing fitted models, and presenting results.
Paid ~$80
http://ddar.datavis.ca/
## Learning Microeconometrics with R
Christopher P. Adams
This book provides an introduction to the field of microeconometrics
through the use of R. The focus is on applying current learning from the
field to real world problems. It uses R to both teach the concepts of
the field and show the reader how the techniques can be used. It is
aimed at the general reader with the equivalent of a bachelor's degree
in economics, statistics or some more technical field. It covers the
standard tools of microeconometrics, OLS, instrumental variables,
Heckman selection and difference in difference. In addition, it
introduces bounds, factor models, mixture models and empirical Bayesian
analysis.
Paid \~\$100
<https://www.routledge.com/Learning-Microeconometrics-with-R/Adams/p/book/9780367255381>
## Public Policy Analytics: Code & Context for Data Science in Government
Ken Steif, Ph.D
The goal of this book is to make data science accessible to social scientists and City Planners, in particular. I hope to convince readers that one with strong domain expertise plus intermediate data skills can have a greater impact in government than the sharpest computer scientist who has never studied economics, sociology, public health, political science, criminology etc.
https://urbanspatial.github.io/PublicPolicyAnalytics/
## Handbook of Regression Modeling in People Analytics
[Keith McNulty](https://twitter.com/dr_keithmcnulty)
It is the author's firm belief that all people analytics professionals
should have a strong understanding of regression models and how to
implement and interpret them in practice, and the aim with this book is
to provide those who need it with help in getting there.
<http://peopleanalytics-regression-book.org/index.html>
For accompanying solutions to some of the questions
<https://keithmcnulty.github.io/peopleanalytics-regression-book/solutions/>
## R for Excel users
Julie Lowndes & Allison Horst
This course is for Excel users who want to add or integrate R and RStudio into their existing data analysis toolkit. It is a friendly intro to becoming a modern R user, full of tidyverse, RMarkdown, GitHub, collaboration & reproducibility.
https://rstudio-conf-2020.github.io/r-for-excel/
## R Programming with Minecraft
Brooke Anderson, Karl Broman, Gergely Daróczi, Mario Inchiosa, David
Smith, and Ali Zaidi
Minecraft is awesome fun, especially in creative mode, where you can
build all sorts of crazy stuff. But ambitious building projects can be
really tedious to create by hand. With the miner R package, you can
write R code to manipulate your Minecraft world and create even more
awesome stuff.
Here's an introduction Rstats NYC conference talk on it:
<https://www.youtube.com/watch?v=r_JgPF8MJpY>
<https://kbroman.org/miner_book/?s=09>
## R for SEO
[François Joly](https://twitter.com/tuf)
Even though R' is a terrific option for SEO, there are simply not enough resources out there.
This guide is not here to deliver a course about R, there are plenty already. This guide is meant to be as practical as possible. How things should be done in an "R-ish way" is not the purpose of this guide. Grab what you want to grab and feel free to submit your own solution.
https://www.rforseo.com/
## R for Water Resources Data Science
[Ryan Peek](https://ryanpeek.org/) and [Rich Pauloo](https://www.richpauloo.com/)
Consists of 2 courses
Introductory:
This course is most relevant and targeted at folks who work with data, from analysts and program staff to engineers and scientists. This course provides an introduction to the power and possibility of a reproducible programming language (R) by demonstrating how to import, explore, visualize, analyze, and communicate different types of data. Using water resources based examples, this course guides participants through basic data science skills and strategies for continued learning and use of R.
Intermediate:
In this course, we will move more quickly, assume familiarity with basic R skills, and also assume that the participant has working experience with more complex workflows, operations, and code-bases. Each module in this course functions as a “stand-alone” lesson, and can be read linearly, or out of order according to your needs and interests. Each module doesn’t necessarily require familiarity with the previous module.
This course emphasizes intermediate scripting skills like iteration, functional programming, writing functions, and controlling project workflows for better reproducibility and efficiency. Approaches to working with more complex data structures like lists and timeseries data, the fundamentals of building Shiny Apps, pulling water resources data from APIs, intermediate mapmaking and spatial data processing, integrating version control in projects with git.
https://www.r4wrds.com/
## Technical Foundations of Informatics
Michael Freeman and Joel Ross
This book covers the foundation skills necessary to start writing
computer programs to work with data using modern and reproducible
techniques. It requires no technical background. These materials were
developed for the INFO 201: Technical Foundations of Informatics course
taught at the University of Washington Information School; however they
have been structured to be an online resource for anyone hoping to learn
to work with information using programmatic approaches.
<https://info201.github.io/>
## An introduction to quantitative analysis of political data in R
[Erik Gahner Larsen](https://twitter.com/erikgahner) & [Zoltán
Fazekas](https://twitter.com/fazol)
In this book, we aim to provide an easily accessible introduction to R
for the collection, study and presentation of different types of
political data. Specifically, the book will teach you how to get
different types of political data into R and manipulate, analyze and
visualize the output. In doing this, we will not only teach you how to
get existing data into R, but also how to collect your own data.
<http://qpolr.com/>
## Machine Learning for Factor Investing
[Guillaume Coqueret](https://twitter.com/g_coqueret) and [Tony
Guida](https://twitter.com/TonyGUIDA_Quant)
This book is intended to cover some advanced modelling techniques
applied to equity investment strategies that are built on firm
characteristics.
<http://www.mlfactor.com/>
## Introduction to Econometrics with R
Christoph Hanck, Martin Arnold, Alexander Gerber, and Martin Schmelzer
Instead of confronting students with pure coding exercises and
complementary classic literature like the book by Venables & Smith
(2010), we figured it would be better to provide interactive learning
material that blends R code with the contents of the well-received
textbook Introduction to Econometrics by Stock & Watson (2015) which
serves as a basis for the lecture.
<https://www.econometrics-with-r.org/>
## How to be a modern scientist
[Jeffrey Leek](https://twitter.com/jtleek)
A book about how to be a scientist the modern, open-source way. The face
of academia is changing. It is no longer sufficient to just publish or
perish. We are now in an era where Twitter, Github, Figshare, and Alt
Metrics are regular parts of the scientific workflow. Here I give high
level advice about which tools to use, how to use them, and what to look
out for. This book is appropriate for scientists at all levels who want
to stay on top of the current technological developments affecting
modern scientific careers.
Pay what you want for the ebook, minimum \$0.00
<https://leanpub.com/modernscientist>
## Cryptocurrency Research: Open Source R Tutorial
Riccardo (Ricky) Esclapon --
[LinkedIn](https://www.linkedin.com/in/esclaponriccardo/), [Personal
Website](https://resclapon.com/)
John Chandler Johnson --
[LinkedIn](https://www.linkedin.com/in/john-chandler-johnson-361a666/)
Kai R. Larsen --
[LinkedIn](https://www.linkedin.com/in/kai-r-larsen-4413a01/),
[ResearchGate](https://www.researchgate.net/profile/Kai_Larsen)
**What you will learn**:
R: The tutorial is in R. For those without experience programming in R
we have a [high-level
version](https://cryptocurrencyresearch.org/high-level) to help you
learn before attempting the full version. Scroll down for a [breakdown
of the individual
sections](https://cryptocurrencyresearch.org/index.html#sections) for an
overview of what you will learn throughout.
Tidyverse: You will get more familiar with tools from the tidyverse,
including [dplyr](https://dplyr.tidyverse.org/),
[ggplot2](https://ggplot2.tidyverse.org/),
[tibble](https://tibble.tidyverse.org/), and
[purrr](https://purrr.tidyverse.org/). These tools provide an excellent
complete ecosystem to do data science in R.
Machine Learning: You will learn to [create machine learning
models](https://cryptocurrencyresearch.org/predictive-modeling.html) and
how to fairly [assess their
performance](https://cryptocurrencyresearch.org/model-validation-plan.html).
Cryptocurrency Data: You will learn these tools analyzing the latest
cryptocurrency data. The tutorial automatically refreshes every 12 hours
and the data is publicly available and refreshed hourly.
This tutorial is free and you can access it via
<https://cryptocurrencyresearch.org/>.
# Getting, cleaning and wrangling data
## A Beginner's Guide to Clean Data - beginners-guide-to-clean-data
Benjamin Greve
This book will help you to become a better data scientist by showing you
the things that can go wrong when working with data - particularly
low-quality data. A key difference between a junior and a senior data
scientist is the awareness of potential pitfalls. The experienced data
scientist will expect them, navigate around them and avoid costly
iteration cycles. After reading this book, you will be able to spot data
quality problems and deal with them before they can break your work,
saving yourself a lot of time.
<https://b-greve.gitbook.io/beginners-guide-to-clean-data/>
## 21 Recipes for Mining Twitter Data with rtweet
Bob Rudis
The recipes contained in this book use the rtweet package by Michael W.
Kearney.
<https://rud.is/books/21-recipes/>
## Text Mining with R
[Julia Silge](https://twitter.com/juliasilge) and [David
Robinson](https://twitter.com/drob)