Skip to content

Commit

Permalink
upload
Browse files Browse the repository at this point in the history
upload files
  • Loading branch information
englianhu committed Oct 27, 2015
0 parents commit 22adfe8
Show file tree
Hide file tree
Showing 95 changed files with 71,779 additions and 0 deletions.
Binary file added .RData
Binary file not shown.
29 changes: 29 additions & 0 deletions .cache/__packages
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
base
BBmisc
devtools
stringr
stringi
reshape
reshape2
data.table
plyr
dplyr
magrittr
foreach
iterators
doParallel
knitr
rmarkdown
tidyr
gtable
gridExtra
pander
stringdist
slidify
RColorBrewer
leaflet
installr
plot3D
markdown
broman
foreign
Binary file not shown.
Empty file.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Empty file.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
18,205 changes: 18,205 additions & 0 deletions .cache/read-datasetB_6da8ccf3a6b94806526e2dce4d84e550.rdb

Large diffs are not rendered by default.

Binary file not shown.
Binary file not shown.
Empty file.
Binary file added .cache/setting_6be3257ff165449d5a9085a1de766341.rdx
Binary file not shown.
Binary file not shown.
Empty file.
Binary file not shown.
14 changes: 14 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# History files
.Rhistory
.Rapp.history

# Example code in package build process
*-Ex.R

# RStudio files
.Rproj.user/

# produced vignettes
vignettes/*.html
vignettes/*.pdf
.Rproj.user
673 changes: 673 additions & 0 deletions Betting Strategy and Model Validation.Rmd

Large diffs are not rendered by default.

13 changes: 13 additions & 0 deletions Betting-Strategy-and-Model-Validation.Rproj
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Version: 1.0

RestoreWorkspace: Default
SaveWorkspace: Default
AlwaysSaveHistory: Default

EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 2
Encoding: UTF-8

RnwWeave: Sweave
LaTeX: pdfLaTeX
340 changes: 340 additions & 0 deletions LICENSE

Large diffs are not rendered by default.

351 changes: 351 additions & 0 deletions Natural Language Analysis.Rmd

Large diffs are not rendered by default.

31 changes: 31 additions & 0 deletions Natural_Language_Analysis_cache/html/__packages
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
base
methods
datasets
utils
grDevices
graphics
stats
scimapClient
BBmisc
devtools
stringr
stringi
reshape
reshape2
data.table
DT
plyr
dplyr
magrittr
foreach
iterators
parallel
doParallel
rmarkdown
tidyr
grid
gtable
gridExtra
pander
stringdist
knitr
Binary file not shown.
Binary file not shown.
Binary file not shown.
170 changes: 170 additions & 0 deletions PL.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
## Loading the packages
if(!'BBmisc' %in% installed.packages()){
install.packages('BBmisc')}
if(!'BiocParallel' %in% installed.packages()){
source("http://bioconductor.org/biocLite.R")
biocLite("BiocParallel")}
if(!'seleniumJars' %in% installed.packages()){
install_github('LluisRamon/seleniumJars')}

suppressPackageStartupMessages(library('BBmisc'))
suppressPackageStartupMessages(lib(c('zoo','stringi','stringr','reshape','reshape2','plyr','dplyr','magrittr',
'ggplot2','ggthemes','plotly','foreach','memoise','doMC','doParallel','BiocParallel',
'markdown','parallel','rmarkdown','manipulate','knitr','turner','scales',
'lubridate','whisker'))) #'RStudioAMI','editR'

## ---------------------------------------------------------------------------------------------
## http://stackoverflow.com/questions/22954623/view-markdown-generated-html-in-rstudio-viewer
render(paste0(getwd(),'/Betting Strategy and Model Validation.Rmd'),'all')
#'@ View(paste0(getwd(),'/Betting_Strategy_and_Model_Validation.html'))
browseURL(paste0(getwd(),'/Betting Strategy and Model Validation.html'))

## https://github.com/swarm-lab/editR
editR(paste0(getwd(),'/Betting Strategy and Model Validation.Rmd'))


## Besides, need to scrap the final-scores / half-time scores / result of soccer matches
teamID <- sort(unique(c(as.character(mbase$Home), as.character(mbase$Away))))
dateID <- sort(unique(mbase$Date)); spboDate <- gsub('-','',dateID)
lnk <- paste0('http://www8.spbo.com/history.plex?day=',spboDate,'&l=en')

## http://stackoverflow.com/questions/2158780/r-catching-an-error-and-then-branching-logic
## http://www.win-vector.com/blog/2012/10/error-handling-in-r/
## Due to the scrapSPBO function scrapped unmatched data, example lnk[827],
## therefore I rewrite the function as scrapSPBO2
source(paste0(getwd(),'/function/scrapSPBO2.R'))
scrapSPBO2(lnk=lnk, dateID=dateID, path='livescore', parallel=FALSE)

## Read scraped spbo datasets
source(paste0(getwd(),'/function/readSPBO.R'))
spboData <- readSPBO(dateID=dateID, path='livescore', parallel=FALSE)



## https://github.com/pablobarbera/instaR
## https://github.com/pablobarbera/Rfacebook
install_github ## can try during free time
## ---------------------------------------------------------------------------------------------
## Load the scraped spbo livescore datasets.
##... will take some times since dim spboData [156841 x 17]
source(paste0(getwd(),'/function/readSPBO2.R'))
spboData <- readSPBO2(dateID=dateID, parallel=TRUE)

## filter spboTeamID
spboTeamID <- sort(c(unique(as.vector(spboData$Home)),unique(as.vector(spboData$Away))))
tmID <- teamID[!teamID %in% mbase$others]

spboData[(is.na(spboData$Date))&(nchar(as.vector(spboData$Time))==5),]
spboData[subset(spboData, (is.na(data.frame(spboData)$Date))&(nchar(as.vector(spboData$Time))==5))$X,]

> dim(mbase$datasets)
[1] 48744 17
> dim(spboData)
[1] 319744 20

mbase$datasets[mbase$datasets$DateUK %in% spboData$DateUK,]
#Source: local data frame [17,934 x 17]

na.omit(mbase$datasets[mbase$datasets$DateUK %in% spboData$DateUK,][order(mbase$datasets$No,decreasing=FALSE),])
#Source: local data frame [25,489 x 17]

library('tau')
library('textcat')
library('stringdist')











http://wizardofvegas.com/forum/gambling/sports/10555-halt-time-betting/3/
http://quant.stackexchange.com/questions/2500/how-to-apply-the-kelly-criterion-when-expected-return-may-be-negative
https://en.wikipedia.org/wiki/Gambling_and_information_theory
http://www.eecs.harvard.edu/cs286r/courses/fall12/papers/Thorpe_KellyCriterion2007.pdf
http://www.sportsbookreview.com/betting-tools/kelly-calculator/
http://thestakingmachine.com/laykelly.php
### http://www.sportsbettingcalculator.co.uk/kelly-staking-calculator/
http://tipstertables.com/blog/betting-system-using-tipster-statistics-and-kelly-criterion
########################################################################################

## Scrape the League in order to assign the virogish/spread margins/overrounds
library(RSelenium)
teamID <- sort(unique(unlist(mbase$Home), unlist(mbase$Away)))
lnk <- 'http://www8.spbo.com/history.plex?day=20110107&l=en'

#'@ system('java -jar selenium-server-standalone.jar')
checkForServer() ## if you need the stand-alone Java binary
startServer()
webDr <- remoteDriver$new()
webDr$open()
webDr$navigate(lnk)
webDr$navigate("http://www.bbc.co.uk")
webDr$goBack()
webDr$goForward()
webDr$quit()

## https://github.com/greenore/RSeleniumUtilities
library(RSeleniumUtilities)
RSeleniumUtilities::checkSelenium()
webDr <- ieDriver()
webDr <- firefoxDriver(use_profile=TRUE, profile_name="selenium")
webDr <- chromeDriver(use_profile=TRUE, profile_name="selenium", internal_testing=TRUE)


## Linear regression
llply(split(mbase,mbase$Sess),function(x)lm(PL~Selection+HCap+Price,x))


#'@ stopCluster(cl)

x <- seq(as.Date('2011-01-01'), as.Date('2015-07-31'), by='months')
y <- seq(min(mbase$PL),max(mbase$Stake), by=10000)
labels <- date_format('%b')(x)
breaks <- as.Date(sort(c(as.POSIXct(x), as.POSIXct(seq(min(mbase$Date),
max(mbase$Date), by='months')), ymd('2015-08-01'))))
labels <- c('', as.vector(rbind(labels, rep('', length(labels)))))

ggplot(data=mbase, aes(x=x, y=y, shape=AHOU)) +
geom_line(aes(y = mbase$Stake, colour = 'Stake'), size=1.5) +
geom_line(aes(y = mbase$PL, colour = 'PL'), size=1.5) +
geom_point(size=2, fill='blue') + expand_limits(y=0) + ## Set y range to include 0
scale_colour_hue(name='PL', l=30) + ## Set legend title use darker colors (lightness=30)
scale_shape_manual(name='PL', values=c(22,21)) + ## Use points with a fill color
scale_shape_manual(values=c(22,21)) + xlab('Time of Day') + ylab('HK Dollars (HKD)') +
scale_x_date(labels = labels, breaks = breaks, limits=range(breaks)) + ## scale_x_date(labels = date_format("%b"),breaks = date_breaks("months")) +
ggtitle('Stakes and Profit & Lose') + ## Set title
theme_bw() + theme(legend.position=c(.7, .4)) ## Position legend inside this must go after theme_bw

qplot(Stake, data=mbase, geom='density', fill=AHOU, alpha=I(.5),
main='Turnover and P&L', xlab='Year in Month',
ylab='HKD Amount') + scale_x_date(breaks=date_breaks('months'), labels = date_format("%b"))

### http://statisticalrecipes.blogspot.com/2012/02/simulating-genetic-drift.html
dtm <- factor(sapply(strsplit(as.character(mbase$Date),'-'),function(x) x[2]))
dtm <- data.frame(month=mapvalues(dtm, sort(levels(dtm)),c('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec')),
mbase$Stake/10000, mbase$PL/10000); names(dtm) <- c('Month','Stake','PL')
sdata <- data.frame(Date=factor(paste0(dtm$Month,'-',mbase$Sess)),dtm[-1]); rm(dtm)
sdata <- ddply(sdata, .(Date), summarise, Stake=sum(Stake), PL=sum(PL))
sdata[order(sdata$Date, decresing=FALSE),]

## plot on same grid, each series colored differently --
## good if the series have same scale
ggplot(sim_data, aes(Month,'HKD 0000')) + geom_line(aes(colour = Series)) +
scale_x_discrete(labels=c('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'))+
theme(axis.text.x=element_text(face="bold",colour="red",size=14))

## ==================================================================================================================================
## http://wenku.baidu.com/view/3574f639580216fc700afdfc.html
## https://stat.ethz.ch/R-manual/R-devel/library/mgcv/html/gam.models.html
## http://doc.qkzz.net/article/e6f33685-e220-4803-8c89-3228501b9412.htm






2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Betting-Strategy-and-Model-Validation
Betting Strategy and Model Validation, I analyse the staking model of a sportsbook agency which follow bets from consultancy firm A
70 changes: 70 additions & 0 deletions Testing efficiency of coding.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
---
title: "Testing efficiency of coding"
author: "Ryo®, Eng Lian Hu"
date: "8/28/2015"
output:
html_document:
fig_height: 3
fig_width: 5
highlight: haddock
theme: cerulean
toc: yes
---

This is an casting page of testing the efficiency of the coding for the research on `Betting Strategy and Model Validation`

```{r load-packages}
## Loading the packages
if(!'devtools' %in% installed.packages()){
install.packages('devtools')}
if(!'BBmisc' %in% installed.packages()){
install.packages('BBmisc')}
suppressPackageStartupMessages(library('BBmisc'))
pkgs <- c('devtools','RStudioAMI','zoo','chron','stringr','stringi','reshape','reshape2','data.table','sparkline','DT','plyr','dplyr','magrittr','parallel','foreach','memoise','manipulate','ggplot2','ggthemes','proto','extrafont','directlabels','PerformanceAnalytics','plotly','doMC','doParallel','BiocParallel','rvest','RSelenium','highlightHTML','knitr','rmarkdown','editR','scales','lubridate','tidyr','whisker','gtable','grid','gridExtra')
suppressAll(lib(pkgs)); rm(pkgs)
```

```{r get-data-summary-table-2.1}
nrow(do.call(rbind, llply(as.list(seq(2011,2015)), function(x) data.frame(Sess=x,read.csv(paste0(getwd(),'/datasets/',x,'.csv'))),.parallel=TRUE)))
nrow(rbind_all(llply(as.list(seq(2011,2015)), function(x) data.frame(Sess=x,read.csv(paste0(getwd(),'/datasets/',x,'.csv'))),.parallel=TRUE)))
system.time(do.call(rbind, llply(as.list(seq(2011,2015)), function(x) data.frame(Sess=x,read.csv(paste0(getwd(),'/datasets/',x,'.csv'))),.parallel=TRUE)))
system.time(rbind_all(llply(as.list(seq(2011,2015)), function(x) data.frame(Sess=x,read.csv(paste0(getwd(),'/datasets/',x,'.csv'))),.parallel=TRUE)))
```

You can also embed plots, for example:

```{r merge_all-dataframes-2.2}
#'@ system.time(Reduce(function(x,y) {merge(x,y,all=TRUE)}, llply(list(df1,df1.sps,df1.pst),function(x) x[[1]])))
#'@ system.time(merge_all(list(df1[[1]],df1.sps[[1]],df1.pst[[1]])))
```

Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.

```{r}
#'@ system.time(merge(socData, othData, all=TRUE))
#'@ system.time(merge_all(list(socData, othData)))
```



From the research, I learned from some articles which compare the efficiency of data measurement which will apply in future data analyse and data mining etc.

- [Comparing performance of by, ddply and data.table](http://www.r-bloggers.com/transforming-subsets-of-data-in-r-with-by-ddply-and-data-table/)

- [R高性能包介绍与并行运算](https://mp.weixin.qq.com/s?__biz=MzA3NDUxMjYzMA%3D%3D&mid=216065319&idx=1&sn=31af52816c7e8b937f15480c4d5f6e41&key=0acd51d81cb052bcbc420864d8003491eba2f4bbc722bf3a7bc7da0d59fefc64ea6fc32bdb33673eebd62f201cbc2190&ascene=7&uin=MjAwMTM4MjU0OA%3D%3D&devicetype=android-19&version=26020236&nettype=WIFI&pass_ticket=GdViEIR%2F5PLzVFnzLxc71K39ze4fb6VAwvFp1bhH3inbu5xBjyQ7BLEpDOrQhWZ1)

- [A biased comparsion of JSON packages in R](https://rstudio-pubs-static.s3.amazonaws.com/31702_9c22e3d1a0c44968a4a1f9656f1800ab.html)

- [Video how-to: Speed up R with C++ and Rcpp](http://www.computerworld.com/article/2961056/data-analytics/video-how-to-speed-up-r-with-c-plus-plus-and-rcpp-package.html)

- [benchmarking logistic regression using glm.fit , bigglm, speedglm, glmnet, LiblineaR](http://stackoverflow.com/questions/19532651/benchmarking-logistic-regression-using-glm-fit-bigglm-speedglm-glmnet-libli)

- [Dates and Times Made Easy with lubridate](http://www.jstatsoft.org/article/view/v040i03/v40i03.pdf)


13 changes: 13 additions & 0 deletions custom.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
/* http://www.w3schools.com/css/css_examples.asp */

table {
max-width: 95%;
border: 1px solid #ccc;
}
th {
background-color: #0000FF;
color: #0000A0;
}
td {
background-color: #00FFFF;
}
Loading

0 comments on commit 22adfe8

Please sign in to comment.