Skip to content

Commit

Permalink
Merge pull request #2 from edgararuiz/wip
Browse files Browse the repository at this point in the history
Wip
  • Loading branch information
edgararuiz committed Mar 12, 2018
2 parents bdaebb8 + 5d651b3 commit 299428b
Show file tree
Hide file tree
Showing 2 changed files with 73 additions and 1 deletion.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@
logs
derby.log
sense.txt
tuberia
73 changes: 72 additions & 1 deletion sparklyr.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,6 @@ muestra_vuelos$entrenar
modelo <- muestra_vuelos$entrenar %>%
ml_logistic_regression(tarde ~.)
```

## Visualizaciones
Expand Down Expand Up @@ -170,6 +169,78 @@ vuelos %>%
```


# Pipelines (Tuberias)

```{r}
entrenar <- muestra_vuelos$entrenar %>%
mutate(
arr_delay = ifelse(arr_delay == "NaN", 0, arr_delay)
) %>%
select(
month,
sched_dep_time,
arr_delay,
distance
) %>%
mutate_all(as.numeric)
```


```{r}
tuberia_vuelos <- ml_pipeline(sc) %>%
ft_dplyr_transformer(
tbl = entrenar
) %>%
ft_binarizer(
input.col = "arr_delay",
output.col = "tarde",
threshold = 15
) %>%
ft_bucketizer(
input.col = "sched_dep_time",
output.col = "horas",
splits = c(400, 800, 1200, 1600, 2000, 2400)
) %>%
ft_r_formula(tarde ~ horas + distance + arr_delay) %>%
ml_logistic_regression()
tuberia_vuelos
```

```{r}
modelo_nuevo <- ml_fit(
tuberia_vuelos,
muestra_vuelos$entrenar
)
modelo_nuevo
```

```{r}
predicciones <- ml_transform(
x = modelo_nuevo,
dataset = muestra_vuelos$examinar
)
predicciones
```

```{r}
predicciones%>%
group_by(tarde, prediction) %>%
tally()
```

```{r}
ml_save(tuberia_vuelos, "tuberia", overwrite = TRUE)
dir("tuberia")
```


# Analysis de texto

```{r}
library(janeaustenr)
```
Expand Down

0 comments on commit 299428b

Please sign in to comment.