Skip to content

Commit

Permalink
improving readability
Browse files Browse the repository at this point in the history
  • Loading branch information
brunj7 committed Mar 6, 2024
1 parent ae71c97 commit 2258317
Showing 1 changed file with 19 additions and 8 deletions.
27 changes: 19 additions & 8 deletions hands-on.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,8 @@ species_db <- tbl(conn, "Species")
species_db
```

You can filter the data and select columns:

```{r}
species_db %>%
filter(Relevance=="Study species") %>%
Expand All @@ -165,9 +167,12 @@ species_db %>%
head(3)
```

Note that those are not data frames but tables. What `dbplyr` is actually doing behind the scenes is translating all those dplyr operations into SQL, sending the SQL code to query the database, retrieving results, etc.
:::{.callout-note}
## Note
Note that those are **not** data frames but tables. What `dbplyr` is actually doing behind the scenes is translating all those dplyr operations into SQL, sending the SQL code to query the database, retrieving results, etc.
:::

#### How can I get a "real data frame?"
#### How can I get a "real" data frame?

You add `collect()` to your query.

Expand All @@ -180,9 +185,11 @@ species_db %>%
collect()
```

Note it means the full query is going to be ran and save in your environment. This might slow things down so you generally want to collect on the smallest data frame you can
Note it means the full query is going to be ran and save in your R environment. This might slow things down, so you generally want to collect on the smallest data frame you can.

#### How can you see the SQL query?

#### How can you see the SQL query equivalent to the tidyverse code? => `show_query()`
Adding `show_query()` at the end of your code block will let you see the SQL code that has been used to query the database.

```{r}
# Add show_query() to the end to see what SQL it is sending!
Expand All @@ -203,6 +210,7 @@ Here is how you could run the query using the SQL code directly:
dbGetQuery(conn, "SELECT Scientific_name FROM Species WHERE (Relevance = 'Study species') ORDER BY Scientific_name LIMIT 3")
```


You can do pretty much anything with these quasi-tables, including grouping, summarization, joins, etc.

Let's count how many species there are per Relevance categories:
Expand All @@ -221,13 +229,15 @@ species_db %>%
summarize(num_species = n()) %>%
show_query()
```

You can also create new columns using mutate:

```{r}
species_db %>%
mutate(Code = paste("X", Code)) %>%
head()
```

How does the query looks like?

```{r}
Expand All @@ -236,21 +246,22 @@ species_db %>%
head() %>%
show_query()
```

:::{.callout-caution}
****Limitation: no way to add or update data in the database, `dbplyr` is view only. If you want to add or update data, you'll need to use the `DBI` package functions.***
***Limitation: no way to add or update data in the database, `dbplyr` is view only. If you want to add or update data, you'll need to use the `DBI` package functions.***
:::

### Average egg volume analysis

Let's reproduce the egg volume analysis we just did. We can calculate the average bird eggs volume per species directly on the database
Let's reproduce the egg volume analysis we just did. We can calculate the average bird eggs volume per species directly on the database:

```{r}
# loading all the necessary tables
eggs_db <- tbl(conn, "Bird_eggs")
nests_db <- tbl(conn, "Bird_nests")
```

Compute the volume:
Compute the volume using the same code as previously!!

```{r}
# Compute the egg volume
Expand All @@ -272,7 +283,7 @@ species_egg_volume_avg_db <- left_join(nests_db, eggs_area_db, by="Nest_ID") %>%
species_egg_volume_avg_db
```

What does this SQL quert looks like?
What does this SQL query looks like?

```{r}
species_egg_volume_avg_db <- left_join(nests_db, eggs_area_db, by="Nest_ID") %>%
Expand Down

0 comments on commit 2258317

Please sign in to comment.