Built site for gh-pages

UCSB-Library-Research-Data-Services · Mar 12, 2024 · 1122e44 · 1122e44
1 parent 2fd57c5
commit 1122e44
Show file tree

Hide file tree

Showing 4 changed files with 12 additions and 12 deletions.
diff --git a/.nojekyll b/.nojekyll
@@ -1 +1 @@
-62e0040d
+1ada4d36
diff --git a/hands-on.html b/hands-on.html
@@ -560,13 +560,13 @@ <h4 class="anchored" data-anchor-id="how-can-you-see-the-sql-query">How can you
 # Database: DuckDB v0.9.2 [unknown@Linux 6.5.0-1015-azure:R 4.3.3/./data/bird_database.duckdb]
   Relevance                                num_species
   &lt;chr&gt;                                          &lt;dbl&gt;
-1 Potential predator (eggs; mammal)                  2
-2 Microtine (alternate prey for predators)           5
-3 Study species                                     41
-4 Incidental monitoring                             18
-5 Potential predator (avian)                        25
-6 Potential predator (mammal)                        6
-7 Study species; potential predator (eggs)           2</code></pre>
+1 Incidental monitoring                             18
+2 Study species                                     41
+3 Potential predator (avian)                        25
+4 Potential predator (mammal)                        6
+5 Study species; potential predator (eggs)           2
+6 Potential predator (eggs; mammal)                  2
+7 Microtine (alternate prey for predators)           5</code></pre>
 </div>
 </div>
 <p>Does that code looks familiar? But this time, here is really the query that was used to retrieve this information:</p>

diff --git a/search.json b/search.json
@@ -53,7 +53,7 @@
     "href": "hands-on.html#lets-connect-to-our-first-database",
     "title": "Hands-on DuckDB & dplyr",
     "section": "Let’s connect to our first database",
-    "text": "Let’s connect to our first database\n\nlibrary(dbplyr)       # to query databases in a tidyverse style manner\nlibrary(DBI)          # to connect to databases\n# install.packages(\"duckdb\")  # install this package to get duckDB API\nlibrary(duckdb)       # Specific to duckDB\n\n\nLoad the bird database\nThis database has been built from the csv files we just analyzed, so the data should be very similar - note we did not say identical more on this in the last section:\n\nconn &lt;- dbConnect(duckdb::duckdb(), dbdir = \"./data/bird_database.duckdb\")\n\nList all the tables present in the database:\n\ndbListTables(conn)\n\n[1] \"Bird_eggs\"       \"Bird_nests\"      \"Camp_assignment\" \"Personnel\"      \n[5] \"Site\"            \"Species\"        \n\n\nLet’s have a look at the Species table\n\nspecies_db &lt;- tbl(conn, \"Species\")\nspecies_db\n\n# Source:   table&lt;Species&gt; [?? x 4]\n# Database: DuckDB v0.9.2 [unknown@Linux 6.5.0-1015-azure:R 4.3.3/./data/bird_database.duckdb]\n   Code  Common_name             Scientific_name        Relevance               \n   &lt;chr&gt; &lt;chr&gt;                   &lt;chr&gt;                  &lt;chr&gt;                   \n 1 agsq  Arctic ground squirrel  Spermophilus parryii   Potential predator (egg…\n 2 amcr  American Crow           Corvus brachyrhynchos  Potential predator (avi…\n 3 amgp  American Golden-Plover  Pluvialis dominica     Study species           \n 4 arfo  Arctic fox              Alopex lagopus         Potential predator (mam…\n 5 arte  Arctic Tern             Sterna paradisaea      Incidental monitoring   \n 6 basa  Baird's Sandpiper       Calidris bairdii       Study species           \n 7 bbis  Broad-billed Sandpiper  Calidris falcinellus   Study species           \n 8 bbpl  Black-bellied Plover    Pluvialis squatarola   Study species           \n 9 bbsa  Buff-breasted Sandpiper Calidris subruficollis Study species           \n10 besw  Bewick's Swan           Cygnus columbianus     Incidental monitoring   \n# ℹ more rows\n\n\nYou can filter the data and select columns:\n\nspecies_db %&gt;%\n  filter(Relevance==\"Study species\") %&gt;%\n  select(Scientific_name) %&gt;%\n  arrange(Scientific_name) %&gt;%\n  head(3)\n\n# Source:     SQL [3 x 1]\n# Database:   DuckDB v0.9.2 [unknown@Linux 6.5.0-1015-azure:R 4.3.3/./data/bird_database.duckdb]\n# Ordered by: Scientific_name\n  Scientific_name   \n  &lt;chr&gt;             \n1 Actitis macularius\n2 Calidris acuminata\n3 Calidris alba     \n\n\n\n\n\n\n\n\nNote\n\n\n\nNote that those are not data frames but tables. What dbplyr is actually doing behind the scenes is translating all those dplyr operations into SQL, sending the SQL code to query the database, retrieving results, etc.\n\n\n\nHow can I get a “real” data frame?\nYou add collect() to your query.\n\nspecies_db %&gt;%\n  filter(Relevance==\"Study species\") %&gt;%\n  select(Scientific_name) %&gt;%\n  arrange(Scientific_name) %&gt;%\n  head(3) %&gt;% \n  collect()\n\n# A tibble: 3 × 1\n  Scientific_name   \n  &lt;chr&gt;             \n1 Actitis macularius\n2 Calidris acuminata\n3 Calidris alba     \n\n\nNote it means the full query is going to be ran and save in your R environment. This might slow things down, so you generally want to collect on the smallest data frame you can.\n\n\nHow can you see the SQL query?\nAdding show_query() at the end of your code block will let you see the SQL code that has been used to query the database.\n\n# Add show_query() to the end to see what SQL it is sending!\nspecies_db %&gt;%\n  filter(Relevance==\"Study species\") %&gt;%\n  select(Scientific_name) %&gt;%\n  arrange(Scientific_name) %&gt;%\n  head(3) %&gt;% \n  show_query()\n\n&lt;SQL&gt;\nSELECT Scientific_name\nFROM Species\nWHERE (Relevance = 'Study species')\nORDER BY Scientific_name\nLIMIT 3\n\n\nThis is a great way to start getting familiar with the SQL syntax, because although you can do a lot with dbplyr you can not do everything that SQL can do. So at some point you might want to start using SQL directly.\nHere is how you could run the query using the SQL code directly:\n\n# query the database using SQL\ndbGetQuery(conn, \"SELECT Scientific_name FROM Species WHERE (Relevance = 'Study species') ORDER BY Scientific_name LIMIT 3\")\n\n     Scientific_name\n1 Actitis macularius\n2 Calidris acuminata\n3      Calidris alba\n\n\nYou can do pretty much anything with these quasi-tables, including grouping, summarization, joins, etc.\nLet’s count how many species there are per Relevance categories:\n\nspecies_db %&gt;%\n  group_by(Relevance) %&gt;%\n  summarize(num_species = n())\n\n# Source:   SQL [7 x 2]\n# Database: DuckDB v0.9.2 [unknown@Linux 6.5.0-1015-azure:R 4.3.3/./data/bird_database.duckdb]\n  Relevance                                num_species\n  &lt;chr&gt;                                          &lt;dbl&gt;\n1 Potential predator (eggs; mammal)                  2\n2 Microtine (alternate prey for predators)           5\n3 Study species                                     41\n4 Incidental monitoring                             18\n5 Potential predator (avian)                        25\n6 Potential predator (mammal)                        6\n7 Study species; potential predator (eggs)           2\n\n\nDoes that code looks familiar? But this time, here is really the query that was used to retrieve this information:\n\nspecies_db %&gt;%\n  group_by(Relevance) %&gt;%\n  summarize(num_species = n()) %&gt;%\n  show_query()\n\n&lt;SQL&gt;\nSELECT Relevance, COUNT(*) AS num_species\nFROM Species\nGROUP BY Relevance\n\n\n\n\n\nAverage egg volume analysis\nLet’s reproduce the egg volume analysis we just did. We can calculate the average bird eggs volume per species directly on the database:\n\n# loading all the necessary tables\neggs_db &lt;- tbl(conn, \"Bird_eggs\")\nnests_db &lt;- tbl(conn, \"Bird_nests\")\n\nCompute the volume using the same code as previously!! Yes, you can use mutate to create new columns on the tables object\n\n# Compute the egg volume\neggs_volume_db &lt;- eggs_db %&gt;%\n  mutate(egg_volume = pi/6*Width^2*Length)\n\n\n\n\n\n\n\nCaution\n\n\n\nLimitation: no way to add or update data in the database, dbplyr is view only. If you want to add or update data, you’ll need to use the DBI package functions.\n\n\nNow let’s join this information to the nest table, and average by species\n\n# Join the egg and nest tables to compute average\nspecies_egg_volume_avg_db &lt;- left_join(nests_db, eggs_volume_db, by=\"Nest_ID\") %&gt;%\n  group_by(Species) %&gt;%\n  summarise(egg_volume_avg = mean(egg_volume, na.rm = TRUE)) %&gt;%\n  arrange(desc(egg_volume_avg)) %&gt;% \n  collect() %&gt;%\n  drop_na()\n\nspecies_egg_volume_avg_db\n\n# A tibble: 7 × 2\n  Species egg_volume_avg\n  &lt;chr&gt;            &lt;dbl&gt;\n1 bbpl            33975.\n2 amgp            28545.\n3 rutu            18094.\n4 dunl            11777.\n5 wrsa            10111.\n6 sepl             9903.\n7 reph             8444.\n\n\nWhat does this SQL query looks like?\n\nspecies_egg_volume_avg_db &lt;- left_join(eggs_volume_db, nests_db, by=\"Nest_ID\") %&gt;%\n  group_by(Species) %&gt;%\n  summarise(egg_volume_avg = mean(egg_volume, na.rm = TRUE)) %&gt;%\n  arrange(desc(egg_volume_avg)) %&gt;% \n  show_query()\n\n&lt;SQL&gt;\nSELECT Species, AVG(egg_volume) AS egg_volume_avg\nFROM (\n  SELECT\n    LHS.Book_page AS \"Book_page.x\",\n    LHS.\"Year\" AS \"Year.x\",\n    LHS.Site AS \"Site.x\",\n    LHS.Nest_ID AS Nest_ID,\n    Egg_num,\n    Length,\n    Width,\n    egg_volume,\n    Bird_nests.Book_page AS \"Book_page.y\",\n    Bird_nests.\"Year\" AS \"Year.y\",\n    Bird_nests.Site AS \"Site.y\",\n    Species,\n    Observer,\n    Date_found,\n    how_found,\n    Clutch_max,\n    floatAge,\n    ageMethod\n  FROM (\n    SELECT\n      Bird_eggs.*,\n      ((3.14159265358979 / 6.0) * (POW(Width, 2.0))) * Length AS egg_volume\n    FROM Bird_eggs\n  ) LHS\n  LEFT JOIN Bird_nests\n    ON (LHS.Nest_ID = Bird_nests.Nest_ID)\n) q01\nGROUP BY Species\nORDER BY egg_volume_avg DESC\n\n\n\n\n\n\n\n\nQuestion\n\n\n\nWhy does the SQL query include the volume computation?\n\n\n\n\nDisconnecting from the database\nBefore we close our session, it is good practice to disconnect from the database first\n\nDBI::dbDisconnect(conn, shutdown = TRUE)"
+    "text": "Let’s connect to our first database\n\nlibrary(dbplyr)       # to query databases in a tidyverse style manner\nlibrary(DBI)          # to connect to databases\n# install.packages(\"duckdb\")  # install this package to get duckDB API\nlibrary(duckdb)       # Specific to duckDB\n\n\nLoad the bird database\nThis database has been built from the csv files we just analyzed, so the data should be very similar - note we did not say identical more on this in the last section:\n\nconn &lt;- dbConnect(duckdb::duckdb(), dbdir = \"./data/bird_database.duckdb\")\n\nList all the tables present in the database:\n\ndbListTables(conn)\n\n[1] \"Bird_eggs\"       \"Bird_nests\"      \"Camp_assignment\" \"Personnel\"      \n[5] \"Site\"            \"Species\"        \n\n\nLet’s have a look at the Species table\n\nspecies_db &lt;- tbl(conn, \"Species\")\nspecies_db\n\n# Source:   table&lt;Species&gt; [?? x 4]\n# Database: DuckDB v0.9.2 [unknown@Linux 6.5.0-1015-azure:R 4.3.3/./data/bird_database.duckdb]\n   Code  Common_name             Scientific_name        Relevance               \n   &lt;chr&gt; &lt;chr&gt;                   &lt;chr&gt;                  &lt;chr&gt;                   \n 1 agsq  Arctic ground squirrel  Spermophilus parryii   Potential predator (egg…\n 2 amcr  American Crow           Corvus brachyrhynchos  Potential predator (avi…\n 3 amgp  American Golden-Plover  Pluvialis dominica     Study species           \n 4 arfo  Arctic fox              Alopex lagopus         Potential predator (mam…\n 5 arte  Arctic Tern             Sterna paradisaea      Incidental monitoring   \n 6 basa  Baird's Sandpiper       Calidris bairdii       Study species           \n 7 bbis  Broad-billed Sandpiper  Calidris falcinellus   Study species           \n 8 bbpl  Black-bellied Plover    Pluvialis squatarola   Study species           \n 9 bbsa  Buff-breasted Sandpiper Calidris subruficollis Study species           \n10 besw  Bewick's Swan           Cygnus columbianus     Incidental monitoring   \n# ℹ more rows\n\n\nYou can filter the data and select columns:\n\nspecies_db %&gt;%\n  filter(Relevance==\"Study species\") %&gt;%\n  select(Scientific_name) %&gt;%\n  arrange(Scientific_name) %&gt;%\n  head(3)\n\n# Source:     SQL [3 x 1]\n# Database:   DuckDB v0.9.2 [unknown@Linux 6.5.0-1015-azure:R 4.3.3/./data/bird_database.duckdb]\n# Ordered by: Scientific_name\n  Scientific_name   \n  &lt;chr&gt;             \n1 Actitis macularius\n2 Calidris acuminata\n3 Calidris alba     \n\n\n\n\n\n\n\n\nNote\n\n\n\nNote that those are not data frames but tables. What dbplyr is actually doing behind the scenes is translating all those dplyr operations into SQL, sending the SQL code to query the database, retrieving results, etc.\n\n\n\nHow can I get a “real” data frame?\nYou add collect() to your query.\n\nspecies_db %&gt;%\n  filter(Relevance==\"Study species\") %&gt;%\n  select(Scientific_name) %&gt;%\n  arrange(Scientific_name) %&gt;%\n  head(3) %&gt;% \n  collect()\n\n# A tibble: 3 × 1\n  Scientific_name   \n  &lt;chr&gt;             \n1 Actitis macularius\n2 Calidris acuminata\n3 Calidris alba     \n\n\nNote it means the full query is going to be ran and save in your R environment. This might slow things down, so you generally want to collect on the smallest data frame you can.\n\n\nHow can you see the SQL query?\nAdding show_query() at the end of your code block will let you see the SQL code that has been used to query the database.\n\n# Add show_query() to the end to see what SQL it is sending!\nspecies_db %&gt;%\n  filter(Relevance==\"Study species\") %&gt;%\n  select(Scientific_name) %&gt;%\n  arrange(Scientific_name) %&gt;%\n  head(3) %&gt;% \n  show_query()\n\n&lt;SQL&gt;\nSELECT Scientific_name\nFROM Species\nWHERE (Relevance = 'Study species')\nORDER BY Scientific_name\nLIMIT 3\n\n\nThis is a great way to start getting familiar with the SQL syntax, because although you can do a lot with dbplyr you can not do everything that SQL can do. So at some point you might want to start using SQL directly.\nHere is how you could run the query using the SQL code directly:\n\n# query the database using SQL\ndbGetQuery(conn, \"SELECT Scientific_name FROM Species WHERE (Relevance = 'Study species') ORDER BY Scientific_name LIMIT 3\")\n\n     Scientific_name\n1 Actitis macularius\n2 Calidris acuminata\n3      Calidris alba\n\n\nYou can do pretty much anything with these quasi-tables, including grouping, summarization, joins, etc.\nLet’s count how many species there are per Relevance categories:\n\nspecies_db %&gt;%\n  group_by(Relevance) %&gt;%\n  summarize(num_species = n())\n\n# Source:   SQL [7 x 2]\n# Database: DuckDB v0.9.2 [unknown@Linux 6.5.0-1015-azure:R 4.3.3/./data/bird_database.duckdb]\n  Relevance                                num_species\n  &lt;chr&gt;                                          &lt;dbl&gt;\n1 Incidental monitoring                             18\n2 Study species                                     41\n3 Potential predator (avian)                        25\n4 Potential predator (mammal)                        6\n5 Study species; potential predator (eggs)           2\n6 Potential predator (eggs; mammal)                  2\n7 Microtine (alternate prey for predators)           5\n\n\nDoes that code looks familiar? But this time, here is really the query that was used to retrieve this information:\n\nspecies_db %&gt;%\n  group_by(Relevance) %&gt;%\n  summarize(num_species = n()) %&gt;%\n  show_query()\n\n&lt;SQL&gt;\nSELECT Relevance, COUNT(*) AS num_species\nFROM Species\nGROUP BY Relevance\n\n\n\n\n\nAverage egg volume analysis\nLet’s reproduce the egg volume analysis we just did. We can calculate the average bird eggs volume per species directly on the database:\n\n# loading all the necessary tables\neggs_db &lt;- tbl(conn, \"Bird_eggs\")\nnests_db &lt;- tbl(conn, \"Bird_nests\")\n\nCompute the volume using the same code as previously!! Yes, you can use mutate to create new columns on the tables object\n\n# Compute the egg volume\neggs_volume_db &lt;- eggs_db %&gt;%\n  mutate(egg_volume = pi/6*Width^2*Length)\n\n\n\n\n\n\n\nCaution\n\n\n\nLimitation: no way to add or update data in the database, dbplyr is view only. If you want to add or update data, you’ll need to use the DBI package functions.\n\n\nNow let’s join this information to the nest table, and average by species\n\n# Join the egg and nest tables to compute average\nspecies_egg_volume_avg_db &lt;- left_join(nests_db, eggs_volume_db, by=\"Nest_ID\") %&gt;%\n  group_by(Species) %&gt;%\n  summarise(egg_volume_avg = mean(egg_volume, na.rm = TRUE)) %&gt;%\n  arrange(desc(egg_volume_avg)) %&gt;% \n  collect() %&gt;%\n  drop_na()\n\nspecies_egg_volume_avg_db\n\n# A tibble: 7 × 2\n  Species egg_volume_avg\n  &lt;chr&gt;            &lt;dbl&gt;\n1 bbpl            33975.\n2 amgp            28545.\n3 rutu            18094.\n4 dunl            11777.\n5 wrsa            10111.\n6 sepl             9903.\n7 reph             8444.\n\n\nWhat does this SQL query looks like?\n\nspecies_egg_volume_avg_db &lt;- left_join(eggs_volume_db, nests_db, by=\"Nest_ID\") %&gt;%\n  group_by(Species) %&gt;%\n  summarise(egg_volume_avg = mean(egg_volume, na.rm = TRUE)) %&gt;%\n  arrange(desc(egg_volume_avg)) %&gt;% \n  show_query()\n\n&lt;SQL&gt;\nSELECT Species, AVG(egg_volume) AS egg_volume_avg\nFROM (\n  SELECT\n    LHS.Book_page AS \"Book_page.x\",\n    LHS.\"Year\" AS \"Year.x\",\n    LHS.Site AS \"Site.x\",\n    LHS.Nest_ID AS Nest_ID,\n    Egg_num,\n    Length,\n    Width,\n    egg_volume,\n    Bird_nests.Book_page AS \"Book_page.y\",\n    Bird_nests.\"Year\" AS \"Year.y\",\n    Bird_nests.Site AS \"Site.y\",\n    Species,\n    Observer,\n    Date_found,\n    how_found,\n    Clutch_max,\n    floatAge,\n    ageMethod\n  FROM (\n    SELECT\n      Bird_eggs.*,\n      ((3.14159265358979 / 6.0) * (POW(Width, 2.0))) * Length AS egg_volume\n    FROM Bird_eggs\n  ) LHS\n  LEFT JOIN Bird_nests\n    ON (LHS.Nest_ID = Bird_nests.Nest_ID)\n) q01\nGROUP BY Species\nORDER BY egg_volume_avg DESC\n\n\n\n\n\n\n\n\nQuestion\n\n\n\nWhy does the SQL query include the volume computation?\n\n\n\n\nDisconnecting from the database\nBefore we close our session, it is good practice to disconnect from the database first\n\nDBI::dbDisconnect(conn, shutdown = TRUE)"
   },
   {
     "objectID": "hands-on.html#how-did-we-create-this-database",

diff --git a/sitemap.xml b/sitemap.xml
@@ -2,14 +2,14 @@
 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
     <loc>https://UCSB-Library-Research-Data-Services.github.io/intro-database-r/about.html</loc>
-    <lastmod>2024-03-11T20:02:37.873Z</lastmod>
+    <lastmod>2024-03-12T16:35:15.466Z</lastmod>
   </url>
   <url>
     <loc>https://UCSB-Library-Research-Data-Services.github.io/intro-database-r/index.html</loc>
-    <lastmod>2024-03-11T20:02:37.885Z</lastmod>
+    <lastmod>2024-03-12T16:35:15.474Z</lastmod>
   </url>
   <url>
     <loc>https://UCSB-Library-Research-Data-Services.github.io/intro-database-r/hands-on.html</loc>
-    <lastmod>2024-03-11T20:02:37.881Z</lastmod>
+    <lastmod>2024-03-12T16:35:15.474Z</lastmod>
   </url>
 </urlset>