diff --git a/.github/workflows/quarto-publish.yml b/.github/workflows/quarto-publish.yml index de4d441..25aae7b 100644 --- a/.github/workflows/quarto-publish.yml +++ b/.github/workflows/quarto-publish.yml @@ -25,9 +25,9 @@ jobs: # uses: actions/setup-python@v3 # From https://github.com/r-lib/actions/tree/v2-branch/setup-r - # - name: Setup R - # uses: r-lib/actions/setup-r@v2 - # - uses: r-lib/actions/setup-renv@v2 + - name: Setup R + uses: r-lib/actions/setup-r@v2 + - uses: r-lib/actions/setup-renv@v2 # NOTE: If Publishing to GitHub Pages, set the permissions correctly (see top of this yaml) - name: Publish to GitHub Pages (and render) diff --git a/_freeze/location-services/read-data/execute-results/html.json b/_freeze/location-services/read-data/execute-results/html.json index 18a0853..a87edf0 100644 --- a/_freeze/location-services/read-data/execute-results/html.json +++ b/_freeze/location-services/read-data/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "0dabfab49dd9b8867c5158c534966c4f", + "hash": "40495e036cc52478a5ae52ab87b281c1", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Read data from ArcGIS Online or Enterprise\"\nsubtitle: \"Learn how to read data from ArcGIS Online or Enterprise into R\"\nfreeze: true\n---\n\n\nArcGIS Online and Enterprise hosted Feature Layers can easily be read into R as [`{sf}`](https://r-spatial.github.io/sf/) objects using`{arcgislayers}`. \n\nThis tutorial will teach you the basics of reading data using `arcgis`.\n\n## Objective\n\nThe objective of this tutorial is to teach you how to \n\n- read in a population dataset from ArcGIS Online\n- apply a filter to a Feature Layer\n- read only specified columns\n- find a Feature Layer url \n\n## Obtaining a feature layer url\n\nFor this example we will read in [population data of major US cities](https://www.arcgis.com/home/item.html?id=9df5e769bfe8412b8de36a2e618c7672) from ArcGIS Online. \n\nWe need to first create a `FeatureLayer` object using the `arc_open()` function. `arc_open()` requires the url of the hosted feature service. To find this, we can navigate to the item in our portal. \n\n\n![](images/read-data/usa-cities.png)\nWhen you scroll down, on the right hand side, you will see a button to view the service itself. \n\n![](images/read-data/view-url.png){width=45%}\n\nClicking this will bring us to the Feature Service itself. Inside of a Feature Server there may be many layers or table that we can use. In this case, there is only one layer. Click the hyperlinked **USA Major Cities**. \n\n![](images/read-data/usa-cities-server.png)\n\nNow we will be in the Feature Layer itself. \n\n![](images/read-data/usa-cities-layer.png){width=70%}\n\nNavigate to your browsers search bar, and you can copy the url \n\n```\nhttps://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Major_Cities_/FeatureServer/0\n```\n\n## Opening a Feature Layer\n\nBefore we can read in the Feature Layer, we need to load the `arcgis` R package. If you do not have `arcgis` installed, install it with `pak::pak(\"r-arcgis/arcgis\")`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(arcgis)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\nAttaching core arcgis packages:\n - {arcgisutils} v0.1.0\n - {arcgislayers} v0.1.0\n```\n\n\n:::\n:::\n\n\nLet's store the Feature Layer url in an object called `furl` (as in feature layer url).\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfurl <- \"https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Major_Cities_/FeatureServer/0\"\n```\n:::\n\n\nWe then pass this variable to `arc_open()` and save it to `flayer` (feature layer). \n\n\n::: {.cell}\n\n```{.r .cell-code}\nflayer <- arc_open(furl)\nflayer\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n>\nName: USA Major Cities\nGeometry Type: esriGeometryPoint\nCRS: 4326\nCapabilities: Query,Extract\n```\n\n\n:::\n:::\n\n\n`arc_open()` will create a `FeatureLayer` object. Under this hood this is really just a list with all of the feature layer's metadata. \n\n:::{.callout-note collapse=\"true\" title=\"FeatureLayer details for the curious\"}\nThe `FeatureLayer` object is obtained by adding `?f=json` to the feature layer url and processing the json. All of the metadata in there is stored in the `FeatureLayer` object. You can see this by running `unclass(flayer)`. Be warned! It gets messy. \n:::\n\nWith this `FeatureLayer` object, we can read data from the service into R using it! \n\n## Reading from a Feature Layer\n\nOnce we have a `FeatureLayer` object we can read its data into memory using the `arc_select()` function. By default, if we use `arc_select()` on a `FeatureLayer` without any additional arguments, the entire service will be brought into memory.\n\n:::{.callout-warning}\nBe careful to not try and read in more data than you need! Reading an entire feature services is fine for datasets in the realm of 0 - 5,000 features. But when we have more than 10,000 features performance and memory may be throttled. \n\nExceptionally detailed geometries require more data to be transferred across the web and may be slower to process.\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncities <- arc_select(flayer)\ncities\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nSimple feature collection with 4185 features and 11 fields\nGeometry type: POINT\nDimension: XY\nBounding box: xmin: -159.3191 ymin: 19.58272 xmax: -68.67922 ymax: 64.86928\nGeodetic CRS: WGS 84\nFirst 10 features:\n OBJECTID NAME CLASS STATE_ABBR STATE_FIPS PLACE_FIPS POPULATION\n1 1 Alabaster city AL 01 0100820 33284\n2 2 Albertville city AL 01 0100988 22386\n3 3 Alexander City city AL 01 0101132 14843\n4 4 Anniston city AL 01 0101852 21564\n5 5 Athens city AL 01 0102956 25406\n6 6 Atmore city AL 01 0103004 8391\n7 7 Auburn city AL 01 0103076 76143\n8 8 Bessemer city AL 01 0105980 26019\n9 9 Birmingham city AL 01 0107000 200733\n10 10 Calera city AL 01 0111416 16494\n POP_CLASS POP_SQMI SQMI CAPITAL geometry\n1 6 1300.7 25.59 POINT (-86.81782 33.2445)\n2 6 827.9 27.04 POINT (-86.21205 34.26421)\n3 6 337.4 43.99 POINT (-85.95631 32.94309)\n4 6 469.9 45.89 POINT (-85.81986 33.6565)\n5 6 625.8 40.60 POINT (-86.9508 34.78484)\n6 5 382.5 21.94 POINT (-87.49009 31.02226)\n7 7 1234.5 61.68 POINT (-85.48999 32.60691)\n8 6 641.8 40.54 POINT (-86.9563 33.40092)\n9 8 1342.2 149.55 POINT (-86.79647 33.5288)\n10 6 674.0 24.47 POINT (-86.74549 33.1244)\n```\n\n\n:::\n:::\n\n\nWe store the results of `arc_select()` into the object `cities`. The result is an `sf` object that we can now work with using **`sf`** and any other R package we'd like. \n\n### Specifying output fields \n\nIn some cases we may have Feature Layers with many fields that we might not want. We can specify which fields we want to return to R by using the `fields` argument. \n\n:::{.callout-tip}\nIt's always good to only read in the data that you need. Adding unneeded fields uses more memory and takes longer to process. \n:::\n\n`fields` takes a character vector of field names. To see which fields are available in a Feature Layer you can use the utility function `list_fields()`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfields <- list_fields(flayer)\nfields[, 1:4]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n name type alias sqlType\n1 OBJECTID esriFieldTypeOID OBJECTID sqlTypeOther\n2 NAME esriFieldTypeString Name sqlTypeOther\n3 CLASS esriFieldTypeString Class sqlTypeOther\n4 STATE_ABBR esriFieldTypeString State Abbreviation sqlTypeOther\n5 STATE_FIPS esriFieldTypeString State FIPS sqlTypeOther\n6 PLACE_FIPS esriFieldTypeString Place FIPS sqlTypeOther\n7 POPULATION esriFieldTypeInteger 2020 Total Population sqlTypeOther\n8 POP_CLASS esriFieldTypeSmallInteger Population Class sqlTypeOther\n9 POP_SQMI esriFieldTypeDouble People per square mile sqlTypeOther\n10 SQMI esriFieldTypeDouble Area in square miles sqlTypeOther\n11 CAPITAL esriFieldTypeString Capital sqlTypeOther\n```\n\n\n:::\n:::\n\n:::{.aside}\nFor the sake of readability, only the first 4 columns are displayed.\n:::\n\nLet's try reading in only the `\"STATE_ABBR\"`, `\"POPULATION\"`, and `\"NAME\"` fields. \n\n\n::: {.cell}\n\n```{.r .cell-code}\narc_select(\n flayer, \n fields = c(\"STATE_ABBR\", \"POPULATION\", \"NAME\")\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nSimple feature collection with 4185 features and 3 fields\nGeometry type: POINT\nDimension: XY\nBounding box: xmin: -159.3191 ymin: 19.58272 xmax: -68.67922 ymax: 64.86928\nGeodetic CRS: WGS 84\nFirst 10 features:\n STATE_ABBR POPULATION NAME geometry\n1 AL 33284 Alabaster POINT (-86.81782 33.2445)\n2 AL 22386 Albertville POINT (-86.21205 34.26421)\n3 AL 14843 Alexander City POINT (-85.95631 32.94309)\n4 AL 21564 Anniston POINT (-85.81986 33.6565)\n5 AL 25406 Athens POINT (-86.9508 34.78484)\n6 AL 8391 Atmore POINT (-87.49009 31.02226)\n7 AL 76143 Auburn POINT (-85.48999 32.60691)\n8 AL 26019 Bessemer POINT (-86.9563 33.40092)\n9 AL 200733 Birmingham POINT (-86.79647 33.5288)\n10 AL 16494 Calera POINT (-86.74549 33.1244)\n```\n\n\n:::\n:::\n\n\n### Using SQL where clauses\n\nNot only can we limit the number of columns that we return from a Feature Layer, but we can also limit the number of rows that we have returned to us. This is very handy in the case of very, very, massive Feature Layers with hundreds of thousands of features. Reading all of those features into memory would be slow, costly (in terms of memory), and unnecessary!\n\nThe `where` argument of `arc_select()` permits us to provide a very simple SQL where clause to limit what we get back. Let's explore the use of the `where` argument. \n\nLet's modify our above `arc_select()` statement to return only the features in California. We do this by using the where clause `STATE_ABBR = 'CA'`\n\n\n::: {.cell}\n\n```{.r .cell-code}\narc_select(\n flayer,\n where = \"STATE_ABBR = 'CA'\",\n fields = c(\"STATE_ABBR\", \"POPULATION\", \"NAME\")\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nSimple feature collection with 498 features and 3 fields\nGeometry type: POINT\nDimension: XY\nBounding box: xmin: -124.1662 ymin: 32.57388 xmax: -114.5903 ymax: 40.93734\nGeodetic CRS: WGS 84\nFirst 10 features:\n STATE_ABBR POPULATION NAME geometry\n1 CA 38046 Adelanto POINT (-117.4384 34.5792)\n2 CA 20299 Agoura Hills POINT (-118.7601 34.15363)\n3 CA 78280 Alameda POINT (-122.2614 37.7672)\n4 CA 15314 Alamo POINT (-122.0307 37.84998)\n5 CA 20271 Albany POINT (-122.3002 37.88985)\n6 CA 82868 Alhambra POINT (-118.1355 34.08398)\n7 CA 52176 Aliso Viejo POINT (-117.7289 33.57922)\n8 CA 14696 Alpine POINT (-116.7585 32.84388)\n9 CA 42846 Altadena POINT (-118.1356 34.19342)\n10 CA 12042 Alum Rock POINT (-121.8239 37.3694)\n```\n\n\n:::\n:::\n\n\nWe can also consider finding only the places in the US with more than 1,000,000 people as well.\n\n\n::: {.cell}\n\n```{.r .cell-code}\narc_select(\n flayer,\n where = \"POPULATION > 1000000\",\n fields = c(\"STATE_ABBR\", \"POPULATION\", \"NAME\")\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nSimple feature collection with 10 features and 3 fields\nGeometry type: POINT\nDimension: XY\nBounding box: xmin: -121.8864 ymin: 29.42354 xmax: -74.01013 ymax: 41.75649\nGeodetic CRS: WGS 84\n STATE_ABBR POPULATION NAME geometry\n1 AZ 1608139 Phoenix POINT (-112.0739 33.44611)\n2 CA 3898747 Los Angeles POINT (-118.2706 34.05279)\n3 CA 1386932 San Diego POINT (-117.1456 32.72033)\n4 CA 1013240 San Jose POINT (-121.8864 37.33941)\n5 IL 2746388 Chicago POINT (-87.64715 41.75649)\n6 NY 8804190 New York POINT (-74.01013 40.71057)\n7 PA 1603797 Philadelphia POINT (-75.16099 39.95136)\n8 TX 1304379 Dallas POINT (-96.79576 32.77865)\n9 TX 2304580 Houston POINT (-95.36751 29.75876)\n10 TX 1434625 San Antonio POINT (-98.4925 29.42354)\n```\n\n\n:::\n:::\n\n\nNow let's try combining both where clauses using `and` to find only the cities in California with a population greater than 1,000,000.\n\n\n::: {.cell}\n\n```{.r .cell-code}\narc_select(\n flayer,\n where = \"POPULATION > 1000000 and STATE_ABBR = 'CA'\",\n fields = c(\"STATE_ABBR\", \"POPULATION\", \"NAME\")\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nSimple feature collection with 3 features and 3 fields\nGeometry type: POINT\nDimension: XY\nBounding box: xmin: -121.8864 ymin: 32.72033 xmax: -117.1456 ymax: 37.33941\nGeodetic CRS: WGS 84\n STATE_ABBR POPULATION NAME geometry\n1 CA 3898747 Los Angeles POINT (-118.2706 34.05279)\n2 CA 1386932 San Diego POINT (-117.1456 32.72033)\n3 CA 1013240 San Jose POINT (-121.8864 37.33941)\n```\n\n\n:::\n:::\n\n\n## Using `dplyr`\n\nIf writing the field names out by hand and coming up with SQL where clauses isn't your thing, that's okay. We also provide `dplyr::select()` and `dplyr::filter()` methods for `FeatureLayer` objects.\n\nThe dplyr functionality is modeled off of [`dbplyr`](https://dbplyr.tidyverse.org/). The general concept is that we have a connection object that specifies what we will be querying against. Then we build up our queries using dplyr functions. Unlike using dplyr on `data.frame`s, the results aren't fetched eagerly. Instead they are _lazy_. With `dbplyr` we use the `collect()` function to execute a query and bring it into memory. The same is true with `FeatureLayer` objects. \n\nLet's build up a query and see it in action! We need to load dplyr to bring the functions into scope.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(dplyr)\n\nfl_query <- flayer |> \n select(STATE_ABBR, POPULATION, NAME)\n\nfl_query\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n>\nName: USA Major Cities\nGeometry Type: esriGeometryPoint\nCRS: 4326\nCapabilities: Query,Extract\nQuery:\n outFields: STATE_ABBR,POPULATION,NAME\n```\n\n\n:::\n:::\n\n\nAfter doing this, we can see that our `FeatureLayer` object now prints out a `Query` field with the `outFields` parameter set to the result of our `select()` function.\n\n:::{.callout-note collapse=\"true\" title=\"A note for advanced useRs\"}\nWe build up and store the query in the `query` attribute of a `FeatureLayer` object. It is a named list that will be passed directly to the API endpoint. The names match endpoint parameters. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nattr(fl_query, \"query\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n$outFields\n[1] \"STATE_ABBR,POPULATION,NAME\"\n```\n\n\n:::\n:::\n\n\nYou can also manually specify parameters using the `update_params()` function. Note that there is _no_ parameter validation.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nupdate_params(fl_query, key = \"value\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n>\nName: USA Major Cities\nGeometry Type: esriGeometryPoint\nCRS: 4326\nCapabilities: Query,Extract\nQuery:\n outFields: STATE_ABBR,POPULATION,NAME\n key: value\n```\n\n\n:::\n:::\n\n\n:::\n\nWe can continue to build up our query using `filter()` \n\n:::{.callout-tip}\nOnly very basic filter statements are supported such as `==`, `<`, `>`, etc.\n:::\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfl_query |> \n filter(POPULATION > 1000000, STATE_ABBR = \"CA\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n>\nName: USA Major Cities\nGeometry Type: esriGeometryPoint\nCRS: 4326\nCapabilities: Query,Extract\nQuery:\n outFields: STATE_ABBR,POPULATION,NAME\n where: POPULATION > 1000000.0 AND 'CA'\n```\n\n\n:::\n:::\n\n\nThe query is stored in the `FeatureLayer` object and will not be executed until we request it with `collect()`. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nfl_query |> \n filter(POPULATION > 1000000, STATE_ABBR == \"CA\") |> \n collect()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nSimple feature collection with 3 features and 3 fields\nGeometry type: POINT\nDimension: XY\nBounding box: xmin: -121.8864 ymin: 32.72033 xmax: -117.1456 ymax: 37.33941\nGeodetic CRS: WGS 84\n STATE_ABBR POPULATION NAME geometry\n1 CA 3898747 Los Angeles POINT (-118.2706 34.05279)\n2 CA 1386932 San Diego POINT (-117.1456 32.72033)\n3 CA 1013240 San Jose POINT (-121.8864 37.33941)\n```\n\n\n:::\n:::\n", + "markdown": "---\ntitle: \"Read hosted data\"\nsubtitle: \"Learn how to read data from ArcGIS Online or Enterprise into R\"\nfreeze: true\n---\n\n\nArcGIS Online and Enterprise hosted Feature Layers can easily be read into R as [`{sf}`](https://r-spatial.github.io/sf/) objects using`{arcgislayers}`. \n\nThis tutorial will teach you the basics of reading data using `arcgis`.\n\n## Objective\n\nThe objective of this tutorial is to teach you how to \n\n- read in a population dataset from ArcGIS Online\n- apply a filter to a Feature Layer\n- read only specified columns\n- find a Feature Layer url \n\n## Obtaining a feature layer url\n\nFor this example we will read in [population data of major US cities](https://www.arcgis.com/home/item.html?id=9df5e769bfe8412b8de36a2e618c7672) from ArcGIS Online. \n\nWe need to first create a `FeatureLayer` object using the `arc_open()` function. `arc_open()` requires the url of the hosted feature service. To find this, we can navigate to the item in our portal. \n\n\n![](images/read-data/usa-cities.png)\nWhen you scroll down, on the right hand side, you will see a button to view the service itself. \n\n![](images/read-data/view-url.png){width=45%}\n\nClicking this will bring us to the Feature Service itself. Inside of a Feature Server there may be many layers or table that we can use. In this case, there is only one layer. Click the hyperlinked **USA Major Cities**. \n\n![](images/read-data/usa-cities-server.png)\n\nNow we will be in the Feature Layer itself. \n\n![](images/read-data/usa-cities-layer.png){width=70%}\n\nNavigate to your browsers search bar, and you can copy the url \n\n```\nhttps://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Major_Cities_/FeatureServer/0\n```\n\n## Opening a Feature Layer\n\nBefore we can read in the Feature Layer, we need to load the `arcgis` R package. If you do not have `arcgis` installed, install it with `pak::pak(\"r-arcgis/arcgis\")`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(arcgis)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\nAttaching core arcgis packages:\n - {arcgisutils} v0.1.0\n - {arcgislayers} v0.1.0\n```\n\n\n:::\n:::\n\n\nLet's store the Feature Layer url in an object called `furl` (as in feature layer url).\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfurl <- \"https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Major_Cities_/FeatureServer/0\"\n```\n:::\n\n\nWe then pass this variable to `arc_open()` and save it to `flayer` (feature layer). \n\n\n::: {.cell}\n\n```{.r .cell-code}\nflayer <- arc_open(furl)\nflayer\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n>\nName: USA Major Cities\nGeometry Type: esriGeometryPoint\nCRS: 4326\nCapabilities: Query,Extract\n```\n\n\n:::\n:::\n\n\n`arc_open()` will create a `FeatureLayer` object. Under this hood this is really just a list with all of the feature layer's metadata. \n\n:::{.callout-note collapse=\"true\" title=\"FeatureLayer details for the curious\"}\nThe `FeatureLayer` object is obtained by adding `?f=json` to the feature layer url and processing the json. All of the metadata in there is stored in the `FeatureLayer` object. You can see this by running `unclass(flayer)`. Be warned! It gets messy. \n:::\n\nWith this `FeatureLayer` object, we can read data from the service into R using it! \n\n## Reading from a Feature Layer\n\nOnce we have a `FeatureLayer` object we can read its data into memory using the `arc_select()` function. By default, if we use `arc_select()` on a `FeatureLayer` without any additional arguments, the entire service will be brought into memory.\n\n:::{.callout-warning}\nBe careful to not try and read in more data than you need! Reading an entire feature services is fine for datasets in the realm of 0 - 5,000 features. But when we have more than 10,000 features performance and memory may be throttled. \n\nExceptionally detailed geometries require more data to be transferred across the web and may be slower to process.\n:::\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncities <- arc_select(flayer)\ncities\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nSimple feature collection with 4185 features and 11 fields\nGeometry type: POINT\nDimension: XY\nBounding box: xmin: -159.3191 ymin: 19.58272 xmax: -68.67922 ymax: 64.86928\nGeodetic CRS: WGS 84\nFirst 10 features:\n OBJECTID NAME CLASS STATE_ABBR STATE_FIPS PLACE_FIPS POPULATION\n1 1 Alabaster city AL 01 0100820 33284\n2 2 Albertville city AL 01 0100988 22386\n3 3 Alexander City city AL 01 0101132 14843\n4 4 Anniston city AL 01 0101852 21564\n5 5 Athens city AL 01 0102956 25406\n6 6 Atmore city AL 01 0103004 8391\n7 7 Auburn city AL 01 0103076 76143\n8 8 Bessemer city AL 01 0105980 26019\n9 9 Birmingham city AL 01 0107000 200733\n10 10 Calera city AL 01 0111416 16494\n POP_CLASS POP_SQMI SQMI CAPITAL geometry\n1 6 1300.7 25.59 POINT (-86.81782 33.2445)\n2 6 827.9 27.04 POINT (-86.21205 34.26421)\n3 6 337.4 43.99 POINT (-85.95631 32.94309)\n4 6 469.9 45.89 POINT (-85.81986 33.6565)\n5 6 625.8 40.60 POINT (-86.9508 34.78484)\n6 5 382.5 21.94 POINT (-87.49009 31.02226)\n7 7 1234.5 61.68 POINT (-85.48999 32.60691)\n8 6 641.8 40.54 POINT (-86.9563 33.40092)\n9 8 1342.2 149.55 POINT (-86.79647 33.5288)\n10 6 674.0 24.47 POINT (-86.74549 33.1244)\n```\n\n\n:::\n:::\n\n\nWe store the results of `arc_select()` into the object `cities`. The result is an `sf` object that we can now work with using **`sf`** and any other R package we'd like. \n\n### Specifying output fields \n\nIn some cases we may have Feature Layers with many fields that we might not want. We can specify which fields we want to return to R by using the `fields` argument. \n\n:::{.callout-tip}\nIt's always good to only read in the data that you need. Adding unneeded fields uses more memory and takes longer to process. \n:::\n\n`fields` takes a character vector of field names. To see which fields are available in a Feature Layer you can use the utility function `list_fields()`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfields <- list_fields(flayer)\nfields[, 1:4]\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n name type alias sqlType\n1 OBJECTID esriFieldTypeOID OBJECTID sqlTypeOther\n2 NAME esriFieldTypeString Name sqlTypeOther\n3 CLASS esriFieldTypeString Class sqlTypeOther\n4 STATE_ABBR esriFieldTypeString State Abbreviation sqlTypeOther\n5 STATE_FIPS esriFieldTypeString State FIPS sqlTypeOther\n6 PLACE_FIPS esriFieldTypeString Place FIPS sqlTypeOther\n7 POPULATION esriFieldTypeInteger 2020 Total Population sqlTypeOther\n8 POP_CLASS esriFieldTypeSmallInteger Population Class sqlTypeOther\n9 POP_SQMI esriFieldTypeDouble People per square mile sqlTypeOther\n10 SQMI esriFieldTypeDouble Area in square miles sqlTypeOther\n11 CAPITAL esriFieldTypeString Capital sqlTypeOther\n```\n\n\n:::\n:::\n\n:::{.aside}\nFor the sake of readability, only the first 4 columns are displayed.\n:::\n\nLet's try reading in only the `\"STATE_ABBR\"`, `\"POPULATION\"`, and `\"NAME\"` fields. \n\n\n::: {.cell}\n\n```{.r .cell-code}\narc_select(\n flayer, \n fields = c(\"STATE_ABBR\", \"POPULATION\", \"NAME\")\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nSimple feature collection with 4185 features and 3 fields\nGeometry type: POINT\nDimension: XY\nBounding box: xmin: -159.3191 ymin: 19.58272 xmax: -68.67922 ymax: 64.86928\nGeodetic CRS: WGS 84\nFirst 10 features:\n STATE_ABBR POPULATION NAME geometry\n1 AL 33284 Alabaster POINT (-86.81782 33.2445)\n2 AL 22386 Albertville POINT (-86.21205 34.26421)\n3 AL 14843 Alexander City POINT (-85.95631 32.94309)\n4 AL 21564 Anniston POINT (-85.81986 33.6565)\n5 AL 25406 Athens POINT (-86.9508 34.78484)\n6 AL 8391 Atmore POINT (-87.49009 31.02226)\n7 AL 76143 Auburn POINT (-85.48999 32.60691)\n8 AL 26019 Bessemer POINT (-86.9563 33.40092)\n9 AL 200733 Birmingham POINT (-86.79647 33.5288)\n10 AL 16494 Calera POINT (-86.74549 33.1244)\n```\n\n\n:::\n:::\n\n\n### Using SQL where clauses\n\nNot only can we limit the number of columns that we return from a Feature Layer, but we can also limit the number of rows that we have returned to us. This is very handy in the case of very, very, massive Feature Layers with hundreds of thousands of features. Reading all of those features into memory would be slow, costly (in terms of memory), and unnecessary!\n\nThe `where` argument of `arc_select()` permits us to provide a very simple SQL where clause to limit what we get back. Let's explore the use of the `where` argument. \n\nLet's modify our above `arc_select()` statement to return only the features in California. We do this by using the where clause `STATE_ABBR = 'CA'`\n\n\n::: {.cell}\n\n```{.r .cell-code}\narc_select(\n flayer,\n where = \"STATE_ABBR = 'CA'\",\n fields = c(\"STATE_ABBR\", \"POPULATION\", \"NAME\")\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nSimple feature collection with 498 features and 3 fields\nGeometry type: POINT\nDimension: XY\nBounding box: xmin: -124.1662 ymin: 32.57388 xmax: -114.5903 ymax: 40.93734\nGeodetic CRS: WGS 84\nFirst 10 features:\n STATE_ABBR POPULATION NAME geometry\n1 CA 38046 Adelanto POINT (-117.4384 34.5792)\n2 CA 20299 Agoura Hills POINT (-118.7601 34.15363)\n3 CA 78280 Alameda POINT (-122.2614 37.7672)\n4 CA 15314 Alamo POINT (-122.0307 37.84998)\n5 CA 20271 Albany POINT (-122.3002 37.88985)\n6 CA 82868 Alhambra POINT (-118.1355 34.08398)\n7 CA 52176 Aliso Viejo POINT (-117.7289 33.57922)\n8 CA 14696 Alpine POINT (-116.7585 32.84388)\n9 CA 42846 Altadena POINT (-118.1356 34.19342)\n10 CA 12042 Alum Rock POINT (-121.8239 37.3694)\n```\n\n\n:::\n:::\n\n\nWe can also consider finding only the places in the US with more than 1,000,000 people as well.\n\n\n::: {.cell}\n\n```{.r .cell-code}\narc_select(\n flayer,\n where = \"POPULATION > 1000000\",\n fields = c(\"STATE_ABBR\", \"POPULATION\", \"NAME\")\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nSimple feature collection with 10 features and 3 fields\nGeometry type: POINT\nDimension: XY\nBounding box: xmin: -121.8864 ymin: 29.42354 xmax: -74.01013 ymax: 41.75649\nGeodetic CRS: WGS 84\n STATE_ABBR POPULATION NAME geometry\n1 AZ 1608139 Phoenix POINT (-112.0739 33.44611)\n2 CA 3898747 Los Angeles POINT (-118.2706 34.05279)\n3 CA 1386932 San Diego POINT (-117.1456 32.72033)\n4 CA 1013240 San Jose POINT (-121.8864 37.33941)\n5 IL 2746388 Chicago POINT (-87.64715 41.75649)\n6 NY 8804190 New York POINT (-74.01013 40.71057)\n7 PA 1603797 Philadelphia POINT (-75.16099 39.95136)\n8 TX 1304379 Dallas POINT (-96.79576 32.77865)\n9 TX 2304580 Houston POINT (-95.36751 29.75876)\n10 TX 1434625 San Antonio POINT (-98.4925 29.42354)\n```\n\n\n:::\n:::\n\n\nNow let's try combining both where clauses using `and` to find only the cities in California with a population greater than 1,000,000.\n\n\n::: {.cell}\n\n```{.r .cell-code}\narc_select(\n flayer,\n where = \"POPULATION > 1000000 and STATE_ABBR = 'CA'\",\n fields = c(\"STATE_ABBR\", \"POPULATION\", \"NAME\")\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nSimple feature collection with 3 features and 3 fields\nGeometry type: POINT\nDimension: XY\nBounding box: xmin: -121.8864 ymin: 32.72033 xmax: -117.1456 ymax: 37.33941\nGeodetic CRS: WGS 84\n STATE_ABBR POPULATION NAME geometry\n1 CA 3898747 Los Angeles POINT (-118.2706 34.05279)\n2 CA 1386932 San Diego POINT (-117.1456 32.72033)\n3 CA 1013240 San Jose POINT (-121.8864 37.33941)\n```\n\n\n:::\n:::\n\n\n## Using `dplyr`\n\nIf writing the field names out by hand and coming up with SQL where clauses isn't your thing, that's okay. We also provide `dplyr::select()` and `dplyr::filter()` methods for `FeatureLayer` objects.\n\nThe dplyr functionality is modeled off of [`dbplyr`](https://dbplyr.tidyverse.org/). The general concept is that we have a connection object that specifies what we will be querying against. Then we build up our queries using dplyr functions. Unlike using dplyr on `data.frame`s, the results aren't fetched eagerly. Instead they are _lazy_. With `dbplyr` we use the `collect()` function to execute a query and bring it into memory. The same is true with `FeatureLayer` objects. \n\nLet's build up a query and see it in action! We need to load dplyr to bring the functions into scope.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(dplyr)\n\nfl_query <- flayer |> \n select(STATE_ABBR, POPULATION, NAME)\n\nfl_query\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n>\nName: USA Major Cities\nGeometry Type: esriGeometryPoint\nCRS: 4326\nCapabilities: Query,Extract\nQuery:\n outFields: STATE_ABBR,POPULATION,NAME\n```\n\n\n:::\n:::\n\n\nAfter doing this, we can see that our `FeatureLayer` object now prints out a `Query` field with the `outFields` parameter set to the result of our `select()` function.\n\n:::{.callout-note collapse=\"true\" title=\"A note for advanced useRs\"}\nWe build up and store the query in the `query` attribute of a `FeatureLayer` object. It is a named list that will be passed directly to the API endpoint. The names match endpoint parameters. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nattr(fl_query, \"query\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n$outFields\n[1] \"STATE_ABBR,POPULATION,NAME\"\n```\n\n\n:::\n:::\n\n\nYou can also manually specify parameters using the `update_params()` function. Note that there is _no_ parameter validation.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nupdate_params(fl_query, key = \"value\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n>\nName: USA Major Cities\nGeometry Type: esriGeometryPoint\nCRS: 4326\nCapabilities: Query,Extract\nQuery:\n outFields: STATE_ABBR,POPULATION,NAME\n key: value\n```\n\n\n:::\n:::\n\n\n:::\n\nWe can continue to build up our query using `filter()` \n\n:::{.callout-tip}\nOnly very basic filter statements are supported such as `==`, `<`, `>`, etc.\n:::\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfl_query |> \n filter(POPULATION > 1000000, STATE_ABBR = \"CA\")\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n>\nName: USA Major Cities\nGeometry Type: esriGeometryPoint\nCRS: 4326\nCapabilities: Query,Extract\nQuery:\n outFields: STATE_ABBR,POPULATION,NAME\n where: POPULATION > 1000000.0 AND 'CA'\n```\n\n\n:::\n:::\n\n\nThe query is stored in the `FeatureLayer` object and will not be executed until we request it with `collect()`. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nfl_query |> \n filter(POPULATION > 1000000, STATE_ABBR == \"CA\") |> \n collect()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nSimple feature collection with 3 features and 3 fields\nGeometry type: POINT\nDimension: XY\nBounding box: xmin: -121.8864 ymin: 32.72033 xmax: -117.1456 ymax: 37.33941\nGeodetic CRS: WGS 84\n STATE_ABBR POPULATION NAME geometry\n1 CA 3898747 Los Angeles POINT (-118.2706 34.05279)\n2 CA 1386932 San Diego POINT (-117.1456 32.72033)\n3 CA 1013240 San Jose POINT (-121.8864 37.33941)\n```\n\n\n:::\n:::\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/location-services/read-data.qmd b/location-services/read-data.qmd index 2e50bf5..a024311 100644 --- a/location-services/read-data.qmd +++ b/location-services/read-data.qmd @@ -1,5 +1,5 @@ --- -title: "Read data from ArcGIS Online or Enterprise" +title: "Read hosted data" subtitle: "Learn how to read data from ArcGIS Online or Enterprise into R" freeze: true ---