Skip to content

Commit

Permalink
add reading tutorial
Browse files Browse the repository at this point in the history
  • Loading branch information
JosiahParry committed Dec 1, 2023
1 parent 48dbb01 commit 9431354
Show file tree
Hide file tree
Showing 6 changed files with 208 additions and 0 deletions.
1 change: 1 addition & 0 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ website:
- section: Location Services
contents:
- location-services/connecting-to-a-portal.qmd
- location-services/read-data.qmd
- tutorials.qmd
- workflows.qmd
- packages.qmd
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added location-services/images/read-data/view-url.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
207 changes: 207 additions & 0 deletions location-services/read-data.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
---
title: "Read data from ArcGIS Online or Enterprise"
subtitle: "Learn how to read data from ArcGIS Online or Enterprise into R"
---

ArcGIS Online and Enterprise hosted Feature Layers can easily be read into R as [`{sf}`](https://r-spatial.github.io/sf/) objects using`{arcgislayers}`.

This tutorial will teach you the basics of reading data using `arcgis`.

## Objective

The objective of this tutorial is to teach you how to

- read in a population dataset from ArcGIS Online
- apply a filter to a Feature Layer
- read only specified columns
- find a Feature Layer url

## Obtaining a feature layer url

For this example we will read in [population data of major US cities](https://www.arcgis.com/home/item.html?id=9df5e769bfe8412b8de36a2e618c7672) from ArcGIS Online.

We need to first create a `FeatureLayer` object using the `arc_open()` function. `arc_open()` requires the url of the hosted feature service. To find this, we can navigate to the item in our portal.


![](images/read-data/usa-cities.png)
When you scroll down, on the right hand side, you will see a button to view the service itself.

![](images/read-data/view-url.png){width=45%}

Clicking this will bring us to the Feature Service itself. Inside of a Feature Server there may be many layers or table that we can use. In this case, there is only one layer. Click the hyperlinked **USA Major Cities**.

![](images/read-data/usa-cities-server.png)

Now we will be in the Feature Layer itself.

![](images/read-data/usa-cities-layer.png){width=70%}

Navigate to your browsers search bar, and you can copy the url

```
https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Major_Cities_/FeatureServer/0
```

## Opening a Feature Layer

Before we can read in the Feature Layer, we need to load the `arcgis` R package. If you do not have `arcgis` installed, install it with `pak::pak("r-arcgis/arcgis")`.

```{r}
library(arcgis)
```

Let's store the Feature Layer url in an object called `furl` (as in feature layer url).

```{r}
furl <- "https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Major_Cities_/FeatureServer/0"
```

We then pass this variable to `arc_open()` and save it to `flayer` (feature layer).

```{r}
flayer <- arc_open(furl)
flayer
```

`arc_open()` will create a `FeatureLayer` object. Under this hood this is really just a list with all of the feature layer's metadata.

:::{.callout-note collapse="true" title="FeatureLayer details for the curious"}
The `FeatureLayer` object is obtained by adding `?f=json` to the feature layer url and processing the json. All of the metadata in there is stored in the `FeatureLayer` object. You can see this by running `unclass(flayer)`. Be warned! It gets messy.
:::

With this `FeatureLayer` object, we can read data from the service into R using it!

## Reading from a Feature Layer

Once we have a `FeatureLayer` object we can read its data into memory using the `arc_select()` function. By default, if we use `arc_select()` on a `FeatureLayer` without any additional arguments, the entire service will be brought into memory.

:::{.callout-warning}
Be careful to not try and read in more data than you need! Reading an entire feature services is fine for datasets in the realm of 0 - 5,000 features. But when we have more than 10,000 features performance and memory may be throttled.

Exceptionally detailed geometries require more data to be transferred across the web and may be slower to process.
:::


```{r, message = FALSE}
cities <- arc_select(flayer)
cities
```

We store the results of `arc_select()` into the object `cities`. The result is an `sf` object that we can now work with using **`sf`** and any other R package we'd like.

### Specifying output fields

In some cases we may have Feature Layers with many fields that we might not want. We can specify which fields we want to return to R by using the `fields` argument.

:::{.callout-tip}
It's always good to only read in the data that you need. Adding unneeded fields uses more memory and takes longer to process.
:::

`fields` takes a character vector of field names. To see which fields are available in a Feature Layer you can use the utility function `list_fields()`.

```{r}
fields <- list_fields(flayer)
fields[, 1:4]
```
:::{.aside}
For the sake of readability, only the first 4 columns are displayed.
:::

Let's try reading in only the `"STATE_ABBR"`, `"POPULATION"`, and `"NAME"` fields.

```{r}
arc_select(
flayer,
fields = c("STATE_ABBR", "POPULATION", "NAME")
)
```

### Using SQL where clauses

Not only can we limit the number of columns that we return from a Feature Layer, but we can also limit the number of rows that we have returned to us. This is very handy in the case of very, very, massive Feature Layers with hundreds of thousands of features. Reading all of those features into memory would be slow, costly (in terms of memory), and unnecessary!

The `where` argument of `arc_select()` permits us to provide a very simple SQL where clause to limit what we get back. Let's explore the use of the `where` argument.

Let's modify our above `arc_select()` statement to return only the features in California. We do this by using the where clause `STATE_ABBR = 'CA'`

```{r}
arc_select(
flayer,
where = "STATE_ABBR = 'CA'",
fields = c("STATE_ABBR", "POPULATION", "NAME")
)
```

We can also consider finding only the places in the US with more than 1,000,000 people as well.

```{r}
arc_select(
flayer,
where = "POPULATION > 1000000",
fields = c("STATE_ABBR", "POPULATION", "NAME")
)
```

Now let's try combining both where clauses using `and` to find only the cities in California with a population greater than 1,000,000.

```{r}
arc_select(
flayer,
where = "POPULATION > 1000000 and STATE_ABBR = 'CA'",
fields = c("STATE_ABBR", "POPULATION", "NAME")
)
```

## Using `dplyr`

If writing the field names out by hand and coming up with SQL where clauses isn't your thing, that's okay. We also provide `dplyr::select()` and `dplyr::filter()` methods for `FeatureLayer` objects.

The dplyr functionality is modeled off of [`dbplyr`](https://dbplyr.tidyverse.org/). The general concept is that we have a connection object that specifies what we will be querying against. Then we build up our queries using dplyr functions. Unlike using dplyr on `data.frame`s, the results aren't fetched eagerly. Instead they are _lazy_. With `dbplyr` we use the `collect()` function to execute a query and bring it into memory. The same is true with `FeatureLayer` objects.

Let's build up a query and see it in action! We need to load dplyr to bring the functions into scope.

```{r, message = FALSE}
library(dplyr)
fl_query <- flayer |>
select(STATE_ABBR, POPULATION, NAME)
fl_query
```

After doing this, we can see that our `FeatureLayer` object now prints out a `Query` field with the `outFields` parameter set to the result of our `select()` function.

:::{.callout-note collapse="true" title="A note for advanced useRs"}
We build up and store the query in the `query` attribute of a `FeatureLayer` object. It is a named list that will be passed directly to the API endpoint. The names match endpoint parameters.

```{r}
attr(fl_query, "query")
```

You can also manually specify parameters using the `update_params()` function. Note that there is _no_ parameter validation.

```{r}
update_params(fl_query, key = "value")
```

:::

We can continue to build up our query using `filter()`

:::{.callout-tip}
Only very basic filter statements are supported such as `==`, `<`, `>`, etc.
:::

```{r, message = FALSE}
fl_query |>
filter(POPULATION > 1000000, STATE_ABBR = "CA")
```

The query is stored in the `FeatureLayer` object and will not be executed until we request it with `collect()`.

```{r}
fl_query |>
filter(POPULATION > 1000000, STATE_ABBR == "CA") |>
collect()
```

0 comments on commit 9431354

Please sign in to comment.