Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sbtools data discovery #261

Merged
merged 15 commits into from
Jul 5, 2017
2 changes: 1 addition & 1 deletion content/usgs-packages/geoknife_Intro.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ set.seed(1)

This lesson will explore how to find and download large gridded datasets via the R package `geoknife`. The package was created to allow easy access to data stored in the [Geo Data Portal (GDP)](https://cida.usgs.gov/gdp/), or any gridded dataset available through the [OPeNDAP](https://www.opendap.org/) protocol DAP2. `geoknife` refers to the gridded dataset as the `fabric`, the spatial feature of interest as the `stencil`, and the subset algorithm parameters as the `knife` (see below).

![geoknife terminology figure](../static/img/geoknife_summary.png "figure illustrating definitions of fabric, stencil, and knife")
![geoknife terminology figure](../static/img/geoknife_summary.png#inline-img "figure illustrating definitions of fabric, stencil, and knife")

## Lesson Objectives

Expand Down
5 changes: 2 additions & 3 deletions content/usgs-packages/geoknife_Intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,20 +3,19 @@ author: Lindsay R. Carr
date: 9999-10-01
slug: geoknife-intro
title: geoknife - Introduction
draft: true
image: img/main/intro-icons-300px/r-logo.png
identifier:
menu:
main:
parent: Introduction to USGS R Packages
weight: 2
draft: true
---
Lesson Summary
--------------

This lesson will explore how to find and download large gridded datasets via the R package `geoknife`. The package was created to allow easy access to data stored in the [Geo Data Portal (GDP)](https://cida.usgs.gov/gdp/), or any gridded dataset available through the [OPeNDAP](https://www.opendap.org/) protocol DAP2. `geoknife` refers to the gridded dataset as the `fabric`, the spatial feature of interest as the `stencil`, and the subset algorithm parameters as the `knife` (see below).

![geoknife terminology figure](../static/img/geoknife_summary.png "figure illustrating definitions of fabric, stencil, and knife")
![geoknife terminology figure](../static/img/geoknife_summary.png#inline-img "figure illustrating definitions of fabric, stencil, and knife")

Lesson Objectives
-----------------
Expand Down
224 changes: 224 additions & 0 deletions content/usgs-packages/sbtools_Discovery.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,224 @@
---
title: "sbtools - Data discovery"
date: "9999-07-01"
author: "Lindsay R. Carr"
slug: "sbtools-discovery"
image: "img/main/intro-icons-300px/r-logo.png"
output: USGSmarkdowntemplates::hugoTraining
parent: Introduction to USGS R Packages
weight: 2
draft: true
---

```{r setup, include=FALSE, warning=FALSE, message=FALSE}
library(knitr)

knit_hooks$set(plot=function(x, options) {
sprintf("<img src='../%s%s-%d.%s'/ title='%s'/>",
options$fig.path, options$label, options$fig.cur, options$fig.ext, options$fig.cap)

})

opts_chunk$set(
echo=TRUE,
fig.path="static/sbtools-discovery/",
fig.width = 6,
fig.height = 6,
fig.cap = "TODO"
)

set.seed(1)
```

Although ScienceBase is a great platform for uploading and storing your data, you can also use it to find other available data. You can do that manually by searching using the ScienceBase web interface or through `sbtools` functions.

## Discovering data via web interface

The most familiar way to search for data would be to use the ScienceBase search capabilities available online. You can search for any publically available data in the [ScienceBase catalog](https://www.sciencebase.gov/catalog/). Search by category (map, data, project, publication, etc), topic-based tags, or location; or search by your own key words.

![ScienceBase Catalog Homepage](../static/img/sb_catalog_search.png#inline-img "search ScienceBase catalog")

Learn more about the [catalog search features](www.sciencebase.gov/about/content/explore-sciencebase#2. Search ScienceBase) and explore the [advanced searching capabilities](www.sciencebase.gov/about/content/sciencebase-advanced-search) on the ScienceBase help pages.

## Discovering data via sbtools

The ScienceBase search tools can be very powerful, but lack the ability to easily recreate the search. If you want to incorporate dataset queries into a reproducible workflow, you can script them using the `sbtools` query functions. The terminology differs from the web interface slightly. Below are functions available to query the catalog:

1. `query_sb` (generic SB query)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this list is very helpful - nice to see all the options in one place.

i wonder if we can say any more about query_sb() in particular here...i think it's more flexible than the other 5, right? can it do everything that the other 5 can, and more? when would a user want to use this one instead of one of the others? do we know any of the things this one can do that the others can't? could it make sense to put this at the end of the list so that you can explain this one as a generalization of the others in the text that follows?

2. `query_sb_text` (matches title or description)
3. `query_sb_doi` (use a DOI identifier)
4. `query_sb_spatial` (data within or at a specific location)
5. `query_sb_date` (items within time range)
6. `query_sb_datatype` (type of data, not necessarily file type)

These functions take a variety of inputs, and all return an R list of `sbitems` (a special `sbtools` class). All of these functions default to 20 returned search results, but you can change that by specifying the argument `limit`. Before we practice using these functions, make sure you load the `sbtools` package in your current R session.

```{r}
library(sbtools)
```

### Using `query_sb`

`query_sb` is the "catch-all" function for querying ScienceBase from R. It only takes one argument for specifying query parameters, `query_list`. This is an R list with specific query parameters as the list names and the user query string as the list values. See the `DESCRIPTION` section of the help file for all options (`?query_sb`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about **Description** instead of `DESCRIPTION`, so nobody confuses this help file section with the package DESCRIPTION file?


```{r}

##### THESE FIRST TWO SEARCHES STILL NEED WORK #####
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what additional work did you want to do on these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was trying out a few other query args, like sort or something and wasn't getting what I expected. I just tried adding sort to the query list and it came out just fine. Consider "still need work" to no longer be true.


# search by keyword
precip_query <- list(q = 'precipitation')
precip_data <- query_sb(query_list = precip_query)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i personally prefer to avoid defining variables that only get used once and are already informatively labeled when they're used. therefore, if doing this myself, I'd convert lines 69-70 to

precip_data <- query_sb(query_list = list(q = 'precipitation'))

but i'd like to hear your case for doing it via an intermediate variable, as it currently is, and i'm fine with leaving it this way if you actively prefer it this way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I really had a reason...I think I just did it. Thanks for pointing it out, I agree - no need to clutter the environment for something that is used once.

length(precip_data) # 50 entries, so there is likely more than 50 results
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of typos in the comment. instead:

length(precip_data) # 20 entries, so there are likely more than 20 results

head(precip_data, 2)

# search by keyword + category
precip_maps_query <- list(q = 'precipitation', browseType = "Static Map Image", sort='title')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm confused by the documentation in ?query_sb (i know you didn't write it, but this could be a chance for us to update sbtools docs and/or clarify in this training file). have you figured out what this means?

  • browseCategory One of .... Used as a filter
  • browseType One of .... Used as a filter

And how did you figure out that "Static Map Image" was an option? Is there a place to go find out what the options are?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was not sure what that meant but sb_datatypes() gave me the list for browseType. For browseCategory, I went to SB, and it didn't look like there were any more categories (https://www.sciencebase.gov/catalog/).

precip_maps_data <- query_sb(query_list = precip_query)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here, precip_query is probably not the argument you intended (b/c precip_maps_query is what you just defined). possibly a minor case in point for not defining intermediate variables?

head(precip_maps_data, 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these sbitem lists are hard to look at, as you've acknowledged by only printing 2. that's fine, but what about also showing the following sapply call to show how you can simplify the output?

sapply(precip_maps_data, function(item) item$title)
#  [1] "Change in Precipitation (Projected and Observed) and Change in Standard Precipitation For Emissions Scenarios A2, A1B and B1 for the Gulf of Mexico"
#  [2] "Precipitation as Snow (PAS)"                                                                                                                        
#  [3] "Precipitation"                                                                                                                                      
#  [4] "Mean Annual Precipitation (MAP)"                                                                                                                    
#  [5] "Mean Summer (May to Sep) Precipitation (MSP)"                                                                                                       
#  [6] "Summer (Jun to Aug) Precipitation (PPTSM)"                                                                                                          
#  [7] "Isoscapes of δ18O and δ2H reveal climatic forcings on Alaska and Yukon precipitation"                                                               
#  [8] "Precipitation mm/year projections for years 2010-2080 RCP 8.5"                                                                                      
#  [9] "Isoscapes of δ18O and δ2H reveal climatic forcings on Alaska and Yukon precipitation"                                                               
# [10] "Precipitation mm/year projections for years 2010-2080 RCP 4.5"                                                                                      
# [11] "Isoscapes of δ18O and δ2H reveal climatic forcings on Alaska and Yukon precipitation"                                                               
# [12] "Isoscapes of δ18O and δ2H reveal climatic forcings on Alaska and Yukon precipitation"                                                               
# [13] "Winter (Dec to Feb) Precipitation (PPTWT)"                                                                                                          
# [14] "Isoscapes of δ18O and δ2H reveal climatic forcings on Alaska and Yukon precipitation"                                                               
# [15] "Average, Standard and Projected Precipitation for Emissions Scenarios A2, A1B, and B1 for the Gulf of Mexico"                                       
# [16] "30 Year Mean Annual Precipitation 1960- 1990 PRISM"                                                                                                 
# [17] "Precipitation variability and primary productivity in water-limited ecosystems: how plants 'leverage' precipitation to 'finance' growth"            
# [18] "Climate change and precipitation - Consequences of more extreme precipitation regimes for terrestrial ecosystems"                                   
# [19] "A Numerical Study of the 1996 Saguenay Flood Cyclone: Effect of Assimilation of Precipitation Data on Quantitative Precipitation Forecasts"         
# [20] "A precipitation-runoff model for part of the Ninemile Creek Watershed near Camillus, Onondaga County, New York" 


##### -------------------------------------- #####

# search by 2 keywords
hazard2_query <- list(q = 'flood', q = 'earthquake')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, i didn't know you could have 2+ q elements. does this to an AND or an OR for the two words? could you note which in the comment 2 lines above?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some exploring. Here's a demo:

> length(query_sb(query_list=list(q='', lq='flood OR earthquake'), limit=100))
[1] 100
> length(query_sb(query_list=list(q='', lq='flood AND earthquake'), limit=100))
[1] 62
> length(query_sb(query_list=list(q='flood AND earthquake'), limit=100))
[1] 62

therefore:

  1. 2 q's, or a q with space-separated words, does an AND query
  2. Lucene queries are available, though not correctly documented in ?query_sb - it's lq, not q
  3. To actually use a Lucene query, it seems that you need to also give a regular query, hence my use of an empty q above. Without that q, no results are returned:
> length(query_sb(query_list=list(lq='flood AND earthquake'), limit=100))
[1] 0
> length(query_sb(query_list=list(lq='flood earthquake'), limit=100))
[1] 0

For this training doc, it would be nice to give a working example of a Lucene query and to report that a regular q is an AND query. It looks to me like there's also an issue to report on the sbtools page (documenting the lq option and, if possible, making it work even if there isn't a q in the query_list)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the behavior of q when given multiple keywords seems unexpected, I'm just going to replace those examples with Lucene query examples instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me!

hazard2_data <- query_sb(query_list = hazard2_query)
length(hazard2_data)
head(hazard2_data, 2)

# search by 3 keywords
hazard3_query <- list(q = 'flood', q = 'earthquake', q='drought')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the 3-keyword example show more than the 2? what about turning this example into a Lucene query or a query with the two words space-separated in a single q?

hazard3_data <- query_sb(query_list = hazard3_query)
length(hazard3_data)
head(hazard3_data, 2)
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was intrigued to discover today, in the SB documentation you linked to (https://www.sciencebase.gov/about/content/sciencebase-advanced-search), that you can also exclude things that meet any of the query criteria:

Exclude Elements in Search Results

To exclude specific elements, use ! (exclamation point)

For example, to find the term water, but eliminate records in Data Category, the URL query is modified: https://www.sciencebase.gov/catalog/items?q=water&filter=browseCategory!%3DData
The ! is inserted after BrowseCategory to eliminate the Data Category from search results. The exclamation point may be added in other search results, to exclude other elements (such as browseType, tags, communities).

could this be worth including? seems like these searches do pull down a lot more than one usually wants, so filtering things out could be helpful.

if you moved the generic query_sb toward the end of this document, after the other query_sb_... functions, there would be room in that section for a tangent like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Struggling with what that actually looks like via sbtools:

length(query_sb(query_list = list(q = '', lq = 'flood AND earthquake', browserType = "!Application"), limit=200))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, me neither. I tried both 'browseCategory!'='Image' and 'browseCategory'='!Image', both of which put the ! in the right place, but then ! gets converted to %21 in httr::handle_url within query_sb, so it's all for naught. I made an sbtools issue to document the wished-for feature: DOI-USGS/sbtools#238. Maybe for now we should just not introduce this topic in these lessons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought it might need to be escaped, but that didn't seem to make any difference. Sounds good, I will leave it alone for now!

### Using `query_sb_text`

`query_sb_text` returns a list of `sbitems` that match the title or description fields. Use it to search authors, station names, rivers, states, etc.

```{r}
# search using a contributors name
contrib_results <- query_sb_text("Robert Hirsch")
head(contrib_results, 2)

# search using place of interest
park_results <- query_sb_text("Yellowstone")
head(park_results, 2)

# search using a site location
loc_results <- query_sb_text("Embudo")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like these examples but am not sure how Yellowstone and Embudo really differ. They're both places, and they both get handled in exactly the same way by SB, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm that's very true. All of these get handled in Science Base the same way I think, right? Since it's just searching text. Could do river_results <- query_sb_text("Rio Grande") to show another slightly different example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, either Embudo or Rio Grande is fine if you're just looking for several examples for reinforcement. Rio Grande does seem a touch more different.

length(loc_results)
head(loc_results, 2)
```

### Using `query_sb_doi`

Use a Digital Object Identifier (DOI) to query ScienceBase. This should return only one list item, unless there is more than one ScienceBase item referencing this very unique identifier.

```{r}
# USGS Microplastics study
query_sb_doi('10.5066/F7ZC80ZP')

####### I've tried A TON of DOIs I found through the web interface and just
####### keep getting empty lists returned.
query_sb_doi('10.1016/j.coldregions.2007.05.009')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this particular DOI seems to have a different scheme from what query_sb_doi seeks. maybe many of them do?

using the scheme embedded in query_sb_doi:

> query_item_identifier(scheme='https://www.sciencebase.gov/vocab/category/item/identifier', type='DOI', key="10.1016/j.coldregions.2007.05.009")
list()

using the scheme I see in the Identifiers section of https://www.sciencebase.gov/catalog/item/570bbfa7e4b0ef3b7ca0294a:

> query_item_identifier(scheme='http://sciencebase.gov/vocab/identifierScheme', type='DOI', key='10.1016/j.coldregions.2007.05.009')
[[1]]
<ScienceBase Item> 
  Title: Snow cover effects on acoustic sensors
  Creator/LastUpdatedBy:      / 
  Provenance (Created / Updated):   / 
  Children: 
  Item ID: 5771b85ae4b07657d1a6be8a
  Parent ID: 5771b40fe4b07657d1a6bb5e

[[2]]
<ScienceBase Item> 
  Title: Snow cover effects on acoustic sensors
  Creator/LastUpdatedBy:      / 
  Provenance (Created / Updated):   / 
  Children: 
  Item ID: 5762cd16e4b07657d19a82a3
  Parent ID: 5762cba1e4b07657d19a7248

[[etc. to length 6]]

therefore:

  1. i've made an sbtools issue for adding this other scheme to the options in query_sb_doi (expand query_sb_doi to include all probable Schemes for DOI DOI-USGS/sbtools#235). the ScienceBase/sbtools maintainers may have a more complete understanding of which schemes are possible.
  2. line 115 made perfect sense to me until this query returned 6 items. maybe not worth claiming that this function "should return only one list item", after all.
  3. in the meantime, to acquire a second working example, i did this:
> known_dois <- query_item_identifier(scheme='https://www.sciencebase.gov/vocab/category/item/identifier', type='DOI')
> item_get_fields(known_dois[[12]], fields='identifiers')[[2]]$key
[1] "doi:10.5066/F77W699S"

and therefore can recommend this example:

> query_sb_doi('10.5066/F77W699S')
[[1]]
<ScienceBase Item> 
  Title: Selected Environmental Characteristics of Sampled Sites, Watersheds, and Riparian Zones for the U.S. Geological Survey Midwest Stream Quality Assessment
  Creator/LastUpdatedBy:      / 
  Provenance (Created / Updated):   / 
  Children: 
  Item ID: 5714ec24e4b0ef3b7ca85d75
  Parent ID: 569972c5e4b0ec051295ece5

```

### Using `query_sb_spatial`

`query_sb_spatial` accepts 3 different methods for specifying a spatial area in which to look for data. You can specify a bounding box `bbox` as an `sp` spatial data object *[[[[NEED MORE TEXT HERE]]]]*. Alternatively, you can supply a vector of latitudes and a vector of longitudes using `lat` and `long` arguments. The function will automatically use the minimum and maximum from those vectors to construct a boundary box. The last way to represent a spatial region to query ScienceBase is using a POLYGON Well-known text (WKT) object as a text string. The format is `"POLYGON(([LONG1 LAT1], [LONG2 LAT2], [LONG3 LAT3]))"`, where `LONG#` and `LAT#` are longitude and latitude pairs as decimals. See the [Open Geospatial Consortium WKT standard](http://www.opengeospatial.org/standards/wkt-crs) for more information.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

up to you, but it might be easiest to split up this text description so that an example immediately follows each of the 3 methods. what other info do you want to include at *[[[[NEED MORE TEXT HERE]]]]*?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the MORE TEXT tag was because I thought I needed to explain what an sp object is. Or at least link out to something that describes it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like the link-out idea


```{r}
### SPATIAL QUERY EXAMPLES NEED SOME TLC

appalachia <- data.frame(
lat = c(34.576900, 36.114974, 37.374456, 35.919619, 39.206481),
long = c(-84.771119, -83.393990, -81.256731, -81.492395, -78.417345))

conus <- data.frame(
lat = c(49.078148, 47.575022, 32.914614, 25.000481),
long = c(-124.722111, -67.996898, -118.270335, -80.125804))

# verifying where points are supposed to be
maps::map('usa')
points(conus$long, conus$lat, col="red", pch=20)
points(appalachia$long, appalachia$lat, col="green", pch=20)

# query by bounding box
bbox_sp_obj <- sp::SpatialPoints(appalachia)
# query_sb_spatial(bbox=bbox_sp_obj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, i can get this one to run without error by specifying a proj4string (i just made up a projection, probably isn't right):

> query_sb_spatial(bbox=sp::SpatialPoints(conus, proj4string = sp::CRS("+proj=longlat +datum=NAD27")))
list()
> query_sb_spatial(bbox=sp::SpatialPoints(appalachia, proj4string = sp::CRS("+proj=longlat +datum=NAD27")))
list()

but they're both returning empty lists, which doesn't seem right. are you getting empty lists for all of the examples in this section? i am.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran a query by interactively selecting a bounding box under "Browse by Location" at https://www.sciencebase.gov/catalog/. The resulting URL was

https://www.sciencebase.gov/catalog/items?filter=spatialQuery%3D%7Btype%3A%22envelope%22%2Ccoordinates%3A%5B%5B-91.06475439079985%2C39.171001192353756%5D%2C+%5B-81.22100439080239%2C33.21851920313964%5D%5D%7D

which shares the spatialQuery filter with query_sb_spatial, but otherwise looks different from the WKT format that's ultimately used by all calls to query_sb_spatial (https://github.com/USGS-R/sbtools/blob/master/R/query_sb_spatial.R#L59). do you suppose SB might have changed formats? i don't see reference to WKT in their spatial docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you figure out the projection we would need? Can we just pick the proj4string?

And yes, I get an empty list for all of these queries.

Hmmm it does look quite different. I guess it might have changed? Should we make an sbtools issue about spatial queries?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know pitifully little about projections. maybe ask laura or a jordan?

yeah, if you're getting empty lists too, i guess another sbtools issue is in order.


# query by latitude and longitude vectors
query_sb_spatial(long = appalachia$long, lat = appalachia$lat)
query_sb_spatial(long = conus$long, lat = conus$lat)

# query by WKT polygon
wkt_coord_str <- paste(conus$long, conus$lat, sep=" ", collapse = ",")
wkt_str <- sprintf("POLYGON((%s))", wkt_coord_str)
# query_sb_spatial(bb_wkt = wkt_str)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this runs for me, just returns an unrewarding list(). is that what you're getting, too?

```

### Using `query_sb_date`

`query_sb_date` returns ScienceBase items that fall within a certain time range. There are multiple timestamps applied to items, so you will need to specify which one to match the range. The default queries are to look for items last updated between 1970-01-01 and today's date. See `?query_sb_date` for more examples of which timestamps are available.

```{r}
# find data worked on in the last week
today <- Sys.time()
oneweekago <- today - (7*24*3600) # days * hrs/day * secs/hr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any interest in replacing this with

oneweekago <- today - as.difftime(7, units='days')

?

recent_data <- query_sb_date(start = today, end = oneweekago)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i started by wondering whether it was OK for start to come after end, then happened on this curious behavior:

> recent_data <- query_sb_date(start = today, end = oneweekago, limit=10000)
> length(recent_data)
[1] 114
> recent_data <- query_sb_date(end = today, start = oneweekago, limit=10000)
> length(recent_data)
[1] 122

...i don't know what to make of this. datetimes do get converted to dates (https://github.com/USGS-R/sbtools/blob/master/R/query_sb_date.R#L36), maybe one of start/end is inclusive while the other is exclusive?

i guess i'd recommend demonstrating with dates rather than datetimes, given that query_sb_date simplifies to dates no matter what. the inclusive/exclusive thing is curious but probably not worth teaching here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried doing this with Dates instead of datetimes, but still seeing this curious behavior...

length(query_sb_date(start = today, end = oneweekago, limit=10000))
[1] 10000
> length(query_sb_date(end = today, start = oneweekago, limit=10000))
[1] 407
> length(query_sb_date(start = as.Date(today), end = as.Date(oneweekago), limit=10000))
[1] 10000
> length(query_sb_date(end = as.Date(today), start = as.Date(oneweekago), limit=10000))
[1] 407

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, that's a much bigger difference than I saw! i guess it's possible that >=9593 items were created exactly on as.Date(oneweekago)...or that when end < start, strange things happen. couldn't say which without more testing, which i'm not sure is worth our time.

that said, i didn't actually expect the Date approach to change the behavior - just to help clarify that Dates are the best precision you can hope for

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that seems like something we don't need to defend against here.

Ah I see your point in doing that. I'll update it.

head(recent_data, 2)

# find data that's been created over the last year
oneyearago <- today - (365*24*3600) # days * hrs/day * secs/hr
recent_data <- query_sb_date(start = today, end = oneyearago, date_type = "dateCreated")
head(recent_data, 2)
```

### Using `query_sb_datatype`

`query_sb_datatype` is used to search ScienceBase by the type of data an item is listed as. Run `sb_datatypes()` to get a list of 50 available data types.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh, good tip! and looks like this is also the list of possible browseTypes? i see that query_sb_datatype creates a browseType filter, and element 46 ("Static Map Image") is what you used for browseType above in line 81...if you do move query_sb() down to below all these more specific query_sb_...()s, you could just refer back to this function when you do the query_sb(browseType...) example


```{r}
# get ScienceBase news items
sbnews <- query_sb_datatype("News")
head(sbnews, 2)

# find shapefiles
shps <- query_sb_datatype("Shapefile")
head(shps, 2)

# find raster data
sbraster <- query_sb_datatype("Raster")
head(sbraster, 2)
```

## Best of both methods

Although you can query from R, sometimes it's useful to look an item on the web interface. You can use the `query_sb_*` functions and then follow that URL to view items on the web. This is especially handy for viewing maps and metadata, or to check or repair a ScienceBase item if any of the `sbtools`-based commands have failed.

```{r}
sbmaps <- query_sb_datatype("Static Map Image", limit=3)
oneitem <- sbmaps[[1]]

# get and open URL from single sbitem
url_oneitem <- oneitem[['link']][['url']]
browseURL(url_oneitem)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


# get and open URLs from many sbitems
lapply(sbmaps, function(sbitem) {
url <- sbitem[['link']][['url']]
browseURL(url)
return(url)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this return(url) line be left out since the side effect of browseURL is the desired functionality? (at least, i assume it is because you didn't assign the output of the lapply to anything)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was thinking that it would be nice to just print out the URLs that were opened. I don't know why, but I generally don't like functions that don't return anything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, fine with me.

})
```

## No results
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this should go above Best of Both Methods, given that it refers to the first method only?


Some of your queries will probably return no results. When there are no results that match your query, the returned list will have a length of 0.

```{r}
# search for items related to a Water Quality Portal paper DOI
query_results <- query_sb_doi(doi = '10.1002/2016WR019993')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol. making good use of a dead end from your work on the query_sb_doi section, eh? there's no chance that this paper will ever make its way onto SB, is there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I know of!

length(query_results)
head(query_results)
```
Loading