-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.Rmd
131 lines (88 loc) · 4.67 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
---
output:
md_document:
variant: markdown_github
---
[![Build Status](https://travis-ci.org/bogdanrau/askchisne.svg?branch=master)](https://travis-ci.org/bogdanrau/askchisne)
askchisne
=========
An R package that connects to the [AskCHIS NE](http://askchisne.ucla.edu) API and retrieves data into R data frames.
NOTE: this package does not come with any warranties.
Installation
============
Install from CRAN: COMING SOON!
Install from GitHub:
```r
devtools::install_github("bogdanrau/askchisne")
```
Usage
=====
Before using this package, you must obtain an API key from the [California Health Interview Survey (CHIS)](http://chis.ucla.edu).
You will use the API key in all functions to connect to the API and get data.
Base Functions
--------------
The following functions can be used to query the AskCHIS NE API:
Function | Description
--------------------------|-----------------------------------------------------
`getMetadata()` | Obtains all of the metadata from AskCHIS NE.
`geoSearch()` | Returns available locations for a search term.
`getEstimate()` | Returns estimate and attributes for one or more locations in the database.
`poolEstimate()` | Pools estimates for two or more locations.
***
> `getMetadata()` function obtains all of the metadata available in AskCHIS NE. This function has only
one simple call:
```r
getMetadata(apiKey = '<YOUR API KEY>')
```
The resulting data frame will contain:
* `name`: the indicator name (required in the `getEstimate()` function).
* `label`: the indicator label.
* `ageGroup`: the age group for that specific indicator.
* `year`: the data year for the indicator.
* `responseLabel`: the response label (can be used if developing Shiny apps).
* `description`: a description of the indicator and link to additional resources if non-CHIS.
***
> `geoSearch()` function searches the API for all available geographic locations matching the search string.
The function requires a search string and the API key:
```r
geoSearch(search = 'YOUR SEARCH TERM', apiKey = '<YOUR API KEY>')
```
The resulting data frame will contain:
* `geoId`: the geoId for each resulting location. This geoId will be required when searching for estimates.
* `name`: the location name.
* `geoType`: the type of location. This can be a zip code (ZCTA), a city (CITIES), a county (COUNTIES,
a legislative district (ASSEMBLY, CONGRESS, SENATE), or the state (STATE).
***
> `getEstimate()` function retrieves estimates as well as additional statistical attributes for one or more
requested locations:
```R
getEstimate(indicator = 'INDICATOR NAME', attributes = NULL, geoLevel = NULL, locations = NULL, year = NULL, apiKey = '<YOUR API KEY>')
```
Parameter | Description
------------------|--------------------
indicator | The name of the indicator, which can be obtained using the getMetadata function.
attributes | If not specified, returns all available attributes: estimate, population, SE, CI_LB95, CI_UB95, CV, MSE.
geoLevel | The level of geography for the query: zcta, cities, counties, assembly, congress, senate, state.
locations | A comma separated list of geoIds for each location queried (can use the geoSearch() function to obtain those).
year | The year for the data requested. Years available are accessible through the getMetadata() function. If left null, will return the newest data available.
The resulting data frame will contain:
* `geoId`: the geoIds for all locations returned.
* `geoName`: the name of all of the locations returned.
* `geoTypeId`: the type of location returned (zip code, city, etc.).
* `isSuppressed`: a TRUE/FALSE describing whether the estimate is suppressed.
* `suppressionReason`: description of reason for suppression.
* `population`: the population universe for that specific indicator.
* `estimate`: the estimate for that indicator.
* `SE`: the Standard Error (SE) for that indicator.
* `CI_LB95`: the lower bound of the 95% Confidence Interval.
* `CI_UB95`: the upper bound of the 95% Confidence Interval.
* `CV`: the Coefficient of Variation.
* `MSE`: the Mean Square Error.
* `year`: the specific year for which the data was either collected or created.
* `unit`: the unit of measurement for the estimate.
***
> `poolEstimate()` function combines multiple locations and returns the pooled estimate for those locations.
```r
poolEstimate(indicator = 'INDICATOR NAME', attributes = NULL, locations = 'LIST OF LOCATION geoIds', year = NULL, apyKey = '<YOUR API KEY>')
```
The resulting data frame will contain the same columns as the response from `getEstimate()`. with the difference that `poolEstimate()` returns pooled locations, not individual locations.