README.Rmd

---
output:
  md_document:
    variant: markdown_github
---
[![Build Status](https://travis-ci.org/bogdanrau/askchisne.svg?branch=master)](https://travis-ci.org/bogdanrau/askchisne)

askchisne
=========

An R package that connects to the [AskCHIS NE](http://askchisne.ucla.edu) API and retrieves data into R data frames.
NOTE: this package does not come with any warranties.

Installation
============
Install from CRAN: COMING SOON!

Install from GitHub:
```r
devtools::install_github("bogdanrau/askchisne")
```

Usage
=====
Before using this package, you must obtain an API key from the [California Health Interview Survey (CHIS)](http://chis.ucla.edu).
You will use the API key in all functions to connect to the API and get data.

Base Functions
--------------
The following functions can be used to query the AskCHIS NE API:

Function                  | Description
--------------------------|-----------------------------------------------------
`getMetadata()`           | Obtains all of the metadata from AskCHIS NE.
`geoSearch()`             | Returns available locations for a search term.
`getEstimate()`           | Returns estimate and attributes for one or more locations in the database.
`poolEstimate()`          | Pools estimates for two or more locations.

***

> `getMetadata()` function obtains all of the metadata available in AskCHIS NE. This function has only
one simple call:
```r
getMetadata(apiKey = '<YOUR API KEY>')
```

The resulting data frame will contain:

* `name`: the indicator name (required in the `getEstimate()` function).

* `label`: the indicator label.

* `ageGroup`: the age group for that specific indicator.

* `year`: the data year for the indicator.

* `responseLabel`: the response label (can be used if developing Shiny apps).

* `description`: a description of the indicator and link to additional resources if non-CHIS.

***

> `geoSearch()` function searches the API for all available geographic locations matching the search string.
The function requires a search string and the API key:
```r
geoSearch(search = 'YOUR SEARCH TERM', apiKey = '<YOUR API KEY>')
```

The resulting data frame will contain:

* `geoId`: the geoId for each resulting location. This geoId will be required when searching for estimates.

* `name`: the location name.

* `geoType`: the type of location. This can be a zip code (ZCTA), a city (CITIES), a county (COUNTIES,
a legislative district (ASSEMBLY, CONGRESS, SENATE), or the state (STATE).

***

> `getEstimate()` function retrieves estimates as well as additional statistical attributes for one or more
requested locations:
```R
getEstimate(indicator = 'INDICATOR NAME', attributes = NULL, geoLevel = NULL, locations = NULL, year = NULL, apiKey = '<YOUR API KEY>')
```

Parameter         | Description
------------------|--------------------
indicator         | The name of the indicator, which can be obtained using the getMetadata function.
attributes        | If not specified, returns all available attributes: estimate, population, SE, CI_LB95, CI_UB95, CV, MSE.
geoLevel          | The level of geography for the query: zcta, cities, counties, assembly, congress, senate, state.
locations         | A comma separated list of geoIds for each location queried (can use the geoSearch() function to obtain those).
year              | The year for the data requested. Years available are accessible through the getMetadata() function. If left null, will return the newest data available.

The resulting data frame will contain:

* `geoId`: the geoIds for all locations returned.

* `geoName`: the name of all of the locations returned.

* `geoTypeId`: the type of location returned (zip code, city, etc.).

* `isSuppressed`: a TRUE/FALSE describing whether the estimate is suppressed.

* `suppressionReason`: description of reason for suppression.

* `population`: the population universe for that specific indicator.

* `estimate`: the estimate for that indicator.

* `SE`: the Standard Error (SE) for that indicator.

* `CI_LB95`: the lower bound of the 95% Confidence Interval.

* `CI_UB95`: the upper bound of the 95% Confidence Interval.

* `CV`: the Coefficient of Variation.

* `MSE`: the Mean Square Error.

* `year`: the specific year for which the data was either collected or created.

* `unit`: the unit of measurement for the estimate.

***

> `poolEstimate()` function combines multiple locations and returns the pooled estimate for those locations.
```r
poolEstimate(indicator = 'INDICATOR NAME', attributes = NULL, locations = 'LIST OF LOCATION geoIds', year = NULL, apyKey = '<YOUR API KEY>')
```

The resulting data frame will contain the same columns as the response from `getEstimate()`. with the difference that `poolEstimate()` returns pooled locations, not individual locations.