Skip to content

Latest commit

 

History

History
72 lines (63 loc) · 2.59 KB

README.md

File metadata and controls

72 lines (63 loc) · 2.59 KB

census-pandas

For now, idle exploration around making it easier to use the pandas library to analyze Census data.

Background

A while ago, Hunter Owens asked if we knew about anyone using the pandas data analysis package with the Census Reporter API. I whipped up some example code in a gist and went on with things.

Recently I started fooling around with it a little more and decided to put it on Github in case anyone else was interested. For a brief moment I considered trying to port Ezra Glenn's acs.R package, but I quickly realized that that is an enormous accomplishment and honestly, I don't do enough data analysis on a routine basis to be motivated.

For now, it uses the Census Reporter API for data, but it might make sense to use the official Census API, since right now CR only has one year worth of data.

Usage

For now, there's really one method, get_dataframe. Here's how it works:

get_dataframe(tables='B01003',geoids='040|01000US',col_names=True,geo_names=True,include_moe=True)
df.head()
name Total B01003001_moe
04000US01 Alabama 4833722 0
04000US02 Alaska 735132 0
04000US04 Arizona 6626624 0
04000US05 Arkansas 2959373 0
04000US06 California 38332521 0

As the syntax suggests, you can pass multiple tables: you really should use an array in that case, but if you pass a string, it adapts.

The same goes for geoids: pass a string or an array of strings. As the example demonstrates, you can select a group of related geographies using Census Reporter's syntax of sumlev|container-geoid.

Contributing

I'm open to input and pull requests. Who knows where this will go.