Slack: #p-india-nfhs
Project Description: This D4D project aims to glean insights from India's National Family Health Survey datasets. Perhaps generally around women's empowerment issues.
Project Co-Leads / Maintainers: @lukewolcott / (more needed!)
Data: https://data.world/data4democracy/india-nfhs, and in this repo.
Getting started | Background | Questions to pursue | First steps
-
We welcome contributions from first timers!
-
Browse our help wanted issues or First Steps below. See if there is anything that interests you.
-
Core maintainers and project leads are responsible for reviewing and merging all pull requests. Need to practice working with github in a group setting? Checkout github-playground.
-
Updates to documentation or readme are greatly appreciated and make for a great first PR.
The NFHS is conducted about every 10 years. The 4th one, NFHS-4, was conducted 2015-2016 and the full datasets will be available "later this year". At the moment, state-level and district-level data are available in PDF factsheets. Debarghya Das has written scripts to scrape the district-level PDFs and pull the data into one place. This CSV file is available in our repo as nfhs-district-level.csv
.
The NFHS-3, conducted 2005-2006, is available in full from the Demographic and Health Surveys (DHS) program. So are the two earler NFHSs. The datasets come from three questionnaires: female, male, and household. One must register to use this data, so we can't put it up on a public repo. Instead, it lives in a private dataset on the D4D data.world site. You are encouraged to contact @lukewolcott through Slack to get access to this dataset.
-
The Hindustan Times created a "Women's Empowerment Index" from questions 101--108 in the state-level data, and has done some analysis with it. Is this an accurate index, or could it be improved? What trends do we see? How does this index correlate with other variables (like alcohol consumption)?
-
What other trends and patterns do we see in the state-level or district-level NFHS-4 data?
-
The NFHS-3 (from 2005-2006) household questionnaire gathers hundreds of variables on 100k households, including some health data. Can we use these household characteristics to predict if the household head is male or female? Or if the household is likely to have clean water? Can we predict if the household is at risk of alcoholism? How do household characteristics cluster the data?
-
Within a few months, the full NFHS-4 (from 2015-2016) will become available. Can we generate interesting models and questions, so that when it drops we can easily investigate the ten years of change?
-
Can we connect the NFHS-3 women's and men's health questionnaire data with World Bank data?
-
What would you like to know?
-
(Peruse the Slack channel for other suggestions, too...)
-
Download Debarghya Das's scrape of the district-level data,
nfhs-district-level.csv
, and dive in! He has generated some heatmaps with some of the gender-related variables, and this is a good start for generating new questions to ask the data. Questions 101--108 in the data are related to women's empowerment. -
The
nfhs3-metadata
folder has some metadata for the NFHS-3 household questionnaire. To access the data, contact @lukewolcott and he'll send you an invitation to the data.world site. -
The
nfhs3-analysis
folder has some initial exploratory plots, and a jupyter notebook with some exploratory data analysis.