Skip to content

Commit

Permalink
Merge pull request #387 from andersonfrailey/zeroweights
Browse files Browse the repository at this point in the history
Fix zero weights issue
  • Loading branch information
andersonfrailey authored Jun 10, 2021
2 parents 75fc0f2 + dc7a93d commit f395298
Show file tree
Hide file tree
Showing 9 changed files with 32 additions and 372 deletions.
3 changes: 2 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,8 @@ puf-files: data/cps-matched-puf.csv \

data/cps-matched-puf.csv: taxdata/puf/finalprep.py \
taxdata/puf/impute_itmexp.py \
taxdata/puf/impute_pencon.py
taxdata/puf/impute_pencon.py\
createpuf.py
python createpuf.py
# Above recipe also makes data/puf.csv

Expand Down
332 changes: 0 additions & 332 deletions Manifest.toml

This file was deleted.

4 changes: 0 additions & 4 deletions Project.toml

This file was deleted.

3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,8 @@ To run the scripts that produce `puf.csv` and `cps.csv.gz`, activate the

`Julia` must also be installed to solve for the PUF and CPS weights. You
can download `Julia` from their [website](https://julialang.org/downloads/)
or by using `homebrew`.
or by using `homebrew`. After installing Julia, you will need to also install
these three packages: `JuMP, Cbc, NPZ`.

Data-Preparation Documentation and Workflow
-------------------------------------------
Expand Down
8 changes: 1 addition & 7 deletions createpuf.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,6 @@ def dataprep(data):
print("Prepping PUF")
puf2011 = pd.read_csv(Path(DATA_PATH, "puf2011.csv"))
raw_puf = puf.preppuf(puf2011, PUF_YEAR)
# raw_puf.to_csv(Path(DATA_PATH, "raw_puf.csv"), index=False)

# rename CPS file to match PUF
print("Prepping CPS")
Expand All @@ -118,12 +117,6 @@ def dataprep(data):
raw_cps["e19800"] = raw_cps["charitable"] * cash
raw_cps["e20100"] = raw_cps["charitable"] * non_cash

# cap number of dependents in CPS to line up with PUF
# raw_cps["depne"] = np.where(
# raw_cps["mars"] == 2,
# np.minimum(5, raw_cps["depne"]),
# np.minimum(3, raw_cps["depne"]),
# )
raw_cps = dataprep(raw_cps)
raw_puf = dataprep(raw_puf)
raw_cps["recid"] = range(1, len(raw_cps.index) + 1)
Expand Down Expand Up @@ -166,6 +159,7 @@ def dataprep(data):
data.drop(list(data.filter(regex=".*_cps")), axis=1, inplace=True)
# add back non-filers
print("Adding non-filers")
nonfilers.rename(columns={"s006": "matched_weight"}, inplace=True)
data = pd.concat([data, nonfilers], sort=False, ignore_index=True)
data = data.fillna(0.0)
data.reset_index(inplace=True)
Expand Down
Binary file modified puf_stage2/puf_weights.csv.gz
Binary file not shown.
Loading

0 comments on commit f395298

Please sign in to comment.