Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel py3 #43

Open
wants to merge 41 commits into
base: master
Choose a base branch
from
Open

Parallel py3 #43

wants to merge 41 commits into from

Conversation

mxndrwgrdnr
Copy link
Member

  • updated synthpop for python 3
  • updated the census attributes retained in starter2.py
  • updated ipu.py so that rather than erroring out when max_iterations is reached, it generates a warning.
  • added method synthesize_all_in_parallel() to synthesizer.py for processing sub-county geographies in parallel.

Could use an extra set of eyes at L264-265 in synthesizer.py. These two lines account for the fact that when processing geographies in parallel, draw.draw_households() cannot rely on the hh_index_start argument to set the household_id either as the index of the synthetic households table or the hh_id field of the synthetic persons table. Thus we have to adjust the households index and persons hh_id field AFTER all the tables have been generated, which is what's hapenning at L264-265.

@mxndrwgrdnr mxndrwgrdnr requested a review from janowicz April 4, 2018 19:43
@coveralls
Copy link

coveralls commented Apr 4, 2018

Coverage Status

Coverage decreased (-1.2%) to 81.643% when pulling c180116 on parallel_py3 into d1de14a on master.

@janowicz
Copy link
Contributor

Thanks very much for this great contribution @mxndrwgrdnr! We'll review this soon, sorry for the delay. There is an existing earlier open PR that was reveiewed, which we kept open for a bit in the case anyone from the community wanted to review, that we'll merge in first. And then we can help reconcile with this PR.

self.p_pums_cols = ('serialno', 'PUMA00', 'PUMA10', 'RELP', 'AGEP',
'ESR', 'RAC1P', 'HISP', 'SEX')
self.p_pums_cols = ('serialno', 'SPORDER', 'PUMA00', 'PUMA10', 'RELP', 'AGEP',
'ESR', 'SCHL', 'SCH', 'JWTR', 'PERNP', 'WKHP', 'RAC1P', 'HISP', 'SEX')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you adding these variables if they are not used in the synthesis?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They will be.

@mxndrwgrdnr
Copy link
Member Author

@cvanoli @janowicz this should be good to go now folks.

@janowicz
Copy link
Contributor

Awesome, thanks @mxndrwgrdnr! I'll check it out today

@cvanoli
Copy link
Collaborator

cvanoli commented Jun 28, 2018

thanks !! @mxndrwgrdnr

Copy link
Collaborator

@cvanoli cvanoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me @mxndrwgrdnr

@@ -259,9 +260,12 @@ def household_weights(
iterations += 1

if iterations > max_iterations:
raise RuntimeError(
warnings.warn(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a conv with @janowicz we discussed that it would be better if this ignoring of errors was made optional. Could we make an ignore parameter, where, if set to True, would raise warnings instead of errors?

@@ -169,7 +169,7 @@ def test_household_weights(
def test_household_weights_max_iter(
household_freqs, person_freqs, household_constraints,
person_constraints):
with pytest.raises(RuntimeError):
with pytest.warns(UserWarning):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem comment to ipu.py line263.

@cvanoli
Copy link
Collaborator

cvanoli commented Jul 3, 2018

I tested the branch with a local environment in my computer and two Future Warnings keep coming up:
1 -synthpop\ipu\ipu.py: Line191

RuntimeWarning: divide by zero encountered in double_scalars
adj = constraint / float((column * weights).sum())

2 - synthpop\census_helpers.py: Line 180:

FutureWarning: Sorting because non-concatenation axis is not aligned. A future version of pandas will change to not sort by default. To accept the future behavior, pass 'sort=False'. To retain the current behavior and silence the warning, pass 'sort=True'.
pums = pd.concat([pums, pums00], ignore_index=True)

@mxndrwgrdnr
Copy link
Member Author

@janowicz the second FutureWarning that @cvanoli mentions isn't a big deal and should resolve itself in the future. The first FutureWarning, however, is happening during IPU when household weights that are very nearly zero are not close enough to zero to be caught by ipu._drop_zeros but close enough to trigger a divide by zero RuntimeWarning. It's just a warning and not an error so I don't think it is effecting performance but I wanted to bring it to attention and see if you had an opinion.

@msoltadeo msoltadeo mentioned this pull request Nov 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants