Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change param_code to have multiple-statement (rather than single-expression) format #1107

Merged
merged 29 commits into from
Jan 14, 2017
Merged

Conversation

martinholmer
Copy link
Collaborator

@martinholmer martinholmer commented Dec 15, 2016

This is not so much a pull request as a continuation of the conversation that started with issue #429 and continued with merged pull request #1081 and WIP pull request #1103. This extended conversation concerns the technical possibility (and the various pros and cons) of adding logic that handles policy parameters that are expressed as code rather than as numbers.

#1081 added the ability to handle very simple situations in which the code could be an single expression (that is, what is on the right-hand side of an assignment statement). This ability was added using the Python eval function and by eliminating the availability of __builtins__ and by rejecting code that contains high-security-risk strings (lambda and __, for example).

#1103 was more ambitious in that it tried to use the capabilities introduced in #1081 to compute with code a wide range of new refundable child tax credits. This more complex problem was not able to be handled with the single expression capability introduced in #1081.

This WIP pull request #1107 extends #1103 so that a wide range of new refundable child tax credits can be computed with the new code parameter introduced by #1103. The new refundable child tax credit example introduces several new issues: (a) the need to execute not just a single expression, but rather a series of statements, and (b) the need to handle policy parameters that are inflation indexed. The revisions in this pull request solve both of these new issues. The solution involves
using the Python exec function and continuing to eliminate the availability of __builtins__ and to reject code that contains high-security-risk strings (lambda and __, for example). The Policy class has been enhanced to provide information needed to index policy parameters.

This pull request is meant for discussion. There are many questions that need to be discussed including the readability of the parameter code in the sample script below and the security risks involved in allowing multiple statements (rather than just a single expression) in the parameter code.

Below is a script that is similar to the notebook discussed in #1103. It shows that the new parameter code feature produces exactly the same results as the (still present) numerical parameter characterization of two reforms that introduce new refundable child tax credits.

@MattHJensen @feenberg @talumbau @zrisher @Amy-Xu
@GoFroggyRun @andersonfrailey @codykallen

Here is a command-line session that runs only under this pull request:

$ cp ../tax-calculator-data/puf.csv .

$ cat param-code-exec.py
from taxcalc import *

def print_results(desc, calc):
    record_columns = ['s006', '_combined']
    out = [getattr(calc.records, col) for col in record_columns]
    dfx = pd.DataFrame(data=np.column_stack(out), columns=record_columns)
    wsum = weighted_sum(dfx, '_combined') * 1.0e-9
    print "{}:COMBINED_TAXES($b)={:9.3f}".format(desc, wsum)

pol1 = Policy()
reform1 = {
    2016: {'_CTC_new_rt': [1],
           '_CTC_new_c_under5_bonus': [1000],
           '_CTC_new_c': [1000],
           '_CTC_new_prt': [0.00]}
}
pol1.implement_reform(reform1)
calc1 = Calculator(policy=pol1, records=Records(), verbose=False)
calc1.advance_to_year(2020)
calc1.calc_all()
print_results('CALC1', calc1)

pol2 = Policy()
reform2 = {
    2016: {'_new_refundable_credit_code_active': [True]},
    0: {'new_refundable_credit_code': """
posagi = max(0, c00100)
credit = where(n24>0,
               min(posagi, 1000*n24+1000*nu05),
               0)
returned_value = credit
"""}
}
pol2.implement_reform(reform2)
calc2 = Calculator(policy=pol2, records=Records(), verbose=False)
calc2.advance_to_year(2020)
calc2.calc_all()
print_results('CALC2', calc2)

pol3 = Policy()
reform3 = {
    2016: {'_CTC_new_rt': [1],
           '_CTC_new_c_under5_bonus': [1000],
           '_CTC_new_c': [1000],
           '_CTC_new_prt': [0.05],
           '_CTC_new_ps': [[75000, 110000, 55000, 75000, 75000, 55000]]}
}
pol3.implement_reform(reform3)
calc3 = Calculator(policy=pol3, records=Records(), verbose=False)
calc3.advance_to_year(2016)
# print calc3.policy.CTC_new_ps
calc3.advance_to_year(2020)
# print calc3.policy.CTC_new_ps
calc3.calc_all()
print_results('CALC3', calc3)

pol4 = Policy()
reform4 = {
    2016: {'_new_refundable_credit_code_active': [True]},
    0: {'new_refundable_credit_code': """
posagi = max(0, c00100)
credit = where(n24>0,
               min(posagi, 1000*n24+1000*nu05),
               0)
ymax = where(equal(MARS,1),  75000*cpi,
       where(equal(MARS,2), 110000*cpi,
       where(equal(MARS,3),  55000*cpi,
       where(equal(MARS,4),  75000*cpi,
       where(equal(MARS,5),  75000*cpi,
       where(equal(MARS,6),  55000*cpi, 0))))))
credit = where(posagi>ymax,
               max(0, credit-0.05*(posagi-ymax)),
               credit)
returned_value = credit
"""}
}
pol4.implement_reform(reform4)
calc4 = Calculator(policy=pol4, records=Records(), verbose=False)
calc4.advance_to_year(2020)
calc4.calc_all()
print_results('CALC4', calc4)

$ python param-code-exec.py
CALC1:COMBINED_TAXES($b)= 2837.004
CALC2:COMBINED_TAXES($b)= 2837.004
CALC3:COMBINED_TAXES($b)= 2853.539
CALC4:COMBINED_TAXES($b)= 2853.539

@codecov-io
Copy link

codecov-io commented Dec 15, 2016

Current coverage is 98.87% (diff: 100%)

Merging #1107 into master will increase coverage by 0.01%

@@             master      #1107   diff @@
==========================================
  Files            38         38          
  Lines          2988       3020    +32   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           2954       2986    +32   
  Misses           34         34          
  Partials          0          0          

Powered by Codecov. Last update 56d75bf...d0b1b48

@MattHJensen MattHJensen self-requested a review December 16, 2016 18:49
@zrisher
Copy link
Contributor

zrisher commented Dec 22, 2016

Regarding sandboxing and security, this follows the line of progression we expected in #429:

  • we provided a small, very simple execution surface that meets some basic needs but is relatively easy to secure
  • we find ourselves needing more execution control
  • the sandbox gets bigger, more complex, and less secure

That said, the security discussion in #429 highlighted the fact that it's actually ok to have this vulnerability, because code parameters are not usuable via Taxbrain yet, and before they are we'll implement some sort of docker-based VM-level sandbox.

So I'm 👍 on merging this and seeing what users come up with.

In the long term, I think editing the particular calculation lines in place and comparing diffs is a simpler and more portable way of modifying tax code (as opposed to passing code through params). I'm working on a suggested method of doing that and will share when it's ready.

@martinholmer
Copy link
Collaborator Author

@zrisher said:

So I'm 👍 on merging this [i.e., pull request #1107] and seeing what users come up with.

OK. I'll begin working on a full-blown pull request in early January unless there are concerns from others on the development team.

@zrisher continued:

In the long term, I think editing the particular calculation lines in place and comparing diffs is a simpler and more portable way of modifying tax code (as opposed to passing code through params). I'm working on a suggested method of doing that and will share when it's ready.

I have little doubt that we can do better than the approach in #1107, so we're all looking forward to your "suggested method". Remember that a major goal of all this work is to provide a "way of modifying tax code" for those who are using only TaxBrain. Everyone is comfortable with Tax-Calculator users modifying the Python source code to do what they want. We just hope that some of them will submit these Python code changes (that handle new kinds of reforms) as pull requests.

@MattHJensen @feenberg @talumbau @Amy-Xu @GoFroggyRun @andersonfrailey @codykallen

@martinholmer
Copy link
Collaborator Author

@talumbau, I was under the impression that after pull request #1111 was merged into the master branch that the appveyor tests would work. Given that understanding, I'm puzzled about these test results for pull request #1107, which conclude with the following lines:

(taxcalc-dev) C:\projects\tax-calculator>set PYTEST=py.test --capture=sys 
(taxcalc-dev) C:\projects\tax-calculator>py.test --capture=sys -v -m "not requires_pufcsv" --pep8 
'py.test' is not recognized as an internal or external command,
operable program or batch file.
Command exited with code 1

@MattHJensen @zrisher

@talumbau
Copy link
Member

talumbau commented Jan 1, 2017

we need to merge #1114 which updates conda to 4.2.13 to deal with the yaml file error. @zrisher has a link the original bug report in his PR.

@martinholmer martinholmer changed the title [WIP] Follow-on to #429, #1081 and #1103 Change param_code to have multiple-statement (rather than one-expression) format Jan 3, 2017
@martinholmer
Copy link
Collaborator Author

Pull request #1107 has now been converted into a regular pull request for merging into master.

I expect a lengthy review of this pull request while its merits are discussed and as we work on resolving issue #1119.

Here is a test that shows that the new multiple-statement param_code format can simulate a complex new refundable child tax credit and produce exactly the same results as when using the pre-existing numeric policy parameters (which remain part of Tax-Calculator).

$ cat param-nocode.json
{ // JSON reform file using numeric CTC_new_* parameters
"policy": {
    // Tax LTCG+QDIV like other income and add CG+QDIV exclusion
    "_CG_nodiff": {"2017": [true]},
    "_CG_ec": {"2017": [10000]},
    "_CG_reinvest_ec_rt": {"2017": [0.50]},

    // Replace CTC and ACTC with larger refundable CTC
    "_CTC_c": {"2017": [0]}, // replace current CTC/ACTC beginning in 2017
    "_CTC_new_rt": {"2017": [0.124]}, // phase-in rate
    "_CTC_new_c": {"2017": [3000]},   // maximum credit per kid
    "_CTC_new_ps": {"2017":           // phase-out start AGI
                    // indexed by filing-unit status, MARS
                    [[175000, 350000, 175000, 175000, 175000, 175000]]},
    "_CTC_new_prt": {"2017": [0.10]}, // phase-out rate
    "_CTC_new_refund_limited": {"2017": [true]},
    "_CTC_new_refund_limit_payroll_rt": {"2017": [1.0]}
           // sets refund limit equal to OASDI payroll taxes paid
},
"behavior": {}, "consumption": {}, "growth": {}
}

$ cat param-code.json 
{ // JSON reform file using CTC_new_* code parameters
"policy": {
    // Replace CTC and ACTC with larger refundable CTC
    "_CTC_c": {"2017": [0]}, // replace current CTC/ACTC beginning in 2017
    "_CTC_new_code_active": {"2017": [true]},

    // Tax LTCG+QDIV like other income and add CG+QDIV exclusion
    "_CG_nodiff": {"2017": [true]},
    "_CG_ec": {"2017": [10000]},
    "_CG_reinvest_ec_rt": {"2017": [0.50]},
    "_ALD_InvInc_ec_base_code_active": {"2017": [true]},

    "param_code": { // all the parameter code must go in one place
"CTC_new_code":
||
posagi = max(0, c00100)
// basic credit is $3000 per kid but no more that 12.4% of positive AGI
credit = where(n24>0,
               min(0.124*posagi, 3000*n24),
               0)
// basic credit is phased out above MARS- and inflation-indexed AGI levels
ymax = where(equal(MARS,1), 175000*cpi,
       where(equal(MARS,2), 350000*cpi,
       where(equal(MARS,3), 175000*cpi,
       where(equal(MARS,4), 175000*cpi,
       where(equal(MARS,5), 175000*cpi,
       where(equal(MARS,6), 175000*cpi, 0))))))
credit = where(posagi>ymax,
               max(0, credit-0.10*(posagi-ymax)),  // 10% phase-out rate
               credit)
refund = max(0, credit - c09200)
limited = max(0, refund - ptax_oasdi)  // refund limited to OASDI payroll taxes
returned_value = max(0, credit - limited)
||
,
"ALD_InvInc_ec_base_code":
||
returned_value = (e00300 + e00600 +
                  max(-3000/_sep, p23250 + p22250) +
                  e01100 + e01200)
||
} // end of "param_code"
}, // end of "policy"
"behavior": {}, "consumption": {}, "growth": {}
}

$ git checkout pr-1103-alt

$ cp ../tax-calculator-data/puf.csv .

$ python inctax.py puf.csv 2020 --blowup --weights --reform param-nocode.json 
You loaded data for 2009.
Your data have been extrapolated to 2020.

$ awk '{r+=$4*$29}END{print r}' puf-20.out-inctax-param-nocode
1.52662e+12

$ python inctax.py puf.csv 2020 --blowup --weights --reform param-code.json 
You loaded data for 2009.
Your data have been extrapolated to 2020.

$ awk '{r+=$4*$29}END{print r}' puf-20.out-inctax-param-code
1.52662e+12

@MattHJensen @feenberg @talumbau @Amy-Xu @GoFroggyRun @andersonfrailey @zrisher @codykallen

@martinholmer
Copy link
Collaborator Author

martinholmer commented Jan 3, 2017

@talumbau, Here is a list of the renamed and new current_law_policy.jsonparameters in pull request #1107.

REN  _ALD_Investment_ec_rt  -->  _ALD_InvInc_ec_rt
REN  _ALD_Investment_ec_base_code_active  -->  _ALD_InvInc_ec_base_code_active
NEW  _CTC_new_code_active

@martinholmer martinholmer changed the title Change param_code to have multiple-statement (rather than one-expression) format Change param_code to have multiple-statement (rather than single-expression) format Jan 4, 2017
@martinholmer
Copy link
Collaborator Author

Pull request #1107 has been ready to merge into the master branch for ten days. Unless I hear any concerns, it will be merged at the end of the work day on Friday, January 13th.

@MattHJensen @feenberg @Amy-Xu @andersonfrailey @GoFroggyRun @zrisher @codykallen

def cpi(self, param_code_name):
"""
Return inflation index for current_year that has
a value of one in first year param_code is active.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be "a value of one if first year param_code is active"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No but I expanded the docstring to give a more complete explanation.

@@ -305,6 +312,26 @@ def scan_param_code(code):
msg += code
raise ValueError(msg)

def cpi(self, param_code_name):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to add a unit test for this function. Also, it seems to me that the use case is specific enough that a more descriptive name should be used.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests and expanded function name for clarity.

@martinholmer martinholmer merged commit 1cf5e0d into PSLmodels:master Jan 14, 2017
@martinholmer
Copy link
Collaborator Author

The merge of #1107 renders #1103 obsolete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants