Change param_code to have multiple-statement (rather than single-expression) format #1107

martinholmer · 2016-12-15T17:05:32Z

This is not so much a pull request as a continuation of the conversation that started with issue #429 and continued with merged pull request #1081 and WIP pull request #1103. This extended conversation concerns the technical possibility (and the various pros and cons) of adding logic that handles policy parameters that are expressed as code rather than as numbers.

#1081 added the ability to handle very simple situations in which the code could be an single expression (that is, what is on the right-hand side of an assignment statement). This ability was added using the Python eval function and by eliminating the availability of __builtins__ and by rejecting code that contains high-security-risk strings (lambda and __, for example).

#1103 was more ambitious in that it tried to use the capabilities introduced in #1081 to compute with code a wide range of new refundable child tax credits. This more complex problem was not able to be handled with the single expression capability introduced in #1081.

This WIP pull request #1107 extends #1103 so that a wide range of new refundable child tax credits can be computed with the new code parameter introduced by #1103. The new refundable child tax credit example introduces several new issues: (a) the need to execute not just a single expression, but rather a series of statements, and (b) the need to handle policy parameters that are inflation indexed. The revisions in this pull request solve both of these new issues. The solution involves
using the Python exec function and continuing to eliminate the availability of __builtins__ and to reject code that contains high-security-risk strings (lambda and __, for example). The Policy class has been enhanced to provide information needed to index policy parameters.

This pull request is meant for discussion. There are many questions that need to be discussed including the readability of the parameter code in the sample script below and the security risks involved in allowing multiple statements (rather than just a single expression) in the parameter code.

Below is a script that is similar to the notebook discussed in #1103. It shows that the new parameter code feature produces exactly the same results as the (still present) numerical parameter characterization of two reforms that introduce new refundable child tax credits.

@MattHJensen @feenberg @talumbau @zrisher @Amy-Xu
@GoFroggyRun @andersonfrailey @codykallen

Here is a command-line session that runs only under this pull request:

$ cp ../tax-calculator-data/puf.csv .

$ cat param-code-exec.py
from taxcalc import *

def print_results(desc, calc):
    record_columns = ['s006', '_combined']
    out = [getattr(calc.records, col) for col in record_columns]
    dfx = pd.DataFrame(data=np.column_stack(out), columns=record_columns)
    wsum = weighted_sum(dfx, '_combined') * 1.0e-9
    print "{}:COMBINED_TAXES($b)={:9.3f}".format(desc, wsum)

pol1 = Policy()
reform1 = {
    2016: {'_CTC_new_rt': [1],
           '_CTC_new_c_under5_bonus': [1000],
           '_CTC_new_c': [1000],
           '_CTC_new_prt': [0.00]}
}
pol1.implement_reform(reform1)
calc1 = Calculator(policy=pol1, records=Records(), verbose=False)
calc1.advance_to_year(2020)
calc1.calc_all()
print_results('CALC1', calc1)

pol2 = Policy()
reform2 = {
    2016: {'_new_refundable_credit_code_active': [True]},
    0: {'new_refundable_credit_code': """
posagi = max(0, c00100)
credit = where(n24>0,
               min(posagi, 1000*n24+1000*nu05),
               0)
returned_value = credit
"""}
}
pol2.implement_reform(reform2)
calc2 = Calculator(policy=pol2, records=Records(), verbose=False)
calc2.advance_to_year(2020)
calc2.calc_all()
print_results('CALC2', calc2)

pol3 = Policy()
reform3 = {
    2016: {'_CTC_new_rt': [1],
           '_CTC_new_c_under5_bonus': [1000],
           '_CTC_new_c': [1000],
           '_CTC_new_prt': [0.05],
           '_CTC_new_ps': [[75000, 110000, 55000, 75000, 75000, 55000]]}
}
pol3.implement_reform(reform3)
calc3 = Calculator(policy=pol3, records=Records(), verbose=False)
calc3.advance_to_year(2016)
# print calc3.policy.CTC_new_ps
calc3.advance_to_year(2020)
# print calc3.policy.CTC_new_ps
calc3.calc_all()
print_results('CALC3', calc3)

pol4 = Policy()
reform4 = {
    2016: {'_new_refundable_credit_code_active': [True]},
    0: {'new_refundable_credit_code': """
posagi = max(0, c00100)
credit = where(n24>0,
               min(posagi, 1000*n24+1000*nu05),
               0)
ymax = where(equal(MARS,1),  75000*cpi,
       where(equal(MARS,2), 110000*cpi,
       where(equal(MARS,3),  55000*cpi,
       where(equal(MARS,4),  75000*cpi,
       where(equal(MARS,5),  75000*cpi,
       where(equal(MARS,6),  55000*cpi, 0))))))
credit = where(posagi>ymax,
               max(0, credit-0.05*(posagi-ymax)),
               credit)
returned_value = credit
"""}
}
pol4.implement_reform(reform4)
calc4 = Calculator(policy=pol4, records=Records(), verbose=False)
calc4.advance_to_year(2020)
calc4.calc_all()
print_results('CALC4', calc4)

$ python param-code-exec.py
CALC1:COMBINED_TAXES($b)= 2837.004
CALC2:COMBINED_TAXES($b)= 2837.004
CALC3:COMBINED_TAXES($b)= 2853.539
CALC4:COMBINED_TAXES($b)= 2853.539

codecov-io · 2016-12-15T17:11:32Z

Current coverage is 98.87% (diff: 100%)

Merging #1107 into master will increase coverage by 0.01%

@@             master      #1107   diff @@
==========================================
  Files            38         38          
  Lines          2988       3020    +32   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           2954       2986    +32   
  Misses           34         34          
  Partials          0          0

Powered by Codecov. Last update 56d75bf...d0b1b48

zrisher · 2016-12-22T18:14:18Z

Regarding sandboxing and security, this follows the line of progression we expected in #429:

we provided a small, very simple execution surface that meets some basic needs but is relatively easy to secure
we find ourselves needing more execution control
the sandbox gets bigger, more complex, and less secure

That said, the security discussion in #429 highlighted the fact that it's actually ok to have this vulnerability, because code parameters are not usuable via Taxbrain yet, and before they are we'll implement some sort of docker-based VM-level sandbox.

So I'm 👍 on merging this and seeing what users come up with.

In the long term, I think editing the particular calculation lines in place and comparing diffs is a simpler and more portable way of modifying tax code (as opposed to passing code through params). I'm working on a suggested method of doing that and will share when it's ready.

martinholmer · 2016-12-22T19:07:44Z

@zrisher said:

So I'm 👍 on merging this [i.e., pull request #1107] and seeing what users come up with.

OK. I'll begin working on a full-blown pull request in early January unless there are concerns from others on the development team.

@zrisher continued:

In the long term, I think editing the particular calculation lines in place and comparing diffs is a simpler and more portable way of modifying tax code (as opposed to passing code through params). I'm working on a suggested method of doing that and will share when it's ready.

I have little doubt that we can do better than the approach in #1107, so we're all looking forward to your "suggested method". Remember that a major goal of all this work is to provide a "way of modifying tax code" for those who are using only TaxBrain. Everyone is comfortable with Tax-Calculator users modifying the Python source code to do what they want. We just hope that some of them will submit these Python code changes (that handle new kinds of reforms) as pull requests.

@MattHJensen @feenberg @talumbau @Amy-Xu @GoFroggyRun @andersonfrailey @codykallen

martinholmer · 2017-01-01T20:02:49Z

@talumbau, I was under the impression that after pull request #1111 was merged into the master branch that the appveyor tests would work. Given that understanding, I'm puzzled about these test results for pull request #1107, which conclude with the following lines:

(taxcalc-dev) C:\projects\tax-calculator>set PYTEST=py.test --capture=sys 
(taxcalc-dev) C:\projects\tax-calculator>py.test --capture=sys -v -m "not requires_pufcsv" --pep8 
'py.test' is not recognized as an internal or external command,
operable program or batch file.
Command exited with code 1

@MattHJensen @zrisher

talumbau · 2017-01-01T20:21:40Z

we need to merge #1114 which updates conda to 4.2.13 to deal with the yaml file error. @zrisher has a link the original bug report in his PR.

martinholmer · 2017-01-03T19:04:32Z

Pull request #1107 has now been converted into a regular pull request for merging into master.

I expect a lengthy review of this pull request while its merits are discussed and as we work on resolving issue #1119.

Here is a test that shows that the new multiple-statement param_code format can simulate a complex new refundable child tax credit and produce exactly the same results as when using the pre-existing numeric policy parameters (which remain part of Tax-Calculator).

$ cat param-nocode.json
{ // JSON reform file using numeric CTC_new_* parameters
"policy": {
    // Tax LTCG+QDIV like other income and add CG+QDIV exclusion
    "_CG_nodiff": {"2017": [true]},
    "_CG_ec": {"2017": [10000]},
    "_CG_reinvest_ec_rt": {"2017": [0.50]},

    // Replace CTC and ACTC with larger refundable CTC
    "_CTC_c": {"2017": [0]}, // replace current CTC/ACTC beginning in 2017
    "_CTC_new_rt": {"2017": [0.124]}, // phase-in rate
    "_CTC_new_c": {"2017": [3000]},   // maximum credit per kid
    "_CTC_new_ps": {"2017":           // phase-out start AGI
                    // indexed by filing-unit status, MARS
                    [[175000, 350000, 175000, 175000, 175000, 175000]]},
    "_CTC_new_prt": {"2017": [0.10]}, // phase-out rate
    "_CTC_new_refund_limited": {"2017": [true]},
    "_CTC_new_refund_limit_payroll_rt": {"2017": [1.0]}
           // sets refund limit equal to OASDI payroll taxes paid
},
"behavior": {}, "consumption": {}, "growth": {}
}

$ cat param-code.json 
{ // JSON reform file using CTC_new_* code parameters
"policy": {
    // Replace CTC and ACTC with larger refundable CTC
    "_CTC_c": {"2017": [0]}, // replace current CTC/ACTC beginning in 2017
    "_CTC_new_code_active": {"2017": [true]},

    // Tax LTCG+QDIV like other income and add CG+QDIV exclusion
    "_CG_nodiff": {"2017": [true]},
    "_CG_ec": {"2017": [10000]},
    "_CG_reinvest_ec_rt": {"2017": [0.50]},
    "_ALD_InvInc_ec_base_code_active": {"2017": [true]},

    "param_code": { // all the parameter code must go in one place
"CTC_new_code":
||
posagi = max(0, c00100)
// basic credit is $3000 per kid but no more that 12.4% of positive AGI
credit = where(n24>0,
               min(0.124*posagi, 3000*n24),
               0)
// basic credit is phased out above MARS- and inflation-indexed AGI levels
ymax = where(equal(MARS,1), 175000*cpi,
       where(equal(MARS,2), 350000*cpi,
       where(equal(MARS,3), 175000*cpi,
       where(equal(MARS,4), 175000*cpi,
       where(equal(MARS,5), 175000*cpi,
       where(equal(MARS,6), 175000*cpi, 0))))))
credit = where(posagi>ymax,
               max(0, credit-0.10*(posagi-ymax)),  // 10% phase-out rate
               credit)
refund = max(0, credit - c09200)
limited = max(0, refund - ptax_oasdi)  // refund limited to OASDI payroll taxes
returned_value = max(0, credit - limited)
||
,
"ALD_InvInc_ec_base_code":
||
returned_value = (e00300 + e00600 +
                  max(-3000/_sep, p23250 + p22250) +
                  e01100 + e01200)
||
} // end of "param_code"
}, // end of "policy"
"behavior": {}, "consumption": {}, "growth": {}
}

$ git checkout pr-1103-alt

$ cp ../tax-calculator-data/puf.csv .

$ python inctax.py puf.csv 2020 --blowup --weights --reform param-nocode.json 
You loaded data for 2009.
Your data have been extrapolated to 2020.

$ awk '{r+=$4*$29}END{print r}' puf-20.out-inctax-param-nocode
1.52662e+12

$ python inctax.py puf.csv 2020 --blowup --weights --reform param-code.json 
You loaded data for 2009.
Your data have been extrapolated to 2020.

$ awk '{r+=$4*$29}END{print r}' puf-20.out-inctax-param-code
1.52662e+12

@MattHJensen @feenberg @talumbau @Amy-Xu @GoFroggyRun @andersonfrailey @zrisher @codykallen

martinholmer · 2017-01-03T19:12:29Z

@talumbau, Here is a list of the renamed and new current_law_policy.jsonparameters in pull request #1107.

REN  _ALD_Investment_ec_rt  -->  _ALD_InvInc_ec_rt
REN  _ALD_Investment_ec_base_code_active  -->  _ALD_InvInc_ec_base_code_active
NEW  _CTC_new_code_active

martinholmer · 2017-01-13T11:38:28Z

Pull request #1107 has been ready to merge into the master branch for ten days. Unless I hear any concerns, it will be merged at the end of the work day on Friday, January 13th.

@MattHJensen @feenberg @Amy-Xu @andersonfrailey @GoFroggyRun @zrisher @codykallen

talumbau · 2017-01-13T20:27:15Z

taxcalc/policy.py

+    def cpi(self, param_code_name):
+        """
+        Return inflation index for current_year that has
+        a value of one in first year param_code is active.


should this be "a value of one if first year param_code is active"?

No but I expanded the docstring to give a more complete explanation.

talumbau · 2017-01-13T20:28:36Z

taxcalc/policy.py

@@ -305,6 +312,26 @@ def scan_param_code(code):
            msg += code
            raise ValueError(msg)

+    def cpi(self, param_code_name):


I think it would be good to add a unit test for this function. Also, it seems to me that the use case is specific enough that a more descriptive name should be used.

Added tests and expanded function name for clarity.

martinholmer · 2017-01-14T03:17:54Z

The merge of #1107 renders #1103 obsolete.

MattHJensen and others added 5 commits December 12, 2016 09:36

new_refundable_credit_code_function

bb27bb6

Switch from eval() to exec() in new_refundable_credit_code_function.

b61eab1

Add np.equal function to visible.

05fc3f0

Merge branch 'master' into pr-1103-alt

96c218f

Add Policy cpi method and use in functions.py file.

98544f9

talumbau added the in progress label Dec 15, 2016

Revise Policy cpi method to use assert rather than raise.

ec49b73

MattHJensen self-requested a review December 16, 2016 18:49

martinholmer added 15 commits December 23, 2016 20:26

Add &&<param-code>&& to JSON reform file syntax.

3b7a6d5

Revise ALD_Investment_ec_base_code to be statement not expression.

6131848

Merge in recent master changes.

bc9b7d2

Remove duplicate entry in current_law_policy.json file.

47e6ea4

Rename parameters related to new refundable CTC code.

0b97087

Change && to || in JSON reform files.

054f717

Make re.sub pattern matching be non-greedy.

bf44afb

Rename/reorganize investment income exclusion base logic.

6502d8d

Rename ALD_invinc as ALD_InvInc.

e0949f2

Move logic from IITAX to CTC_new_nocode function.

bdd7ce5

Add comments to test_calculate.py param_code section.

4c45322

Update taxcalc/reforms documentation.

d7e221b

Eliminate trailing whitespace in test_calculate.py file.

dffbc6c

Streamline test_reforms.py logic.

b44d551

Merge branch 'master' into pr-1103-alt

19aeb39

martinholmer added 2 commits January 1, 2017 16:43

Merge branch 'master' into pr-1103-alt

4274dfe

Merge branch 'master' into pr-1103-alt

ae2d0d1

martinholmer added 2 commits January 2, 2017 14:06

Merge branch 'master' into pr-1103-alt

8cf50c0

Update test_dropq.py for ALD_InvInc_ renames.

d1d27a6

martinholmer changed the title ~~[WIP] Follow-on to #429, #1081 and #1103~~ Change param_code to have multiple-statement (rather than one-expression) format Jan 3, 2017

martinholmer removed the in progress label Jan 3, 2017

martinholmer changed the title ~~Change param_code to have multiple-statement (rather than one-expression) format~~ Change param_code to have multiple-statement (rather than single-expression) format Jan 4, 2017

martinholmer added 2 commits January 9, 2017 18:34

Update var name in taxcalc/taxbrain files.

3fc3609

Merge branch 'master' into pr-1103-alt

fa71bf3

talumbau reviewed Jan 13, 2017

View reviewed changes

martinholmer added 2 commits January 13, 2017 21:24

Expand error checking in Policy.cpi_for_param_code(); add tests.

6a3aec0

Add test and remove Policy code that can never be reached.

d0b1b48

martinholmer merged commit 1cf5e0d into PSLmodels:master Jan 14, 2017

martinholmer deleted the pr-1103-alt branch January 14, 2017 19:25

martinholmer mentioned this pull request Mar 7, 2017

Remove all logic and tests related to param_code capability #1221

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change param_code to have multiple-statement (rather than single-expression) format #1107

Change param_code to have multiple-statement (rather than single-expression) format #1107

martinholmer commented Dec 15, 2016 •

edited

Loading

codecov-io commented Dec 15, 2016 •

edited

Loading

zrisher commented Dec 22, 2016

martinholmer commented Dec 22, 2016

martinholmer commented Jan 1, 2017

talumbau commented Jan 1, 2017

martinholmer commented Jan 3, 2017

martinholmer commented Jan 3, 2017 •

edited

Loading

martinholmer commented Jan 13, 2017

talumbau Jan 13, 2017

martinholmer Jan 14, 2017

talumbau Jan 13, 2017

martinholmer Jan 14, 2017

martinholmer commented Jan 14, 2017

Change param_code to have multiple-statement (rather than single-expression) format #1107

Change param_code to have multiple-statement (rather than single-expression) format #1107

Conversation

martinholmer commented Dec 15, 2016 • edited Loading

codecov-io commented Dec 15, 2016 • edited Loading

Current coverage is 98.87% (diff: 100%)

zrisher commented Dec 22, 2016

martinholmer commented Dec 22, 2016

martinholmer commented Jan 1, 2017

talumbau commented Jan 1, 2017

martinholmer commented Jan 3, 2017

martinholmer commented Jan 3, 2017 • edited Loading

martinholmer commented Jan 13, 2017

talumbau Jan 13, 2017

Choose a reason for hiding this comment

martinholmer Jan 14, 2017

Choose a reason for hiding this comment

talumbau Jan 13, 2017

Choose a reason for hiding this comment

martinholmer Jan 14, 2017

Choose a reason for hiding this comment

martinholmer commented Jan 14, 2017

martinholmer commented Dec 15, 2016 •

edited

Loading

codecov-io commented Dec 15, 2016 •

edited

Loading

martinholmer commented Jan 3, 2017 •

edited

Loading