-
-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to simulate many ALD_Investment_ec_base definitions #1081
Add ability to simulate many ALD_Investment_ec_base definitions #1081
Conversation
Current coverage is 98.79% (diff: 100%)@@ master #1081 diff @@
==========================================
Files 38 38
Lines 2795 2830 +35
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 2761 2796 +35
Misses 34 34
Partials 0 0
|
Here is another example of a Python expression as a policy parameter using the code contained in pull request #1081. This useless example demonstrates how to use an expression to duplicate the default calculation of the investment income exclusion base. This example is interesting in that it shows how numpy array functions (maximum, minimum, and where) can be used in expressions.
So, we get exactly the same results using either the (faster) built-in method (ref-new1.json) or the (slower) expression method (ref-new2.json) of calculating the exclusion base. @MattHJensen @feenberg @Amy-Xu @GoFroggyRun @andersonfrailey @zrisher @codykallen |
Does this suggest an approach to user-written reforms such as requested
in
#429
I realize a single expression isn't perfectly general, and there are
security implications, but it would be a valuable first step.
If necessary, we could require human approval for the code.
dan
…On Wed, 23 Nov 2016, Martin Holmer wrote:
Here is another example of a Python expression as a policy parameter using the code contained in pull
request #1081. This useless example demonstrates how to use an expression to duplicate the default
calculation of the investment income exclusion base. This example is interesting in that it shows how
numpy array functions (maximum, minimum, and where) can be used in expressions.
$ date
Wed Nov 23 18:03:11 EST 2016
$ cd ~/work/OSPC/tax-calculator
$ git checkout invinc-ec-base-code
$ cp ../tax-calculator-data/puf.csv .
$ cat ref-new1.json
{
"_ALD_Investment_ec_rt": {"2015": [0.50]}
}
$ cat ref-new2.json
{
"param_code": {"ALD_Investment_ec_base_code":
"e00300 + e00600 + max(-3000./_sep, p22250+p23250) + e01100 + e01200"},
"_ALD_Investment_ec_base_code_active": {"2015": [true]},
"_ALD_Investment_ec_rt": {"2015": [0.50]}
}
$ python inctax.py puf.csv 2015 --reform ref-new1.json
$ python inctax.py puf.csv 2015 --reform ref-new2.json
$ ls -l puf-15*
-rw-r--r-- 1 mrh staff 37748055 Nov 23 18:05 puf-15.out-inctax-ref-new1
-rw-r--r-- 1 mrh staff 37748055 Nov 23 18:06 puf-15.out-inctax-ref-new2
$ diff puf-15.out-inctax-ref-new1 puf-15.out-inctax-ref-new2
$
So, we get exactly the same results using either the (faster) built-in method (ref-new1.json) or the
(slower) expression method (ref-new2.json) of calculating the exclusion base.
@MattHJensen @feenberg @Amy-Xu @GoFroggyRun @andersonfrailey @zrisher @codykallen
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the
thread.[AHvQVUKjAl3Km-Ace4M11XWhZm6fXDBUks5rBMysgaJpZM4K6-6h.gif]
|
@feenberg commented (after receiving the original comment in the #1081 conversation):
I'm glad you are generally positive about the approach taken in pull request #1081. Let me address your security concerns, which were expressed quite clearly in the discussion of issue #429. The simple answer is that this approach is very secure and cannot have any undesirable side effects. Let me explain the approach in more detail so that you can see why I answered your question the way I did. First, the param_code is not one or more statements (like in your example above), but rather a single expression. An expression in Python is anything that can be on the right-hand side of an assignment statement. Second, the Tax-Calculator variables that can be used in the expression can be limited to variables that are appropriate to the expression. And third, the functions that can be used in the expression can be limited to functions that are appropriate to the expression. So, in pull request #1081, the key code is as follows:
So, in this case only the numpy minimum, maximum, and where (if-like) functions are available in the expression and only variables that are components of investment income are available to the expression writer. The Python @MattHJensen @Amy-Xu @GoFroggyRun @andersonfrailey @zrisher @codykallen |
I was thinking of:
https://xkcd.com/327/
but I see the "eval" function is the key. That seems very effective.
Thanks
dan
…On Tue, 29 Nov 2016, Martin Holmer wrote:
@feenberg commented (after receiving the original comment in the #1081 conversation):
That is great, important progress.
We have to worry about security though. Is there some way to know that the
submitted expression can't have any side-effects? Is python susceptible to
the old
x = y;<bad code>;
trick, where the bad code is hidden behind a semicolon which terminates
the expression? I did figure out that Python compilers don't have a flag
to emit only "pure code" with no side effects, so I think it is up to us.
I'm glad you are generally positive about the approach taken in pull request #1081.
Let me address your security concerns, which were expressed quite clearly in the
discussion of issue #429.
The simple answer is that this approach is very secure and cannot have any undesirable
side effects.
Let me explain the approach in more detail so that you can see why I answered your
question the way I did. First, the param_code is not one or more statements (like in
your example above), but rather a single expression. An expression in Python is
anything that can be on the right-hand side of an assignment statement. Second, the
Tax-Calculator variables that can be used in the expression can be limited to
variables that are appropriate to the expression. And third, the functions that can be
used in the expression can be limited to functions that are appropriate to the
expression. So, in pull request #1081, the key code is as follows:
def ALD_Investment_ec_base_code_function(calc):
"""
Compute investment_ec_base from code
"""
code = calc.policy.param_code['ALD_Investment_ec_base_code']
visible = {'min': np.minimum, 'max': np.maximum, 'where': np.where}
vars = ['e00300', 'e00600', 'e00650', 'e01100', 'e01200',
'p22250', 'p23250', '_sep']
for var in vars:
visible[var] = getattr(calc.records, var)
# pylint: disable=eval-used
calc.records.investment_ec_base = eval(compile(code, '<str>', 'eval'),
{'__builtins__': None}, visible)
So, in this case only the numpy minimum, maximum, and where (if-like) functions are
available in the expression and only variables that are components of investment
income are available to the expression writer. The Python compile function will
generate an error if there are statements (rather than just one right-hand-side
expression), and the Python eval function will generate an error if any function or
variable that is not in the visible directory is used. So, things seem pretty secure
to me. Can you see any security holes in this approach?
@MattHJensen @Amy-Xu @GoFroggyRun @andersonfrailey @zrisher @codykallen
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the
thread.[AHvQVUJ4yCvUmNWdRIaJUOMy3rwbLPxYks5rDKntgaJpZM4K6-6h.gif]
|
@feenberg said:
Agreed. Python has an @MattHJensen @Amy-Xu @GoFroggyRun @andersonfrailey @zrisher @codykallen |
@martinholmer I like where you're going with this. I don't think it's safe for direct use by web users by any means (not sure if that's what you're asking), but it's totally safe for local use. It's making me think about ways we could package general changes to tax calculation logic that would support exposing control in a safe manner online, making use of a user review component as @feenberg smartly suggested. I'm going to mull it over and will get back with more commentary later today. |
@zrisher said:
Why do you say "I don't think it's [#1081] safe for direct use by web users by any means ..."? |
I would worry a bit about the information in
https://2013.picoctf.com//problems/pyeval/stage1.html
https://2013.picoctf.com/problems/pyeval/stage3.html
http://www.floyd.ch/?p=584
As far as I can tell, it is possibly to screen dangerous calls (they
appear to be signalled by a double underscore) but some special care is
required. Not being much of a Python programmer, I should probably not be
central to this conversation.
dan
…On Thu, 1 Dec 2016, Martin Holmer wrote:
@zrisher said:
I like where you're going with this. I don't think it's safe for direct
use by web users by any means (not sure if that's what you're asking), but
it's totally safe for local use. It's making me think about ways we could
package general changes to tax calculation logic that would support
exposing control in a safe manner online, making use of a user review
component as @feenberg smartly suggested. I'm going to mull it over and
will get back with more commentary later today.
Why do you say "I don't think it's [#1081] safe for direct use by web users by any
means ..."?
@MattHJensen @feenberg
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the
thread.[AHvQVYhs5Mu2k6RYvFMXLhSACCZHmFv7ks5rDtLkgaJpZM4K6-6h.gif]
|
@martinholmer said:
The links I listed when discussing sandboxing in #429 lay this out, but it's a bit of reading and I should have summarized:
HOWEVER, if you structure your code submission process in such a way to make visual review of top additions by project maintainers feasible, web-based code modification could become a reality for this project. I would still recommend running the calculations within a PyPy sandbox, or else your entire system is only as safe as the user accounts that can approve submissions. And I'm not sure about the data exposure issues, but I'm hoping |
On Thu, 1 Dec 2016, zrisher wrote:
@martinholmer said:
Why do you say "I don't think it's [#1081] safe for direct use by web
users by any means ..."?
The links I listed when discussing sandboxing in #429 lay this out, but it's a bit of
reading and I should have summarized:
* A user could cause any exception they want - Ok so we catch exceptions and don't
return them explicitly via the web
* A user could lock up the CPU or overflow the stack - Ok so we put some controls
around its execution. But you quickly find holes, and you keep further removing
the control from execution until you arrive at the managed-thread architecture
provided by PyPy's sandbox.
Don't the simulations run on a separate computer? If that computer
crashes, does it affect any other visitor? Do we have a way to kill a job
in an infinite loop before too much money has been spent?
* A user could control things we don't want them to - This is inherently difficult
to control because Python provides such powerful metaprogramming abilities
regardless of your execution level. This is exemplified by the links @feenberg
provided, explained by the top answer to this SO post, and proven out by the death
of the language-level sandbox pysandbox. PyPy attempts to solve this by replacing
even the lowest level references with its safer constructs.
* Also, in our particular case, a user could modify tax logic to try to access the
underlying taxpayer data. I don't know how dropq works, so I'm not sure if it
relies on a bit a of indirection existing in the dependent system or not.
You probably could do something to reveal the underlying data, but it
would take multiple runs and would stand out to anyone looking at the
code. If this is the only consideration we should still allow users of CPS
based dataset to submit without prior approval.
dan
…
HOWEVER, if you structure your code submission process in such a way to make visual
review of top additions by project maintainers feasible, web-based code modification
could become a reality for this project. I would still recommend running the
calculations within a PyPy sandbox, or else your entire system is only as safe as the
user accounts that can approve submissions. And I'm not sure about the data exposure
issues, but I'm hoping dropq already solves them.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the
thread.[AHvQVUNNyU-jWF5R7mTuoGLtYzQQecjLks5rDzQLgaJpZM4K6-6h.gif]
|
@feenberg said:
I'm pretty sure there are separate servers for webapp-public vs the taxcalc workers they queue up, but if an anonymous user has the ability to run unsandboxed code without review, they can bring down the entire tax calculation side by simply locking up all the workers with multiple long-running requests. Even with auto-scaling, it's an easy way to deny that service. I'm sure the worker controller has some safeguards built in here that @talumbau could speak to, but I'd bet that no matter how it's architected, a malicious user could easily deny the tax calculation service in that situation. But in terms of service ability and data security, the greater danger is actually this:
With the ability to run arbitrary code, a user might be able to gain access to the underlying OS, and possibly even the entire AWS account. Simply put, without at least the PyPy sandbox or code review, I would not recommend allowing the execution of user-supplied code if you care about the availability or security of your AWS assets. And in the long term I highly suggest both. @talumbau is probably best suited to speak to the current architecture and its security, though. |
Commit e545fa9 in pull request #1081 adds some security protection by scanning the parameter code for dangerous elements. The first version of the new Are there other dangerous character strings that should be prohibited? How far does this go in making secure our particular use of @MattHJensen @feenberg @Amy-Xu @GoFroggyRun @andersonfrailey @zrisher @codykallen |
Both of these lock up my process for a long time and could be amended to overflow memory instead:
The approach you're taking is language-level sandboxing, the same path that the creator of I am curious if our incredibly simplified use case for tax calculation logic with a few variables and basic operations might be an exception to that rule, but I am doubtful. Even if we massage it to the point that our team can't prove it's not safe, we don't have the security background of |
It would be amazing if we had a working version of that though, and I feel like I'm just repeating myself, so I'm going to lay off this issue a bit and see if other people can make more useful contributions. |
On Thu, 1 Dec 2016, zrisher wrote:
@martinholmer
Both of these lock up my process for a long time and could be amended to overflow memory instead:
* 9999**99999999 - Basic mathematical operations with large numbers
* ''.join(str(x / (y + 1)) for x in xrange(999999) for y in xrange(999999)) - List comprehension
without the use of brackets
We need to distinguish between abuse and real problems. Anyone can abuse
TB by making many simultaneous requests. That hasn't been a problem as far
as I can tell. More but less obvious paths don't really change that
situation.
Again, if we whitelist functions, there would be no need to whitelist any
string functions. As for large numbers, I am shocked that Python can't
deal with an exponentail overflow. Fortran could handle that in 1966.
The approach you're taking is language-level sandboxing, the same path that the creator of pysandbox
tried for years and declared fundamentally flawed. The current consensus seems to be that only
OS-level sandboxes are effective for Python.
Doesn't The failure of language level sandboxing have a lot to do with
wanting to include more facilities than we would need, and situations were
breaking the sandbox is more rewarding?
I am curious if our incredibly simplified use case for tax calculation logic with a few variables and
basic operations might be an exception to that rule, but I am doubtful. Even if we massage it to the
point that our team can't prove it's not safe, we don't have the security background of pysandbox's
maintainers, nor its hundreds of experienced black-hat engineers openly testing it for years.
What is the downside? I suppose there is a small possibility that TB would
be down for a few hours, or an entire weekend. That doesn't bother me if
the possibility is small, and we could backtrack if it happened. Again, if
we restrict the ability to registered users, there is no real problem at
all.
dan
… —
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the
thread.[AHvQVaouT9RphclHuefidRxeLx21UHSoks5rD4cQgaJpZM4K6-6h.gif]
|
@zrisher and @feenberg, Thank you both for the citations in the security literature and for your thoughtful comments on the pros/cons of using in pull request #1081 the following type of code:
There is a lot in the citations and in what you say, so I'm going to make a number of #1081 comments to discuss the issues raised by your comments. Thanks again for all the help. @MattHJensen @Amy-Xu @GoFroggyRun @andersonfrailey @codykallen |
@zrisher said (among a number of insightful things):
Since making that statement, #1081 has added a
The combination of code scanning and the no builtins means that all the provided examples of malicious code that gained access to the operating system running the Python interpreter have been avoided. Also, the kinds of CPU-wasting expressions that @zrisher pointed out have been avoided (see more detail on my experiments at the end of this comment). Despite these efforts, it seems likely (as @zrisher imagines) that a skilled hacker could still figure out some kind of security vulnerability with this arrangement. So, the question about whether or not to deploy the enhancements in #1081 in TaxBrain is (as @feenberg points out) really a benefit-cost analysis. There is no incentive to engage in this sort of hacking when running Tax-Calculator on your local computer because the hacker would pay the price. As @zrisher points out, the issue is whether or not to deploy #1081 on TaxBrain. Let me point out that it would be simple to delay implementing #1081 on TaxBrain by simply having TaxBrain reject the upload of any JSON reform files that contains the phrase What would the situation be like if that were not done and TaxBrain did accept for execution JSON reform files that contain @MattHJensen @Amy-Xu @GoFroggyRun @andersonfrailey @codykallen POSTSCRIPT ON TRYING MALICIOUS CODE
Here is the content of the
|
In the course of trying to stress Tax-Calculator using the parameter-code capability in pull request #1081, I discovered that the current version of TaxBrain is not immune to abuse. Obviously, you can't enter any numerical values for policy parameters that allow access to the underlying operating system and I have't found any numerical values that cause TaxBrain to crash, but I have found that you can enter numerical values that produce absurd results. So, for example, a common way of characterizing a reform that subjects all earnings to the OASDI payroll tax is to enter for the maximum taxable earnings parameter a value of So, as we consider the benefits and costs of merging pull request #1081, let's not forget that the current version of TaxBrain is already open to minor abuse by devious users. @MattHJensen @feenberg @Amy-Xu @GoFroggyRun @andersonfrailey @zrisher @codykallen |
On Sat, 3 Dec 2016, Martin Holmer wrote:
In the course of trying to stress Tax-Calculator using the parameter-code capability in
pull request #1081, I discovered that the current version of TaxBrain is not immune to
to abuse. Obviously, you can't enter any numerical values for policy parameters that
allow access to the underlying operating system and I have't found any numerical
values that cause TaxBrain to crash, but I have found that you can enter numerical
values that produce absurd results.
So, for example, a common way of characterizing a reform that subjects all earnings to
the OASDI payroll tax is to enter for the maximum taxable earnings parameter a value
of 9.9e99, which generates a 2020 increase in payroll tax revenue of 183.1 billion
dollars (complete results on this page). But if I enter for this same policy parameter
a value of 9.9e99**9.9e99, TaxBrain generates a 2020 increase in payroll tax revenue
That looks like an expression, not a numeric literal. What value does TB
store? If TB treats the input as an expression it suggests a user could
include variables in the string. This seems like a (possibly dangerous)
feature. Actually, if we could handle the security issues it seems like a
potentially great feature.
When I enter 9.9e99**9.9e99 into plain python, it returns an overflow
error (Numeric result out of range). Shouldn't it return an error to the
user? It seems as if we would have to go out of our way for TB to even
continue processing.
dan
|
The decrease in payroll taxes resulting from setting the maximum taxable earnings for Social Security to 9.9e99**9.9e99 is due to Python's method of handling NaN entries in data. This value results in an error for the calculation of payroll taxes in the reform. When the change in revenue is calculated, Python treats the NaN value as zero by default. The payroll tax decrease of $956.7 billion in 2020 is simply the total revenue from Social Security taxes under current law. The same revenue change can be achieved by setting the maximum taxable earnings to zero. |
I just went to TB and entered
.5*min(100000,e00100)
into the box for the taxable maximum wages. The result seems plausible, so
I have to think that TB accepts expressions in the boxes that are
documented to require numeric literals.
This is a fantastic feature, which will allow many wonderful reforms to be
tested directly but it does mean that we have to vet the inputs for
possibily abusive content. I believe Martin has the code to do that.
The practice of converting NaNs to zero seems unjustified to me. My local
Python interpreter does not do that - can we turn it off and what would we
lose if we did so? I would prefer TB to return a server error, preferably
with some diagnostic information.
dan
…On Sat, 3 Dec 2016, codykallen wrote:
The decrease in payroll taxes resulting from setting the maximum taxable earnings for
Social Security to 9.9e99**9.9e99 is due to Python's method of handling NaN entries in
data. This value results in an error for the calculation of payroll taxes in the
reform. When the change in revenue is calculated, Python treats the NaN value as zero
by default. The payroll tax decrease of $956.7 billion in 2020 is simply the total
revenue from Social Security taxes under current law. The same revenue change can be
achieved by setting the maximum taxable earnings to zero.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the
thread.[AHvQVW-aob2rADHIfnzM8PCKHTP9NtGnks5rEcz9gaJpZM4K6-6h.gif]
|
@codykallen said:
Thanks for this informative comment. Then @feenberg said:
If expressions are being evaluated when entered into TaxBrain numerical parameter boxes, then it would seem as if the current version of TaxBrain (without pull request #1081) may have some security risks. |
@feenberg said:
When I did what you describe above on TaxBrain it did run to completion and showed me results. When you say that the results "seems plausible" I wasn't sure what your expectations were. The results show a decline in 2020 payroll tax revenue of $956.7 billion from a pre-reform amount of $1,045.3 billion. This seems the same as @codykallen described above with the maximum taxable earnings parameter being interpreted as zero. This could be for one of (at least) two reasons. First, the expression may have been evaluated by TaxBrain as zero because when the In an attempt to see which of these two things is going on in TaxBrain, I tried an expression that includes a variable that is visible in the So, rather than viewing your results as an indication that the current version of TaxBrain has hidden in it "a fantastic feature", I view these results is indicating that the current version of TaxBrain has a serious bug. My suspicion is that this behavior is rooted in the philosophy of the project to try to avoid any validation testing of user-supplied input, but that is just a guess based on no systematic investigation. By the way, if you enter these kinds of expressions for the maximum taxable earnings parameter using the JSON reform file approach to using TaxBrain, errors messages are generated and no tax calculations are done. @MattHJensen @feenberg @Amy-Xu @GoFroggyRun @andersonfrailey @zrisher @codykallen |
On Sun, 4 Dec 2016, Martin Holmer wrote:
@feenberg said:
I just went to TB and entered 0.5*min(100000,e00100) into the box for the
taxable maximum wages [that is, the OASDI maximum taxable earnings]. The
result seems plausible, so I have to think that TB accepts expressions in
the boxes that are documented to require numeric literals.
When I did what you describe above on TaxBrain it did run to completion and showed me
results. When you say that the results "seems plausible" I wasn't sure what your
expectations were. The results show a decline in 2020 payroll tax revenue of $956.7
billion from a pre-reform amount of $1,045.3 billion. This seems the same as
@codykallen described above with the maximum taxable earnings parameter being
interpreted as zero. This could be for one of (at least) two reasons. First, the
expression may have been evaluated by TaxBrain as zero because when the _SS_Earnings_c
parameter is used in the EI_PayrollTax function that e00100 is not visible (and
perhaps, therefore, interpreted as zero). Or second, it could be that the expression
that you entered is simply not comprehended as a numerical value, but TaxBrain goes
ahead and does all the computations anyway.
In an attempt to see which of these two things is going on in TaxBrain, I tried an
expression that includes a variable that is visible in the EI_PayrollTax function.
When I enter 1.0*min(118500, e00200) for the value of the maximum taxable earnings, I
expect to see little or no change in payroll tax revenue. The TaxBrain results that I
get look just like what you got with your expression with 2020 payroll tax revenue
declining by about 91 percent (see the complete results here). This suggest to me that
there is no evaluation of the expression by TaxBrain, it is just that TaxBrain is
totally confused and assigns the _SS_Earnings_c parameter a value of zero or (as
@codykallen suggests) NaN.
So, rather than viewing your results as an indication that the current version of
TaxBrain has hidden in it "a fantastic feature", I view these results is indicating
that the current version of TaxBrain has a serious bug. My suspicion is that this
behavior is rooted in the philosophy of the project to try to avoid any validation
testing of user-supplied input, but that is just a guess based on no systematic
investigation.
OK, but we have to do one of two things:
1) implement the fantastic feature
2) stop treating errors as zeroes
The current situation is a serious bug.
dan
…
By the way, if you enter these kinds of expressions for the maximum taxable earnings
parameter using the JSON reform file approach to using TaxBrain, errors messages are
generated and no tax calculations are done.
@MattHJensen @feenberg @Amy-Xu @GoFroggyRun @andersonfrailey @zrisher @codykallen
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the
thread.[AHvQVb_5nhh2zJrLGX5NhbyyY-P86sEZks5rEwkygaJpZM4K6-6h.gif]
|
I have been out-of-pocket for some time, but I saw some of the messages here so I decided to get caught up on this thread. Overall, I have two main pieces of feedback:
For item 1, the salient points from @zrisher are as follows:
The issues with
The first is more likely than the second, which is mostly just annoying and would create a higher maintenance burden on TaxBrain. The second is obviously very bad, but does not seem very likely. Still the consequences are bad enough that we should try very hard to avoid it. For sandboxing, PyPy is not a viable alternative due to the fact that Tax-Calculator uses The suggestion I put forward about this issue (executing user-submitted code on TaxBrain) at our meeting in April is to use Docker as the container mechanism. The Dockerized system would be configured in a way similar to what is described in the StackExchage post linked above. The idea is that any malicious code would only be able to be malicious on an isolated system that could not impact the actual worker node. The effort level here is not small. For item 2, it's clear that the TaxBrain input page needs additional logic for better custom input parsing. This is a consequence of the decision to make these input fields a custom field, which means we are responsible for parsing what the user types. For example, initially we allowed just this kind of input:
but then we modified this to include things like:
If we only allowed integer literals or float literals or something like that, we could just hand the input off to the I'm back from my break on Tuesday, so I will be able to engage on the TaxBrain input errors at that time. |
I'm pretty sure there are separate servers for webapp-public vs the taxcalc workers they
queue up, but if an anonymous user has the ability to run unsandboxed code without review,
they can bring down the entire tax calculation side by simply locking up all the workers
with multiple long-running requests. Even with auto-scaling, it's an easy way to deny that
service. I'm sure the worker controller has some safeguards built in here that @talumbau
could speak to, but I'd bet that no matter how it's architected, a malicious user could
easily deny the tax calculation service in that situation.
Doesn't any user have the ability to DOS the system by simply opening
multiple windows and selecting calculate? How does DOS protection this
have anything to do with user-submitted code?
If we have a limit of 20 workers, and someone submits 100 requests, won't
the system appear to be essentially dead for an hour? I actually have no
idea how many workers we have, or what protection we have for our AWS
bill.
dan feenberg
|
A more comprehensive report on TaxBrain problems when entering an expression (rather than a number) into a traditional parameter box is provided in TaxBrain issue 427. @MattHJensen @feenberg @Amy-Xu @GoFroggyRun @andersonfrailey @zrisher @talumbau |
I would like to merge pull request #1081 at the end of the work day on Wednesday, December 7th. We seem to have a consensus (in the discussion so far) that the parameter-code enhancements in #1081 are OK for use with Tax-Calculator on local computers. There does seem to be some difference of opinion about the consequences of allowing the parameter-code enhancements to run on TaxBrain. But that is no problem for #1081 because it would be very easy for TaxBrain to refuse to execute any uploaded JSON reform file that contains the phrase So, if you have any concerns about merging #1081 into the master branch, now is the time to share those concerns with the development team. @MattHJensen @feenberg @Amy-Xu @GoFroggyRun @andersonfrailey @zrisher @talumbau @codykallen |
I'm 👍 on merging this for use by local taxcalc users. The discussion on TB incorrectly parsing param input has been moved to ospc-org/ospc.org#427. I suggest we continue the discussion of improving the ability of users to programmatically customize tax calculation logic in #429. This PR is our first step in that direction. |
After merging recent master branch changes into the pull-request-1081 development branch, we have the following results showing that the new
|
+1 from me. Thanks for adding the switch @martinholmer |
Recently an alternative definition of the investment income exclusion base was requested by a model user, and the boolean
_ALD_Investment_ec_base_all
parameter was added to handle this alternative. But this is not a very good solution. What happens when a third (or fourth or fifth) definition is requested? In that case, we would have to replace the boolean parameter_ALD_Investment_ec_base_all
with_ALD_Investment_ec_base_type
that has numerical values corresponding to each definition. The code would have a longif-elif
statement and it would have to be changed each time a new definition was requested. Plus, users would have to remember what definition each_ALD_Investment_ec_base_type
value referred to. Not a pretty picture.This pull request handles this problem in a different way, using one possible implementation of the ideas discussed in issue #429. The basic idea is to create a policy parameter that is a Python expression rather than a number. In this pull request, the boolean
_ALD_Investment_ec_base_all
parameter is replaced by two parameters: (a) theALD_Investment_ec_base_code
parameter that is a string containing the Python expression that defines alternative definitions of the base, and (b) the boolean_ALD_Investment_ec_base_code_active
parameter that specifies which years the expression is used to compute the base. In years the code expression is inactive, the base is computed the way it is now when_ALD_Investment_ec_base_all
is true.This pull request can be viewed as a trial-run of the general ideas discussed in #429. I'm sure there are other ways to implement those ideas, so this pull request should restart the #429 discussion.
Below I show how Tax-Calculator works on the branch contained in this pull request.
Comments are welcome after Thanksgiving.
@MattHJensen @feenberg @Amy-Xu @GoFroggyRun @andersonfrailey @zrisher @codykallen
So, when we use the code parameter, we get exactly the same results as when using the old boolean parameter.