-
Notifications
You must be signed in to change notification settings - Fork 0
Conversation
I want to make it so that most fuzzer queries can be ran against a data source as opposed to failing at compile time. I think some large changes must happen to that end: 1. Generation of some structures must go bottom-up. In particular subqueries for a given query need to be figured out, before that query can be generated. 2. Generation of some other structures must go top-down. For example some expressions are forbidden in subqueries, so we would like to know if we're in the context of a top-level query or subquery when generating those. 3. Currently the fuzzer either generates incredibly complex queries with many subqueries and very complex expressions or very simple queries. I want to take direct control of the complexity of what's being generated such that the amount of complexity "allowed" is "invested" in various ways - either building more complex expressions or a more complex overall query structure. This should make the fuzzer generate queries of medium complexity, making it more likely that such a query will both be useful and able to run. Achieving both of these with stream_data is somewhat tricky. The size parameter which is supposed to guide generator complexity is difficult to properly manage. It's also somewhat cumbersome to generate data when both a top-down and bottom-up approach are needed. Because of this I'm trying a hand-crafted approach.
This comment has been minimized.
This comment has been minimized.
Standard tests have passed 🎉 |
|> Function.type_specs() | ||
|> Enum.map(fn {argument_types, return_type} -> | ||
{function, argument_types, return_type} | ||
defmacrop generate(complexity, {:%{}, line, options}) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line
doesn't seem to be used. You could also use a list of tuples instead of a map, and then you wouldn't need to depend on the quoted format (because a quoted list of pairs is a list of pairs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed later. I didn't like the extra parens required in case of a list of tuples, so I settled on this format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a more important question is why does this need to be a macro? Macros have all kinds of weird semantics, and I can't immediately tell what are the gains here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't make it a macro, except for the need for lazy evaluation of the options. The other alternative would be to make it a function with calling syntax something like:
frequency(complexity, [
{1, fn -> first_option(complexity) end},
{1, fn -> second_option(comlexity / 2, complexity / 2) end}
])
I had that in the beginning, but it looks rather messy with the extra fn
s.
end | ||
defp generate_scaffold(tables, complexity) do | ||
generate(complexity, %{ | ||
3 => %Scaffold{from: Enum.random(tables), complexity: complexity}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are these 1,2,3 keys?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are weights associated with the options. So in this case the first option is 3 times more likely than the other two. Furthermore if complexity is low, only the earlier options will be chosen. Here, if complexity is 4, it will only ever choose the first or second option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this could be made more explicit if you used a list of tuples instead of a map? Something like:
[
{%Scaffold{...}, weight: 3},
# ...
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current map representation means we can't specify more options with the same likelihood. Is that intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can, because it goes through the macro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hahaha, you're right, but I gotta say that's quite hacky and confusing. If I ever saw a code like
%{
1 => foo(),
1 => bar()
}
I'd be very puzzled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll change it. WDYT about using the macro to make them evaluate lazily?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about using the macro to make them evaluate lazily?
I'm not completely convinced whether it's more helpful or distracting, but it's fairly simple, so I'll leave that decision up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case I'll leave as-is for now. It's just a mechanical task to change this later if it proves problematic for some reason.
|
||
defp group_by_elements({:select, _, items}) do | ||
items | ||
|> Enum.with_index() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also do |> Enum.with_index(1)
, and then you wouldn't need + 1
two lines below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Smart!
}) | ||
end | ||
|
||
defp simple_condtion(), do: {:=, nil, [constant(), constant()]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small typo in condtion
Pull request can be merged 🎉 |
Standard tests have passed 🎉 |
Ready for review |
Pull request can be merged 👍 |
I want to make it so that most fuzzer queries can be ran against a data source as opposed to failing at compile time. I think some large changes must happen to that end:
Achieving both of these with stream_data is somewhat tricky. The size parameter which is supposed to guide generator complexity is difficult to properly manage. It's also somewhat cumbersome to generate data when both a top-down and bottom-up approach are needed. Because of this I'm trying a hand-crafted approach.
As a further note on stream_data: I was hoping we'd be able to use the minimization features to be able to identify a small failing example with much less human intervention. However, because of a complex-looking issue in the lib (whatyouhide/stream_data#97) this didn't pan out. It seems like it should be possible to do a hand-crafted example minimization algorithm, but, of course, we don't need stream_data for that.
The version in this PR generates queries that don't really reference data, except for selecting user ids. However it already uses most of the high-level query features (WHERE, HAVING, etc.) and manages to retain over 95% successful query rate (only querying postgres for now), despite generating deeply nested subqueries. Even this simple version already uncovered three issues (#3049, #3061, #3059), so I think it can be called a small success.