-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(fuzzer): Support custom input generator in VectorFuzzer #11466
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for meta-velox canceled.
|
This pull request was exported from Phabricator. Differential Revision: D65576377 |
This pull request was exported from Phabricator. Differential Revision: D65576377 |
…bator#11466) Summary: Pull Request resolved: facebookincubator#11466 DO NOT REVIEW NOW. Differential Revision: D65576377
b69e0d2
to
6e3c393
Compare
This pull request was exported from Phabricator. Differential Revision: D65576377 |
…bator#11466) Summary: Pull Request resolved: facebookincubator#11466 DO NOT REVIEW NOW. Differential Revision: D65576377
6e3c393
to
e283b96
Compare
…bator#11466) Summary: Pull Request resolved: facebookincubator#11466 DO NOT REVIEW NOW. Differential Revision: D65576377
e283b96
to
595c9a0
Compare
This pull request was exported from Phabricator. Differential Revision: D65576377 |
Please set WIP PRs into draft mode (does not notify codeowners) until you want a review to reduce notification volume for codeowners :) |
Hi @assignUser, I tried but unfortunately couldn't find the convert-to-draft button from my side. I guess it's because this PR was exported from our internal system. By the way, the failed CI jobs report an error during Configure Build as follows:
But the reported CMakeLists file doesn't use an ALIAS target with target_compile_options. Do you know how I could address this error?
|
The mono library creates alias targets for all targets, so you have to check with |
…bator#11466) Summary: Pull Request resolved: facebookincubator#11466 DO NOT REVIEW NOW. Differential Revision: D65576377
595c9a0
to
4ec5014
Compare
This pull request was exported from Phabricator. Differential Revision: D65576377 |
Hi @assignUser, could you please share an example of the suggested approach? I tried adding the following code in the CMakeLists.txt file but it didn't work on my Mac (i.e., I still see errors of using the deprecated thing).
I've also tried adding |
4ec5014
to
d12b361
Compare
…bator#11466) Summary: Pull Request resolved: facebookincubator#11466 DO NOT REVIEW NOW. Differential Revision: D65576377
This pull request was exported from Phabricator. Differential Revision: D65576377 |
…bator#11466) Summary: Pull Request resolved: facebookincubator#11466 DO NOT REVIEW NOW. Differential Revision: D65576377
d12b361
to
9b2d277
Compare
This pull request was exported from Phabricator. Differential Revision: D65576377 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should the generator that, afaiu is only supposed to be used for fuzzing be part of libvelox and link to the actual fuzzer? As type is core we would ALWAYS need to build with fuzzers?
config.seed_, | ||
JSON(), | ||
config.nullRatio_, | ||
fuzzer::getRandomInputGenerator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the cause of the missing symbol. The link to velox_fuzzer... is removed by the monolithic wrapper function. We could link this explicitly but to be honest I don't think the type defintion is the right place for an input generator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @assignUser, every custom type needs a logic to generate valid data, so the custom types and input generators are naturally bounded. So we add a new API getInputGenerator() to CustomTypeFactories to remind every custom type author to define the data generation logic intentionally. I'll try to move ConstrainedInputGenerators.h/cpp to another place not under fuzzer directories. Thank you for helping pinpoint the problem!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm moving ConstrainedInputGenerators.h/cpp to velox/common/fuzzer: #12080.
Summary: Presto may perform constant folding on queries before sending the fragment to velox workers. However, when the workers receive the fragments, the fragments may contain types which had a different implementation than how velox implemented the type. This incompatibility results in incorrect results. For example, this PR fixes the type incompatibility between Java coordinator and C++ worker for `ipaddress` types. - Java coordinator, ipaddress is represented as a slice of 16 bytes which if represented as a number, would be big endian. - C++ worker, ipaddress is represented as an int128_t, in little endian form. The discrepancy between these two can be see with on native engine, the result set will be `::ffff:1.2.3.4` represented in reverse byte order ``` SELECT CAST(ip AS ipaddress) as casted_ip FROM ( VALUES ('::ffff:1.2.3.4') ) AS t (ip) ``` To address this issue, we can reverse the byte order of the ipaddress type sent from and to Java. **Note**: - This issue is not exclusive to ipaddrss, and other custom types in velox which have different underlying type/implementation than Java may suffer from this issue as well. - We can likely enhance the fuzzer to help catch cases like this at diff time once custom fuzzer inputs are landed (facebookincubator#11466) Differential Revision: D68039050
…kincubator#11466) Summary: Custom types often require custom logic to generate valid values, such as JSON. To support such custom data generation for expression fuzzer, this diff makes two changes: 1. Require a custom type to provide a custom input generator that is automatically used when VectorFuzzer generates vectors of this type. The custom type can provide a nullptr, in which case VectorFuzzer generates random data in the old way. 2. Allow users of VectorFuzzer to provide a custom input generator to the API calls. (This will be needed for custom input generation for non-custom types in expression fuzzer, such as cdf functions that require some arguments to be positive numbers). Differential Revision: D65576377
6b0bf0e
to
6ba8691
Compare
This pull request was exported from Phabricator. Differential Revision: D65576377 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D65576377 |
…kincubator#11466) Summary: Pull Request resolved: facebookincubator#11466 Custom types often require custom logic to generate valid values, such as JSON. To support such custom data generation for expression fuzzer, this diff makes two changes: 1. Require a custom type to provide a custom input generator that is automatically used when VectorFuzzer generates vectors of this type. The custom type can provide a nullptr, in which case VectorFuzzer generates random data in the old way. 2. Allow users of VectorFuzzer to provide a custom input generator to the API calls. (This will be needed for custom input generation for non-custom types in expression fuzzer, such as cdf functions that require some arguments to be positive numbers). Differential Revision: D65576377
6ba8691
to
2e51e14
Compare
This pull request was exported from Phabricator. Differential Revision: D65576377 |
…kincubator#11466) Summary: Pull Request resolved: facebookincubator#11466 Custom types often require custom logic to generate valid values, such as JSON. To support such custom data generation for expression fuzzer, this diff makes two changes: 1. Require a custom type to provide a custom input generator that is automatically used when VectorFuzzer generates vectors of this type. The custom type can provide a nullptr, in which case VectorFuzzer generates random data in the old way. 2. Allow users of VectorFuzzer to provide a custom input generator to the API calls. (This will be needed for custom input generation for non-custom types in expression fuzzer, such as cdf functions that require some arguments to be positive numbers). Differential Revision: D65576377
2e51e14
to
ed17783
Compare
This pull request was exported from Phabricator. Differential Revision: D65576377 |
…kincubator#11466) Summary: Pull Request resolved: facebookincubator#11466 Custom types often require custom logic to generate valid values, such as JSON. To support such custom data generation for expression fuzzer, this diff makes two changes: 1. Require a custom type to provide a custom input generator that is automatically used when VectorFuzzer generates vectors of this type. The custom type can provide a nullptr, in which case VectorFuzzer generates random data in the old way. 2. Allow users of VectorFuzzer to provide a custom input generator to the API calls. (This will be needed for custom input generation for non-custom types in expression fuzzer, such as cdf functions that require some arguments to be positive numbers). Differential Revision: D65576377
ed17783
to
37b0528
Compare
velox/type/Type.h
Outdated
// Type of data represented by JSON. This config should be ignored by non-JSON | ||
// input generators. | ||
const TypePtr& representedType_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just let the JSON Generator call velox::randType(...) with a FuzzerGenerator created from seed_?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, JSON really only supports a handful of types: string, integer, float, object (map), array, and boolean (any of which can be null), so JSON could just randomly construct one of those if you don't want to link against VectorFuzzer in the Presto Type code.
velox/vector/fuzzer/VectorFuzzer.cpp
Outdated
@@ -336,13 +355,16 @@ VectorPtr VectorFuzzer::fuzzConstant(const TypePtr& type, vector_size_t size) { | |||
opts_.maxConstantContainerSize.value(), opts_.containerLength); | |||
opts_.complexElementsMaxSize = std::min<int32_t>( | |||
opts_.maxConstantContainerSize.value(), opts_.complexElementsMaxSize); | |||
// TODO: incorporate fuzzer options into customGenerator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment feels like it would be more appropriate where we create the InputGeneratorConfig in VectorFuzzer.cpp.
This pull request was exported from Phabricator. Differential Revision: D65576377 |
37b0528
to
09f893c
Compare
…kincubator#11466) Summary: Pull Request resolved: facebookincubator#11466 Custom types often require custom logic to generate valid values, such as JSON. To support such custom data generation for expression fuzzer, this diff makes two changes: 1. Require a custom type to provide a custom input generator that is automatically used when VectorFuzzer generates vectors of this type. The custom type can provide a nullptr, in which case VectorFuzzer generates random data in the old way. 2. Allow users of VectorFuzzer to provide a custom input generator to the API calls. (This will be needed for custom input generation for non-custom types in expression fuzzer, such as cdf functions that require some arguments to be positive numbers). Differential Revision: D65576377
Differential Revision: D68137136
…kincubator#11466) Summary: Pull Request resolved: facebookincubator#11466 Custom types often require custom logic to generate valid values, such as JSON. To support such custom data generation for expression fuzzer, this diff makes two changes: 1. Require a custom type to provide a custom input generator that is automatically used when VectorFuzzer generates vectors of this type. The custom type can provide a nullptr, in which case VectorFuzzer generates random data in the old way. 2. Allow users of VectorFuzzer to provide a custom input generator to the API calls. (This will be needed for custom input generation for non-custom types in expression fuzzer, such as cdf functions that require some arguments to be positive numbers). Differential Revision: D65576377
09f893c
to
c6a645b
Compare
This pull request was exported from Phabricator. Differential Revision: D65576377 |
…kincubator#11466) Summary: Pull Request resolved: facebookincubator#11466 Custom types often require custom logic to generate valid values, such as JSON. To support such custom data generation for expression fuzzer, this diff makes two changes: 1. Require a custom type to provide a custom input generator that is automatically used when VectorFuzzer generates vectors of this type. The custom type can provide a nullptr, in which case VectorFuzzer generates random data in the old way. 2. Allow users of VectorFuzzer to provide a custom input generator to the API calls. (This will be needed for custom input generation for non-custom types in expression fuzzer, such as cdf functions that require some arguments to be positive numbers). Differential Revision: D65576377
This pull request was exported from Phabricator. Differential Revision: D65576377 |
c6a645b
to
67691da
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM
…kincubator#11466) Summary: Pull Request resolved: facebookincubator#11466 Custom types often require custom logic to generate valid values, such as JSON. To support such custom data generation for expression fuzzer, this diff makes two changes: 1. Require a custom type to provide a custom input generator that is automatically used when VectorFuzzer generates vectors of this type. The custom type can provide a nullptr, in which case VectorFuzzer generates random data in the old way. 2. Allow users of VectorFuzzer to provide a custom input generator to the API calls. (This will be needed for custom input generation for non-custom types in expression fuzzer, such as cdf functions that require some arguments to be positive numbers). Differential Revision: D65576377
…kincubator#11466) Summary: Pull Request resolved: facebookincubator#11466 Custom types often require custom logic to generate valid values, such as JSON. To support such custom data generation for expression fuzzer, this diff makes two changes: 1. Require a custom type to provide a custom input generator that is automatically used when VectorFuzzer generates vectors of this type. The custom type can provide a nullptr, in which case VectorFuzzer generates random data in the old way. 2. Allow users of VectorFuzzer to provide a custom input generator to the API calls. (This will be needed for custom input generation for non-custom types in expression fuzzer, such as cdf functions that require some arguments to be positive numbers). Differential Revision: D65576377
Summary:
Custom types often require custom logic to generate valid values, such as JSON. To support such
custom data generation for expression fuzzer, this diff makes two changes:
Require a custom type to provide a custom input generator that is automatically used when
VectorFuzzer generates vectors of this type. The custom type can provide a nullptr, in which case
VectorFuzzer generates random data in the old way.
Allow users of VectorFuzzer to provide a custom input generator to the API calls. (This will be
needed for custom input generation for non-custom types in expression fuzzer, such as cdf functions
that require some arguments to be positive numbers).
Differential Revision: D65576377