Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(fuzzer): Support custom input generator in VectorFuzzer #11466

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kagamiori
Copy link
Contributor

@kagamiori kagamiori commented Nov 7, 2024

Summary:
Custom types often require custom logic to generate valid values, such as JSON. To support such
custom data generation for expression fuzzer, this diff makes two changes:

  1. Require a custom type to provide a custom input generator that is automatically used when
    VectorFuzzer generates vectors of this type. The custom type can provide a nullptr, in which case
    VectorFuzzer generates random data in the old way.

  2. Allow users of VectorFuzzer to provide a custom input generator to the API calls. (This will be
    needed for custom input generation for non-custom types in expression fuzzer, such as cdf functions
    that require some arguments to be positive numbers).

Differential Revision: D65576377

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 7, 2024
Copy link

netlify bot commented Nov 7, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 67691da
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6789c1fe8f678d000867bb24

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65576377

@kagamiori kagamiori changed the title Add input generator for json_parse in expression fuzzer [WIP] Add input generator for json_parse in expression fuzzer Nov 7, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65576377

kagamiori added a commit to kagamiori/velox that referenced this pull request Nov 7, 2024
…bator#11466)

Summary:
Pull Request resolved: facebookincubator#11466

DO NOT REVIEW NOW.

Differential Revision: D65576377
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65576377

kagamiori added a commit to kagamiori/velox that referenced this pull request Nov 7, 2024
…bator#11466)

Summary:
Pull Request resolved: facebookincubator#11466

DO NOT REVIEW NOW.

Differential Revision: D65576377
kagamiori added a commit to kagamiori/velox that referenced this pull request Nov 7, 2024
…bator#11466)

Summary:
Pull Request resolved: facebookincubator#11466

DO NOT REVIEW NOW.

Differential Revision: D65576377
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65576377

@assignUser
Copy link
Collaborator

Please set WIP PRs into draft mode (does not notify codeowners) until you want a review to reduce notification volume for codeowners :)

@kagamiori
Copy link
Contributor Author

Please set WIP PRs into draft mode (does not notify codeowners) until you want a review to reduce notification volume for codeowners :)

Hi @assignUser, I tried but unfortunately couldn't find the convert-to-draft button from my side. I guess it's because this PR was exported from our internal system.

By the way, the failed CI jobs report an error during Configure Build as follows:

CMake Error at velox/experimental/fuzzer_input_generator/CMakeLists.txt:26 (target_compile_options):
target_compile_options can not be used on an ALIAS target.

But the reported CMakeLists file doesn't use an ALIAS target with target_compile_options. Do you know how I could address this error?

velox_add_library(
  velox_fuzzer_constrained_input_generators ConstrainedGenerators.cpp
  ConstrainedVectorGenerator.cpp)

velox_link_libraries(
  velox_fuzzer_constrained_input_generators
  Folly::folly
  velox_expression
  velox_type
  velox_vector_fuzzer_util)

if(CMAKE_CXX_COMPILER_ID MATCHES "Clang")
  target_compile_options(velox_fuzzer_constrained_input_generators
                         PRIVATE -Wno-deprecated-declarations)
endif()

@assignUser
Copy link
Collaborator

But the reported CMakeLists file doesn't use an ALIAS target with target_compile_options. Do you know how I could address this error?

The mono library creates alias targets for all targets, so you have to check with if(VELOX_MONO_LIBRARY) and use velox as the target. Or if it's only one TU with the warnings you can set the flag on that source file with https://cmake.org/cmake/help/latest/command/set_source_files_properties.html this would then not set the flag on all files in velox.

@assignUser assignUser marked this pull request as draft November 8, 2024 14:59
kagamiori added a commit to kagamiori/velox that referenced this pull request Nov 12, 2024
…bator#11466)

Summary:
Pull Request resolved: facebookincubator#11466

DO NOT REVIEW NOW.

Differential Revision: D65576377
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65576377

@kagamiori
Copy link
Contributor Author

But the reported CMakeLists file doesn't use an ALIAS target with target_compile_options. Do you know how I could address this error?

The mono library creates alias targets for all targets, so you have to check with if(VELOX_MONO_LIBRARY) and use velox as the target. Or if it's only one TU with the warnings you can set the flag on that source file with https://cmake.org/cmake/help/latest/command/set_source_files_properties.html this would then not set the flag on all files in velox.

Hi @assignUser, could you please share an example of the suggested approach? I tried adding the following code in the CMakeLists.txt file but it didn't work on my Mac (i.e., I still see errors of using the deprecated thing).

if(CMAKE_CXX_COMPILER_ID MATCHES "Clang")
    set_source_files_properties(Utils.h PROPERTIES COMPILE_OPTIONS "-Wno-deprecated-declarations")
    set_source_files_properties(Utils.cpp PROPERTIES COMPILE_OPTIONS "-Wno-deprecated-declarations")
endif()

I've also tried adding #pragma clang diagnostic ignored "-Wdeprecated-declarations" in Utils.h and Utils.cpp that have the deprecated usage. It worked on my Mac but fails in CI (https://github.com/facebookincubator/velox/actions/runs/11804302396/job/32884198994?pr=11466).

kagamiori added a commit to kagamiori/velox that referenced this pull request Nov 15, 2024
…bator#11466)

Summary:
Pull Request resolved: facebookincubator#11466

DO NOT REVIEW NOW.

Differential Revision: D65576377
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65576377

kagamiori added a commit to kagamiori/velox that referenced this pull request Nov 16, 2024
…bator#11466)

Summary:
Pull Request resolved: facebookincubator#11466

DO NOT REVIEW NOW.

Differential Revision: D65576377
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65576377

Copy link
Collaborator

@assignUser assignUser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should the generator that, afaiu is only supposed to be used for fuzzing be part of libvelox and link to the actual fuzzer? As type is core we would ALWAYS need to build with fuzzers?

config.seed_,
JSON(),
config.nullRatio_,
fuzzer::getRandomInputGenerator(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the cause of the missing symbol. The link to velox_fuzzer... is removed by the monolithic wrapper function. We could link this explicitly but to be honest I don't think the type defintion is the right place for an input generator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @assignUser, every custom type needs a logic to generate valid data, so the custom types and input generators are naturally bounded. So we add a new API getInputGenerator() to CustomTypeFactories to remind every custom type author to define the data generation logic intentionally. I'll try to move ConstrainedInputGenerators.h/cpp to another place not under fuzzer directories. Thank you for helping pinpoint the problem!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm moving ConstrainedInputGenerators.h/cpp to velox/common/fuzzer: #12080.

yuandagits added a commit to yuandagits/velox that referenced this pull request Jan 11, 2025
Summary:
Presto may perform constant folding on queries before sending the fragment to velox workers. However, when the workers receive the fragments, the fragments may contain types which had a different implementation than how velox implemented the type. This incompatibility results in incorrect results.

For example, this PR fixes the type incompatibility between Java coordinator and C++ worker for `ipaddress` types. 
 - Java coordinator, ipaddress is represented as a slice of 16 bytes which if represented as a number, would be big endian. 
- C++ worker, ipaddress is represented as an int128_t, in little endian form.

The discrepancy between these two can be see with on native engine, the result set will be `::ffff:1.2.3.4` represented in reverse byte order
```
SELECT 
  CAST(ip AS ipaddress) as casted_ip 
FROM 
  (
    VALUES 
      ('::ffff:1.2.3.4')
  ) AS t (ip)
```

To address this issue, we can reverse the byte order of the ipaddress type sent from and to Java.

**Note**: 

- This issue is not exclusive to ipaddrss, and other custom types in velox which have different underlying type/implementation than Java may suffer from this issue as well.

- We can likely enhance the fuzzer to help catch cases like this at diff time once custom fuzzer inputs are landed (facebookincubator#11466)

Differential Revision: D68039050
kagamiori added a commit to kagamiori/velox that referenced this pull request Jan 14, 2025
…kincubator#11466)

Summary:


Custom types often require custom logic to generate valid values, such as JSON. To support such 
custom data generation for expression fuzzer, this diff makes two changes:

1. Require a custom type to provide a custom input generator that is automatically used when 
VectorFuzzer generates vectors of this type. The custom type can provide a nullptr, in which case 
VectorFuzzer generates random data in the old way.

2. Allow users of VectorFuzzer to provide a custom input generator to the API calls. (This will be 
needed for custom input generation for non-custom types in expression fuzzer, such as cdf functions 
that require some arguments to be positive numbers).

Differential Revision: D65576377
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65576377

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65576377

kagamiori added a commit to kagamiori/velox that referenced this pull request Jan 14, 2025
…kincubator#11466)

Summary:
Pull Request resolved: facebookincubator#11466

Custom types often require custom logic to generate valid values, such as JSON. To support such
custom data generation for expression fuzzer, this diff makes two changes:

1. Require a custom type to provide a custom input generator that is automatically used when
VectorFuzzer generates vectors of this type. The custom type can provide a nullptr, in which case
VectorFuzzer generates random data in the old way.

2. Allow users of VectorFuzzer to provide a custom input generator to the API calls. (This will be
needed for custom input generation for non-custom types in expression fuzzer, such as cdf functions
that require some arguments to be positive numbers).

Differential Revision: D65576377
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65576377

kagamiori added a commit to kagamiori/velox that referenced this pull request Jan 15, 2025
…kincubator#11466)

Summary:
Pull Request resolved: facebookincubator#11466

Custom types often require custom logic to generate valid values, such as JSON. To support such
custom data generation for expression fuzzer, this diff makes two changes:

1. Require a custom type to provide a custom input generator that is automatically used when
VectorFuzzer generates vectors of this type. The custom type can provide a nullptr, in which case
VectorFuzzer generates random data in the old way.

2. Allow users of VectorFuzzer to provide a custom input generator to the API calls. (This will be
needed for custom input generation for non-custom types in expression fuzzer, such as cdf functions
that require some arguments to be positive numbers).

Differential Revision: D65576377
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65576377

kagamiori added a commit to kagamiori/velox that referenced this pull request Jan 15, 2025
…kincubator#11466)

Summary:
Pull Request resolved: facebookincubator#11466

Custom types often require custom logic to generate valid values, such as JSON. To support such
custom data generation for expression fuzzer, this diff makes two changes:

1. Require a custom type to provide a custom input generator that is automatically used when
VectorFuzzer generates vectors of this type. The custom type can provide a nullptr, in which case
VectorFuzzer generates random data in the old way.

2. Allow users of VectorFuzzer to provide a custom input generator to the API calls. (This will be
needed for custom input generation for non-custom types in expression fuzzer, such as cdf functions
that require some arguments to be positive numbers).

Differential Revision: D65576377
Comment on lines 2018 to 2020
// Type of data represented by JSON. This config should be ignored by non-JSON
// input generators.
const TypePtr& representedType_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just let the JSON Generator call velox::randType(...) with a FuzzerGenerator created from seed_?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, JSON really only supports a handful of types: string, integer, float, object (map), array, and boolean (any of which can be null), so JSON could just randomly construct one of those if you don't want to link against VectorFuzzer in the Presto Type code.

@@ -336,13 +355,16 @@ VectorPtr VectorFuzzer::fuzzConstant(const TypePtr& type, vector_size_t size) {
opts_.maxConstantContainerSize.value(), opts_.containerLength);
opts_.complexElementsMaxSize = std::min<int32_t>(
opts_.maxConstantContainerSize.value(), opts_.complexElementsMaxSize);
// TODO: incorporate fuzzer options into customGenerator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment feels like it would be more appropriate where we create the InputGeneratorConfig in VectorFuzzer.cpp.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65576377

kagamiori added a commit to kagamiori/velox that referenced this pull request Jan 16, 2025
…kincubator#11466)

Summary:
Pull Request resolved: facebookincubator#11466

Custom types often require custom logic to generate valid values, such as JSON. To support such
custom data generation for expression fuzzer, this diff makes two changes:

1. Require a custom type to provide a custom input generator that is automatically used when
VectorFuzzer generates vectors of this type. The custom type can provide a nullptr, in which case
VectorFuzzer generates random data in the old way.

2. Allow users of VectorFuzzer to provide a custom input generator to the API calls. (This will be
needed for custom input generation for non-custom types in expression fuzzer, such as cdf functions
that require some arguments to be positive numbers).

Differential Revision: D65576377
kagamiori added a commit to kagamiori/velox that referenced this pull request Jan 17, 2025
…kincubator#11466)

Summary:
Pull Request resolved: facebookincubator#11466

Custom types often require custom logic to generate valid values, such as JSON. To support such
custom data generation for expression fuzzer, this diff makes two changes:

1. Require a custom type to provide a custom input generator that is automatically used when
VectorFuzzer generates vectors of this type. The custom type can provide a nullptr, in which case
VectorFuzzer generates random data in the old way.

2. Allow users of VectorFuzzer to provide a custom input generator to the API calls. (This will be
needed for custom input generation for non-custom types in expression fuzzer, such as cdf functions
that require some arguments to be positive numbers).

Differential Revision: D65576377
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65576377

…kincubator#11466)

Summary:
Pull Request resolved: facebookincubator#11466

Custom types often require custom logic to generate valid values, such as JSON. To support such
custom data generation for expression fuzzer, this diff makes two changes:

1. Require a custom type to provide a custom input generator that is automatically used when
VectorFuzzer generates vectors of this type. The custom type can provide a nullptr, in which case
VectorFuzzer generates random data in the old way.

2. Allow users of VectorFuzzer to provide a custom input generator to the API calls. (This will be
needed for custom input generation for non-custom types in expression fuzzer, such as cdf functions
that require some arguments to be positive numbers).

Differential Revision: D65576377
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65576377

Copy link
Contributor

@kevinwilfong kevinwilfong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM

kagamiori added a commit to kagamiori/velox that referenced this pull request Jan 17, 2025
…kincubator#11466)

Summary:
Pull Request resolved: facebookincubator#11466

Custom types often require custom logic to generate valid values, such as JSON. To support such custom data generation for expression fuzzer, this diff makes two changes:
1. Require a custom type to provide a custom input generator that is automatically used when VectorFuzzer generates vectors of this type. The custom type can provide a nullptr, in which case VectorFuzzer generates random data in the old way.
2. Allow users of VectorFuzzer to provide a custom input generator to the API calls. (This will be needed for custom input generation for non-custom types in expression fuzzer, such as cdf functions that require some arguments to be positive numbers).

Differential Revision: D65576377
kagamiori added a commit to kagamiori/velox that referenced this pull request Jan 17, 2025
…kincubator#11466)

Summary:
Pull Request resolved: facebookincubator#11466

Custom types often require custom logic to generate valid values, such as JSON. To support such custom data generation for expression fuzzer, this diff makes two changes:
1. Require a custom type to provide a custom input generator that is automatically used when VectorFuzzer generates vectors of this type. The custom type can provide a nullptr, in which case VectorFuzzer generates random data in the old way.
2. Allow users of VectorFuzzer to provide a custom input generator to the API calls. (This will be needed for custom input generation for non-custom types in expression fuzzer, such as cdf functions that require some arguments to be positive numbers).

Differential Revision: D65576377
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants