Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade DuckDB to 0.8.1 #6725

Closed
wants to merge 2 commits into from

Conversation

majetideepak
Copy link
Collaborator

@majetideepak majetideepak commented Sep 25, 2023

Move from DuckDB amalgamation to using the latest DuckDB 0.8.1 as an external dependency.
See discussion here #5589

@netlify
Copy link

netlify bot commented Sep 25, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 06208d7
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/653b20607a0cae0008ffc1f2

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 25, 2023
@majetideepak majetideepak force-pushed the pgparser branch 4 times, most recently from 11a73b0 to 8edb4fd Compare September 27, 2023 00:24
@majetideepak majetideepak changed the title Upgrade DuckDB to 0.8.1 Upgrade DuckDB to 0.9.0 Oct 6, 2023
@majetideepak majetideepak force-pushed the pgparser branch 6 times, most recently from 8ab1418 to 1d1b603 Compare October 9, 2023 12:00
@majetideepak majetideepak marked this pull request as ready for review October 9, 2023 12:25
@majetideepak
Copy link
Collaborator Author

majetideepak commented Oct 9, 2023

There is only one test that is failing
https://app.circleci.com/pipelines/github/facebookincubator/velox/35337/workflows/486c084d-e899-496d-9c10-7cedf0c2a96e/jobs/232713/tests
HashJoinTest.semiProjectWithFilter:
According to the DuckDB documentation, EXISTS operator should never return NULL. However, it does for this particular query. Likely a bug in DuckDB?

../../velox/exec/tests/utils/QueryAssertions.cpp:1061
Failed
Expected 15, got 15
3 extra rows, 3 missing rows
3 of extra rows:
	null | 40 | false
	null | 40 | false
	null | 40 | false

3 of missing rows:
	null | 40 | null
	null | 40 | null
	null | 40 | null

Note: DuckDB only supports timestamps of millisecond precision. If this test involves timestamp inputs, please make sure you use the right precision.
DuckDB query: SELECT t0, t1, EXISTS (SELECT * FROM u WHERE u0 = t0 AND t1 <> u1) FROM t

@majetideepak
Copy link
Collaborator Author

@kgpai, @mbasmanova, Can you import and share what the failures look like internally? Thanks.

@assignUser
Copy link
Collaborator

Wow, very nice to get rid of the amalgamation! Will review later but 🎉

@majetideepak
Copy link
Collaborator Author

The tests are taking a lot longer likely due to DuckDB being built in Debug mode. We need to build it in Release mode always. For some reason, I am unable to enable Release mode only for DuckDB via bundling.
See failures here 4fe7d1d
We should probably make it SYSTEM always in CI.

@mbasmanova
Copy link
Contributor

@majetideepak

Can you import and share what the failures look like internally?

Deepak, this is a significant undertaking, hence, will need to be prioritized first. I see that CI is red and you mentioned that tests are taking a long time. Maybe resolve these issues first while we are looking for ways to free up bandwidth.

@majetideepak
Copy link
Collaborator Author

majetideepak commented Oct 11, 2023

@mbasmanova and I discussed offline.
Meta's internal build system uses DuckDB 0.8.1. We will use this version in CI as well. 0.8.1 is one release prior to the current recent release (0.9.*) and it should be more stable.
The Velox tpch dbgen depends on the DuckDB TPC-H extension. We must remove this dependency first.
See #7001

@pedroerp
Copy link
Contributor

Thanks @majetideepak for looking into this. Now that the dbgen dependency is decoupled, what are the next steps on this PR?

@majetideepak majetideepak force-pushed the pgparser branch 3 times, most recently from 65dd120 to 3f46b47 Compare October 24, 2023 16:01
@majetideepak
Copy link
Collaborator Author

@pedroerp, this is ready for review

Copy link
Collaborator

@assignUser assignUser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, really nice change!

@@ -31,6 +31,28 @@ commands:
git submodule sync --recursive
git submodule update --init --recursive

install-duckdb:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good for now but in a follow up we should add this to the docker images.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! I don't have the power to update docker images.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually you have :) We have a workflow in place that updates the dockerfiles after changes to the files are merged to main: https://github.com/facebookincubator/velox/actions/workflows/docker.yml

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Good to know. Thanks.

@@ -458,6 +458,9 @@ if("${CMAKE_CXX_COMPILER_ID}" MATCHES "GNU")
message(
FATAL_ERROR "VELOX requires gcc > 8. Found ${CMAKE_CXX_COMPILER_VERSION}")
endif()

# Find Threads library
find_package(Threads REQUIRED)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? If ddb requires Threads it will/should look for it. Feel free to resolve if not for ddb but rather our changed code.

Copy link
Collaborator Author

@majetideepak majetideepak Oct 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started seeing errors such as below in our code with this PR.

CMake Error at velox/common/memory/tests/CMakeLists.txt:15 (add_executable):
  Target "velox_memory_test" links to target "Threads::Threads" but the
  target was not found.  Perhaps a find_package() call is missing for an
  IMPORTED target, or an ALIAS target is missing?

See https://app.circleci.com/pipelines/github/facebookincubator/velox/36404/workflows/3bdaa340-22b1-46df-8377-449100e5bbe9/jobs/241760

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see so it likely was a transitive dependency satisfied via DDB amalgamation that is now missing. Makes sense to add it in that case

CMakeLists.txt Show resolved Hide resolved
@majetideepak majetideepak force-pushed the pgparser branch 2 times, most recently from 3c62b5a to e8e8ad5 Compare October 25, 2023 13:32
@pedroerp
Copy link
Contributor

@majetideepak I can help import, but github is saying there are conflicts that must be resolved in the DuckWrapper files first. I guess you need to first rebase on top of the other PR?

@majetideepak
Copy link
Collaborator Author

@pedroerp Rebased.

@facebook-github-bot
Copy link
Contributor

@pedroerp has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@assignUser
Copy link
Collaborator

(I am going to rerun the benchmark to see if the regression sticks around)

@assignUser
Copy link
Collaborator

assignUser commented Nov 16, 2023

fyi: the regression was false positive: https://github.com/facebookincubator/velox/actions/runs/6662181171/job/18731909455?pr=6725#step:16:736

(the conbench altert has all the runs, another commit would override that iirc)

@facebook-github-bot
Copy link
Contributor

@pedroerp merged this pull request in 98d6c05.

Copy link

Conbench analyzed the 1 benchmark run on commit 98d6c053.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

@mbasmanova
Copy link
Contributor

Seeing build failures: #7692

facebook-github-bot pushed a commit that referenced this pull request Nov 28, 2023
Summary:
After DuckDB upgrade (PR #6725) we see build errors:

```
/opt/gluten/ep/build-velox/build/velox_ep/_build/release/_deps/duckdb-src/src/planner/binder/expression/bind_star_expression.cpp:123:4: error: 'duckdb_re2' has not been declared
  123 |    duckdb_re2::RE2 regex(regex_str);
      |    ^~~~~~~~~~
```

If we compile re2 and fmt before DuckDB, there will cause dependency conflict.

A fix is to compile DuckDB before re2 and fmt.

Pull Request resolved: #7722

Reviewed By: pedroerp

Differential Revision: D51605184

Pulled By: mbasmanova

fbshipit-source-id: 54acb0a0f672abe710f6f398fa465ec46d38573c
@majetideepak majetideepak deleted the pgparser branch January 3, 2024 14:00
array.reserve(size);
for (auto i = 0; i < size; i++) {
auto innerRow = offset + i;
if (elements->isNullAt(innerRow)) {
array.emplace_back(::duckdb::Value(nullptr));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this change is responsible for breaking Join Fuzzer: #7943

CC: @majetideepak @kgpai @pedroerp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants