Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(go/adbc/drivermgr): Implement SetSubstraitPlan in CGO wrapper and add DuckDB wrapper tests #1709

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

joellubi
Copy link
Member

@joellubi joellubi commented Apr 6, 2024

No description provided.

@github-actions github-actions bot added this to the ADBC Libraries 1.0.0 milestone Apr 6, 2024
@joellubi joellubi changed the title feat(go/adbc/drivermgr): Implement SetSubstraitPlan in CGO Wrapper and Add DuckDB to Test feat(go/adbc/drivermgr): Implement SetSubstraitPlan in CGO wrapper and add DuckDB wrapper tests Apr 6, 2024
dm.drv = drivermgr.Driver{}
var err error
dm.db, err = dm.drv.NewDatabase(map[string]string{
"driver": "libduckdb",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm honestly surprised this works without needing the suffix. That's really cool to know and can likely help us simplify some other CI

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should prefer just "duckdb", and the C++ driver manger will look for "libduckdb.so" and "duckdb.dll"

_, err = stmt.ExecuteUpdate(dm.ctx)
dm.NoError(err)

dm.NoError(stmt.SetSqlQuery("INSERT INTO test_table (id, name) VALUES (1, 'test')"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason to not just insert all three rows in one query?

INSERT INTO test_table (id, name) VALUES (1, 'test'), (2, 'test'), (3, 'tested') or does DuckDB not support that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this to use a single statement as you suggested. I think I was just rushing lol.

Comment on lines +125 to +116
countFn, err := bldr.AggregateFn(extensions.SubstraitDefaultURIPrefix+"functions_aggregate_generic.yaml", "count", nil)
dm.Require().NoError(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sigh i should really improve this in the substrait library lol

@zeroshade
Copy link
Member

Thanks so much for this, this is great both for adding the CI using duckdb and for SetSubtraitPlan being used.

It looks like the issue is purely that conda isn't finding libduckdb for windows which is odd....

@lidavidm
Copy link
Member

lidavidm commented Apr 7, 2024

It looks like the issue is purely that conda isn't finding libduckdb for windows which is odd....

because on windows, it's probably duckdb.dll; Windows doesn't use the lib prefix. Use just "duckdb" and on Windows it'll add .dll for you and otherwise it'll prepend lib and append .so/.dylib

@joellubi
Copy link
Member Author

joellubi commented Apr 8, 2024

It looks like the issue is purely that conda isn't finding libduckdb for windows which is odd....

because on windows, it's probably duckdb.dll; Windows doesn't use the lib prefix. Use just "duckdb" and on Windows it'll add .dll for you and otherwise it'll prepend lib and append .so/.dylib

I tried just duckdb without lib and it works, which is nice. I think there's another issue though which is that there doesn't seem to even be a windows version hosted on conda-forge...

That said, I had been testing this with a nightly release of duckdb because some substrait issues were recently fixed but hadn't yet been included in a stable release. Testing against nightly might help surface issues/drift early. I suppose I could just download those in the github workflow and branch on the os / arch. Kind of complicated unfortunately. What do you think?

@lidavidm
Copy link
Member

lidavidm commented Apr 8, 2024

If someone knows the right things to do here (I outlined what I think is necessary but don't really have the time: need to add the mingw gcc on Windows + configure the Go builds here to use a different compiler than CC/CXX) conda-forge/arrow-adbc-split-feedstock#17

@lidavidm
Copy link
Member

lidavidm commented Apr 8, 2024

I think it's fine to download nightly duckdb/in the future we could even test against both in the workflow (just repeat the steps)

@joellubi
Copy link
Member Author

joellubi commented Apr 9, 2024

@lidavidm @zeroshade

As I looked more closely I realized it's not only Windows failing but also Unix CGO (e.g. macos-14). Is there anything that needs to be done to properly locate the library after installing with conda? I didn't see libsqlite getting copied into another directory, so I wasn't sure why only duckdb wouldn't work the same way. Here's the test failure:

=== RUN   TestDriverMgrDuckDB
    wrapper_duckdb_test.go:54: 
        	Error Trace:	/Users/runner/work/arrow-adbc/arrow-adbc/go/adbc/drivermgr/wrapper_duckdb_test.go:54
        	            				/Users/runner/go/pkg/mod/github.com/stretchr/[email protected]/suite/suite.go:157
        	            				/Users/runner/work/arrow-adbc/arrow-adbc/go/adbc/drivermgr/wrapper_duckdb_test.go:162
        	Error:      	Received unexpected error:
        	            	Internal: [Driver Manager] [DriverManager] dlopen() failed: dlopen(duckdb, 0x0006): tried: '/duckdb' (no such file), '/Users/runner/local/lib/duckdb' (no such file), 'duckdb' (no such file), '/System/Volumes/Preboot/Cryptexes/OSduckdb' (no such file), '/usr/lib/duckdb' (no such file, not in dyld cache), 'duckdb' (no such file)
        	            	dlopen() failed: dlopen(libduckdb.dylib, 0x0006): tried: '/libduckdb.dylib' (no such file), '/Users/runner/local/lib/libduckdb.dylib' (no such file), 'libduckdb.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OSlibduckdb.dylib' (no such file), '/usr/lib/libduckdb.dylib' (no such file, not in dyld cache), 'libduckdb.dylib' (no such file)

@lidavidm
Copy link
Member

lidavidm commented Apr 9, 2024

I think we install into /Users/runner/local/lib/ and set DYLD_LIBRARY_PATH when we build the SQLite driver, so we'd need to make sure DuckDB ends up in the same path.

@lidavidm
Copy link
Member

lidavidm commented Apr 9, 2024

Hmm, but duckdb should be in the Conda env. Possibly add $CONDA_PREFIX/lib explicitly to DYLD_LIBRARY_PATH?

@joellubi joellubi force-pushed the impl-wrapper-substrait branch 2 times, most recently from 044df4f to c4a7afe Compare April 10, 2024 18:25
@joellubi
Copy link
Member Author

Hmm, but duckdb should be in the Conda env. Possibly add $CONDA_PREFIX/lib explicitly to DYLD_LIBRARY_PATH?

That did the trick. I think previously we were using libraries that were either built as part of the workflow or included in the distribution itself (I think this may have been the case with sqlite).

@joellubi
Copy link
Member Author

I looked into running the workflow against nightly duckdb...

While I think it would be nice to test against nightlies, they are not currently uploaded to any package manager. The files need to be downloaded and unzipped (link). This isn't too bad but the naming and paths aren't very consistent across platforms and architectures, so while we could get this working now it will very likely be a brittle workflow if/when the names/paths change in the future.

According to the release calendar, an official release including the required substrait fixes should be available within a week or so. In the interest of simplicity, I think we can just leave this PR as a draft and re-run the workflow in a week to confirm the tests pass.

@lidavidm
Copy link
Member

Cool, thanks for investigating.

@lidavidm lidavidm removed this from the ADBC Libraries 1.0.0 milestone May 3, 2024
@joellubi joellubi force-pushed the impl-wrapper-substrait branch from 8826917 to 83a0ab9 Compare May 13, 2024 17:25
@joellubi
Copy link
Member Author

I rebased and reran the workflows with the newer release of duckdb. The test failures resolved as expected. The issue remains however that there is unfortunately no libduckdb package published to conda-forge for windows...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants