Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nightly Arrow tests are failing because of datafusion.substrait no longer being available #36

Closed
jorisvandenbossche opened this issue Dec 1, 2023 · 6 comments

Comments

@jorisvandenbossche
Copy link

The arrow nightly build with substrait integration tests is currently failing because of a datafusion import problem (latest logs):

Substrait Integration Tests
Validating imports
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/conda/envs/arrow/lib/python3.10/site-packages/substrait_consumer-0.0.1-py3.10.egg/substrait_consumer/consumers.py", line 13, in <module>
    from datafusion import substrait as ds
  File "/opt/conda/envs/arrow/lib/python3.10/site-packages/datafusion/substrait.py", line 19, in <module>
    from ._internal import substrait
ImportError: cannot import name 'substrait' from 'datafusion._internal' (/opt/conda/envs/arrow/lib/python3.10/site-packages/datafusion/_internal.abi3.so)
1

I think this might be caused by apache/datafusion-python#527, which appeared in datafusion 33.0.
(short term workaround to get the tests green again might be to pin datafusion to 32.0)

@jdye64
Copy link

jdye64 commented Dec 1, 2023

I'll take a look at this. I think I just need to enable the feature in the CI nightly build.

@jorisvandenbossche
Copy link
Author

The tests on this repo are also failing I see, but so I am talking about the tests included in Arrow, which is managed by https://github.com/apache/arrow/blob/main/ci/scripts/install_substrait_consumer.sh, and is essentially doing a simple pip install datafusion (from the requirements.txt here).

So I think for this to work, there will need to be a new release of datafusion that adds it again to the wheel (or, as mentioned, short term pin to 32.0 until there is a new release)

@jdye64
Copy link

jdye64 commented Dec 1, 2023

pip install datafusion (from the requirements.txt here).

Sorry I can't seem to find that while grepping the codebase? Am I overlooking something? Can you point me to the code snippet performing that pip install datafusion? I just made a PR to Arrow DataFusion Python to include substrait in the nightly builds and if that command is pulling from the nightlies that should hopefully help out.

I opened a PR apache/datafusion-python#544 where we can at least start to understand if that will help this issue or not and discuss there.

@jorisvandenbossche
Copy link
Author

jorisvandenbossche commented Dec 1, 2023

pip install datafusion (from the requirements.txt here).

Sorry I can't seem to find that while grepping the codebase? Am I overlooking something? Can you point me to the code snippet performing that pip install datafusion?

Yeah, I said "essentially doing a pip install" because you will indeed not find that by grepping... It's in the file I linked above, it loops over the packages in the requirements.txt file in this repo, and then installs those. And datafusion is in there:

But anyway, fixing this on the datafusion-python side as you are doing with apache/datafusion-python#544 is the proper fix I assume.

@jdye64
Copy link

jdye64 commented Dec 1, 2023

Ahh ok yeah. Sorry, I saw arrow in the managed bit you had above and incorrectly assumed the pip install datafusion python bit was in there. Thanks for the clarification. I think once #544 is merged and CI gets a chance to build the nightlies that should button everything up but if not will look into it further. Thanks for pointing this out.

@richtia
Copy link
Member

richtia commented Dec 1, 2023

Thanks for pointing this out @jorisvandenbossche! I completely forgot that the arrow nightlies used the dependencies in this repo. I had seen this issue last week, but only did the pinning in CI here. I'll push the pinning in the requirements.txt too.

@richtia richtia closed this as completed Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants