-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure query-planning is disabled in dask #371
Ensure query-planning is disabled in dask #371
Conversation
Documentation preview |
Notes on testing with RAPIDS 24.04This PR disables both query-planning and automatic string conversion, which mitigate most test failures with 24.04. Remaining failures are described below (TODO: file issues where necessary)...
|
…don't import dd anywere before disabling these settings)
from merlin.config import validate_dask_configs | ||
|
||
validate_dask_configs() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benfred @jperez999 - It doesn't seem like I can add a top-level merlin/__init__.py
file to perform the dask-config validation without messing up the namespace for libraries like merlin.dataloader
.
What do you think about the new merlin.config
module I added in this PR?
As far as I can tell, we can be sure query-planning is disabled across merlin/nvt if we add this validate_dask_configs()
call at the top of merlin/core/__init__.py
, merlin/io/__init__.py
, and merlin/dag/__init__.py
. All internal merlin code using dask will need to import one of these modules first. Therefore, we only need to worry about the user importing dask.dataframe
before importing something from merlin, and I think validate_dask_configs
raises a suitable error message when this happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine. Is there a way for us to test this works, right now? Or do we need to wait until we start using 24.06+ cudf/dask packages?
Adds logic to raise an error if "query-planning" is enabled in dask when
merlin.core
is imported.We will need to make sure
merlin.core
is imported (across all of Merlin) beforedask.dataframe
is ever imported.This is most important for RAPIDS 24.06+, but may apply to edge cases in 24.04.