-
Notifications
You must be signed in to change notification settings - Fork 655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT-#7433: Replace NativeDataFrameMode with a complete "native" execution. #7436
FEAT-#7433: Replace NativeDataFrameMode with a complete "native" execution. #7436
Conversation
…"native" execution. Signed-off-by: sfc-gh-mvashishtha <[email protected]>
Signed-off-by: sfc-gh-mvashishtha <[email protected]>
Signed-off-by: sfc-gh-mvashishtha <[email protected]>
Signed-off-by: sfc-gh-mvashishtha <[email protected]>
Signed-off-by: sfc-gh-mvashishtha <[email protected]>
Signed-off-by: sfc-gh-mvashishtha <[email protected]>
Signed-off-by: sfc-gh-mvashishtha <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI overview of changes here:
- I added some clarifying comments here around where I made changes.
- We now test all native dataframe mode capabilities on every pull request or push to main, instead of conditionally on
test-native-dataframe-mode
. - "native" mostly runs the same tests that we run with "PandasOnPython", and it also runs the native interoperability suite.
modin/core/io/io.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: The changes here are all about enabling skipping the "default to pandas" warning in the native I/O subclass of BaseIO
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI the changes here are about enabling subclasses to choose whether to warn on default to pandas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple questions. It may also be worth pinging the people that added the Native QC in the first place, in case this would cause any breaking changes for them.
@arunjose696 @anmyachev we are trying to add a complete execution mode for Please let us know if anyone is relying on |
Signed-off-by: sfc-gh-mvashishtha <[email protected]>
Hi @sfc-gh-mvashishtha, based on the following comment and the pull request description, the change is essentially only in the mode name (from user's perspective). If so, no problem, we'll rewrite it if necessary. Thanks for letting me know.
|
Replace NativeDataFrameMode with
"NativeOnNative"
execution mode.User-facing changes
Prior to this commit, users had to switch the config variable
NativeDataFrameMode
from the default of"Default"
to"Pandas"
to use native execution. Now native execution is another modin execution mode withStorageFormat
of"Native"
andEngine
of"Native.
"Integration tests and CI
Prior to this commit, we ran 1) a set of tests checking that native Modin dataframes could interoperate with non-native dataframes 2) a subset of tests in native dataframe mode.
Now, we run the interoperability test suite, but also run the entire rest of the test suite (except for some partitioned-execution-only tests) in native execution mode via the
test-all
job matrix. This commit also renames the interoperability test suite from modin/tests/pandas/native_df_mode to modin/tests/pandas/native_df_interoperability/.Deleting most of the NativeQueryCompiler implementation
NativeQueryCompiler had a long implementation which was mostly the same as the BaseQueryCompiler implementation. However, there were some bugs in NativeQueryCompiler, including some correctness bugs related to copying the underlying pandas dataframe (see #7435). This commit deletes most of the NativeQueryCompiler implementation, so that the native query compiler mostly works just like the BaseQueryCompiler. The main difference is that while
BaseQueryCompiler
uses a partitioned pandas dataframe (under thePython
execution, so all in a single process), the native query compiler does not use partitions.Warning messages about default to pandas
While BaseQueryCompiler and BaseIO warn when they default to pandas, they should not do so when using native execution. We add class-level fields to these classes that tell whether to warn on default to pandas.
By default, we treat warnings as errors in our test suite, so in many places we have to look for the default to pandas warning only if we are not native execution mode. For convenience, this PR adds testing utility methods to 1) detect the global native execution mode 2) detect whether a dataframe or series is using native execution 3) conditionally expect a warning about defaulting to pandas.
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
docs/development/architecture.rst
is up-to-date