Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[enhancement] add sklearnex version of validate_data, _check_sample_weight #2177

Open
wants to merge 136 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
136 commits
Select commit Hold shift + click to select a range
32fe269
add finiteness_checker pybind11 bindings
icfaust Oct 23, 2024
cdbf1b5
added finiteness checker
icfaust Oct 23, 2024
62674a2
Update finiteness_checker.cpp
icfaust Oct 23, 2024
c75c23b
Update finiteness_checker.cpp
icfaust Oct 23, 2024
6a20938
Update finiteness_checker.cpp
icfaust Oct 23, 2024
382d7a1
Update finiteness_checker.cpp
icfaust Oct 23, 2024
c8ffd9c
Update finiteness_checker.cpp
icfaust Oct 23, 2024
9aa13d5
Update finiteness_checker.cpp
icfaust Oct 23, 2024
84e15d5
Rename finiteness_checker.cpp to finiteness_checker.cpp
icfaust Oct 23, 2024
63073c6
Update finiteness_checker.cpp
icfaust Oct 24, 2024
d915da5
Merge branch 'intel:main' into dev/new_assert_all_fininte
icfaust Oct 28, 2024
3dddf2d
add next step
icfaust Oct 31, 2024
1e1213e
follow conventions
icfaust Oct 31, 2024
0531713
make xtable explicit
icfaust Oct 31, 2024
e831167
remove comment
icfaust Oct 31, 2024
d6eb1d0
Update validation.py
icfaust Oct 31, 2024
fb30d6e
Update __init__.py
icfaust Nov 1, 2024
63a18c2
Update validation.py
icfaust Nov 1, 2024
76c0856
Update __init__.py
icfaust Nov 1, 2024
7deb2bb
Update __init__.py
icfaust Nov 1, 2024
ed46b29
Update validation.py
icfaust Nov 1, 2024
67d6273
Update _data_conversion.py
icfaust Nov 1, 2024
054f0a1
Merge branch 'main' into dev/new_assert_all_fininte
icfaust Nov 1, 2024
8abead9
Update _data_conversion.py
icfaust Nov 1, 2024
47d0f8b
Update policy_common.cpp
icfaust Nov 1, 2024
e48c2bd
Update policy_common.cpp
icfaust Nov 1, 2024
c6751c4
Update _policy.py
icfaust Nov 1, 2024
f3e4a3a
Update policy_common.cpp
icfaust Nov 2, 2024
39cdb5f
Rename finiteness_checker.cpp to finiteness_checker.cpp
icfaust Nov 2, 2024
0f39613
Create finiteness_checker.py
icfaust Nov 2, 2024
b42cfe3
Update validation.py
icfaust Nov 2, 2024
0ed615e
Update __init__.py
icfaust Nov 2, 2024
f101aff
attempt at fixing circular imports again
icfaust Nov 2, 2024
24c0e94
fix isort
icfaust Nov 2, 2024
3f96166
remove __init__ changes
icfaust Nov 2, 2024
d985053
last move
icfaust Nov 2, 2024
90ec48b
Update policy_common.cpp
icfaust Nov 2, 2024
8c2c854
Update policy_common.cpp
icfaust Nov 2, 2024
6fa38d7
Update policy_common.cpp
icfaust Nov 2, 2024
9c1ca9c
Update policy_common.cpp
icfaust Nov 2, 2024
4b67dbd
Update validation.py
icfaust Nov 2, 2024
fa59a3c
add testing
icfaust Nov 2, 2024
3330b33
isort
icfaust Nov 2, 2024
4895940
attempt to fix module error
icfaust Nov 2, 2024
0c6dd5d
add fptype
icfaust Nov 2, 2024
e2182fa
fix typo
icfaust Nov 2, 2024
982ef2c
Update validation.py
icfaust Nov 2, 2024
2fb52a8
remove sua_ifcae from to_table
icfaust Nov 3, 2024
28dc267
isort and black
icfaust Nov 3, 2024
2f85fd4
Update test_memory_usage.py
icfaust Nov 3, 2024
8659248
format
icfaust Nov 3, 2024
3827d6f
Update _data_conversion.py
icfaust Nov 3, 2024
55fa7d2
Update _data_conversion.py
icfaust Nov 3, 2024
175cd78
Update test_validation.py
icfaust Nov 3, 2024
7016ad0
remove unnecessary code
icfaust Nov 3, 2024
1a01859
Merge branch 'main' into dev/new_assert_all_fininte
icfaust Nov 18, 2024
2fbcdd9
merge master
icfaust Nov 18, 2024
fb7375f
make reviewer changes
icfaust Nov 19, 2024
30816bf
make dtype check change
icfaust Nov 19, 2024
abb3b16
add sparse testing
icfaust Nov 19, 2024
97aef73
try again
icfaust Nov 19, 2024
6e29651
try again
icfaust Nov 19, 2024
59363a8
try again
icfaust Nov 19, 2024
12de703
temporary commit
icfaust Nov 20, 2024
07ec3d8
first attempt
icfaust Nov 20, 2024
32c565d
missing change?
icfaust Nov 20, 2024
a571a4e
Merge branch 'intel:main' into dev/sklearnex_assert_all_finite
icfaust Nov 20, 2024
5093ed7
modify DummyEstimator for testing
icfaust Nov 20, 2024
f04deba
generalize DummyEstimator
icfaust Nov 20, 2024
740a5e7
switch test
icfaust Nov 20, 2024
27050bd
further testing changes
icfaust Nov 20, 2024
53c8f7b
add initial validate_data test, will be refactored
icfaust Nov 20, 2024
90f59c4
fixes for CI
icfaust Nov 20, 2024
7f170e2
Update validation.py
icfaust Nov 20, 2024
81e2bbc
Update validation.py
icfaust Nov 20, 2024
116bdba
Update test_memory_usage.py
icfaust Nov 20, 2024
076ebc4
Update base.py
icfaust Nov 20, 2024
e1d0743
Update base.py
icfaust Nov 20, 2024
f59cdd3
improve tests
icfaust Nov 20, 2024
7f9ea25
fix logic
icfaust Nov 20, 2024
51247c0
fix logic
icfaust Nov 20, 2024
6e5c0ef
fix logic again
icfaust Nov 20, 2024
8d47744
rename file
icfaust Nov 20, 2024
1ae9af5
Revert "rename file"
icfaust Nov 20, 2024
bf9b46e
remove duplication
icfaust Nov 20, 2024
3101c3f
fix imports
icfaust Nov 20, 2024
6da176b
Merge branch 'intel:main' into dev/sklearnex_assert_all_finite
icfaust Nov 20, 2024
ee799f6
Rename test_finite.py to test_validation.py
icfaust Nov 20, 2024
db4a6c6
Revert "Rename test_finite.py to test_validation.py"
icfaust Nov 20, 2024
b5acbac
updates
icfaust Nov 21, 2024
ed57c15
Update validation.py
icfaust Nov 21, 2024
414f897
fixes for some test failures
icfaust Nov 21, 2024
83253b3
fix text
icfaust Nov 21, 2024
b22e23a
fixes for some failures
icfaust Nov 21, 2024
2f8ec16
make consistent
icfaust Nov 21, 2024
1fd9973
fix bad logic
icfaust Nov 21, 2024
c20c8cc
fix in string
icfaust Nov 21, 2024
1ce1b10
attempt tp see if dataframe conversion is causing the issue
icfaust Nov 21, 2024
5355039
fix iter problem
icfaust Nov 21, 2024
b5b8442
fix testing issues
icfaust Nov 21, 2024
d025c89
formatting
icfaust Nov 21, 2024
428bfb6
revert change
icfaust Nov 21, 2024
da23138
fixes for pandas
icfaust Nov 21, 2024
1d0c330
there is a slowdown with pandas that needs to be solved
icfaust Nov 21, 2024
f3f63a6
swap to transpose for speed
icfaust Nov 21, 2024
56c8054
more clarity
icfaust Nov 21, 2024
1580d77
add _check_sample_weight
icfaust Nov 22, 2024
ffc9f1f
add more testing'
icfaust Nov 22, 2024
d184ed0
rename
icfaust Nov 22, 2024
c68616f
remove unnecessary imports
icfaust Nov 22, 2024
e7ea94e
fix test slowness
icfaust Nov 22, 2024
dbe108d
focus get_dataframes_and_queues
icfaust Nov 22, 2024
7284b59
put config_context around
icfaust Nov 22, 2024
e1be91d
Update test_validation.py
icfaust Nov 24, 2024
8a0f9e9
Update base.py
icfaust Nov 24, 2024
5272207
Update test_validation.py
icfaust Nov 24, 2024
21a7896
Merge branch 'intel:main' into dev/sklearnex_assert_all_finite
icfaust Nov 24, 2024
56b5c4c
generalize regex
icfaust Nov 25, 2024
0d1b306
add fixes for sklearn 1.0 and input_name
icfaust Nov 25, 2024
8ff312e
fixes for test failures
icfaust Nov 25, 2024
87b7e3b
Update validation.py
icfaust Nov 25, 2024
29e8f8c
Update test_validation.py
icfaust Nov 25, 2024
527ce22
Merge branch 'intel:main' into dev/sklearnex_assert_all_finite
icfaust Nov 25, 2024
27ce5fc
Update validation.py
icfaust Nov 27, 2024
5d31988
formattintg
icfaust Nov 27, 2024
c4dccd6
make suggested changes
icfaust Nov 27, 2024
f83f1ef
follow changes made in #2126
icfaust Nov 27, 2024
0356a90
Merge branch 'intel:main' into dev/sklearnex_assert_all_finite
icfaust Nov 27, 2024
e43c047
fix future device problem
icfaust Nov 27, 2024
a9504a8
Merge branch 'dev/sklearnex_assert_all_finite' of https://github.com/…
icfaust Nov 27, 2024
b799d44
Merge branch 'intel:main' into dev/sklearnex_assert_all_finite
icfaust Nov 27, 2024
5c81f9d
Update validation.py
icfaust Nov 27, 2024
6ef96b1
merge main
icfaust Nov 28, 2024
1db7575
Merge branch 'uxlfoundation:main' into dev/sklearnex_assert_all_finite
icfaust Dec 2, 2024
8fca003
Merge branch 'uxlfoundation:main' into dev/sklearnex_assert_all_finite
icfaust Dec 3, 2024
164435d
minor changes based on #2206, suggestions
icfaust Dec 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 7 additions & 38 deletions sklearnex/tests/test_memory_usage.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,14 @@
get_dataframes_and_queues,
)
from onedal.tests.utils._device_selection import get_queues, is_dpctl_device_available
from onedal.utils._array_api import _get_sycl_namespace
from onedal.utils._dpep_helpers import dpctl_available, dpnp_available
from sklearnex import config_context
from sklearnex.tests.utils import PATCHED_FUNCTIONS, PATCHED_MODELS, SPECIAL_INSTANCES
from sklearnex.tests.utils import (
PATCHED_FUNCTIONS,
PATCHED_MODELS,
SPECIAL_INSTANCES,
DummyEstimator,
)
from sklearnex.utils._array_api import get_namespace

if dpctl_available:
Expand Down Expand Up @@ -131,41 +135,6 @@ def gen_functions(functions):
ORDER_DICT = {"F": np.asfortranarray, "C": np.ascontiguousarray}


if _is_dpc_backend:

from sklearn.utils.validation import check_is_fitted

from onedal.datatypes import from_table, to_table

class DummyEstimatorWithTableConversions(BaseEstimator):

def fit(self, X, y=None):
sua_iface, xp, _ = _get_sycl_namespace(X)
X_table = to_table(X)
y_table = to_table(y)
# The presence of the fitted attributes (ending with a trailing
# underscore) is required for the correct check. The cleanup of
# the memory will occur at the estimator instance deletion.
self.x_attr_ = from_table(
X_table, sua_iface=sua_iface, sycl_queue=X.sycl_queue, xp=xp
)
self.y_attr_ = from_table(
y_table, sua_iface=sua_iface, sycl_queue=X.sycl_queue, xp=xp
)
return self

def predict(self, X):
# Checks if the estimator is fitted by verifying the presence of
# fitted attributes (ending with a trailing underscore).
check_is_fitted(self)
sua_iface, xp, _ = _get_sycl_namespace(X)
X_table = to_table(X)
returned_X = from_table(
X_table, sua_iface=sua_iface, sycl_queue=X.sycl_queue, xp=xp
)
return returned_X


def gen_clsf_data(n_samples, n_features, dtype=None):
data, label = make_classification(
n_classes=2, n_samples=n_samples, n_features=n_features, random_state=777
Expand Down Expand Up @@ -369,7 +338,7 @@ def test_table_conversions_memory_leaks(dataframe, queue, order, data_shape, dty
pytest.skip("SYCL device memory leak check requires the level zero sysman")

_kfold_function_template(
DummyEstimatorWithTableConversions,
DummyEstimator,
dataframe,
data_shape,
queue,
Expand Down
2 changes: 2 additions & 0 deletions sklearnex/tests/utils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
SPECIAL_INSTANCES,
UNPATCHED_FUNCTIONS,
UNPATCHED_MODELS,
DummyEstimator,
_get_processor_info,
call_method,
gen_dataset,
Expand All @@ -39,6 +40,7 @@
"gen_models_info",
"gen_dataset",
"sklearn_clone_dict",
"DummyEstimator",
]

_IS_INTEL = "GenuineIntel" in _get_processor_info()
41 changes: 41 additions & 0 deletions sklearnex/tests/utils/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,11 @@
)
from sklearn.datasets import load_diabetes, load_iris
from sklearn.neighbors._base import KNeighborsMixin
from sklearn.utils.validation import check_is_fitted

from onedal.datatypes import from_table, to_table
from onedal.tests.utils._dataframes_support import _convert_to_dataframe
from onedal.utils._array_api import _get_sycl_namespace
from sklearnex import get_patch_map, patch_sklearn, sklearn_is_patched, unpatch_sklearn
from sklearnex.basic_statistics import BasicStatistics, IncrementalBasicStatistics
from sklearnex.linear_model import LogisticRegression
Expand Down Expand Up @@ -369,3 +372,41 @@ def _get_processor_info():
)

return proc


class DummyEstimator(BaseEstimator):

def fit(self, X, y=None):
sua_iface, xp, _ = _get_sycl_namespace(X)
X_table = to_table(X)
y_table = to_table(y)
# The presence of the fitted attributes (ending with a trailing
# underscore) is required for the correct check. The cleanup of
# the memory will occur at the estimator instance deletion.
if sua_iface:
self.x_attr_ = from_table(
X_table, sua_iface=sua_iface, sycl_queue=X.sycl_queue, xp=xp
)
self.y_attr_ = from_table(
y_table, sua_iface=sua_iface, sycl_queue=X.sycl_queue, xp=xp
)
else:
self.x_attr = from_table(X_table)
self.y_attr = from_table(y_table)

return self

def predict(self, X):
# Checks if the estimator is fitted by verifying the presence of
# fitted attributes (ending with a trailing underscore).
check_is_fitted(self)
sua_iface, xp, _ = _get_sycl_namespace(X)
X_table = to_table(X)
if sua_iface:
returned_X = from_table(
X_table, sua_iface=sua_iface, sycl_queue=X.sycl_queue, xp=xp
)
else:
returned_X = from_table(X_table)

return returned_X
4 changes: 2 additions & 2 deletions sklearnex/utils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,6 @@
# limitations under the License.
# ===============================================================================

from .validation import _assert_all_finite
from .validation import assert_all_finite

__all__ = ["_assert_all_finite"]
__all__ = ["assert_all_finite"]
89 changes: 0 additions & 89 deletions sklearnex/utils/tests/test_finite.py

This file was deleted.

Loading