Skip to content

Commit

Permalink
Merge branch 'intel:main' into dev/bs_zero
Browse files Browse the repository at this point in the history
  • Loading branch information
icfaust authored Nov 28, 2024
2 parents 1175a98 + d9a25a5 commit e583ef5
Show file tree
Hide file tree
Showing 11 changed files with 61 additions and 53 deletions.
16 changes: 9 additions & 7 deletions INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,14 @@ The build-process (using setup.py) happens in 4 stages:
python setup.py develop --no-deps
```

Where:

* Keys `--single-version-externally-managed` and `--no-deps` are required to not download daal4py after the installation of Intel(R) Extension for Scikit-learn.
* The `develop` mode does not install the package but creates a `.egg-link` in the deployment directory
back to the project source-code directory. That way, you can edit the source code and see the changes
without reinstalling the package after a small change.
* `--single-version-externally-managed` is an option for Python packages instructing the setuptools module to create a package that the host's package manager can easily manage.
- To build the python module without installing it:
```bash
Expand All @@ -217,13 +225,7 @@ python setup.py build_ext --inplace --force --abs-rpath
python setup.py build --abs-rpath
```
Where:

* Keys `--single-version-externally-managed` and `--no-deps` are required to not download daal4py after the installation of Intel(R) Extension for Scikit-learn.
* The `develop` mode does not install the package but creates a `.egg-link` in the deployment directory
back to the project source-code directory. That way, you can edit the source code and see the changes
without reinstalling the package after a small change.
* `--single-version-externally-managed` is an option for Python packages instructing the setup tools module to create a package the host's package manager can easily manage.
**Note:** when building `scikit-learn-intelex` from source with this option, it will use the oneDAL library with which it was compiled. oneDAL has dependencies on other libraries such as TBB, which is also distributed as a python package through `pip` and as a `conda` package. By default, a conda environment will first try to load TBB from its own packages if it is installed in the environment, which might cause issues if oneDAL was compiled with a system TBB instead of a conda one. In such cases, it is advised to either uninstall TBB from pip/conda (it will be loaded from the oneDAL library which links to it), or modify the order of search paths in environment variables like `${LD_LIBRARY_PATH}`.
## Build from Sources with `conda-build`
Expand Down
2 changes: 1 addition & 1 deletion dependencies-dev
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,5 @@ Jinja2==3.1.4
numpy==2.0.1 ; python_version <= '3.9'
numpy==2.1.3 ; python_version > '3.9'
pybind11==2.13.6
cmake==3.31.0.1
cmake==3.31.1
setuptools==75.6.0
12 changes: 6 additions & 6 deletions onedal/datatypes/_data_conversion.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,12 +35,12 @@ def _convert_one_to_table(arg):
def to_table(*args):
"""Create oneDAL tables from scalars and/or arrays.
Note: this implementation can be used with contiguous scipy.sparse, numpy
ndarrays, DPCTL/DPNP usm_ndarrays and scalars. Tables will use pointers to the
original array data. Scalars will be copies. Arrays may be modified in-
place by oneDAL during computation. This works for data located on CPU and
SYCL-enabled Intel GPUs. Each array may only be of a single datatype (i.e.
each must be homogeneous).
Note: this implementation can be used with scipy.sparse, numpy ndarrays,
DPCTL/DPNP usm_ndarrays and scalars. Tables will use pointers to the
original array data. Scalars and non-contiguous arrays will be copies.
Arrays may be modified in-place by oneDAL during computation. This works
for data located on CPU and SYCL-enabled Intel GPUs. Each array may only
be of a single datatype (i.e. each must be homogeneous).
Parameters
----------
Expand Down
28 changes: 18 additions & 10 deletions onedal/datatypes/data_conversion.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -155,18 +155,26 @@ dal::table convert_to_table(PyObject *obj) {
}
if (is_array(obj)) {
PyArrayObject *ary = reinterpret_cast<PyArrayObject *>(obj);
if (array_is_behaved_C(ary) || array_is_behaved_F(ary)) {
if (!PyArray_ISCARRAY_RO(ary) && !PyArray_ISFARRAY_RO(ary)) {
// NOTE: this will make a C-contiguous deep copy of the data
// this is expected to be a special case
ary = PyArray_GETCONTIGUOUS(ary);
if (ary) {
res = convert_to_table(reinterpret_cast<PyObject *>(ary));
Py_DECREF(ary);
return res;
}
else {
throw std::invalid_argument(
"[convert_to_table] Numpy input could not be converted into onedal table.");
}
}
#define MAKE_HOMOGEN_TABLE(CType) res = convert_to_homogen_impl<CType>(ary);
SET_NPY_FEATURE(array_type(ary),
array_type_sizeof(ary),
MAKE_HOMOGEN_TABLE,
throw std::invalid_argument("Found unsupported array type"));
SET_NPY_FEATURE(array_type(ary),
array_type_sizeof(ary),
MAKE_HOMOGEN_TABLE,
throw std::invalid_argument("Found unsupported array type"));
#undef MAKE_HOMOGEN_TABLE
}
else {
throw std::invalid_argument(
"[convert_to_table] Numpy input Could not convert Python object to onedal table.");
}
}
else if (strcmp(Py_TYPE(obj)->tp_name, "csr_matrix") == 0 || strcmp(Py_TYPE(obj)->tp_name, "csr_array") == 0) {
PyObject *py_data = PyObject_GetAttrString(obj, "data");
Expand Down
24 changes: 22 additions & 2 deletions onedal/datatypes/data_conversion_sua_iface.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@

namespace oneapi::dal::python {

using namespace pybind11::literals;
// Please follow <https://intelpython.github.io/dpctl/latest/
// api_reference/dpctl/sycl_usm_array_interface.html#sycl-usm-array-interface-attribute>
// for the description of `__sycl_usm_array_interface__` protocol.
Expand All @@ -42,6 +43,8 @@ namespace oneapi::dal::python {
// of `__sycl_usm_array_interface__` protocol.
template <typename Type>
dal::table convert_to_homogen_impl(py::object obj) {
dal::table res{};

// Get `__sycl_usm_array_interface__` dictionary representing USM allocations.
auto sua_iface_dict = get_sua_interface(obj);

Expand All @@ -64,6 +67,25 @@ dal::table convert_to_homogen_impl(py::object obj) {
// Get oneDAL Homogen DataLayout enumeration from input object shape and strides.
const auto layout = get_sua_iface_layout(sua_iface_dict, r_count, c_count);

if (layout == dal::data_layout::unknown){
// NOTE: this will make a C-contiguous deep copy of the data
// if possible, this is expected to be a special case
py::object copy;
if (py::hasattr(obj, "copy")){
copy = obj.attr("copy")();
}
else if (py::hasattr(obj, "__array_namespace__")){
const auto space = obj.attr("__array_namespace__")();
copy = space.attr("asarray")(obj, "copy"_a = true);
}
else {
throw std::runtime_error("Wrong strides");
}
res = convert_to_homogen_impl<Type>(copy);
copy.dec_ref();
return res;
}

// Get `__sycl_usm_array_interface__['data'][0]`, the first element of data entry,
// which is a Python integer encoding USM pointer value.
const auto* const ptr = reinterpret_cast<const Type*>(get_sua_ptr(sua_iface_dict));
Expand All @@ -79,8 +101,6 @@ dal::table convert_to_homogen_impl(py::object obj) {
// Use read-only accessor for onedal table.
bool is_readonly = is_sua_readonly(sua_iface_dict);

dal::table res{};

if (is_readonly) {
res = dal::homogen_table(queue,
ptr,
Expand Down
15 changes: 3 additions & 12 deletions onedal/datatypes/tests/test_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -377,23 +377,14 @@ def test_sua_iface_interop_unsupported_dtypes(dataframe, queue, dtype):
def test_to_table_non_contiguous_input(dataframe, queue):
if dataframe in "dpnp,dpctl" and not _is_dpc_backend:
pytest.skip("__sycl_usm_array_interface__ support requires DPC backend.")
X = np.mgrid[:10, :10]
X, _ = np.mgrid[:10, :10]
X = _convert_to_dataframe(X, sycl_queue=queue, target_df=dataframe)
X = X[:, :3]
sua_iface, _, _ = _get_sycl_namespace(X)
# X expected to be non-contiguous.
assert not X.flags.c_contiguous and not X.flags.f_contiguous

# TODO:
# consistent error message.
if dataframe in "dpnp,dpctl":
expected_err_msg = (
"Unable to convert from SUA interface: only 1D & 2D tensors are allowed"
)
else:
expected_err_msg = "Numpy input Could not convert Python object to onedal table."
with pytest.raises(ValueError, match=expected_err_msg):
to_table(X)
X_t = to_table(X)
assert X_t and X_t.shape == (10, 3) and X_t.has_data


@pytest.mark.skipif(
Expand Down
2 changes: 1 addition & 1 deletion onedal/datatypes/utils/sua_iface_helpers.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ dal::data_layout get_sua_iface_layout(const py::dict& sua_dict,
return dal::data_layout::column_major;
}
else {
throw std::runtime_error("Wrong strides");
return dal::data_layout::unknown;
}
}
else {
Expand Down
9 changes: 0 additions & 9 deletions onedal/utils/validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -158,15 +158,6 @@ def _check_array(

if sp.issparse(array):
return array

# TODO: Convert this kind of arrays to a table like in daal4py
if not array.flags.aligned and not array.flags.writeable:
array = np.array(array.tolist())

# TODO: If data is not contiguous copy to contiguous
# Need implemeted numpy table in oneDAL
if not array.flags.c_contiguous and not array.flags.f_contiguous:
array = np.ascontiguousarray(array, array.dtype)
return array


Expand Down
2 changes: 1 addition & 1 deletion requirements-test.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ numpy>=2.0.0 ; python_version >= '3.12'
scikit-learn==1.5.2
pandas==2.1.3 ; python_version < '3.11'
pandas==2.2.3 ; python_version >= '3.11'
xgboost==2.1.2
xgboost==2.1.3
lightgbm==4.5.0
catboost==1.2.7 ; python_version < '3.11' # TODO: Remove 3.11 condition when catboost supports numpy 2.0
shap==0.46.0 ; python_version < '3.13'
Expand Down
2 changes: 0 additions & 2 deletions sklearnex/cluster/tests/test_kmeans.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,8 +133,6 @@ def test_dense_vs_sparse(queue, init, algorithm, dims):
from sklearnex.cluster import KMeans

if init == "random" or (not _IS_INTEL and init == "k-means++"):
if daal_check_version((2025, "P", 200)):
pytest.fail("Re-verify failure of k-means++ in 2025.2 oneDAL")
pytest.skip(f"{init} initialization for sparse K-means is non-conformant.")

# For higher level of sparsity (smaller density) the test may fail
Expand Down
2 changes: 0 additions & 2 deletions sklearnex/tests/test_run_to_run_stability.py
Original file line number Diff line number Diff line change
Expand Up @@ -153,8 +153,6 @@ def _skip_neighbors(estimator, method):
and method
in ["score", "predict", "kneighbors", "kneighbors_graph", "predict_proba"]
):
if daal_check_version((2025, "P", 200)):
pytest.fail("Re-verify failure of algorithms in oneDAL 2025.2")
pytest.skip(f"{estimator} shows instability on non-Intel(R) hardware")


Expand Down

0 comments on commit e583ef5

Please sign in to comment.