Skip to content

Commit

Permalink
[FSTORE-1633] Fix engine choice in case of connection to serverless (…
Browse files Browse the repository at this point in the history
…4.1) (#427)

* [FSTORE-1633] Fix engine choice in case of connection to serverless

* Bump version to 4.1.3
  • Loading branch information
aversey authored Dec 6, 2024
1 parent 2515818 commit c9d6d74
Show file tree
Hide file tree
Showing 8 changed files with 30 additions and 26 deletions.
2 changes: 1 addition & 1 deletion java/beam/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<parent>
<artifactId>hsfs-parent</artifactId>
<groupId>com.logicalclocks</groupId>
<version>4.1.2</version>
<version>4.1.3</version>
</parent>
<modelVersion>4.0.0</modelVersion>

Expand Down
2 changes: 1 addition & 1 deletion java/flink/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<parent>
<artifactId>hsfs-parent</artifactId>
<groupId>com.logicalclocks</groupId>
<version>4.1.2</version>
<version>4.1.3</version>
</parent>
<modelVersion>4.0.0</modelVersion>

Expand Down
2 changes: 1 addition & 1 deletion java/hsfs/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<parent>
<artifactId>hsfs-parent</artifactId>
<groupId>com.logicalclocks</groupId>
<version>4.1.2</version>
<version>4.1.3</version>
</parent>
<modelVersion>4.0.0</modelVersion>

Expand Down
2 changes: 1 addition & 1 deletion java/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
<groupId>com.logicalclocks</groupId>
<artifactId>hsfs-parent</artifactId>
<packaging>pom</packaging>
<version>4.1.2</version>
<version>4.1.3</version>
<modules>
<module>hsfs</module>
<module>spark</module>
Expand Down
2 changes: 1 addition & 1 deletion java/spark/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
<parent>
<artifactId>hsfs-parent</artifactId>
<groupId>com.logicalclocks</groupId>
<version>4.1.2</version>
<version>4.1.3</version>
</parent>
<modelVersion>4.0.0</modelVersion>

Expand Down
42 changes: 23 additions & 19 deletions python/hopsworks_common/connection.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
import weakref
from typing import Any, Optional

from hopsworks_common import client, usage, util, version
from hopsworks_common import client, constants, usage, util, version
from hopsworks_common.core import (
hosts_api,
project_api,
Expand Down Expand Up @@ -98,13 +98,12 @@ class Connection:
project: The name of the project to connect to. When running on Hopsworks, this
defaults to the project from where the client is run from.
Defaults to `None`.
engine: Which engine to use, `"spark"`, `"python"` or `"training"`. Defaults to `None`,
which initializes the engine to Spark if the environment provides Spark, for
example on Hopsworks and Databricks, or falls back to Python if Spark is not
available, e.g. on local Python environments or AWS SageMaker. This option
allows you to override this behaviour. `"training"` engine is useful when only
feature store metadata is needed, for example training dataset location and label
information when Hopsworks training experiment is conducted.
engine: Specifies the engine to use. Possible options are "spark", "python", "training", "spark-no-metastore", or "spark-delta". The default value is None, which automatically selects the engine based on the environment:
"spark": Used if Spark is available and the connection is not to serverless Hopsworks, such as in Hopsworks or Databricks environments.
"python": Used in local Python environments or AWS SageMaker when Spark is not available or the connection is done to serverless Hopsworks.
"training": Used when only feature store metadata is needed, such as for obtaining training dataset locations and label information during Hopsworks training experiments.
"spark-no-metastore": Functions like "spark" but does not rely on the Hive metastore.
"spark-delta": Minimizes dependencies further by avoiding both Hive metastore and HopsFS.
hostname_verification: Whether or not to verify Hopsworks' certificate, defaults
to `True`.
trust_store_path: Path on the file system containing the Hopsworks certificates,
Expand Down Expand Up @@ -338,30 +337,35 @@ def connect(self) -> None:
self._connected = True
finalizer = weakref.finalize(self, self.close)
try:
external = client.base.Client.REST_ENDPOINT not in os.environ
serverless = self._host == constants.HOSTS.APP_HOST
# determine engine, needed to init client
if (self._engine is not None and self._engine.lower() == "spark") or (
self._engine is None and importlib.util.find_spec("pyspark")
if (
self._engine is None
and importlib.util.find_spec("pyspark")
and (not external or not serverless)
):
self._engine = "spark"
elif (self._engine is not None and self._engine.lower() == "python") or (
self._engine is None and not importlib.util.find_spec("pyspark")
):
elif self._engine is None:
self._engine = "python"
elif self._engine.lower() == "spark":
self._engine = "spark"
elif self._engine.lower() == "python":
self._engine = "python"
elif self._engine is not None and self._engine.lower() == "training":
elif self._engine.lower() == "training":
self._engine = "training"
elif (
self._engine is not None
and self._engine.lower() == "spark-no-metastore"
):
elif self._engine.lower() == "spark-no-metastore":
self._engine = "spark-no-metastore"
elif self._engine.lower() == "spark-delta":
self._engine = "spark-delta"
else:
raise ConnectionError(
"Engine you are trying to initialize is unknown. "
"Supported engines are `'spark'`, `'python'` and `'training'`."
)

# init client
if client.base.Client.REST_ENDPOINT not in os.environ:
if external:
client.init(
"external",
self._host,
Expand Down
2 changes: 1 addition & 1 deletion python/hopsworks_common/version.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,4 @@
# limitations under the License.
#

__version__ = "4.1.2"
__version__ = "4.1.3"
2 changes: 1 addition & 1 deletion utils/java/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

<groupId>com.logicalclocks</groupId>
<artifactId>hsfs-utils</artifactId>
<version>4.1.2</version>
<version>4.1.3</version>

<properties>
<hops.version>3.2.0.0-SNAPSHOT</hops.version>
Expand Down

0 comments on commit c9d6d74

Please sign in to comment.