Skip to content

Commit

Permalink
Replace hsfs with hopsworks where it is possible in docs
Browse files Browse the repository at this point in the history
  • Loading branch information
aversey committed Oct 25, 2024
1 parent 36ee228 commit fea4590
Show file tree
Hide file tree
Showing 8 changed files with 86 additions and 18 deletions.
2 changes: 1 addition & 1 deletion python/hopsworks_common/client/online_store_rest_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -305,7 +305,7 @@ def _check_hopsworks_connection(self) -> None:
assert (
client.get_instance() is not None and client.get_instance()._connected
), """Hopsworks Client is not connected. Please connect to Hopsworks cluster
via hopsworks.login or hsfs.connection before initialising the Online Store REST Client.
via hopsworks.login before initialising the Online Store REST Client.
"""
_logger.debug("Hopsworks connection is active.")

Expand Down
69 changes: 68 additions & 1 deletion python/hopsworks_common/connection.py
Original file line number Diff line number Diff line change
Expand Up @@ -477,7 +477,74 @@ def connection(
api_key_file: Optional[str] = None,
api_key_value: Optional[str] = None,
) -> Connection:
"""Connection factory method, accessible through `hopsworks.connection()`."""
"""Connection factory method, accessible through `hopsworks.connection()`.
This class provides convenience classmethods accessible from the `hopsworks`-module:
!!! example "Connection factory"
For convenience, `hopsworks` provides a factory method, accessible from the top level
module, so you don't have to import the `Connection` class manually:
```python
import hopsworks
conn = hopsworks.connection()
```
!!! hint "Save API Key as File"
To get started quickly, you can simply create a file with the previously
created Hopsworks API Key and place it on the environment from which you
wish to connect to Hopsworks.
You can then connect by simply passing the path to the key file when
instantiating a connection:
```python hl_lines="6"
import hopsworks
conn = hopsworks.connection(
'my_instance', # DNS of your Hopsworks instance
443, # Port to reach your Hopsworks instance, defaults to 443
api_key_file='hopsworks.key', # The file containing the API key generated above
hostname_verification=True) # Disable for self-signed certificates
)
project = conn.get_project("my_project")
```
Clients in external clusters need to connect to the Hopsworks using an
API key. The API key is generated inside the Hopsworks platform, and requires at
least the "project" scope to be able to access a project.
For more information, see the [integration guides](../setup.md).
# Arguments
host: The hostname of the Hopsworks instance in the form of `[UUID].cloud.hopsworks.ai`,
defaults to `None`. Do **not** use the url including `https://` when connecting
programatically.
port: The port on which the Hopsworks instance can be reached,
defaults to `443`.
project: The name of the project to connect to. When running on Hopsworks, this
defaults to the project from where the client is run from.
Defaults to `None`.
engine: Which engine to use, `"spark"`, `"python"` or `"training"`. Defaults to `None`,
which initializes the engine to Spark if the environment provides Spark, for
example on Hopsworks and Databricks, or falls back on Hive in Python if Spark is not
available, e.g. on local Python environments or AWS SageMaker. This option
allows you to override this behaviour. `"training"` engine is useful when only
feature store metadata is needed, for example training dataset location and label
information when Hopsworks training experiment is conducted.
hostname_verification: Whether or not to verify Hopsworks' certificate, defaults
to `True`.
trust_store_path: Path on the file system containing the Hopsworks certificates,
defaults to `None`.
cert_folder: The directory to store retrieved HopsFS certificates, defaults to
`"/tmp"`. Only required when running without a Spark environment.
api_key_file: Path to a file containing the API Key, defaults to `None`.
api_key_value: API Key as string, if provided, `api_key_file` will be ignored,
however, this should be used with care, especially if the used notebook or
job script is accessible by multiple parties. Defaults to `None`.
# Returns
`Connection`. Connection handle to perform operations on a
Hopsworks project.
"""
return cls(
host,
port,
Expand Down
2 changes: 1 addition & 1 deletion python/hopsworks_common/project.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ def get_feature_store(
name: Project name of the feature store.
engine: Which engine to use, `"spark"`, `"python"` or `"training"`.
Defaults to `"python"` when connected to [Serverless Hopsworks](https://app.hopsworks.ai).
See hsfs.Connection.connection documentation for more information.
See [`hopsworks.connection`](connection.md#connection) documentation for more information.
# Returns
`hsfs.feature_store.FeatureStore`: The Feature Store API
# Raises
Expand Down
2 changes: 1 addition & 1 deletion python/hsfs/feature_store.py
Original file line number Diff line number Diff line change
Expand Up @@ -458,7 +458,7 @@ def sql(
For spark engine: Dictionary of read options for Spark.
For python engine:
If running queries on the online feature store, users can provide an entry `{'external': True}`,
this instructs the library to use the `host` parameter in the [`hsfs.connection()`](connection_api.md#connection) to establish the connection to the online feature store.
this instructs the library to use the `host` parameter in the [`hopsworks.login()`](login.md#login) to establish the connection to the online feature store.
If not set, or set to False, the online feature store storage connector is used which relies on
the private ip.
Defaults to `{}`.
Expand Down
14 changes: 7 additions & 7 deletions python/hsfs/feature_view.py
Original file line number Diff line number Diff line change
Expand Up @@ -337,7 +337,7 @@ def init_serving(
Transformation statistics are fetched from training dataset and applied to the feature vector.
external: boolean, optional. If set to True, the connection to the
online feature store is established using the same host as
for the `host` parameter in the [`hsfs.connection()`](connection_api.md#connection) method.
for the `host` parameter in the [`hopsworks.login()`](login.md#login) method.
If set to False, the online feature store storage connector is used which relies on the private IP.
Defaults to True if connection to Hopsworks is established from external environment (e.g AWS
Sagemaker or Google Colab), otherwise to False.
Expand Down Expand Up @@ -592,7 +592,7 @@ def get_feature_vector(
providing feature values which are not available in the feature store.
external: boolean, optional. If set to True, the connection to the
online feature store is established using the same host as
for the `host` parameter in the [`hsfs.connection()`](connection_api.md#connection) method.
for the `host` parameter in the [`hopsworks.login()`](login.md#login) method.
If set to False, the online feature store storage connector is used
which relies on the private IP. Defaults to True if connection to Hopsworks is established from
external environment (e.g AWS Sagemaker or Google Colab), otherwise to False.
Expand Down Expand Up @@ -705,7 +705,7 @@ def get_feature_vectors(
providing feature values which are not available in the feature store.
external: boolean, optional. If set to True, the connection to the
online feature store is established using the same host as
for the `host` parameter in the [`hsfs.connection()`](connection_api.md#connection) method.
for the `host` parameter in the [`hopsworks.login()`](login.md#login) method.
If set to False, the online feature store storage connector is used
which relies on the private IP. Defaults to True if connection to Hopsworks is established from
external environment (e.g AWS Sagemaker or Google Colab), otherwise to False.
Expand Down Expand Up @@ -777,7 +777,7 @@ def get_inference_helper(
Set of required primary keys is [`feature_view.primary_keys`](#primary_keys)
external: boolean, optional. If set to True, the connection to the
online feature store is established using the same host as
for the `host` parameter in the [`hsfs.connection()`](connection_api.md#connection) method.
for the `host` parameter in the [`hopsworks.login()`](login.md#login) method.
If set to False, the online feature store storage connector is used
which relies on the private IP. Defaults to True if connection to Hopsworks is established from
external environment (e.g AWS Sagemaker or Google Colab), otherwise to False.
Expand Down Expand Up @@ -835,7 +835,7 @@ def get_inference_helpers(
Set of required primary keys is [`feature_view.primary_keys`](#primary_keys)
external: boolean, optional. If set to True, the connection to the
online feature store is established using the same host as
for the `host` parameter in the [`hsfs.connection()`](connection_api.md#connection) method.
for the `host` parameter in the [`hopsworks.login()`](login.md#login) method.
If set to False, the online feature store storage connector is used
which relies on the private IP. Defaults to True if connection to Hopsworks is established from
external environment (e.g AWS Sagemaker or Google Colab), otherwise to False.
Expand Down Expand Up @@ -912,7 +912,7 @@ def find_neighbors(
filter: A filter expression to restrict the search space (optional).
external: boolean, optional. If set to True, the connection to the
online feature store is established using the same host as
for the `host` parameter in the [`hsfs.connection()`](connection_api.md#connection) method.
for the `host` parameter in the [`hopsworks.login()`](login.md#login) method.
If set to False, the online feature store storage connector is used
which relies on the private IP. Defaults to True if connection to Hopsworks is established from
external environment (e.g AWS Sagemaker or Google Colab), otherwise to False.
Expand Down Expand Up @@ -3567,7 +3567,7 @@ def transform(
feature_vector: `Union[List[Any], List[List[Any]], pd.DataFrame, pl.DataFrame]`. The feature vector to be transformed.
external: boolean, optional. If set to True, the connection to the
online feature store is established using the same host as
for the `host` parameter in the [`hsfs.connection()`](connection_api.md#connection) method.
for the `host` parameter in the [`hopsworks.login()`](login.md#login) method.
If set to False, the online feature store storage connector is used
which relies on the private IP. Defaults to True if connection to Hopsworks is established from
external environment (e.g AWS Sagemaker or Google Colab), otherwise to False.
Expand Down
6 changes: 3 additions & 3 deletions python/hsfs/training_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -1007,7 +1007,7 @@ def init_prepared_statement(
initialised for retrieving serving vectors as a batch.
external: boolean, optional. If set to True, the connection to the
online feature store is established using the same host as
for the `host` parameter in the [`hsfs.connection()`](connection_api.md#connection) method.
for the `host` parameter in the [`hopsworks.login()`](login.md#login) method.
If set to False, the online feature store storage connector is used
which relies on the private IP. Defaults to True if connection to Hopsworks is established from
external environment (e.g AWS Sagemaker or Google Colab), otherwise to False.
Expand All @@ -1024,7 +1024,7 @@ def get_serving_vector(
serving application.
external: boolean, optional. If set to True, the connection to the
online feature store is established using the same host as
for the `host` parameter in the [`hsfs.connection()`](connection_api.md#connection) method.
for the `host` parameter in the [`hopsworks.login()`](login.md#login) method.
If set to False, the online feature store storage connector is used
which relies on the private IP. Defaults to True if connection to Hopsworks is established from
external environment (e.g AWS Sagemaker or Google Colab), otherwise to False.
Expand All @@ -1046,7 +1046,7 @@ def get_serving_vectors(
serving application.
external: boolean, optional. If set to True, the connection to the
online feature store is established using the same host as
for the `host` parameter in the [`hsfs.connection()`](connection_api.md#connection) method.
for the `host` parameter in the [`hopsworks.login()`](login.md#login) method.
If set to False, the online feature store storage connector is used
which relies on the private IP. Defaults to True if connection to Hopsworks is established from
external environment (e.g AWS Sagemaker or Google Colab), otherwise to False.
Expand Down
5 changes: 3 additions & 2 deletions python/hsml/core/dataset_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,10 +61,11 @@ def upload(
"""Upload a file to the Hopsworks filesystem.
```python
import hopsworks
conn = hsml.connection(project="my-project")
project = hopsworks.login(project="my-project")
dataset_api = conn.get_dataset_api()
dataset_api = project.get_dataset_api()
uploaded_file_path = dataset_api.upload("my_local_file.txt", "Resources")
Expand Down
4 changes: 2 additions & 2 deletions python/tests/test_connection.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,12 @@
# limitations under the License.
#

from hsml.connection import (
from hopsworks_common.connection import (
HOPSWORKS_PORT_DEFAULT,
HOSTNAME_VERIFICATION_DEFAULT,
Connection,
)
from hsml.constants import HOSTS
from hopsworks_common.constants import HOSTS


class TestConnection:
Expand Down

0 comments on commit fea4590

Please sign in to comment.