Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FSTORE-1439] Merge hopsworks-api, feature-store-api and machine-learning-api #230

Merged
merged 929 commits into from
Jul 17, 2024

Conversation

aversey
Copy link
Contributor

@aversey aversey commented Jul 16, 2024

No description provided.

moritzmeister and others added 30 commits May 5, 2023 17:55
…atures (#1011)

* accept datetime object in filter

* removed unused varible

* support for unbinded features

* Update python/tests/core/test_arrow_flight_client.py

Co-authored-by: Ralf <[email protected]>

* Update python/tests/core/test_arrow_flight_client.py

Co-authored-by: Ralf <[email protected]>

* simplified diff

* Update python/hsfs/core/arrow_flight_client.py

Co-authored-by: Ralf <[email protected]>

---------

Co-authored-by: Ralf <[email protected]>
…ing (#1008)

* cleaned up read printout

* udpated message

* filter only userwarnings
* change wait default for fg.insert/save

* fixed test

* added get_state to job class

* added final state and updated docs

* updated docs

* reverted import change

* moved job config below job

* added wait parameter

* fixed tests

* do not override write_options with wait= parameter

* one more place

---------

Co-authored-by: Moritz Meister <[email protected]>
* working state

* refactored query parameters

* removed numeric parameter

* fix bug

* look in all features in all featuregroups

* support excluded features

* added features property to featureview

* include transformation functions in repr

* fix column check on join

* updated warning messages

* more refactoring, fixed tests

* fixed tests, added filter check

* fixed test

* fixed ambiguity issue with filter params

* fixed tests and cleanup

* check full filter

* fix append feature and check filter there as well

* don't check merged filter on .filter

* working state

* removed checks

* some renaming

* made lookup function private/protected

* added docs

* fixed cached check

* updated error messages, reverted transformation function test changes

* added tests

* added add_features_to_collection method

* collect features on request

* fixed comment
* removed pandas version pinning

* patched pandas, patches jdbc tests, added unittests

* removed pandas dep from dev

* removed pandas1sqlachemy2, pandas2sqlalchemy2

* rename action

* added Dev-pandas1 for ci testing on pandas 1.5.3

* added fsspec dependency, required by pandas.read

* fixed statistics serialization
* Decamelize dict to pass correct args

* Fix id in from_ge_type

* Added ids to dict for conversions

* allow extra kwargs
gibchikafa and others added 15 commits July 3, 2024 19:52
… path (#1358)

* Append MATERIAL_DIRECTORY env var value to certificates path

* Also include MATERIAL_DIR in token.jwt
* Fix test_base_client.py

It was impure, changing environ variable and thereby messing other tests. This commit wraps these tests in a decorator saving and restoring os.environ.

* Move changes_environ to tests/utils.py
* Refactor mysql dependencies out of util

* Fix tests for mysql

* Start cleaning some dependencies under type_checking
* Start removing hive related dependencies

* More hive cleanups

* Update ruff pre-commit hook

* Start fixing pytest

* REmove hive related tests

* Review
…imary_key" (#1357)

* primary_keys -> primary_key

* backwards compatibility

* backwards compatibility

* Review and fix missed foc change

---------

Co-authored-by: davitbzh <[email protected]>
Co-authored-by: Victor Jouffrey <[email protected]>
Co-authored-by: Victor Jouffrey <[email protected]>
* feature logging

* clean up

* default write option

* doc

* fix

* fix style

* revert pd.StringDtype()

* fix doc

* fix doc

* doc and copy df

* get untransformed features

* rename transformed_features

* get untransformed features

* return batch df

* address comments

* rename internal methods

* fix style

* add warning

* fix circular import

* set logging_enabled to True

* fix style
* Start Kafka refactor

* Continue refactor of kafka/avro out of python engine

* Minor tidy up of args

* Tidy up python engine

* Minor refactoring of multi_part_insert and kafka headers

* Options to get_kafka_config with spark options

* Fix mocking

* Move and fix kafka tests

* More fixing of mocking

* More work on unit test fixing

* Fix with import reload

* Fix mocking in test_feature_group_writer

* Fix mocking in engine/test_python

---------

Co-authored-by: Aleksey Veresov <[email protected]>
* left join

* snowflake

* test_join

---------

Co-authored-by: davitbzh <[email protected]>
Co-authored-by: Dhananjay Mukhedkar <[email protected]>
* hopsworks_udf first version

* working code for running hopsworks udf without saving in backend using python client

* removing debugging logs

* statistics working with python client

* basic functionality working with backend

* code with statistics working and saved to backend

* working code for feature vector

* reformatted and documented Hopswork UDF class

* unit tests for transformation functions

* clearning transformations engine and adding unit tests

* feature view api formated

* reformatting and fixing feature_view_engine

* reformatted and added unit tests for feature view

* updating documentation for feature store

* updating documentation for feature store

* fixed tests for training datatset features

* reformatted and added unit tests for python engine

* most unit tests fixed

* all unit tests working

* removed print

* adding test for hopsworks_udf

* correcting merge for vector server

* reformatting with ruff

* fixing vector server

* fixing docs

* fixing vector server

* fixing building in transformations

* correcting get feature vector

* adding missed changes for build in transformations

* shallow copying scope dictonary to not overwrite statistics variable for different udf's having same statistics parameter name

* adding deep copy to create multiple transfromation functions with different features

* sorting transformation function to maintain consistent order

* sorting transformation functions in transformation function engine to mainatin same order

* using feature view transformation functions

* addressing review comments

* using PYARROW_EXTENSION_ENABLE during import rather than as a function

* skiping transformation function test in windows spark udf failing due to dependencies with greater expectation

* changing transformed_feature_vector_col_name to transformed_features to obtain feature names after transfromations

* adding property transformed_features in feature view to obtain feature names after transfromations

* updating doc string and adding property decorator missed during rebase

* refactoring transformation functions to update parsing of statistics parameters and also renaming decorator name

* refactoring transformation functions to update parsing of statistics parameters and also renaming decorator name

* reformating with ruff

* adding statistics to udf only if required

* convrting extended statistics to dictonary

* sorting built in label encoder to maintain consistency

* adding type hints for class TransformationStatistics

* adapating to backend update of reaturning output_types, transformation_features and statistics_argument_names as Lists

* fixing unit tests

* removign space in doc string

* replace - from output column names with _

* revreting unwanted spark test _ replace changes

* adding missed import

* correcting to_dict feature view

* reverting python.py unintentional changes during rebase

* rebase and adding back missed import
@aversey aversey changed the base branch from main to dev July 16, 2024 14:29
@aversey aversey changed the base branch from dev to main July 16, 2024 14:29
@aversey aversey requested a review from vatj July 16, 2024 14:31
…ning-api

* Remove redundant java-related files

* Move java code of hsfs

* Move locust_benchmark of hsfs

* Move python code of hsfs

* Move python code of hsml

* Move hsml tests

* Remove redundant __init__.py in hsml/tests

* Move hsfs tests

* Merge hsfs tests

* Fix the problem with impure tests changing backend_fixtures

One of the tests is impure, changing backend_fixtures, which breaks test_model.
This commit makes backend_fixtures to return a deep copy of backend_fixtures_json isntead of returning them directly.

* Fix ruff check of test_base_client

* Move hsfs utils

* Merge hsfs docs

* Fix pyproject dependencies for hsfs docs

* Merge the rest of hsfs docs

* Merge hsfs mkdocs

* Merge hsml docs

* Add files generated by auto_doc to gitignore

* Merge hsfs pyproject.toml

* Merge hsml pyproject.toml

* Remove hsfs and hsml pyproject.toml

* Remove hive from pyproject dependencies

* Fix mistypes in pyproject

* Fix documentation generation

* Fix circular dependencies

* Fix circular dependency in hsml (CONNECTION_SAAS_HOSTNAME)

* Fix circular dependency in hsml (model/util)

* Ruff fix hsml/util

* Fix docgen

There is a very strange bug: hsml.python for some reason is available only as python.hsml.python.

* Fix hsml.python import errors

* Fix import problem in docgen

* Ruff fix auto_doc

*Revert the move of requirements-docs to an extra

See PR #209

* Move workflows

* Merge workflows

* Remove redundant github files

* Skip test_login until Robin is back

* Fix test_connection

* Fix test_hopsworks_udf

* Fix import problem in docgen

* Fix pyproject optional docgen dependency

* Revert the move of requirements-docs to an extra

See PR #209

* Rename unit_tests_pandas to unit_tests_pandas1 for clarity

* Rename unit_tests_typechecked

* Merge workflows

* Fix import problem in docgen

* Merge and remove the rest of python files

* Merge gitignore

* Move docker and jenkins files

* Merge CONTRIBUTING

* Merge README and remove hsfs and hsml subdirectories

* Revert the move of requirements-docs to an extra

See PR #209

* Add hopsworks_common
@aversey aversey force-pushed the dev-merge branch 3 times, most recently from 474ebff to 67484a9 Compare July 17, 2024 10:29
@aversey aversey merged commit 640b360 into main Jul 17, 2024
52 checks passed
@aversey aversey self-assigned this Jul 25, 2024
@aversey aversey deleted the dev-merge branch September 23, 2024 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.