Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FSTORE-1440] Merge the python libraries into their subdirectories #211

Merged
merged 930 commits into from
Jul 15, 2024

Conversation

aversey
Copy link
Contributor

@aversey aversey commented Jun 27, 2024

The end goal of these changes is merging of hopsworks-api, feature-store-api and machine-learning-api into one project and repo and enabling management of dependencies via Python extras.

There are several PRs doing it. This is the first of them, which simply merges hsfs and hsml into their corresponding subdirectories.

The next RP: #215

moritzmeister and others added 30 commits May 9, 2023 15:09
…atures (#1011)

* accept datetime object in filter

* removed unused varible

* support for unbinded features

* Update python/tests/core/test_arrow_flight_client.py

Co-authored-by: Ralf <[email protected]>

* Update python/tests/core/test_arrow_flight_client.py

Co-authored-by: Ralf <[email protected]>

* simplified diff

* Update python/hsfs/core/arrow_flight_client.py

Co-authored-by: Ralf <[email protected]>

---------

Co-authored-by: Ralf <[email protected]>
…ing (#1008)

* cleaned up read printout

* udpated message

* filter only userwarnings
* change wait default for fg.insert/save

* fixed test

* added get_state to job class

* added final state and updated docs

* updated docs

* reverted import change

* moved job config below job

* added wait parameter

* fixed tests

* do not override write_options with wait= parameter

* one more place

---------

Co-authored-by: Moritz Meister <[email protected]>
* working state

* refactored query parameters

* removed numeric parameter

* fix bug

* look in all features in all featuregroups

* support excluded features

* added features property to featureview

* include transformation functions in repr

* fix column check on join

* updated warning messages

* more refactoring, fixed tests

* fixed tests, added filter check

* fixed test

* fixed ambiguity issue with filter params

* fixed tests and cleanup

* check full filter

* fix append feature and check filter there as well

* don't check merged filter on .filter

* working state

* removed checks

* some renaming

* made lookup function private/protected

* added docs

* fixed cached check

* updated error messages, reverted transformation function test changes

* added tests

* added add_features_to_collection method

* collect features on request

* fixed comment
* removed pandas version pinning

* patched pandas, patches jdbc tests, added unittests

* removed pandas dep from dev

* removed pandas1sqlachemy2, pandas2sqlalchemy2

* rename action

* added Dev-pandas1 for ci testing on pandas 1.5.3

* added fsspec dependency, required by pandas.read

* fixed statistics serialization
* Decamelize dict to pass correct args

* Fix id in from_ge_type

* Added ids to dict for conversions

* allow extra kwargs
gibchikafa and others added 7 commits July 3, 2024 19:52
… path (#1358)

* Append MATERIAL_DIRECTORY env var value to certificates path

* Also include MATERIAL_DIR in token.jwt
* Fix test_base_client.py

It was impure, changing environ variable and thereby messing other tests. This commit wraps these tests in a decorator saving and restoring os.environ.

* Move changes_environ to tests/utils.py
* Refactor mysql dependencies out of util

* Fix tests for mysql

* Start cleaning some dependencies under type_checking
* Start removing hive related dependencies

* More hive cleanups

* Update ruff pre-commit hook

* Start fixing pytest

* REmove hive related tests

* Review
…imary_key" (#1357)

* primary_keys -> primary_key

* backwards compatibility

* backwards compatibility

* Review and fix missed foc change

---------

Co-authored-by: davitbzh <[email protected]>
Co-authored-by: Victor Jouffrey <[email protected]>
Co-authored-by: Victor Jouffrey <[email protected]>
* feature logging

* clean up

* default write option

* doc

* fix

* fix style

* revert pd.StringDtype()

* fix doc

* fix doc

* doc and copy df

* get untransformed features

* rename transformed_features

* get untransformed features

* return batch df

* address comments

* rename internal methods

* fix style

* add warning

* fix circular import

* set logging_enabled to True

* fix style
aversey pushed a commit to aversey/hopsworks-api that referenced this pull request Jul 5, 2024
vatj and others added 8 commits July 8, 2024 12:48
* Start Kafka refactor

* Continue refactor of kafka/avro out of python engine

* Minor tidy up of args

* Tidy up python engine

* Minor refactoring of multi_part_insert and kafka headers

* Options to get_kafka_config with spark options

* Fix mocking

* Move and fix kafka tests

* More fixing of mocking

* More work on unit test fixing

* Fix with import reload

* Fix mocking in test_feature_group_writer

* Fix mocking in engine/test_python

---------

Co-authored-by: Aleksey Veresov <[email protected]>
* left join

* snowflake

* test_join

---------

Co-authored-by: davitbzh <[email protected]>
Co-authored-by: Dhananjay Mukhedkar <[email protected]>
* hopsworks_udf first version

* working code for running hopsworks udf without saving in backend using python client

* removing debugging logs

* statistics working with python client

* basic functionality working with backend

* code with statistics working and saved to backend

* working code for feature vector

* reformatted and documented Hopswork UDF class

* unit tests for transformation functions

* clearning transformations engine and adding unit tests

* feature view api formated

* reformatting and fixing feature_view_engine

* reformatted and added unit tests for feature view

* updating documentation for feature store

* updating documentation for feature store

* fixed tests for training datatset features

* reformatted and added unit tests for python engine

* most unit tests fixed

* all unit tests working

* removed print

* adding test for hopsworks_udf

* correcting merge for vector server

* reformatting with ruff

* fixing vector server

* fixing docs

* fixing vector server

* fixing building in transformations

* correcting get feature vector

* adding missed changes for build in transformations

* shallow copying scope dictonary to not overwrite statistics variable for different udf's having same statistics parameter name

* adding deep copy to create multiple transfromation functions with different features

* sorting transformation function to maintain consistent order

* sorting transformation functions in transformation function engine to mainatin same order

* using feature view transformation functions

* addressing review comments

* using PYARROW_EXTENSION_ENABLE during import rather than as a function

* skiping transformation function test in windows spark udf failing due to dependencies with greater expectation

* changing transformed_feature_vector_col_name to transformed_features to obtain feature names after transfromations

* adding property transformed_features in feature view to obtain feature names after transfromations

* updating doc string and adding property decorator missed during rebase

* refactoring transformation functions to update parsing of statistics parameters and also renaming decorator name

* refactoring transformation functions to update parsing of statistics parameters and also renaming decorator name

* reformating with ruff

* adding statistics to udf only if required

* convrting extended statistics to dictonary

* sorting built in label encoder to maintain consistency

* adding type hints for class TransformationStatistics

* adapating to backend update of reaturning output_types, transformation_features and statistics_argument_names as Lists

* fixing unit tests

* removign space in doc string

* replace - from output column names with _

* revreting unwanted spark test _ replace changes

* adding missed import

* correcting to_dict feature view

* reverting python.py unintentional changes during rebase

* rebase and adding back missed import
…#206)

* load test

* adding udf decorator to hopsworks repo

* reverting pyproject modeified for loadtest

* reverting pyproject modeified for loadtest

* adding api documentation for udf decorator
@aversey
Copy link
Contributor Author

aversey commented Jul 13, 2024

In this PR, git blame of hsml and hsfs files shows me, but this is later fixed by git mv.

@aversey aversey changed the title [FSTORE-1440] The merge of the python libraries [FSTORE-1440] Merge the python libraries into their subdirectories Jul 13, 2024
@aversey aversey merged commit bdb33a5 into logicalclocks:dev-merge Jul 15, 2024
2 checks passed
@aversey aversey self-assigned this Jul 25, 2024
@aversey aversey deleted the the-merge branch August 21, 2024 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.