-
Notifications
You must be signed in to change notification settings - Fork 95
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[FSTORE-978] Hopsworks recommender system tutorial (#193)
* recommender-system
- Loading branch information
Showing
14 changed files
with
5,625 additions
and
0 deletions.
There are no files selected for viewing
1,307 changes: 1,307 additions & 0 deletions
1,307
advanced_tutorials/recommender-system/1_feature_engineering.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
232 changes: 232 additions & 0 deletions
232
advanced_tutorials/recommender-system/2a_create_retrieval_dataset.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,232 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Connected. Call `.close()` to terminate connection gracefully.\n", | ||
"\n", | ||
"Logged in to project, explore it here https://hopsworks0.logicalclocks.com/p/119\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"import hopsworks\n", | ||
"\n", | ||
"project = hopsworks.login() # insert API Key from https://app.hopsworks.ai" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Create Retrieval Dataset\n", | ||
"\n", | ||
"In this notebook, we'll create a dataset for our retrieval model." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Connected. Call `.close()` to terminate connection gracefully.\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"fs = project.get_feature_store()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Feature Selection\n", | ||
"\n", | ||
"First, we'll load the feature groups we created in the previous tutorial." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"trans_fg = fs.get_feature_group(\"transactions\",version=1)\n", | ||
"customers_fg = fs.get_feature_group(\"customers\",version=1)\n", | ||
"articles_fg = fs.get_feature_group(\"articles\",version=1)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We'll need to join these three data sources to make the data compatible with out retrieval model. Recall that each row in the `transactions` feature group relates information about which customer bought which item. We'll join this feature group with the `customers` and `articles` feature groups to inject customer and item features into each row." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"query = trans_fg.select([\"customer_id\", \"article_id\", \"t_dat\", \"month_sin\", \"month_cos\"])\\\n", | ||
" .join(customers_fg.select([\"age\"]), on=\"customer_id\")\\\n", | ||
" .join(articles_fg.select([\"garment_group_name\", \"index_group_name\"]), on=\"article_id\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Feature View Creation\n", | ||
"In Hopsworks, you write features to feature groups (where the features are stored) and you read features from feature views. A feature view is a logical view over features, stored in feature groups, and a feature view typically contains the features used by a specific model. This way, feature views enable features, stored in different feature groups, to be reused across many different models." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 5, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Transformation functions available:\n", | ||
"- min_max_scaler - version: 1\n", | ||
"- standard_scaler - version: 1\n", | ||
"- label_encoder - version: 1\n", | ||
"- month_sin - version: 1\n", | ||
"- robust_scaler - version: 1\n", | ||
"- month_cos - version: 1\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"# explore available transformation functions\n", | ||
"\n", | ||
"print(\"Transformation functions available:\")\n", | ||
"for tr_fn in fs.get_transformation_functions():\n", | ||
" print(\"- \" + tr_fn.name + \" - version: \" + str(tr_fn.version))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 6, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Feature view created successfully, explore it at \n", | ||
"https://hopsworks0.logicalclocks.com/p/119/fs/67/fv/retrieval/version/1\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"month_to_sin = fs.get_transformation_function(name=\"month_sin\", version=1)\n", | ||
"month_to_cos = fs.get_transformation_function(name=\"month_cos\", version=1)\n", | ||
"\n", | ||
"feature_view = fs.create_feature_view(\n", | ||
" name='retrieval',\n", | ||
" query=query,\n", | ||
" transformation_functions={\n", | ||
" \"month_sin\": month_to_sin,\n", | ||
" \"month_cos\": month_to_cos,\n", | ||
" }\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"To view and explore data in the feature view we can retrieve batch data using the `get_batch_data()` method." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Training Dataset Creation\n", | ||
"\n", | ||
"Finally, we can create our dataset." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 7, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Training dataset job started successfully, you can follow the progress at \n", | ||
"https://hopsworks0.logicalclocks.com/p/119/jobs/named/retrieval_1_create_fv_td_10072023185611/executions\n" | ||
] | ||
}, | ||
{ | ||
"name": "stderr", | ||
"output_type": "stream", | ||
"text": [ | ||
"VersionWarning: Incremented version to `1`.\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"feature_view = fs.get_feature_view(\"retrieval\", version=1)\n", | ||
"\n", | ||
"td_version, td_job = feature_view.create_train_validation_test_split(\n", | ||
" validation_size = 0.1, \n", | ||
" test_size = 0.1,\n", | ||
" description = 'Retrieval dataset splits',\n", | ||
" data_format = 'csv',\n", | ||
" write_options = {'wait_for_job': True},\n", | ||
" coalesce = True\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Next Steps\n", | ||
"\n", | ||
"In the next notebook, we'll train a model on the dataset we created in this notebook." | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.11" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 4 | ||
} |
Oops, something went wrong.