Skip to content

Commit

Permalink
[FSTORE-978] Hopsworks recommender system tutorial (#193)
Browse files Browse the repository at this point in the history
* recommender-system
  • Loading branch information
davitbzh authored Sep 12, 2023
1 parent a9ebc34 commit e0812e8
Show file tree
Hide file tree
Showing 14 changed files with 5,625 additions and 0 deletions.
1,307 changes: 1,307 additions & 0 deletions advanced_tutorials/recommender-system/1_feature_engineering.ipynb

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,232 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Connected. Call `.close()` to terminate connection gracefully.\n",
"\n",
"Logged in to project, explore it here https://hopsworks0.logicalclocks.com/p/119\n"
]
}
],
"source": [
"import hopsworks\n",
"\n",
"project = hopsworks.login() # insert API Key from https://app.hopsworks.ai"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create Retrieval Dataset\n",
"\n",
"In this notebook, we'll create a dataset for our retrieval model."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Connected. Call `.close()` to terminate connection gracefully.\n"
]
}
],
"source": [
"fs = project.get_feature_store()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Feature Selection\n",
"\n",
"First, we'll load the feature groups we created in the previous tutorial."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"trans_fg = fs.get_feature_group(\"transactions\",version=1)\n",
"customers_fg = fs.get_feature_group(\"customers\",version=1)\n",
"articles_fg = fs.get_feature_group(\"articles\",version=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We'll need to join these three data sources to make the data compatible with out retrieval model. Recall that each row in the `transactions` feature group relates information about which customer bought which item. We'll join this feature group with the `customers` and `articles` feature groups to inject customer and item features into each row."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"query = trans_fg.select([\"customer_id\", \"article_id\", \"t_dat\", \"month_sin\", \"month_cos\"])\\\n",
" .join(customers_fg.select([\"age\"]), on=\"customer_id\")\\\n",
" .join(articles_fg.select([\"garment_group_name\", \"index_group_name\"]), on=\"article_id\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Feature View Creation\n",
"In Hopsworks, you write features to feature groups (where the features are stored) and you read features from feature views. A feature view is a logical view over features, stored in feature groups, and a feature view typically contains the features used by a specific model. This way, feature views enable features, stored in different feature groups, to be reused across many different models."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Transformation functions available:\n",
"- min_max_scaler - version: 1\n",
"- standard_scaler - version: 1\n",
"- label_encoder - version: 1\n",
"- month_sin - version: 1\n",
"- robust_scaler - version: 1\n",
"- month_cos - version: 1\n"
]
}
],
"source": [
"# explore available transformation functions\n",
"\n",
"print(\"Transformation functions available:\")\n",
"for tr_fn in fs.get_transformation_functions():\n",
" print(\"- \" + tr_fn.name + \" - version: \" + str(tr_fn.version))"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Feature view created successfully, explore it at \n",
"https://hopsworks0.logicalclocks.com/p/119/fs/67/fv/retrieval/version/1\n"
]
}
],
"source": [
"month_to_sin = fs.get_transformation_function(name=\"month_sin\", version=1)\n",
"month_to_cos = fs.get_transformation_function(name=\"month_cos\", version=1)\n",
"\n",
"feature_view = fs.create_feature_view(\n",
" name='retrieval',\n",
" query=query,\n",
" transformation_functions={\n",
" \"month_sin\": month_to_sin,\n",
" \"month_cos\": month_to_cos,\n",
" }\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To view and explore data in the feature view we can retrieve batch data using the `get_batch_data()` method."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Training Dataset Creation\n",
"\n",
"Finally, we can create our dataset."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Training dataset job started successfully, you can follow the progress at \n",
"https://hopsworks0.logicalclocks.com/p/119/jobs/named/retrieval_1_create_fv_td_10072023185611/executions\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"VersionWarning: Incremented version to `1`.\n"
]
}
],
"source": [
"feature_view = fs.get_feature_view(\"retrieval\", version=1)\n",
"\n",
"td_version, td_job = feature_view.create_train_validation_test_split(\n",
" validation_size = 0.1, \n",
" test_size = 0.1,\n",
" description = 'Retrieval dataset splits',\n",
" data_format = 'csv',\n",
" write_options = {'wait_for_job': True},\n",
" coalesce = True\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Next Steps\n",
"\n",
"In the next notebook, we'll train a model on the dataset we created in this notebook."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Loading

0 comments on commit e0812e8

Please sign in to comment.