Skip to content

Commit

Permalink
scikit-learn custom prediction routines sample (only the training par…
Browse files Browse the repository at this point in the history
…t) (GoogleCloudPlatform#63)

* New sample to show how to use custom prediction routines (training).

* A number of fixes and adjustments.

* Fixed README files.

* Added kokoro test files.

* Fixed the typo

* flake8 fixes

* flake8 fixes

* Changed pantheon to console.cloud everywhere.
  • Loading branch information
happyhuman authored Sep 9, 2019
1 parent ef9c655 commit 3efc3b0
Show file tree
Hide file tree
Showing 28 changed files with 1,004 additions and 28 deletions.
49 changes: 49 additions & 0 deletions .kokoro/tests/training/sklearn_structured_custom_routines.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
#!/bin/bash
# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
set -eo pipefail


download_files() {
# Download files for testing.
GCS_FOLDER="gs://cloud-samples-data/ml-engine/chicago_taxi"

echo "Downloading files"
gsutil cp ${GCS_FOLDER}/training/small/taxi_trips_train.csv data/taxi_trips_train.csv
gsutil cp ${GCS_FOLDER}/training/small/taxi_trips_eval.csv data/taxi_trips_eval.csv
gsutil cp ${GCS_FOLDER}/prediction/taxi_trips_prediction_dict.ndjson data/taxi_trips_prediction_dict.ndjson

# Define ENV for `train-local.sh` script
export TAXI_TRAIN_SMALL=data/taxi_trips_train.csv
export TAXI_EVAL_SMALL=data/taxi_trips_eval.csv
export TAXI_PREDICTION_DICT_NDJSON=data/taxi_trips_prediction_dict.ndjson
}


run_tests() {
# Run base tests.
echo "Running code tests in `pwd`."
download_files
# Run local training and local prediction
source scripts/train-local.sh
}


main(){
cd ${KOKORO_ARTIFACTS_DIR}/github/ai-platform-samples/${CAIP_TEST_DIR}
run_tests
echo 'Test was successful'
}

main
50 changes: 50 additions & 0 deletions .kokoro/training/sklearn/structured/custom_routines/common.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Format: //devtools/kokoro/config/proto/build.proto

# Download trampoline resources.
gfile_resources: "/bigstore/cloud-devrel-kokoro-resources/trampoline"


# Download credentials from Cloud Storage.
gfile_resources: "/bigstore/cloud-devrel-kokoro-resources/ai-platform-samples"


# Use the trampoline script to run in docker.
build_file: "ai-platform-samples/.kokoro/trampoline.sh"


# Environment Variables.
env_vars: {
key: "TRAMPOLINE_IMAGE"
value: "gcr.io/cloud-devrel-kokoro-resources/python"
}

# Tell the trampoline which tests to run.
env_vars: {
key: "TRAMPOLINE_BUILD_FILE"
value: "github/ai-platform-samples/.kokoro/tests/run_tests.sh"
}

env_vars: {
key: "CAIP_TEST_DIR"
value: "training/sklearn/structured/custom_routines"
}

# Run specific tests
env_vars: {
key: "CAIP_TEST_SCRIPT"
value: "github/ai-platform-samples/.kokoro/tests/training/sklearn_structured_custom_routines.sh"
}
15 changes: 15 additions & 0 deletions .kokoro/training/sklearn/structured/custom_routines/continuous.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Format: //devtools/kokoro/config/proto/build.proto
15 changes: 15 additions & 0 deletions .kokoro/training/sklearn/structured/custom_routines/periodic.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Format: //devtools/kokoro/config/proto/build.proto
15 changes: 15 additions & 0 deletions .kokoro/training/sklearn/structured/custom_routines/presubmit.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Format: //devtools/kokoro/config/proto/build.proto
6 changes: 3 additions & 3 deletions setup/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,9 @@ and follow the instructions.

5- Enable the API for the following services:

* [Compute Engine](https://pantheon.corp.google.com/compute)
* [Storage](https://pantheon.corp.google.com/storage)
* [AI Platform](https://pantheon.corp.google.com/mlengine)
* [Compute Engine](https://console.cloud.google.com/compute)
* [Storage](https://console.cloud.google.com/storage)
* [AI Platform](https://console.cloud.google.com/mlengine)

From your terminal, run:

Expand Down
Empty file added training/__init__.py
Empty file.
Empty file.
8 changes: 7 additions & 1 deletion training/sklearn/structured/base/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ executed on your local machine.
* [task.py](trainer/task.py) initializes and parses task arguments. This is the entry point to the trainer.
* [model.py](trainer/model.py) includes a function to create the scikit-learn estimator or pipeline
* [metadata.py](trainer/metadata.py) contains the definition for the target and feature names, among other configuring variables
* [util.py](trainer/task.py) contains a number of helper functions used in task.py
* [util.py](trainer/util.py) contains a number of helper functions used in task.py
* [scripts](./scripts) directory: command-line scripts to train the model locally or on AI Platform.
We recommend to run the scripts in this directory in the following order, and use
the `source` command to run them, in order to export the environment variables at each step:
Expand Down Expand Up @@ -71,6 +71,12 @@ This will create a training job on AI Platform and displays some instructions on
At the end of a successful training job, it will upload the trained model object to a GCS
bucket and sets `$MODEL_DIR` environment variable to the directory containing the model.

### Monitoring
Once the training starts and the models are generated, you may view the training job in
the [AI Platform page](https://console.cloud.google.com/mlengine/jobs). If you click on the
corresponding training job, you will be able to view the chosen hyperparamters, along with the
metric scores for each model. All the generated model objects will be stored on GCS.

## Explaining Key Elements

In this section, we'll highlight the main elements of this sample.
Expand Down
4 changes: 3 additions & 1 deletion training/sklearn/structured/base/trainer/metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,9 @@
# Target name
TARGET_NAME = 'tip'

# The features to be used for training
# The features to be used for training.
# If FEATURE_NAMES is None, then all the available columns will be
# used as features, except for the target column.
FEATURE_NAMES = [
'trip_miles',
'trip_seconds',
Expand Down
12 changes: 9 additions & 3 deletions training/sklearn/structured/base/trainer/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,14 @@ def data_train_test_split(data_df):
pandas.DataFrame, pandas.Series)
"""

# Only use metadata.FEATURE_NAMES + metadata.TARGET_NAME
features = data_df[metadata.FEATURE_NAMES]
if metadata.FEATURE_NAMES is None:
# Use all the columns as features, except for the target column
feature_names = list(data_df.columns)
feature_names.remove(metadata.TARGET_NAME)
features = data_df[feature_names]
else:
# Only use metadata.FEATURE_NAMES
features = data_df[metadata.FEATURE_NAMES]
target = data_df[metadata.TARGET_NAME]

x_train, x_val, y_train, y_val = ms.train_test_split(features,
Expand Down Expand Up @@ -70,7 +76,7 @@ def read_df_from_bigquery(full_table_path, project_id=None, num_samples=None):


def read_df_from_gcs(file_pattern):
"""Read data from Google Cloud Storage, split into train and validation sets.
"""Read data from Google Cloud Storage, split into train and validation sets
Assume that the data on GCS is in csv format without header.
The column names will be provided through metadata
Expand Down
Loading

0 comments on commit 3efc3b0

Please sign in to comment.