Skip to content

Commit

Permalink
adding a full ML example and pushing it into the docs
Browse files Browse the repository at this point in the history
  • Loading branch information
brifordwylie committed Dec 29, 2023
1 parent 6863cad commit 278508a
Show file tree
Hide file tree
Showing 2 changed files with 118 additions and 4 deletions.
71 changes: 67 additions & 4 deletions docs/api_classes/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,76 @@
!!! tip inline end "Just Getting Started?"
You're in the right place, the SageWorks API Classes are the best way to get started with SageWorks!

Welcome to the SageWorks API Classes
## Welcome to the SageWorks API Classes

Diagram/image of the classes here

These class provide high-level APIs for the SageWorks package, offering easy access to its core classes:
These classes provide high-level APIs for the SageWorks package, they enable your team to build full AWS Machine Learning Pipelines. They handle all the details around updating and managing a complex set of AWS Services. Each class provides an essential component of the overall ML Pipline. Simply combine the classes to build production ready, AWS powered, machine learning pipelines.

- **[DataSource](data_source.md):** Manages AWS Data Catalog and Athena
- **[FeatureSet](feature_set.md):** Manages AWS Feature Store and Feature Groups
- **[Model](model.md):** Manages the training and deployment of AWS Model Groups and Packages
- **[Endpoint](endpoint.md):** Manages the deployment and invocations/inference on AWS Endpoints

![ML Pipeline](../images/sageworks_concepts.png)

## Example ML Pipline

```py title="full_ml_pipeline.py"
from sageworks.api.data_source import DataSource
from sageworks.api.feature_set import FeatureSet
from sageworks.api.model import Model, ModelType
from sageworks.api.endpoint import Endpoint

# Create the abalone_data DataSource
DataSource("s3://sageworks-public-data/common/abalone.csv")

# Now create a FeatureSet
ds.to_features("abalone_features")

# Create the abalone_regression Model
fs = FeatureSet("abalone_features")
fs.to_model(
ModelType.REGRESSOR,
name="abalone-regression",
target_column="class_number_of_rings",
tags=["abalone", "regression"],
description="Abalone Regression Model",
)

# Create the abalone_regression Endpoint
model = Model("abalone-regression")
model.to_endpoint(name="abalone-regression-end", tags=["abalone", "regression"])

# Now we'll run inference on the endpoint
endpoint = Endpoint("abalone-regression-end")

# Get a DataFrame of data (not used to train) and run predictions
athena_table = fs.get_training_view_table()
df = fs.query(f"SELECT * FROM {athena_table} where training = 0")
results = endpoint.predict(df)
print(results[["class_number_of_rings", "prediction"]])
```

**Output**

```
Processing...
class_number_of_rings prediction
0 12 10.477794
1 11 11.11835
2 14 13.605763
3 12 11.744759
4 17 15.55189
.. ... ...
826 7 7.981503
827 11 11.246113
828 9 9.592911
829 6 6.129388
830 8 7.628252
```

!!! success "Full AWS ML Pipeline Achievement Unlocked!"
Bing! You just built and deployed a full AWS Machine Learning Pipeline. You can now use the SageWorks Dashboard web interface to inspect your AWS artifacts. A comprehensive set of Exploratory Data Analysis techniques and Model Performance Metrics are available for your entire team to review, inspect and interact with.
<img alt="sageworks_new_light" src="https://github.com/SuperCowPowers/sageworks/assets/4806709/ed2ed1bd-e2d8-49a1-b350-b2e19e2b7832">

!!! note "Examples"
All of the SageWorks Examples are in the Sageworks Repository under the examples/ directory. For a full code listing of any example please visit our [SageWorks Examples](https://github.com/SuperCowPowers/sageworks/blob/main/examples)
51 changes: 51 additions & 0 deletions examples/full_ml_pipeline.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
"""This Script creates a full AWS ML Pipeline with SageWorks
DataSource:
- abalone_data
FeatureSet:
- abalone_features
Model:
- abalone-regression
Endpoint:
- abalone-regression-end
"""
import logging
from sageworks.api.data_source import DataSource
from sageworks.api.feature_set import FeatureSet
from sageworks.api.model import Model, ModelType
from sageworks.api.endpoint import Endpoint

# Setup the logger
log = logging.getLogger("sageworks")

if __name__ == "__main__":

# Create the abalone_data DataSource
ds = DataSource("s3://sageworks-public-data/common/abalone.csv")

# Now create a FeatureSet
ds.to_features("abalone_features")

# Create the abalone_regression Model
fs = FeatureSet("abalone_features")
fs.to_model(
ModelType.REGRESSOR,
name="abalone-regression",
target_column="class_number_of_rings",
tags=["abalone", "regression"],
description="Abalone Regression Model",
)

# Create the abalone_regression Endpoint
model = Model("abalone-regression")
model.to_endpoint(name="abalone-regression-end", tags=["abalone", "regression"])

# Now we'll run inference on the endpoint
endpoint = Endpoint("abalone-regression-end")

# Get a DataFrame of data (not used to train) and run predictions
athena_table = fs.get_training_view_table()
df = fs.query(f"SELECT * FROM {athena_table} where training = 0")
results = endpoint.predict(df)
print(results[["class_number_of_rings", "prediction"]])

0 comments on commit 278508a

Please sign in to comment.