diff --git a/README.md b/README.md index e110430..93f276d 100644 --- a/README.md +++ b/README.md @@ -15,20 +15,20 @@ The 12-line code snippet below is all you need to create your first Aqueduct wor ```python from aqueduct import Client, op -# Create an Aqueduct client. If we're running on the same machine as the +# Create an Aqueduct client. If we're running on the same machine as the # Aqueduct server, we can create a client without providing an API key or a # server address. client = Client() -# The @op decorator here allows Aqueduct to run this function as -# a part of an Aqueduct workflow. It tells Aqueduct that when +# The @op decorator here allows Aqueduct to run this function as +# a part of an Aqueduct workflow. It tells Aqueduct that when # we execute this function, we're defining a step in the workflow. @op def transform_data(reviews): ''' This simple Python function takes in a DataFrame with hotel reviews and adds a column called strlen that has the string length of the - review. + review. ''' reviews['strlen'] = reviews['review'].str.len() return reviews @@ -84,8 +84,7 @@ For more on this pipeline, check our [Quickstart Guide](quickstart-guide.md). * [Updating Aqueduct](installation-and-configuration/updating-aqueduct.md) * [Debugging a Prediction Pipeline](guides/debugging-a-failed-workflow.md) -* [Running on Airflow](broken-reference) -* [Changing the Aqueduct Metadata Store](broken-reference) +* [Running on Airflow](resources/compute-systems/airflow.md) * [Porting a Workflow to Aqueduct](guides/porting-a-workflow-to-aqueduct.md) ### API Reference diff --git a/api-reference/aqueduct-cli.md b/api-reference/aqueduct-cli.md index ca38d7c..64e7d82 100644 --- a/api-reference/aqueduct-cli.md +++ b/api-reference/aqueduct-cli.md @@ -26,7 +26,7 @@ This page provide a detailed walkthrough of the Aqueduct CLI. #### install -`aqueduct install ` installs the dependencies required for `` on your machine. In most cases, these are `pip` packages on a system-by-system basis, but certain connectors (MySQL & Microsoft SQL Server) require special configuration -- see [Broken link](broken-reference "mention") for more details. +`aqueduct install ` installs the dependencies required for `` on your machine. In most cases, these are `pip` packages on a system-by-system basis, but certain connectors (MySQL & Microsoft SQL Server) require special installations. #### apikey diff --git a/guides/porting-a-workflow-to-aqueduct.md b/guides/porting-a-workflow-to-aqueduct.md index 4986da3..1def41d 100644 --- a/guides/porting-a-workflow-to-aqueduct.md +++ b/guides/porting-a-workflow-to-aqueduct.md @@ -50,7 +50,7 @@ Once you have your code running on Aqueduct, you probably are going to want to s The first thing we'll need to do is figure out where our data inputs are coming from and where our predictions are going to. You'll need to connect those systems as Aqueduct [resources](../resources/ "mention"). -Once we have our resources connected, we can get a handle to that resource in our Python code. For our example here, we're going to use the [aqueduct-demo-resource.md](../resources/data-systems/aqueduct-demo-resource.md "mention"). Once we have a handle to the demo database, we can then run a SQL query on it (see [Broken link](broken-reference "mention") for details on using non-relational data systems) to get our input data. You can use any SQL query that works for your underlying database. +Once we have our resources connected, we can get a handle to that resource in our Python code. For our example here, we're going to use the [aqueduct-demo-resource.md](../resources/data-systems/aqueduct-demo-resource.md "mention"). Once we have a handle to the demo database, we can then run a SQL query on it (see [non-sql-systems](../resources/data-systems/non-sql-systems "mention") for details on using non-relational data systems) to get our input data. You can use any SQL query that works for your underlying database. ```python from aqueduct import Client, op @@ -58,7 +58,7 @@ import pandas as pd client = Client() -db = client.resource('aqueduct_demo') +db = client.resource('aqueduct_demo') input_data = db.sql('SELECT * FROM wine;') ``` @@ -108,16 +108,16 @@ input_data = db.sql('SELECT * FROM wine;') @op def clean_data(input_data): - # First, clean our data. + # First, clean our data. cleaned_data = pd.DataFrame([]) return cleaned_data - -@op + +@op def featurize_data(cleaned_data): # Next, featurize our data. features = pd.DataFrame([]) return features - + @op def predict(features): # Finally, load our model and make some predictions. diff --git a/operators.md b/operators.md index 112607c..fe46b38 100644 --- a/operators.md +++ b/operators.md @@ -11,6 +11,6 @@ This guide will walk you through: * [Creating a Python Operator](operators/creating-a-python-operator.md) * [Specifying a `requirements.txt`](operators/specifying-a-requirements.txt.md) * [Adding File Dependencies in Python](operators/file-dependencies-in-python.md) -* [Improve Dependencies and Python Version Management Using Conda](broken-reference) +* [Improve Dependencies and Python Version Management Using Conda](resources/compute-systems/conda.md) * [Eager vs Lazy Execution](operators/lazy-vs-eager-execution.md) * [Configuring GPUs, CPUs, and Memory](operators/configuring-resource-constraints.md) diff --git a/quickstart-guide.md b/quickstart-guide.md index cfa1322..4a34670 100644 --- a/quickstart-guide.md +++ b/quickstart-guide.md @@ -59,7 +59,7 @@ def transform_data(reviews): ''' This simple Python function takes in a DataFrame with hotel reviews and adds a column called strlen that has the string length of the - review. + review. ''' reviews['strlen'] = reviews['review'].str.len() return reviews @@ -105,7 +105,7 @@ Note that checks are denoted with the @check decorator. Checks can also computed ### Saving Data -Finally, we can save the transformed table `strlen_table` back to the Aqueduct demo database. See [here](broken-reference) for more details around using resources. +Finally, we can save the transformed table `strlen_table` back to the Aqueduct demo database. See [data-systems](resources/data-systems/ "mention") for more details. ```python demo_db.save(strlen_table, table_name="strlen_table", update_mode="replace") diff --git a/workflows/creating-a-workflow.md b/workflows/creating-a-workflow.md index b1f2738..76013da 100644 --- a/workflows/creating-a-workflow.md +++ b/workflows/creating-a-workflow.md @@ -34,7 +34,7 @@ All of the code we've written here is simple Pandas code. The only change we've ```python from aqueduct import Client -client = Client() +client = Client() db = client.resource('aqueduct_demo') wine_data = db.sql('SELECT * FROM wine;') @@ -75,11 +75,11 @@ acidity_by_group.get() # Shows a preview of the results of `get_average_acidity` Once we've defined our whole workflow, the final step is to publish it to Aqueduct. Intuitively, the name of the method we'll use for this is `publish_flow`. ```python -flow = client.publish_flow(name='average_acidity', +flow = client.publish_flow(name='average_acidity', artifacts=[acidity_by_group]) ``` -By default, the workflow is published using Aqueduct Python execution engine that runs on the same machine as the server, but if we want to customize the execution engine, check out [Broken link](broken-reference "mention"). +By default, the workflow is published using Aqueduct Python execution engine that runs on the same machine as the server, but if we want to customize the execution engine, check out [compute-systems](resources/compute-systems/ "mention"). There are a few key arguments here, and we'll go through the one by one: