Skip to content

Commit

Permalink
Reframe README to match latest positioning (#1114)
Browse files Browse the repository at this point in the history
  • Loading branch information
vsreekanti authored Mar 28, 2023
1 parent 57e23dc commit 63c33f4
Showing 1 changed file with 31 additions and 26 deletions.
57 changes: 31 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,58 +1,63 @@
[<img src="https://aqueduct-public-assets-bucket.s3.us-east-2.amazonaws.com/webapp/logos/aqueduct-logo-two-tone/1x/aqueduct-logo-two-tone-1x.png" width= "35%" />](https://www.aqueducthq.com)

## Aqueduct: The easiest way to run ML on any cloud infrastructure
## Aqueduct: The easiest way to run ML on any cloud

[![Start Sandbox](https://img.shields.io/static/v1?label=%20&logo=github&message=Start%20Sandbox&color=black)](https://github.com/codespaces/new?hide_repo_select=true&ref=main&repo=496844646)
[![Downloads](https://pepy.tech/badge/aqueduct-ml/month)](https://pypi.org/project/aqueduct-ml/)
[![Slack](https://img.shields.io/static/v1.svg?label=chat&message=on%20slack&color=27b1ff&style=flat)](https://join.slack.com/t/aqueductusers/shared_invite/zt-11hby91cx-cpmgfK0qfXqEYXv25hqD6A)
[![GitHub license](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://github.com/aqueducthq/aqueduct/blob/master/LICENSE)
[![PyPI version](https://badge.fury.io/py/aqueduct-ml.svg)](https://pypi.org/project/aqueduct-ml/)
[![Tests](https://github.com/aqueducthq/aqueduct/actions/workflows/integration-tests.yml/badge.svg)](https://github.com/aqueducthq/aqueduct/actions/workflows/integration-tests.yml)

**Aqueduct enables you to define, deploy and monitor robust ML pipelines on any cloud infrastructure.** Check out our [quickstart guide](https://docs.aqueducthq.com/quickstart-guide)!
**Aqueduct enables you to easily run machine learning tasks on any cloud infrastructure. [Check out our quickstart guide! →](https://docs.aqueducthq.com/quickstart-guide)**

Aqueduct gives you a simple Python-native API to define machine learning pipelines, the ability to deploy those pipelines on your existing infrastructure (e.g., Spark, Kubernetes, Lambda), and visibility into the code, data, and metadata associated with your workflows.
Aqueduct is fully open-source and runs securely in your cloud.
Aqueduct is an open-source MLOps framework that allows you to define ML tasks in vanilla Python, run those tasks on any infrastructure you'd like to use, and gain visibility into the execution and performance of your ML. **[See what tools Aqueduct works with. →](https://aqueducthq.com/integrations/)**

Here's how you can get started:

You can install Aqueduct via `pip`:
```bash
pip3 install aqueduct-ml
aqueduct start
```

Now, we can create our first workflow:

```python
from aqueduct import Client, op, metric

client = Client()

@op
def transform_data(reviews):
reviews['strlen'] = reviews['review'].str.len()
return reviews
### How it works

Aqueduct's Python native API allows you to define ML tasks in regular Python code. You can connect Aqueduct to your existing cloud infrastructure ([docs](https://docs.aqueducthq.com/integrations)), and Aqueduct will seamlessly move your code from your laptop to the cloud or between different cloud infrastructure layers.

demo_db = client.integration("aqueduct_demo")
reviews_table = demo_db.sql("select * from hotel_reviews;")
<!--- TODO(vikram): Modify this once we add support for switching into/out of Databricks in a single workflow. --->
For example, we can define a pipeline that trains a model on Kubernetes using a GPU and validates that model in AWS Lambda in a few lines of Python:

strlen_table = transform_data(reviews_table)
demo_db.save(strlen_table, "strlen_table", "replace)

client.publish_flow(name="review_strlen", artifacts=[strlen_table])
```python
@op(
engine='eks-us-east-2',
resources={'gpu_resource_name': 'nvidia.com/gpu'}
)
def train(features):
return model.train(features)

@metric(engine='lambda-us-east-2')
def validate(model):
return validation_test(model)

validate(train(features))
```

Once we've created a workflow, we can view that workflow in the Aqueduct UI:
Once you publish this workflow to Aqueduct, you can see it on the UI:

![image](https://user-images.githubusercontent.com/867892/196529730-3c9582d5-8692-495d-a7df-8eb62ddf305f.png)
![image](https://user-images.githubusercontent.com/867892/228295996-4ba3de23-3106-431d-93a9-afd8d77a707b.png)

To see how to build your first workflow, check out our **[quickstart guide! →](https://docs.aqueducthq.com/quickstart-guide)**

## Why Aqueduct?

The engineering required to get data science & machine learning projects in production slows down data teams. Aqueduct automates away that engineering and allows you to define robust data & ML pipelines in a few lines of code and run them anywhere.
MLOps has become a [tangled mess of siloed infrastructure](https://aqueducthq.com/post/the-mlops-knot/). Most teams need to set up and operate many different cloud infrastructure tools to run ML effectively, but these tools have disparate APIs and interoperate poorly.

Aqueduct provides a single interface to running machine learning tasks on your existing cloud infrastructure — Kubernetes, Spark, Lambda, etc. From the same Python API, you can run code across any or all of these systems seamlessly and gain visibility into how your code is performing.

* **Python-native pipeline API**: Aqueduct’s API allows you define your workflows in vanilla Python, so you can get code into production quickly and effectively. No more DSLs or YAML configs to worry about.
* **Integrated with your infrastructure**: Workflows defined in Aqueduct can run on any cloud infrastructure you use, like Kubernetes, Spark, Airflow, or AWS Lambda. You can get all the benefits of Aqueduct without having to rip-and-replace your existing tooling.
* **Centralized visibility into code, data, & metadata**: Once your workflows are in production, you need to know what’s running, whether it’s working, and when it breaks. Aqueduct gives you visibility into what code, data, metrics, and metadata are generated by each workflow run, so you can have confidence that your pipelines work as expected — and know immediately when they don’t.
* **Runs securely in your cloud**: Aqueduct is fully open-source and runs in any Unix environment. It runs entirely in your cloud and on your infrastructure, so you can be confident that nothing is ever leaving your cloud.
* **Runs securely in your cloud**: Aqueduct is fully open-source and runs in any Unix environment. It runs entirely in your cloud and on your infrastructure, so you can be confident that your data and code are secure.

## Overview & Examples

Expand Down

0 comments on commit 63c33f4

Please sign in to comment.