Skip to content

Commit

Permalink
Edit the front page a bit. (#683)
Browse files Browse the repository at this point in the history
This is intended to improve alignment with the current "PMF Theory".
  • Loading branch information
kerinin authored Aug 21, 2023
1 parent 6212196 commit da8ca8e
Showing 1 changed file with 77 additions and 56 deletions.
133 changes: 77 additions & 56 deletions python/docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,94 +2,115 @@
hide-toc: true
html_theme.sidebar_secondary.remove: true
---
# Introduction

<div class="px-4 py-5 my-5 text-center">
<img class="d-block mx-auto mb-4" src="_static/kaskada.svg" alt="" width="auto">
<h2 class="display-5 fw-bold">Event-processing for AI applications.</h2>
<h1 class="display-5 fw-bold">Real-Time AI without the fuss.</h1>
<div class="col-lg-7 mx-auto">
<p class="lead mb-4">Next-generation, real-time and historic event processing.
<p class="lead mb-4">Kaskada is a next-generation streaming engine that connects AI models to real-time & historical data.
</p>
</div>
</div>

## Kaskada completes the Real-Time AI stack, providing...

```{gallery-grid}
:grid-columns: 1 2 2 3
- header: "{fas}`timeline;pst-color-primary` Real-time processing for all"
content: "Quickly process events so you can respond in real-time."
link: ".#stream"
- header: "{fab}`python;pst-color-primary` Python-native"
content: "Use Python so you can load data, process it, and train and serve models from one place."
link: ".#python"
- header: "{fas}`gauge-high;pst-color-primary` Get started immediately"
content: "No infrastructure to provision let's you jump right in."
link: ".#get-started"
- header: "{fas}`fast-forward;pst-color-primary` Real-time, Batch and Streaming"
content: "Execute large-historic queries or materialize in real-time. Or both."
link: ".#real-time-and-historic"
- header: "{fas}`rocket;pst-color-primary` Local, Remote and Distributed"
content: "Develop and test locally. Deploy to Docker, K8s or a service for production."
link: ".#local-and-distributed"
- header: "{fas}`backward;pst-color-primary` Time-travel"
content: "Generate training examples from the past to predict the future."
link: ".#time-travel"
- header: "{fas}`timeline;pst-color-primary` Real-time Aggregation"
content: "Precompute model inputs from streaming data with robust data connectors, transformations & aggregations."
- header: "{fas}`binoculars;pst-color-primary` Event Detection"
content: "Trigger pro-active AI behaviors by identifying important activities, as they happen."
- header: "{fas}`backward;pst-color-primary` History Replay"
content: "Backtest and fine-tune from historical data using per-example time travel and point-in-time joins."
```

* * *

(stream)=
# Real-time event-processing

Kaskada is built on Apache Arrow, providing an efficient, columnar representation of data.
The same approach is at the core of many analytic databases as well as Pandas and Polars.
### Real-time AI in minutes

Kaskada goes beyond the columnar representation, by introduce a Timestream -- a columnar representation of events, ordered by time and grouped by key.
This representation is a perfect fit for all kinds of events, modern event streams as well as events stored in a database.
Specializing for Timestreams allows Kaskada to optimize temporal queries and execute them much faster.
Connect and compute over databases, streaming data, _and_ data loaded dynamically using Python..
Kaskada is seamlessly integrated with Python's ecosystem of AI/ML tooling so you can load data, process it, train and serve models all in the same place.

(python)=
# Python-native
There's no infrastructure to provision (and no JVM hiding under the covers), so you can jump right in - check out the [Quick Start](quickstart).

Connect to existing data in streams or databases, or load data using Python.
Wherever your events are stored, Kaskada can help you process them.

Build temporal queries and process the results using Python.
Connect straight to your visualizations, dashboards or machine learning systems.
### Built for scale and reliability

Kaskada lets you do it all in one place.
Implemented in [Rust](https://www.rust-lang.org/) using [Apache Arrow](https://arrow.apache.org/), Kaskada's compute engine uses columnar data to efficiently execute large historic and high-throughput streaming queries.
Every operation in Kaskada is implemented incrementally, allowing automatic recovery if the process is terminated or killed.

(get_started)=
# Get Started
With Kaskada, most jobs are fast enough to run locally, so it's easy to build and test your real-time queries.
As your needs grow, Kaskada's cloud-native design and support for partitioned execution gives you the volume and throughput you need to scale.
Kaskada was built by core contributors to [Apache Beam](https://beam.apache.org/), [Google Cloud Dataflow](https://cloud.google.com/dataflow), and [Apache Cassandra](https://cassandra.apache.org/), and is under active development

With no infrastructure to deploy, get started processing events immediately.
Check out the [Quick Start](quickstart) now!

(local-and-distributed)=
# Local, Remote and Distributed

Fast enough to run locally, Kaskada makes it easy to build and test your real-time queries.
* * *

Built for the cloud and supporting partitioned and distributed execution, Kaskada scales to the volume and throughput you need.
## Example Real-Time App: BeepGPT

[BeepGPT](https://github.com/kaskada-ai/beep-gpt/tree/main) keeps you in the loop without disturbing your focus. Its personalized, intelligent AI continuously monitors your Slack workspace, alerting you to important conversations and freeing you to concentrate on what’s most important.

The core of BeepGPT's real-time processing requires only a few lines of code using Kaskada:

```python
import kaskada as kd
kd.init_session()

# Bootstrap from historical data
messages = kd.sources.PyList(
rows = pyarrow.parquet.read_table("./messages.parquet")
.to_pylist(),
time_column_name = "ts",
key_column_name = "channel",
)

# Send each Slack message to Kaskada
def handle_message(client, req):
messages.add_rows(req.payload["event"])
slack.socket_mode_request_listeners.append(handle_message)
slack.connect()

# Aggregate multiple messages into a "conversation"
conversations = ( messages
.select("user", "text")
.collect(max=20)
)

# Handle each conversation as it occurs
async for row in conversations.run(materialize=True).iter_rows_async():

# Use a pre-trained model to identify interested users
prompt = "\n\n".join([f' {msg["user"]} --> {msg["text"]} ' for msg in row["result"]])
res = openai.Completion.create(
model="davinci:ft-personal:coversation-users-full-kaskada-2023-08-05-14-25-30",
prompt=prompt + "\n\n###\n\n",
logprobs=5,
max_tokens=1,
stop=" end",
temperature=0.25,
)

# Notify interested users using the Slack API
for user_id in interested_users(res):
notify_user(row, user_id)
```

For more details, check out the [BeepGPT Github project](https://github.com/kaskada-ai/beep-gpt).

(real_time_and_historic)=
# Real-time and Historic
* * *

Process events in real-time as they arrive.
Backfill materializations by starting with history and switching to the stream.
## Get Started

(time-travel)=
# Time Travel
Compute temporal joins at the correct times, without risk of leakage.
Getting started with Kaskda is a `pip install kaskada` away.
Check out the [Quick Start](quickstart) now!

```{toctree}
:hidden:
:maxdepth: 3
quickstart
why
tour
quickstart
examples/index
guide/index
```
Expand All @@ -103,4 +124,4 @@ reference/timestream/index
reference/windows
reference/sources
reference/results
```
```

0 comments on commit da8ca8e

Please sign in to comment.