diff --git a/python/docs/source/index.md b/python/docs/source/index.md index 8da638952..a28ed2af3 100644 --- a/python/docs/source/index.md +++ b/python/docs/source/index.md @@ -2,94 +2,115 @@ hide-toc: true html_theme.sidebar_secondary.remove: true --- +# Introduction
-

Event-processing for AI applications.

+

Real-Time AI without the fuss.

-

Next-generation, real-time and historic event processing. +

Kaskada is a next-generation streaming engine that connects AI models to real-time & historical data.

+## Kaskada completes the Real-Time AI stack, providing... + ```{gallery-grid} :grid-columns: 1 2 2 3 -- header: "{fas}`timeline;pst-color-primary` Real-time processing for all" - content: "Quickly process events so you can respond in real-time." - link: ".#stream" -- header: "{fab}`python;pst-color-primary` Python-native" - content: "Use Python so you can load data, process it, and train and serve models from one place." - link: ".#python" -- header: "{fas}`gauge-high;pst-color-primary` Get started immediately" - content: "No infrastructure to provision let's you jump right in." - link: ".#get-started" - -- header: "{fas}`fast-forward;pst-color-primary` Real-time, Batch and Streaming" - content: "Execute large-historic queries or materialize in real-time. Or both." - link: ".#real-time-and-historic" -- header: "{fas}`rocket;pst-color-primary` Local, Remote and Distributed" - content: "Develop and test locally. Deploy to Docker, K8s or a service for production." - link: ".#local-and-distributed" -- header: "{fas}`backward;pst-color-primary` Time-travel" - content: "Generate training examples from the past to predict the future." - link: ".#time-travel" +- header: "{fas}`timeline;pst-color-primary` Real-time Aggregation" + content: "Precompute model inputs from streaming data with robust data connectors, transformations & aggregations." +- header: "{fas}`binoculars;pst-color-primary` Event Detection" + content: "Trigger pro-active AI behaviors by identifying important activities, as they happen." +- header: "{fas}`backward;pst-color-primary` History Replay" + content: "Backtest and fine-tune from historical data using per-example time travel and point-in-time joins." ``` -* * * - -(stream)= -# Real-time event-processing -Kaskada is built on Apache Arrow, providing an efficient, columnar representation of data. -The same approach is at the core of many analytic databases as well as Pandas and Polars. +### Real-time AI in minutes -Kaskada goes beyond the columnar representation, by introduce a Timestream -- a columnar representation of events, ordered by time and grouped by key. -This representation is a perfect fit for all kinds of events, modern event streams as well as events stored in a database. -Specializing for Timestreams allows Kaskada to optimize temporal queries and execute them much faster. +Connect and compute over databases, streaming data, _and_ data loaded dynamically using Python.. +Kaskada is seamlessly integrated with Python's ecosystem of AI/ML tooling so you can load data, process it, train and serve models all in the same place. -(python)= -# Python-native +There's no infrastructure to provision (and no JVM hiding under the covers), so you can jump right in - check out the [Quick Start](quickstart). -Connect to existing data in streams or databases, or load data using Python. -Wherever your events are stored, Kaskada can help you process them. -Build temporal queries and process the results using Python. -Connect straight to your visualizations, dashboards or machine learning systems. +### Built for scale and reliability -Kaskada lets you do it all in one place. +Implemented in [Rust](https://www.rust-lang.org/) using [Apache Arrow](https://arrow.apache.org/), Kaskada's compute engine uses columnar data to efficiently execute large historic and high-throughput streaming queries. +Every operation in Kaskada is implemented incrementally, allowing automatic recovery if the process is terminated or killed. -(get_started)= -# Get Started +With Kaskada, most jobs are fast enough to run locally, so it's easy to build and test your real-time queries. +As your needs grow, Kaskada's cloud-native design and support for partitioned execution gives you the volume and throughput you need to scale. +Kaskada was built by core contributors to [Apache Beam](https://beam.apache.org/), [Google Cloud Dataflow](https://cloud.google.com/dataflow), and [Apache Cassandra](https://cassandra.apache.org/), and is under active development -With no infrastructure to deploy, get started processing events immediately. -Check out the [Quick Start](quickstart) now! - -(local-and-distributed)= -# Local, Remote and Distributed - -Fast enough to run locally, Kaskada makes it easy to build and test your real-time queries. +* * * -Built for the cloud and supporting partitioned and distributed execution, Kaskada scales to the volume and throughput you need. +## Example Real-Time App: BeepGPT + +[BeepGPT](https://github.com/kaskada-ai/beep-gpt/tree/main) keeps you in the loop without disturbing your focus. Its personalized, intelligent AI continuously monitors your Slack workspace, alerting you to important conversations and freeing you to concentrate on what’s most important. + +The core of BeepGPT's real-time processing requires only a few lines of code using Kaskada: + +```python +import kaskada as kd +kd.init_session() + +# Bootstrap from historical data +messages = kd.sources.PyList( + rows = pyarrow.parquet.read_table("./messages.parquet") + .to_pylist(), + time_column_name = "ts", + key_column_name = "channel", +) + +# Send each Slack message to Kaskada +def handle_message(client, req): + messages.add_rows(req.payload["event"]) +slack.socket_mode_request_listeners.append(handle_message) +slack.connect() + +# Aggregate multiple messages into a "conversation" +conversations = ( messages + .select("user", "text") + .collect(max=20) +) + +# Handle each conversation as it occurs +async for row in conversations.run(materialize=True).iter_rows_async(): + + # Use a pre-trained model to identify interested users + prompt = "\n\n".join([f' {msg["user"]} --> {msg["text"]} ' for msg in row["result"]]) + res = openai.Completion.create( + model="davinci:ft-personal:coversation-users-full-kaskada-2023-08-05-14-25-30", + prompt=prompt + "\n\n###\n\n", + logprobs=5, + max_tokens=1, + stop=" end", + temperature=0.25, + ) + + # Notify interested users using the Slack API + for user_id in interested_users(res): + notify_user(row, user_id) +``` +For more details, check out the [BeepGPT Github project](https://github.com/kaskada-ai/beep-gpt). -(real_time_and_historic)= -# Real-time and Historic +* * * -Process events in real-time as they arrive. -Backfill materializations by starting with history and switching to the stream. +## Get Started -(time-travel)= -# Time Travel -Compute temporal joins at the correct times, without risk of leakage. +Getting started with Kaskda is a `pip install kaskada` away. +Check out the [Quick Start](quickstart) now! ```{toctree} :hidden: :maxdepth: 3 +quickstart why tour -quickstart examples/index guide/index ``` @@ -103,4 +124,4 @@ reference/timestream/index reference/windows reference/sources reference/results -``` \ No newline at end of file +```