docs: Use pydata theme and setup ablog (#732)

kaskada-ai · Sep 2, 2023 · 5cbbbff · 5cbbbff
1 parent 178c4a1
commit 5cbbbff
Show file tree

Hide file tree

Showing 17 changed files with 945 additions and 255 deletions.
diff --git a/.github/workflows/ci_python.yml b/.github/workflows/ci_python.yml
@@ -88,7 +88,6 @@ jobs:
       - uses: actions/setup-python@v4
         with:
           python-version: |
-            3.8
             3.9
             3.10
             3.11
@@ -117,7 +116,7 @@ jobs:
         # This installs the kaskada package using the wheel.
         # This ensures that we don't accidentally install the version from pypi.
         run: |
-          for V in 3.8 3.9 3.10 3.11; do
+          for V in 3.9 3.10 3.11; do
             echo "::group::Install for Python $V"
             poetry env use $V
             poetry env info
@@ -133,14 +132,22 @@ jobs:
             echo "::endgroup::"
             deactivate
           done
+      - name: Setup QT
+        # Needed by sphinx-social-cards.
+        # https://github.com/2bndy5/sphinx-social-cards/blob/main/.github/workflows/build.yml#L54
+        run: |
+          sudo apt-get install -y libgl1-mesa-dev libxkbcommon-x11-0
+          echo "QT_QPA_PLATFORM=offscreen" >> "$GITHUB_ENV"
       - name: Build docs
+        # ablog doesn't currently indicate whether it supports parallel reads,
+        # leading to a warning.
+        # when possible, add `"-j", "auto",` to do parallel builds (and in nox).
         run: |
-          sudo apt install -y libegl1
           poetry env use 3.11
           source $(poetry env info --path)/bin/activate
           poetry install --with=docs
           pip install ${WHEEL} --force-reinstall
-          sphinx-build docs/source docs/_build -j auto -W
+          sphinx-build docs/source docs/_build -W # -j auto
           deactivate
       - name: Upload docs
         uses: actions/upload-pages-artifact@v2

diff --git a/.github/workflows/release_python.yml b/.github/workflows/release_python.yml
@@ -94,7 +94,6 @@ jobs:
       - uses: actions/setup-python@v4
         with:
           python-version: |
-            3.8
             3.9
             3.10
             3.11
@@ -123,7 +122,7 @@ jobs:
         run: |
           WHEEL="dist/kaskada-${{ needs.version.outputs.version }}-cp38-abi3-${{ matrix.wheel_suffix }}.whl"
           echo "WHEEL:${WHEEL}"
-          for V in 3.8 3.9 3.10 3.11; do
+          for V in 3.9 3.10 3.11; do
             echo "::group::Install for Python $V"
             poetry env use $V
             source $(poetry env info --path)/bin/activate
@@ -219,7 +218,6 @@ jobs:
       - uses: actions/setup-python@v4
         with:
           python-version: |
-            3.8
             3.9
             3.10
             3.11
@@ -249,7 +247,7 @@ jobs:
         run: |
           WHEEL="dist/kaskada-${{ needs.version.outputs.version }}-cp38-abi3-manylinux_2_28_${{ matrix.target }}.whl"
           echo "WHEEL:${WHEEL}"
-          for V in 3.8 3.9 3.10 3.11; do
+          for V in 3.9 3.10 3.11; do
             echo "::group::Install for Python $V"
             poetry env use $V
             poetry env info

diff --git a/python/docs/source/_layouts/default.yml b/python/docs/source/_layouts/default.yml
@@ -0,0 +1,46 @@
+layers:
+  # the base layer for the background
+  - background:
+      color: "#26364a"
+      image: >-
+        #% if page.meta.card_image -%#
+        '{{ page.meta.card_image }}'
+        #%- elif layout.background_image -%#
+        '{{ layout.background_image }}'
+        #%- endif %#
+  # the layer for the logo image
+  - size: { width: 300, height: 83 }
+    offset: { x: 60, y: 60 }
+    icon:
+      image: "_static/kaskada-negative.svg"
+  # the layer for the page's title
+  - size: { width: 920, height: 300 }
+    offset: { x: 60, y: 180 }
+    typography:
+      content: >-
+        #% if page.meta.title -%#
+        '{{ page.meta.title }}'
+        #%- elif page.title -%#
+        '{{ page.title }}'
+        #%- endif %#
+      line:
+        # height: 0.85
+        amount: 3
+      font:
+        weight: 500
+      color: white
+  # the layer for the site's (or page's) description
+  - offset: { x: 60, y: 480 }
+    size: { width: 1080, height: 90 }
+    typography:
+      content: >-
+        #% if page.meta and page.meta.description -%#
+        '{{ page.meta.description }}'
+        #%- else -%#
+        '{{ config.site_description }}'
+        #%- endif %#
+      line:
+        height: 0.87
+        amount: 2
+      align: start bottom
+      color: white
diff --git a/python/docs/source/blog/index.md b/python/docs/source/blog/index.md
@@ -0,0 +1,10 @@
+# Blog
+
+```{eval-rst}
+.. postlist::
+    :list-style: circle
+    :format: {title}
+    :excerpts:
+    :sort:
+    :expand: Read more ...
+```
diff --git a/python/docs/source/blog/posts/2023-03-28-announcing-kaskada-oss.md b/python/docs/source/blog/posts/2023-03-28-announcing-kaskada-oss.md
@@ -0,0 +1,76 @@
+---
+blogpost: true
+author: ben
+date: 2023-03-28
+tags: releases
+excerpt: 1
+description: From Startup to Open Source Project
+---
+
+# Announcing Kaskada OSS
+
+Today, we’re announcing the open-source release of Kaskada – a modern, open-source event processing engine.
+
+# How it began: Simplifying ML
+
+Kaskada technology has evolved a lot since we began developing it three years ago. Initially, we were laser focused on the machine learning (ML) space. We saw many companies working on different approaches to the same ML problems -- managing computed feature values (what is now called a feature store), applying existing algorithms to train a model from those values, and serving that model by applying it to computed feature values. We saw a different problem.
+
+With our background in the data processing space we identified a critical gap -- no one was looking at the process of going from raw, event-based, data to computed feature values. This meant that users had to choose – use SQL and treat the events as a table, losing important information in the process, or use lower-level data pipeline APIs and worry about all the details. Our experience working on data processing systems at Google and as part of Apache Beam led us to create a compute engine designed for the needs of feature engineering — we called it a feature engine.
+
+We are extremely proud of where Kaskada technology is today. Unlike a feature store, it focuses on computing the features a user described using a simple, declarative language. Unlike existing data processing systems, it delivers on the needs of machine learning – expressing sophisticated, temporal features without leakage, working with raw events without pre-processing, and scalability that just worked for training and serving.
+
+The unique characteristics of Kaskada make it ideal for the time-based event processing required for accurate, real-time machine learning. While we see that ML will always be a great use case for Kaskada, we’ve realized it can be used for so much more.
+
+# Modern, Open-Source Event Processing
+
+When [DataStax acquired Kaskada](https://www.datastax.com/press-release/datastax-acquires-machine-learning-company-kaskada-to-unlock-real-time-ai) a few months ago, we began the process of open-sourcing the core Kaskada technology. In the conversations that followed, we realized that the capabilities of Kaskada that make it ideal for real-time ML – easy to use, high-performance columnar computations over event-based data – also make it great for general event processing. These features include:
+
+1. **Rich, Temporal Operations**: The ability to easily express computations over time beyond windowed aggregations. For instance, when computing training data it was often necessary to compute values at a point in time in the past and combine those with a label value computed at a later point in time. This led to a powerful set of operations for working with time.
+2. **Events all the way down**: The ability to run a query both to get all results over time and just the final results. This means that Kaskada operates directly on the events – turning a sequence of events into a sequence of changes, which may be observed directly or materialized to a table. By treating everything as events, the temporal operations are always available and you never need to think about the difference between streams and tables, nor do you need to use different APIs for each.
+3. **Modern and easy to use**: Kaskada is built in Rust and uses Apache Arrow for high-performance, columnar computations. It consists of a single binary which makes for easy local and cloud deployments.
+
+
+This led to the decision to open source Kaskada as a modern, open-source event-processing language and native engine. Machine learning is still a great use case of Kaskada, but we didn’t want the feature engine label to constrain community creativity and innovation. It’s all available today in the [GitHub repository](https://github.com/kaskada-ai/kaskada) under the Apache 2.0 License.
+
+# Why use Kaskada?
+
+Kaskada is for you if…
+
+1. **You want to compute the results of your query over time.**
+Operating over time all the way down means that Kaskada makes it easy to compute the result of any query over time.
+
+2. **You want to express temporal computations without writing pages of SQL.**
+Kaskada provides a declarative language for event-processing. Because of the focus on temporal computations and composability, it is much easier and shorter than comparable SQL queries.
+
+3. **You want to process events today without setting up other tools.**
+The columnar event-processing engine within Kaskada scales to X million events/second running on a single machine. This lets you get started and iterate quickly without becoming an expert in cluster management or big-data tools.
+
+
+# What’s coming next?
+
+Our first goal was getting the project released. Now that it is, we are excited to see where the project goes!
+
+Some improvements on our mind are shown below. We look forward to hearing your thoughts on what would help you process events.
+
+1.  **Increase extensibility and participate in the larger open-source community.**
+    - Introduce extension points for I/O connectors and contribute connectors for a larger set of supported formats.
+    - Expose a logical execution plan after the language constructs have been compiled away, so that other executors may be developed using the same parsing and type-checking rules.
+    - Introduce extension points for custom schema catalogs, allowing Kaskada queries to be compiled against existing data catalogs.
+
+2. **Align query capabilities with more general, event-processing use cases.**
+    - Ability to create composite events from patterns of existing events and subsequently process those composite events (“CEP”).
+    - Improvements to the declarative language to reduce surprises, make it more familiar to new users, and make it even easier to express temporal computations over events.
+
+3.  **Continue to improve local performance and usability.**
+    - Make it possible to use the engine more easily in a variety of ways – via a command line REPL, via an API, etc.
+    - Improve performance and latency of real-time and partitioned execution within the native engine.
+
+# How can I contribute?
+
+Give it a try – [download one of the releases](https://github.com/kaskada-ai/kaskada/releases) and run some computations on your event data. Let us know how it works for you, and what you’d like to see improved!
+
+We’d love to hear what you think - please comment or ask on our [Kaskada GitHub discussions page](https://github.com/kaskada-ai/kaskada/discussions).
+
+Help spread the word – Star and Follow the project on GitHub!
+
+Please file issues, start discussions or join us on GitHub to chat about the project or event-processing in general.
diff --git a/python/docs/source/blog/posts/2023-08-25-new-kaskada.md b/python/docs/source/blog/posts/2023-08-25-new-kaskada.md
@@ -0,0 +1,85 @@
+---
+blogpost: true
+date: 2023-08-25
+author: ryan
+tags: releases
+excerpt: 2
+description: Embedded in Python for accessible Real-Time AI
+---
+
+# Introducing the New Kaskada
+
+We started Kaskada with the goal of simplifying the real-time AI/ML lifecycle, and in the past year AI has exploded in usefulness and accessibility. Generative models and Large Language Models (LLMs) have revolutionized how we approach AI. Their accessibility and incredible capabilities have made AI more valuable than it has ever been and democratized the practice of AI.
+
+Still, a challenge remains: building and managing real-time AI applications.
+
+## The Challenge of using Real-Time Data in AI Applications
+
+Real-time data for AI Applications has always been surrounded by an array of challenges. For example:
+
+1. **Infrastructure Hurdles**: Accessing real-time data often means struggling to acquire data and deploying complex infrastructure, requiring significant time and expertise to get right.
+
+2. **Cumbersome Tools**: Traditional tools for streaming data are bulky, with steep learning curves and complex JVM-based setups.
+
+3. **Analysis Disconnect**: AI models thrive on historical data, but the tools designed for bulk historical analysis are often worlds apart from those made for real-time or streaming data processing.
+
+4. **Challenges of Time-Travel**: AI applications frequently require a unique kind of historical analysis – one that can time-travel through your data. Expressing such analyses is challenging with conventional analytic tools that weren’t designed with time in mind.
+
+These challenges have made it difficult for all but the largest companies with the deepest development budgets to deliver on the promise of real-time AI, and these are the challenges we built Kaskada to solve.
+
+## Welcome to the New Kaskada
+
+We originally built Kaskada as a managed service. Earlier this year, we [released Kaskada as an open-source, self-managed service](./2023-03-28-announcing-kaskada-oss.md), simplifying data onboarding and allowing Kaskada to be deployed anywhere.
+
+Today, we take the next step in improving Kaskada’s usability by providing its core compute engine as an embedded Python library. Because Kaskada is written in Rust, we’re able to leverage the excellent [PyO3](https://pyo3.rs/) project to compile Python-native bindings for our compute engine and support Python-defined UDF’s. Additionally, Kaskada is built using [Apache Arrow](https://arrow.apache.org/), which allows zero-copy data transfers between Kaskada and other Python libraries such as [Pandas](https://pandas.pydata.org/), allowing you to operate on your data in-place.
+
+We’re also changing how you query Kaskada by implementing our query DSL as Python functions. This change makes it easier to get started by eliminating the learning curve of a new language and improving integration with code editors, syntax highlighters, and AI coding assistants.
+
+The result is an easy-to-use Python-native library with all the efficiency and performance of our low-level Rust implementation, fully integrated with the rich Python ecosystem of AI/ML tools, visualization libraries etc.
+
+## Features for Real-Time AI Applications
+
+Real-Time AI is easier today than it's ever been:
+
+* Foundation models built by OpenAI, Facebook and others can be used as a starting point, allowing sophisticated applications to be built with a fraction of the data that would otherwise be necessary.
+* Services such as OpenAI eliminate the need to manage complex infrastructure.
+* Platforms like HuggingFace have made it easier than ever to share and collaborate on open LLMs.
+
+The New Kaskada complements these resources, making it easier than ever to utilize real-time data by providing several key components:
+
+### 1. Real-time Aggregation
+
+In a world where data is continuously flowing, being able to efficiently precompute model inputs is invaluable. With Kaskada's real-time aggregation, you can effortlessly:
+
+- Connect with multiple data streams using our robust data connectors.
+- Transform data on-the-go, ensuring that the model receives the most relevant inputs.
+- Perform complex aggregations to derive meaningful insights from streams of data, making sure your AI models always have the most pertinent information.
+- Pause and resume aggregations in the event of process termination.
+
+The result? Faster decision-making, timely insights, and AI models that are always a step ahead.
+
+### 2. Event Detection
+
+Real-time event detection can mean the difference between catching an anomaly and letting it slip through the cracks. The New Kaskada’s event detection system is designed to:
+
+- Expressively describe complex cross-event and cross-entity conditions to use as triggers.
+- Identify important activities and patterns as they occur, ensuring nothing goes unnoticed.
+- Trigger proactive AI behaviors, allowing for immediate actions or notifications based on the detected events.
+
+From spotting fraudulent activities to identifying high-priority user behaviors, Kaskada ensures that important activities are always on your radar.
+
+### 3. History Replay
+
+Past data holds the keys to effective future decisions. With Kaskada's history replay, you can:
+
+- Backtest AI models by revisiting historical data points.
+- Fine-tune models using per-example time travel, ensuring your models are always optimized based on past and present data.
+- Use point-in-time joins to seamlessly merge data from different data sources at a single point in history, unlocking deeper insights and more accurate predictions.
+
+Kaskada ties together the modern real-time AI stack, providing a data foundation for developing and operating AI applications.
+
+## Join the Community
+
+We believe in the transformative power of real-time AI and the possibilities it holds. We believe that real-time data will allow AI to go beyond question-answering to provide proactive, intelligent applications. We want to hear what excites you about real-time and generative AI - [Join our Slack community](https://kaskada.io/community/) and share your use cases, insights and experiences with the New Kaskada.
+
+*"Real-Time AI without the fuss."* Embrace the future with Kaskada.