Skip to content

Commit

Permalink
feat: improve structure and linking of common concepts (#763)
Browse files Browse the repository at this point in the history
* feat: improve structure and linking of common concepts

* fix depth

* fix duplication

* capitalize

* moving concepts to concept section

* triggers to concepts

* Update index.md

* task types

* fix links

* links

* more links

* Update 02b.using-expressions.md

* links

* fixing broken links

* Update 04.others.md

* task defaults + flow properties pages
  • Loading branch information
anna-geller authored Jan 9, 2024
1 parent 588c07e commit 4fd7cf8
Show file tree
Hide file tree
Showing 180 changed files with 1,195 additions and 1,000 deletions.
4 changes: 2 additions & 2 deletions content/blogs/2022-10-05-kestra-snowflake.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ The platform enables organizations to avoid large-scale licensing costs commonly

Data warehouse workloads are typically part of a larger technological stack. To streamline operations, orchestration, and scheduling of data pipelines are crucial. This is where Kestra comes into play.

Kestra is designed to orchestrate and schedule scalable data workflows, thereby enhancing DataOps teams' productivity. It can construct, operate, manage, and monitor a [variety of complex workflows](../docs/02.tutorial/05.flowable.md) sequentially or in parallel.
Kestra is designed to orchestrate and schedule scalable data workflows, thereby enhancing DataOps teams' productivity. It can construct, operate, manage, and monitor a [variety of complex workflows](../docs/01.tutorial/05.flowable.md) sequentially or in parallel.

Kestra can execute workflows based on event-based, time-based, and API-based scheduling, giving complete control.
Snowflake already offers many cost optimization processes like data compression and auto-scaling. However, Kestra makes it simpler to [download](../plugins/plugin-jdbc-snowflake/tasks/io.kestra.plugin.jdbc.snowflake.Download.md), [upload](../plugins/plugin-jdbc-snowflake/tasks/io.kestra.plugin.jdbc.snowflake.Upload.md), and [query](../plugins/plugin-jdbc-snowflake/tasks/io.kestra.plugin.jdbc.snowflake.Query.md) data by integrating with Snowflake's storage and compute resources.
Expand Down Expand Up @@ -141,5 +141,5 @@ Kestra provides flexibility and control to data teams, it can orchestrate any ki
Kestra's Snowflake plugin makes data warehousing simple even for non-developers thanks to YAML. Your Snowflake storage pipeline can accommodates raw data from multiple sources and transforms it using ETL operations. Additionally, you can skip the transformation and directly load data into the warehouse using the [ELT pipeline](./2022-04-27-etl-vs-elt.md). Kestra can manage both workflows simultaneously. In any case, Kestra ensures that the data is readily available to perform analysis and learn valuable patterns.

Join the Slack [community](https://kestra.io/slack) if you have any questions or need assistance.
Follow us on [Twitter](https://twitter.com/kestra_io) for the latest news.
Follow us on [Twitter](https://twitter.com/kestra_io) for the latest news.
Check the code in our [GitHub repository](https://github.com/kestra-io/kestra) and give us a star if you like the project.
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ While building Kestra, we wanted to rely only on the queue as a database for our

## Same Kafka Topic for Source and Destination

In Kestra, we have a [Kafka topic](https://kafka.apache.org/intro#intro_concepts_and_terms) for the current flow [execution](../docs/03.concepts/02.executions.md). That topic is both the source and the destination. We update the current execution to add some information and send it back to Kafka for further processing.
In Kestra, we have a [Kafka topic](https://kafka.apache.org/intro#intro_concepts_and_terms) for the current flow [execution](../docs/03.concepts/execution.md). That topic is both the source and the destination. We update the current execution to add some information and send it back to Kafka for further processing.

Initially, we were unsure if this design was possible with Kafka. We [asked](https://twitter.com/tchiotludo/status/1252197729406783488) Matthias J. Sax, one of the primary maintainers of Kafka Streams, who responded on [Stack Overflow](https://stackoverflow.com/questions/61316312/does-kafka-stream-with-same-sink-source-topics-with-join-is-supported).

Expand Down Expand Up @@ -124,7 +124,7 @@ Our first assumption was that `all()` returns an object (Flow in our case), as t
- Fetch all the data from RocksDB
- Deserialize the data from RocksDB that is stored as byte, and map it to concrete Java POJO

So each time we call the `all()` method, all values are deserialized, which can lead to high CPU usage and latency on your stream. We are talking about all [flow revisions](../docs/03.concepts/01.flows.md#revision) on our cluster. The last revision had 2.5K flows, but we don't see people creating a lot of revisions. Imagine 100K `byte[]` to deserialize to POJO for every call. 🤯
So each time we call the `all()` method, all values are deserialized, which can lead to high CPU usage and latency on your stream. We are talking about all [flow revisions](../docs/03.concepts/flow.md#revision) on our cluster. The last revision had 2.5K flows, but we don't see people creating a lot of revisions. Imagine 100K `byte[]` to deserialize to POJO for every call. 🤯

Since we only need the last revision in our use case, we create an in-memory Map with all the flows using the following:

Expand Down
Loading

0 comments on commit 4fd7cf8

Please sign in to comment.