[KPIP-2] Kyuubi Support Flink SQL Engine Design #1132

yanghua · 2021-09-22T07:09:43Z

yanghua
Sep 22, 2021
Collaborator

Motivation

Kyuubi currently mainly builds an efficient tenant management mechanism around the Spark ecosystem, which greatly reduces the threshold and cost for users to use Spark SQL. However, the enterprise-level big data platform is usually an ecosystem composed of multiple engines, and some enhancement requirements for the ease of use of Spark naturally extend to other engines, such as Flink1.

Flink has similar design goals to Spark, and its positioning is also a general-purpose computing and analysis engine that can support streaming and batching. Based on the design concept of DataFlow, Flink performs particularly well in real-time stream computing scenarios, which has attracted a large number of users. For well-known reasons, it has become popular in China. We have seen some explorations and practices of Flink users in building a Gateway that can support multi-tenancy and simplify the cost of SQL submission 2.

Flink's parent company, Ververica, once open-sourced a Gateway3 for Flink SQL and a JDBC Driver4 based on it. These two components can reduce the threshold for users to submit SQL to a certain extent. Unfortunately, the community and ecology of these two projects are not very active after opening the source code. And on the whole, it is not production-ready and its version is 1.12 much older than the newest released version 1.14.x. Moreover, There are many problems reported by users that have not been resolved 5. One user has summarized some of its drawbacks2 as follows:

Does not support metadata persistence
Does not support multi-tenancy (authentication and resources)
Cannot scale horizontally
Missing job status management

In addition, Flink's codebase provides a sql-client component, which contains some basic ability(e.g. Executor, SessionContext, ExecutionContext, DefaultContext) to implement a general SQL submitter like the ververica's gateway3.

Taking into account the needs of the community, and Kyuubi is a multi-tenant framework dedicated to the development of multiple engines. It is definitely better for users to build a consistent and smooth experience for enterprises around a unified multi-tenant framework than vertical growth in a chimney style. Therefore, I submit this proposal to discuss the design and implementation of the integration of Kyuubi and Flink.

Design

Before proceeding with the specific design, let's first analyze the design of Flink SQL Gateway and Flink SQL JDBC driver. It provides a basic implementation for simplifying the use of Flink SQL and introducing the Session mechanism. On the whole, the modes of interaction between them and Flink Cluster (here we do not distinguish the type of Cluster for the time being, assuming that SQL Gateway can interact with all types of Flink Cluster normally) are as follows:

From a functional point of view, it is somewhat similar to Kyuubi Server, of course, there are big differences in the supported protocols and the implementation of the engine. We expect Kyuubi to support Flink on the engine side, not on the peer server side. Therefore, related capabilities should be pushed down to the engine side of the back end of the server.

There is a big difference between integrating Flink and Spark on the engine side. The Spark application runs as an independent process set on the cluster, coordinated by the SparkContext object in the main program (called the driver) 6. Unfortunately, Flink does not have such a component. Flink's native client is not a resident component. Its JobMaster acts as a coordinator, but it cannot run user-defined logic. After Flink v1.11, it provided a new deployment model named Application Mode which is similar to Spark Driver but has some limitations7.

Therefore, considering the difference in architecture design between Flink and Spark, we need to provide a driver-like role for Flink Engine in Kyuubi, which we can call "Engine container". In order to focus on this proposal (providing the basic implementation of Kyuubi and Flink integration), we will start an Engine container process locally as its runtime, which is responsible for communicating with Kyuubi Server and Flink Cluster. In the future, we will consider adapting it to the requirements related to the deployment model and resource management framework supported by Kyuubi and load balancing.

Let Kyuubi integrate Flink and adapt to Kyuubi's overall architecture, it is impossible to move Flink SQL Gateway directly to Kyuubi. We need it to implement FrontendService and BackendService on the Kyuubi Engine side, and the front link is still based on the Thrift protocol. It looks like the picture below:

Next, based on some of the above design considerations, let's explain how to implement Flink SQL Engine.

Implementation

There are currently two operations supported by Flink SQL Gateway: JobOperation / NonJobOperation.

JobOperation: Need to explicitly interact with Flink Cluster, currently there are only two types of Operation (SELECT/INSERT);
NonJobOperation: No need, just access local metadata;

This is somewhat different from the current implementation of Spark integration in Kyuubi. Kyuubi fetches all this information generated by Spark's own execution mechanism. Therefore, it can be considered that Spark Engine maintains metadata based on the Server-side, and Flink Engine maintains metadata based on the Client-side.

The metadata of the current Flink SQL Gateway is stored inside memory by default, which means it is volatile and unreliable. In the future, we will provide a set of persistence solutions so that metadata can be persisted to specific storage in a plug-in manner.

Based on the design of the Kyuubi, we expect the support of the Flink engine to follow the unified abstraction of the Kyuubi Server. Drawing on Kyuubi's implementation of Spark, the preliminary class diagram for Flink Engine integration is as follows:

In short, we need to introduce:

A EngineType to distinguish spark and flink when creating engine.
FlinkThriftFrontendService: to override some methods extends from ThriftFrontendService and TCLIService
FlinkSQLBackendService: to specify the detailed SessionManager to FlinkSessionManager
FlinkSQLSessionManager: to implement the interface methods comes from SessionManager, specify detailed OperationManager and manage session lifestyle.
FlinkSessionImpl: the necessity of this class would be decided on the implement-time.
FlinkSQLOperationManager: used to manage operations
FlinkOperation: to override some aspect methods and interface methods of AbstractOperation
FlinkSQLEngine: the main class of the Flink SQL Engine

Scope and limitation

Supporting FlinkSQL engine is a big feature for Kyuubi and we are exploring the major capacity of the first version. So some limitations list below:

We based on the 1.12 major Flink version to align with Flink SQL Gateway3
We choose the widely used deploy mode that is: Per Job mode.
Currently, we only focus on JDBC protocol and Streaming SQL summission.

zinking · 2022-02-16T09:01:08Z

zinking
Feb 16, 2022

excellent write up.

0 replies

a49a · 2022-02-17T12:09:41Z

a49a
Feb 17, 2022

Thank you to drive it. It's extremely cool.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KPIP-2] Kyuubi Support Flink SQL Engine Design #1132

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

[KPIP-2] Kyuubi Support Flink SQL Engine Design #1132

yanghua Sep 22, 2021 Collaborator

Motivation

Design

Implementation

Scope and limitation

Replies: 2 comments

zinking Feb 16, 2022

a49a Feb 17, 2022

yanghua
Sep 22, 2021
Collaborator

zinking
Feb 16, 2022

a49a
Feb 17, 2022