Replies: 2 comments
-
excellent write up. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Thank you to drive it. It's extremely cool. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Motivation
Kyuubi currently mainly builds an efficient tenant management mechanism around the Spark ecosystem, which greatly reduces the threshold and cost for users to use Spark SQL. However, the enterprise-level big data platform is usually an ecosystem composed of multiple engines, and some enhancement requirements for the ease of use of Spark naturally extend to other engines, such as Flink1.
Flink has similar design goals to Spark, and its positioning is also a general-purpose computing and analysis engine that can support streaming and batching. Based on the design concept of DataFlow, Flink performs particularly well in real-time stream computing scenarios, which has attracted a large number of users. For well-known reasons, it has become popular in China. We have seen some explorations and practices of Flink users in building a Gateway that can support multi-tenancy and simplify the cost of SQL submission 2.
Flink's parent company, Ververica, once open-sourced a Gateway3 for Flink SQL and a JDBC Driver4 based on it. These two components can reduce the threshold for users to submit SQL to a certain extent. Unfortunately, the community and ecology of these two projects are not very active after opening the source code. And on the whole, it is not production-ready and its version is 1.12 much older than the newest released version 1.14.x. Moreover, There are many problems reported by users that have not been resolved 5. One user has summarized some of its drawbacks2 as follows:
In addition, Flink's codebase provides a sql-client component, which contains some basic ability(e.g.
Executor
,SessionContext
,ExecutionContext
,DefaultContext
) to implement a general SQL submitter like the ververica's gateway3.Taking into account the needs of the community, and Kyuubi is a multi-tenant framework dedicated to the development of multiple engines. It is definitely better for users to build a consistent and smooth experience for enterprises around a unified multi-tenant framework than vertical growth in a chimney style. Therefore, I submit this proposal to discuss the design and implementation of the integration of Kyuubi and Flink.
Design
Before proceeding with the specific design, let's first analyze the design of Flink SQL Gateway and Flink SQL JDBC driver. It provides a basic implementation for simplifying the use of Flink SQL and introducing the Session mechanism. On the whole, the modes of interaction between them and Flink Cluster (here we do not distinguish the type of Cluster for the time being, assuming that SQL Gateway can interact with all types of Flink Cluster normally) are as follows:
From a functional point of view, it is somewhat similar to Kyuubi Server, of course, there are big differences in the supported protocols and the implementation of the engine. We expect Kyuubi to support Flink on the engine side, not on the peer server side. Therefore, related capabilities should be pushed down to the engine side of the back end of the server.
There is a big difference between integrating Flink and Spark on the engine side. The Spark application runs as an independent process set on the cluster, coordinated by the SparkContext object in the main program (called the driver) 6. Unfortunately, Flink does not have such a component. Flink's native client is not a resident component. Its JobMaster acts as a coordinator, but it cannot run user-defined logic. After Flink v1.11, it provided a new deployment model named Application Mode which is similar to Spark Driver but has some limitations7.
Therefore, considering the difference in architecture design between Flink and Spark, we need to provide a driver-like role for Flink Engine in Kyuubi, which we can call "Engine container". In order to focus on this proposal (providing the basic implementation of Kyuubi and Flink integration), we will start an Engine container process locally as its runtime, which is responsible for communicating with Kyuubi Server and Flink Cluster. In the future, we will consider adapting it to the requirements related to the deployment model and resource management framework supported by Kyuubi and load balancing.
Let Kyuubi integrate Flink and adapt to Kyuubi's overall architecture, it is impossible to move Flink SQL Gateway directly to Kyuubi. We need it to implement FrontendService and BackendService on the Kyuubi Engine side, and the front link is still based on the Thrift protocol. It looks like the picture below:
Next, based on some of the above design considerations, let's explain how to implement Flink SQL Engine.
Implementation
There are currently two operations supported by Flink SQL Gateway: JobOperation / NonJobOperation.
This is somewhat different from the current implementation of Spark integration in Kyuubi. Kyuubi fetches all this information generated by Spark's own execution mechanism. Therefore, it can be considered that Spark Engine maintains metadata based on the Server-side, and Flink Engine maintains metadata based on the Client-side.
The metadata of the current Flink SQL Gateway is stored inside memory by default, which means it is volatile and unreliable. In the future, we will provide a set of persistence solutions so that metadata can be persisted to specific storage in a plug-in manner.
Based on the design of the Kyuubi, we expect the support of the Flink engine to follow the unified abstraction of the Kyuubi Server. Drawing on Kyuubi's implementation of Spark, the preliminary class diagram for Flink Engine integration is as follows:
In short, we need to introduce:
EngineType
to distinguish spark and flink when creating engine.ThriftFrontendService
andTCLIService
SessionManager
toFlinkSessionManager
SessionManager
, specify detailedOperationManager
and manage session lifestyle.AbstractOperation
Scope and limitation
Supporting FlinkSQL engine is a big feature for Kyuubi and we are exploring the major capacity of the first version. So some limitations list below:
Beta Was this translation helpful? Give feedback.
All reactions