-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement FlintJob to handle all query types in warmpool mode #979
base: main
Are you sure you want to change the base?
Conversation
6747ab9
to
59aa26b
Compare
spark-sql-application/src/main/scala/org/apache/spark/sql/FlintJob.scala
Show resolved
Hide resolved
spark-sql-application/src/main/scala/org/apache/spark/sql/FlintJob.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify and document how WarmPool is abstracted and can be enabled/disabled?
spark-sql-application/src/main/scala/org/apache/spark/sql/FlintJob.scala
Outdated
Show resolved
Hide resolved
spark-sql-application/src/main/scala/org/apache/spark/sql/FlintJob.scala
Outdated
Show resolved
Hide resolved
def getSegmentName(sparkSession: SparkSession): String = { | ||
val maxExecutorsCount = | ||
sparkSession.conf.get(FlintSparkConf.MAX_EXECUTORS_COUNT.key, "unknown") | ||
String.format("%se", maxExecutorsCount) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This segmentName
is specific to warmpool logic; let us create abstractions on warmpool and record metrics via AOP.
spark-sql-application/src/main/scala/org/apache/spark/sql/FlintREPL.scala
Outdated
Show resolved
Hide resolved
spark-sql-application/src/main/scala/org/apache/spark/sql/JobOperator.scala
Show resolved
Hide resolved
spark-sql-application/src/main/scala/org/apache/spark/sql/FlintJob.scala
Outdated
Show resolved
Hide resolved
044aeea
to
adef5b6
Compare
spark-sql-application/src/main/scala/org/apache/spark/sql/WarmpoolJob.scala
Show resolved
Hide resolved
spark-sql-application/src/main/scala/org/apache/spark/sql/WarmpoolJob.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove the concept of interactive / batch / streaming job for warm pool?
spark-sql-application/src/main/scala/org/apache/spark/sql/WarmpoolJob.scala
Outdated
Show resolved
Hide resolved
} | ||
} | ||
|
||
def queryLoop(commandContext: CommandContext): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need the concept of query loop for warm pool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warmpool requires multiple iterations as well before running the actual query.
e195862
to
b19028b
Compare
Signed-off-by: Shri Saran Raj N <[email protected]>
b19028b
to
e1db8de
Compare
|
||
// osClient needs spark session to be created first to get FlintOptions initialized. | ||
// Otherwise, we will have connection exception from EMR-S to OS. | ||
val osClient = new OSClient(FlintSparkConf().flintOptions()) | ||
|
||
// QueryResultWriter depends on sessionManager to fetch the sessionContext | ||
val sessionManager = instantiateSessionManager(sparkSession, Some(resultIndex)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since JobOperator
needs to support interactive queries, QueryResultWriter will be used. QueryResultWriterImpl
, which handles the writing of query results, depends on sessionManager
.
That's why sessionManager
is being introduced here to satisfy this dependency (for interactive queries)
Signed-off-by: Shri Saran Raj N <[email protected]>
Description
This PR introduces support for FlintJob to handle all types of queries — interactive, streaming, and batch — with all data sources in warmpool mode. Additionally, FlintJob will also support non-warmpool mode for streaming and batch queries, configurable via a Spark configuration setting.
FlintJob invokes
Warmpool.scala
, which in turn calls the client to continuously fetch queries for execution. The client sets various Spark configurations, such as the datasource, resultIndex, and other parameters. It also controls when to terminate the loop and stop the job. When a valid query is received, the JobOperator flow is triggered to execute the query and write the results accordingly.Changes:
getNextStatement()
in a loop.JobOperator
to write the query results either to QueryResultWriter or an OpenSearch Index, depending on the job type.JobOperator
.Related Issues
Check List
--signoff
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.