-
Notifications
You must be signed in to change notification settings - Fork 919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KYUUBI #5328] Batch supports priority scheduling #5352
Conversation
@@ -26,6 +26,7 @@ object KyuubiReservedKeys { | |||
final val KYUUBI_SESSION_USER_SIGN = "kyuubi.session.user.sign" | |||
final val KYUUBI_SESSION_REAL_USER_KEY = "kyuubi.session.real.user" | |||
final val KYUUBI_SESSION_CONNECTION_URL_KEY = "kyuubi.session.connection.url" | |||
final val KYUUBI_SESSION_PRIORITY = "kyuubi.session.priority" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kyuubi.batch.priority
Codecov Report
@@ Coverage Diff @@
## master #5352 +/- ##
======================================
Coverage 0.00% 0.00%
======================================
Files 590 588 -2
Lines 33493 33434 -59
Branches 4424 4392 -32
======================================
+ Misses 33493 33434 -59
... and 9 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
} | ||
} | ||
|
||
override def pickMetadata(kyuubiInstance: String): Option[Metadata] = synchronized { | ||
JdbcUtils.executeQueryWithRowMapper( | ||
s"""SELECT identifier FROM $METADATA_TABLE | ||
|WHERE state=? | ||
|ORDER BY create_time ASC LIMIT 1 | ||
|ORDER BY priority DESC, create_time ASC LIMIT 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as we discussed before, there would cause significant performance issues on MySQL 5.7(which is adopted widely at the current stage), we should provide a switch to allow user disable this feature
As this involves a basic algorithm, we'd better add UT at the MetadataManager level to make sure the priority-based FIFO algorithm works as expected. |
Sure, i'll go on with schedule UT. |
kyuubi-server/src/test/scala/org/apache/kyuubi/server/metadata/MetadataManagerSuite.scala
Show resolved
Hide resolved
.doc("Whether to pick up job from metadata store order by priority." + | ||
"If disabled this, set kyuubi.batch.priority wont take effect." + | ||
"We recommend to enabled this when you already have a metadata store, " + | ||
"which can support mix order index(such as Mysql8)." + | ||
"See why recommend this in https://github.com/apache/kyuubi/pull/5329") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the text need to polish, again, "MySQL"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall.
Some optional comments to follow.
@@ -78,6 +78,7 @@ case class Metadata( | |||
engineState: String = null, | |||
engineError: Option[String] = None, | |||
endTime: Long = 0L, | |||
priority: Int = 10, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a magic code here. Default priority is worth to be extracted to explicit static value , and further mentioned in config docs. Or it's confusing for user and developers reviewing the UTs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's at least document it should be consistent with the metadata table DDL definition
val mockBatchJob1 = Metadata( | ||
identifier = "mock_batch_job_1", | ||
sessionType = SessionType.BATCH, | ||
realUser = "kyuubi", | ||
username = "kyuubi", | ||
engineType = EngineType.SPARK_SQL.toString, | ||
state = OperationState.INITIALIZED.toString, | ||
priority = 20, | ||
createTime = time + 10000) | ||
val mockBatchJob2 = Metadata( | ||
identifier = "mock_batch_job_2", | ||
sessionType = SessionType.BATCH, | ||
realUser = "kyuubi", | ||
username = "kyuubi", | ||
engineType = EngineType.SPARK_SQL.toString, | ||
state = OperationState.INITIALIZED.toString, | ||
createTime = time) | ||
val mockBatchJob3 = Metadata( | ||
identifier = "mock_batch_job_3", | ||
sessionType = SessionType.BATCH, | ||
realUser = "kyuubi", | ||
username = "kyuubi", | ||
engineType = EngineType.SPARK_SQL.toString, | ||
state = OperationState.INITIALIZED.toString, | ||
createTime = time + 10000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be extracted to a dedicate method for generating mocked batch job objects?
And make sure same job objects are the same in positive and negative tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
Reused and modified newMetadata
method to build mock batch job and keep use same mock object in both positive and negative case.
Could you please update the PR description to match the real change? |
Updated, mentioned new changes. |
.doc("Whether to enable picking up jobs from the metadata store in order of priority." + | ||
"If this feature is disabled, setting kyuubi.batch.priority will not take effect." + | ||
"However, it is recommended to enable this feature if you already have a metadata store " + | ||
"that can support mixed order indexing(such as MySQL 8)." + | ||
"See why recommend this in https://github.com/apache/kyuubi/pull/5329") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggested change:
Whether to enable the priority scheduling for batch impl v2. When false, ignore kyuubi.batch.priority
and use the FIFO ordering strategy for batch job scheduling. Note: it's recommended to use MySQL 8.0 as metadata backend when enabling this feature, since it may cause significant performance issues on MySQL 5.7 due to the lack of support for mixed order index. See more details at KYUUBI #5329.
Also, please take care of the space between each line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, only a suggestion of configuration docs
kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/jdbc/JDBCMetadataStore.scala
Outdated
Show resolved
Hide resolved
…/jdbc/JDBCMetadataStore.scala
### _Why are the changes needed?_ Follow #5329 and close #5328: 1. Add new config `kyuubi.metadata.store.jdbc.priority.enabled` to control whether enable priority scheduling, due to users may experience performance issues when using MySQL5.7 as metastore backend and enabling kyuubi batch v2 priority feature. 2. When priority scheduling is enabled, `KyuubiBatchService` picks metadata job with `ORDER BY priority DESC, create_time ASC`. 3. Insert metadata with priority field, default priority value is `10`. 4. Add new config `kyuubi.batch.priority` for each batch priority. ### _How was this patch tested?_ - [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request ### _Was this patch authored or co-authored using generative AI tooling?_ No Closes #5352 from zwangsheng/KYUUBI#5328. Closes #5328 687ed1e [Cheng Pan] Update kyuubi-server/src/main/scala/org/apache/kyuubi/server/metadata/jdbc/JDBCMetadataStore.scala 58621b5 [zwangsheng] fix comments 1bf81e7 [zwangsheng] fix style 7ed2551 [zwangsheng] update default priority desc & improve UT 21ceccb [zwangsheng] fix doc 27fc5e8 [zwangsheng] enrich desc c0bbc0d [zwangsheng] fix style 6b8d0f0 [zwangsheng] fix comment 67eb252 [zwangsheng] fix comment e1705c3 [zwangsheng] Add config to control whether pick order by priority or not 129a467 [zwangsheng] Add unit test for pickBatchForSubmitting fcaf85d [zwangsheng] Fix unit test f7ca221 [zwangsheng] Fix unit test 8d4b276 [wangsheng] fix code style 4c6b990 [wangsheng] fix comments 654ad84 [zwangsheng] [KYUUBI #5328][V2] Kyuubi Server Pick Metadata job with priority Lead-authored-by: zwangsheng <[email protected]> Co-authored-by: wangsheng <[email protected]> Co-authored-by: Cheng Pan <[email protected]> Signed-off-by: Cheng Pan <[email protected]> (cherry picked from commit b24d94e) Signed-off-by: Cheng Pan <[email protected]>
@@ -167,9 +169,10 @@ class JDBCMetadataStore(conf: KyuubiConf) extends MetadataStore with Logging { | |||
|request_args, | |||
|create_time, | |||
|engine_type, | |||
|cluster_manager | |||
|cluster_manager, | |||
|priority |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see the metadata table schema change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for review, metadata table schema changed in #5329
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @zwangsheng
Why are the changes needed?
Follow #5329 and close #5328:
kyuubi.metadata.store.jdbc.priority.enabled
to control whether enable priority scheduling, due to users may experience performance issues when using MySQL5.7 as metastore backend and enabling kyuubi batch v2 priority feature.KyuubiBatchService
picks metadata job withORDER BY priority DESC, create_time ASC
.10
.kyuubi.batch.priority
for each batch priority.How was this patch tested?
Add some test cases that check the changes thoroughly including negative and positive cases if possible
Add screenshots for manual tests if appropriate
Run test locally before make a pull request
Was this patch authored or co-authored using generative AI tooling?
No