Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RESTful API supports killing engine forcibly #6008

Closed
wants to merge 32 commits into from

Conversation

zhaohehuhu
Copy link
Contributor

@zhaohehuhu zhaohehuhu commented Jan 23, 2024

🔍 Description

Issue References 🔗

Describe Your Solution 🔧

I'd like to introduce the feature that allows users to forcibly kill an engine through API.

Types of changes 🔖

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Test Plan 🧪

Behavior Without This Pull Request ⚰️

Behavior With This Pull Request 🎉

Related Unit Tests


Checklist 📝

Be nice. Be informative.

@codecov-commenter
Copy link

codecov-commenter commented Jan 25, 2024

Codecov Report

Attention: Patch coverage is 0% with 36 lines in your changes missing coverage. Please review.

Project coverage is 0.00%. Comparing base (23a6ee1) to head (00c208a).
Report is 7 commits behind head on master.

Files Patch % Lines
...rg/apache/kyuubi/server/api/v1/AdminResource.scala 0.00% 15 Missing ⚠️
...rg/apache/kyuubi/engine/ApplicationOperation.scala 0.00% 12 Missing ⚠️
...in/java/org/apache/kyuubi/client/AdminRestApi.java 0.00% 4 Missing ⚠️
...n/scala/org/apache/kyuubi/engine/ProcBuilder.scala 0.00% 3 Missing ⚠️
...ubi/engine/spark/SparkTBinaryFrontendService.scala 0.00% 1 Missing ⚠️
...pache/kyuubi/service/AbstractFrontendService.scala 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master   #6008    +/-   ##
=======================================
  Coverage    0.00%   0.00%            
=======================================
  Files         675     677     +2     
  Lines       41631   41775   +144     
  Branches     5687    5707    +20     
=======================================
- Misses      41631   41775   +144     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zhaohehuhu
Copy link
Contributor Author

Done. Thanks

@github-actions github-actions bot added the kind:documentation Documentation is a feature! label Jun 25, 2024
@zhaohehuhu
Copy link
Contributor Author

Please don't forget to update the RESTful API docs.

Done.

@zhaohehuhu
Copy link
Contributor Author

the failed UTs are related, because this PR appends a new configuration to the engine launch command, they should be easy to debug.

OK. I'm looking into it

| subdomain | the engine subdomain | String(optional) |
| proxyUser | the proxy user to impersonate | String(optional) |
| hive.server2.proxy.user | the proxy user to impersonate | String(optional) |
| kill | whether to kill the engine forcibly | Boolean(optional) |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's mention the default value is false

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

StandardCharsets.UTF_8)
mapper.readValue(json, classOf[ApplicationManagerInfo])
} catch {
case _: Throwable => ApplicationManagerInfo(None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log the original encodedStr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Thanks.

.createWithDefault(Seq("spark.driver.memory", "spark.executor.memory"))
.createWithDefault(Seq(
"spark.driver.memory",
"spark.executor.memory"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restore

Copy link
Contributor Author

@zhaohehuhu zhaohehuhu Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format issue. Restored.


def deserialize(encodedStr: String): ApplicationManagerInfo = {
try {
info(s"The original string encoded:$encodedStr")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to catch block. we only need to print the base64 string when something goes wrong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Done

@pan3793
Copy link
Member

pan3793 commented Jun 26, 2024

cc @turboFei, this patch now adopts your suggestions, please take a look when you have time.

@turboFei
Copy link
Member

would you update the AdminRestApi and CommandLine?

@pan3793 pan3793 closed this in ab273c8 Jul 1, 2024
@pan3793
Copy link
Member

pan3793 commented Jul 1, 2024

Thanks, merged to master

@zhaohehuhu zhaohehuhu deleted the dev-0123 branch July 22, 2024 07:48
val engineNodes = discoveryClient.getServiceNodesInfo(engineSpace, silent = true)
engineNodes.foreach { engineNode =>
val nodePath = s"$engineSpace/${engineNode.nodeName}"
val engineRefId = engineNode.engineRefId.orNull
info(s"Deleting engine node:$nodePath")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late comment.
the engineSpace is not cleaned right? Why don't we delete it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except for CONNECTION level engine, we cannot guarantee that there are no other engines access the same engineSpace

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean there can be multiple engines under same subdomain?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. Even in GROUP level, we can only create one engine per subdomain, right. And subdomain has been included in engineSpace right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are some corner cases. for example, if the engine has no response due to full GC or overload, kyuubi will create a new one on the same engine space. I also see sometimes the dist lock does not work properly(which should not, may be bugs), that also causes multi engines live in the same subdomain. finally, in theory, there are race conditions for kyuubi server deleting and engine registering under the same subdomain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants