Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exited spark-submit process should not block batch submit queue #6028

Closed
wants to merge 1 commit into from

Conversation

pan3793
Copy link
Member

@pan3793 pan3793 commented Jan 30, 2024

🔍 Description

Issue References 🔗

While enabling batch implementation V2 with the following configurations

kyuubi.batch.impl.version=2
kyuubi.batch.submitter.enabled=true
kyuubi.batch.submitter.threads=48
spark.master=yarn
spark.submit.deployMode=cluster
spark.yarn.submit.waitAppCompletion=false

I found that the batch jobs will be blocked in the DB queue once a YARN queue has no resources, this brings an issue, the subsequential batch jobs that are going to be submitted to another YARN queue also be queued in DB, rather than YARN queue.

mysql> select state, engine_state, count(1) from metadata where state in ('INITIALIZED', 'PENDING', 'RUNNING') group by state, engine_state;
+-------------+--------------+----------+
| state       | engine_state | count(1) |
+-------------+--------------+----------+
| INITIALIZED | NULL         |      166 |
| PENDING     | NULL         |        1 |
| RUNNING     | PENDING      |      148 |
| RUNNING     | RUNNING      |      415 |
+-------------+--------------+----------+

Describe Your Solution 🔧

The submitter queue whose size is controlled by kyuubi.batch.submitter.threads is designed to address the spark-submit process concurrency issue, too many spark-submit processes may run out of the Kyuubi server's node CPU/memory resources and eventually crash the service. For Spark YARN cluster mode, if set spark.yarn.submit.waitAppCompletion=false, the local spark-submit process exits immediately once the Application goes ACCEPTED status, even no resource could be allocated for the AM container, we should not block such case in submitter queue.

Types of changes 🔖

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Test Plan 🧪

Pass GA, and roll out into internal cluster.


Checklist 📝

Be nice. Be informative.

BruceKellan pushed a commit to BruceKellan/kyuubi that referenced this pull request Jan 30, 2024
…r/web-ui

Bumps [axios](https://github.com/axios/axios) from 0.27.2 to 1.6.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/axios/axios/releases">axios's releases</a>.</em></p>
<blockquote>
<h2>Release v1.6.0</h2>
<h2>Release notes:</h2>
<h3>Bug Fixes</h3>
<ul>
<li><strong>CSRF:</strong> fixed CSRF vulnerability CVE-2023-45857 (<a href="https://redirect.github.com/axios/axios/issues/6028">#6028</a>) (<a href="https://github.com/axios/axios/commit/96ee232bd3ee4de2e657333d4d2191cd389e14d0">96ee232</a>)</li>
<li><strong>dns:</strong> fixed lookup function decorator to work properly in node v20; (<a href="https://redirect.github.com/axios/axios/issues/6011">#6011</a>) (<a href="https://github.com/axios/axios/commit/5aaff532a6b820bb9ab6a8cd0f77131b47e2adb8">5aaff53</a>)</li>
<li><strong>types:</strong> fix AxiosHeaders types; (<a href="https://redirect.github.com/axios/axios/issues/5931">#5931</a>) (<a href="https://github.com/axios/axios/commit/a1c8ad008b3c13d53e135bbd0862587fb9d3fc09">a1c8ad0</a>)</li>
</ul>
<h3>PRs</h3>
<ul>
<li>CVE 2023 45857 ( <a href="https://api.github.com/repos/axios/axios/pulls/6028">#6028</a> )</li>
</ul>
<pre><code>
⚠️ Critical vulnerability fix. See https://security.snyk.io/vuln/SNYK-JS-AXIOS-6032459
</code></pre>
<h3>Contributors to this release</h3>
<ul>
<li> <a href="https://github.com/DigitalBrainJS" title="+449/-114 ([apache#6032](axios/axios#6032) [apache#6021](axios/axios#6021) [apache#6011](axios/axios#6011) [apache#5932](axios/axios#5932) [apache#5931](axios/axios#5931) )">Dmitriy Mozgovoy</a></li>
<li> <a href="https://github.com/valentin-panov" title="+4/-4 ([apache#6028](axios/axios#6028) )">Valentin Panov</a></li>
<li> <a href="https://github.com/therealrinku" title="+1/-1 ([apache#5889](axios/axios#5889) )">Rinku Chaudhari</a></li>
</ul>
<h2>Release v1.5.1</h2>
<h2>Release notes:</h2>
<h3>Bug Fixes</h3>
<ul>
<li><strong>adapters:</strong> improved adapters loading logic to have clear error messages; (<a href="https://redirect.github.com/axios/axios/issues/5919">#5919</a>) (<a href="https://github.com/axios/axios/commit/e4107797a7a1376f6209fbecfbbce73d3faa7859">e410779</a>)</li>
<li><strong>formdata:</strong> fixed automatic addition of the <code>Content-Type</code> header for FormData in non-browser environments; (<a href="https://redirect.github.com/axios/axios/issues/5917">#5917</a>) (<a href="https://github.com/axios/axios/commit/bc9af51b1886d1b3529617702f2a21a6c0ed5d92">bc9af51</a>)</li>
<li><strong>headers:</strong> allow <code>content-encoding</code> header to handle case-insensitive values (<a href="https://redirect.github.com/axios/axios/issues/5890">#5890</a>) (<a href="https://redirect.github.com/axios/axios/issues/5892">#5892</a>) (<a href="https://github.com/axios/axios/commit/4c89f25196525e90a6e75eda9cb31ae0a2e18acd">4c89f25</a>)</li>
<li><strong>types:</strong> removed duplicated code (<a href="https://github.com/axios/axios/commit/9e6205630e1c9cf863adf141c0edb9e6d8d4b149">9e62056</a>)</li>
</ul>
<h3>Contributors to this release</h3>
<ul>
<li> <a href="https://github.com/DigitalBrainJS" title="+89/-18 ([apache#5919](axios/axios#5919) [apache#5917](axios/axios#5917) )">Dmitriy Mozgovoy</a></li>
<li> <a href="https://github.com/DavidJDallas" title="+11/-5 ()">David Dallas</a></li>
<li> <a href="https://github.com/fb-sean" title="+2/-8 ()">Sean Sattler</a></li>
<li> <a href="https://github.com/0o001" title="+4/-4 ()">Mustafa Ateş Uzun</a></li>
<li> <a href="https://github.com/sfc-gh-pmotacki" title="+2/-1 ([apache#5892](axios/axios#5892) )">Przemyslaw Motacki</a></li>
<li> <a href="https://github.com/Cadienvan" title="+1/-1 ()">Michael Di Prisco</a></li>
</ul>
<h2>Release v1.5.0</h2>
<h2>Release notes:</h2>
<h3>Bug Fixes</h3>
<ul>
<li><strong>adapter:</strong> make adapter loading error more clear by using platform-specific adapters explicitly (<a href="https://redirect.github.com/axios/axios/issues/5837">#5837</a>) (<a href="https://github.com/axios/axios/commit/9a414bb6c81796a95c6c7fe668637825458e8b6d">9a414bb</a>)</li>
<li><strong>dns:</strong> fixed <code>cacheable-lookup</code> integration; (<a href="https://redirect.github.com/axios/axios/issues/5836">#5836</a>) (<a href="https://github.com/axios/axios/commit/b3e327dcc9277bdce34c7ef57beedf644b00d628">b3e327d</a>)</li>
<li><strong>headers:</strong> added support for setting header names that overlap with class methods; (<a href="https://redirect.github.com/axios/axios/issues/5831">#5831</a>) (<a href="https://github.com/axios/axios/commit/d8b4ca0ea5f2f05efa4edfe1e7684593f9f68273">d8b4ca0</a>)</li>
<li><strong>headers:</strong> fixed common Content-Type header merging; (<a href="https://redirect.github.com/axios/axios/issues/5832">#5832</a>) (<a href="https://github.com/axios/axios/commit/8fda2766b1e6bcb72c3fabc146223083ef13ce17">8fda276</a>)</li>
</ul>
<h3>Features</h3>
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/axios/axios/blob/v1.x/CHANGELOG.md">axios's changelog</a>.</em></p>
<blockquote>
<h1><a href="https://github.com/axios/axios/compare/v1.5.1...v1.6.0">1.6.0</a> (2023-10-26)</h1>
<h3>Bug Fixes</h3>
<ul>
<li><strong>CSRF:</strong> fixed CSRF vulnerability CVE-2023-45857 (<a href="https://redirect.github.com/axios/axios/issues/6028">#6028</a>) (<a href="https://github.com/axios/axios/commit/96ee232bd3ee4de2e657333d4d2191cd389e14d0">96ee232</a>)</li>
<li><strong>dns:</strong> fixed lookup function decorator to work properly in node v20; (<a href="https://redirect.github.com/axios/axios/issues/6011">#6011</a>) (<a href="https://github.com/axios/axios/commit/5aaff532a6b820bb9ab6a8cd0f77131b47e2adb8">5aaff53</a>)</li>
<li><strong>types:</strong> fix AxiosHeaders types; (<a href="https://redirect.github.com/axios/axios/issues/5931">#5931</a>) (<a href="https://github.com/axios/axios/commit/a1c8ad008b3c13d53e135bbd0862587fb9d3fc09">a1c8ad0</a>)</li>
</ul>
<h3>PRs</h3>
<ul>
<li>CVE 2023 45857 ( <a href="https://api.github.com/repos/axios/axios/pulls/6028">#6028</a> )</li>
</ul>
<pre><code>
⚠️ Critical vulnerability fix. See https://security.snyk.io/vuln/SNYK-JS-AXIOS-6032459
</code></pre>
<h3>Contributors to this release</h3>
<ul>
<li> <a href="https://github.com/DigitalBrainJS" title="+449/-114 ([apache#6032](axios/axios#6032) [apache#6021](axios/axios#6021) [apache#6011](axios/axios#6011) [apache#5932](axios/axios#5932) [apache#5931](axios/axios#5931) )">Dmitriy Mozgovoy</a></li>
<li> <a href="https://github.com/valentin-panov" title="+4/-4 ([apache#6028](axios/axios#6028) )">Valentin Panov</a></li>
<li> <a href="https://github.com/therealrinku" title="+1/-1 ([apache#5889](axios/axios#5889) )">Rinku Chaudhari</a></li>
</ul>
<h2><a href="https://github.com/axios/axios/compare/v1.5.0...v1.5.1">1.5.1</a> (2023-09-26)</h2>
<h3>Bug Fixes</h3>
<ul>
<li><strong>adapters:</strong> improved adapters loading logic to have clear error messages; (<a href="https://redirect.github.com/axios/axios/issues/5919">#5919</a>) (<a href="https://github.com/axios/axios/commit/e4107797a7a1376f6209fbecfbbce73d3faa7859">e410779</a>)</li>
<li><strong>formdata:</strong> fixed automatic addition of the <code>Content-Type</code> header for FormData in non-browser environments; (<a href="https://redirect.github.com/axios/axios/issues/5917">#5917</a>) (<a href="https://github.com/axios/axios/commit/bc9af51b1886d1b3529617702f2a21a6c0ed5d92">bc9af51</a>)</li>
<li><strong>headers:</strong> allow <code>content-encoding</code> header to handle case-insensitive values (<a href="https://redirect.github.com/axios/axios/issues/5890">#5890</a>) (<a href="https://redirect.github.com/axios/axios/issues/5892">#5892</a>) (<a href="https://github.com/axios/axios/commit/4c89f25196525e90a6e75eda9cb31ae0a2e18acd">4c89f25</a>)</li>
<li><strong>types:</strong> removed duplicated code (<a href="https://github.com/axios/axios/commit/9e6205630e1c9cf863adf141c0edb9e6d8d4b149">9e62056</a>)</li>
</ul>
<h3>Contributors to this release</h3>
<ul>
<li> <a href="https://github.com/DigitalBrainJS" title="+89/-18 ([apache#5919](axios/axios#5919) [apache#5917](axios/axios#5917) )">Dmitriy Mozgovoy</a></li>
<li> <a href="https://github.com/DavidJDallas" title="+11/-5 ()">David Dallas</a></li>
<li> <a href="https://github.com/fb-sean" title="+2/-8 ()">Sean Sattler</a></li>
<li> <a href="https://github.com/0o001" title="+4/-4 ()">Mustafa Ateş Uzun</a></li>
<li> <a href="https://github.com/sfc-gh-pmotacki" title="+2/-1 ([apache#5892](axios/axios#5892) )">Przemyslaw Motacki</a></li>
<li> <a href="https://github.com/Cadienvan" title="+1/-1 ()">Michael Di Prisco</a></li>
</ul>
<h3>PRs</h3>
<ul>
<li>CVE 2023 45857 ( <a href="https://api.github.com/repos/axios/axios/pulls/6028">#6028</a> )</li>
</ul>
<pre><code>
⚠️ Critical vulnerability fix. See https://security.snyk.io/vuln/SNYK-JS-AXIOS-6032459
</code></pre>
<h1><a href="https://github.com/axios/axios/compare/v1.4.0...v1.5.0">1.5.0</a> (2023-08-26)</h1>
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="https://github.com/axios/axios/commit/f7adacdbaa569281253c8cfc623ad3f4dc909c60"><code>f7adacd</code></a> chore(release): v1.6.0 (<a href="https://redirect.github.com/axios/axios/issues/6031">#6031</a>)</li>
<li><a href="https://github.com/axios/axios/commit/9917e67cbb6c157382863bad8c741de58e3f3c2b"><code>9917e67</code></a> chore(ci): fix release-it arg; (<a href="https://redirect.github.com/axios/axios/issues/6032">#6032</a>)</li>
<li><a href="https://github.com/axios/axios/commit/96ee232bd3ee4de2e657333d4d2191cd389e14d0"><code>96ee232</code></a> fix(CSRF): fixed CSRF vulnerability CVE-2023-45857 (<a href="https://redirect.github.com/axios/axios/issues/6028">#6028</a>)</li>
<li><a href="https://github.com/axios/axios/commit/7d45ab2e2ad6e59f5475e39afd4b286b1f393fc0"><code>7d45ab2</code></a> chore(tests): fixed tests to pass in node v19 and v20 with <code>keep-alive</code> enabl...</li>
<li><a href="https://github.com/axios/axios/commit/5aaff532a6b820bb9ab6a8cd0f77131b47e2adb8"><code>5aaff53</code></a> fix(dns): fixed lookup function decorator to work properly in node v20; (<a href="https://redirect.github.com/axios/axios/issues/6011">#6011</a>)</li>
<li><a href="https://github.com/axios/axios/commit/a48a63ad823fc20e5a6a705f05f09842ca49f48c"><code>a48a63a</code></a> chore(docs): added AxiosHeaders docs; (<a href="https://redirect.github.com/axios/axios/issues/5932">#5932</a>)</li>
<li><a href="https://github.com/axios/axios/commit/a1c8ad008b3c13d53e135bbd0862587fb9d3fc09"><code>a1c8ad0</code></a> fix(types): fix AxiosHeaders types; (<a href="https://redirect.github.com/axios/axios/issues/5931">#5931</a>)</li>
<li><a href="https://github.com/axios/axios/commit/2ac731d60545ba5c4202c25fd2e732ddd8297d82"><code>2ac731d</code></a> chore(docs): update readme.md (<a href="https://redirect.github.com/axios/axios/issues/5889">#5889</a>)</li>
<li><a href="https://github.com/axios/axios/commit/88fb52b5fad7aabab0532e7ad086c5f1b0178905"><code>88fb52b</code></a> chore(release): v1.5.1 (<a href="https://redirect.github.com/axios/axios/issues/5920">#5920</a>)</li>
<li><a href="https://github.com/axios/axios/commit/e4107797a7a1376f6209fbecfbbce73d3faa7859"><code>e410779</code></a> fix(adapters): improved adapters loading logic to have clear error messages; ...</li>
<li>Additional commits viewable in <a href="https://github.com/axios/axios/compare/v0.27.2...v1.6.0">compare view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=axios&package-manager=npm_and_yarn&previous-version=0.27.2&new-version=1.6.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `dependabot rebase`.

[//]: # (dependabot-automerge-start)
Dependabot will merge this PR once CI passes on it, as requested by yaooqinn.

[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `dependabot rebase` will rebase this PR
- `dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `dependabot merge` will merge this PR after your CI passes on it
- `dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `dependabot cancel merge` will cancel a previously requested merge and block automerging
- `dependabot reopen` will reopen this PR if it is closed
- `dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- `dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/apache/kyuubi/network/alerts).

</details>

Closes apache#5671 from dependabot[bot]/dependabot/npm_and_yarn/kyuubi-server/web-ui/axios-1.6.0.

Closes apache#5671

243d0c6 [Cheng Pan] license
2fa3217 [Cheng Pan] nit
c21a873 [Cheng Pan] nit
ae46d5d [Cheng Pan] Update kyuubi-server/web-ui/pnpm-lock.yaml
4adccd2 [dependabot[bot]] Bump axios from 0.27.2 to 1.6.0 in /kyuubi-server/web-ui

Lead-authored-by: Cheng Pan <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Cheng Pan <[email protected]>
@codecov-commenter
Copy link

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (3f993f4) 61.11% compared to head (05fcc75) 61.07%.
Report is 2 commits behind head on master.

Files Patch % Lines
.../org/apache/kyuubi/server/KyuubiBatchService.scala 0.00% 1 Missing and 1 partial ⚠️
...n/scala/org/apache/kyuubi/engine/ProcBuilder.scala 50.00% 1 Missing ⚠️
...g/apache/kyuubi/operation/BatchJobSubmission.scala 0.00% 1 Missing ⚠️
...org/apache/kyuubi/session/KyuubiBatchSession.scala 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #6028      +/-   ##
============================================
- Coverage     61.11%   61.07%   -0.04%     
  Complexity       23       23              
============================================
  Files           623      623              
  Lines         37136    37140       +4     
  Branches       5031     5031              
============================================
- Hits          22695    22685      -10     
- Misses        11997    12005       +8     
- Partials       2444     2450       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pan3793 pan3793 self-assigned this Jan 30, 2024
@pan3793 pan3793 added this to the v1.8.1 milestone Jan 30, 2024
@pan3793 pan3793 closed this in 208354c Jan 30, 2024
pan3793 added a commit that referenced this pull request Jan 30, 2024
…mit queue

While enabling batch implementation V2 with the following configurations
```
kyuubi.batch.impl.version=2
kyuubi.batch.submitter.enabled=true
kyuubi.batch.submitter.threads=48
spark.master=yarn
spark.submit.deployMode=cluster
spark.yarn.submit.waitAppCompletion=false
```

I found that the batch jobs will be blocked in the DB queue once a YARN queue has no resources, this brings an issue, the subsequential batch jobs that are going to be submitted to another YARN queue also be queued in DB, rather than YARN queue.

```
mysql> select state, engine_state, count(1) from metadata where state in ('INITIALIZED', 'PENDING', 'RUNNING') group by state, engine_state;
+-------------+--------------+----------+
| state       | engine_state | count(1) |
+-------------+--------------+----------+
| INITIALIZED | NULL         |      166 |
| PENDING     | NULL         |        1 |
| RUNNING     | PENDING      |      148 |
| RUNNING     | RUNNING      |      415 |
+-------------+--------------+----------+
```

The submitter queue whose size is controlled by `kyuubi.batch.submitter.threads` is designed to address the `spark-submit` process concurrency issue, too many `spark-submit` processes may run out of the Kyuubi server's node CPU/memory resources and eventually crash the service. For Spark YARN cluster mode, if set `spark.yarn.submit.waitAppCompletion=false`, the local `spark-submit` process exits immediately once the Application goes ACCEPTED status, even no resource could be allocated for the AM container, we should not block such case in submitter queue.

- [ ] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

Pass GA, and roll out into internal cluster.

---

- [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes #6028 from pan3793/batch-submit.

Closes #6028

05fcc75 [Cheng Pan] Exited spark-submit process should not block batch submit queue

Authored-by: Cheng Pan <[email protected]>
Signed-off-by: Cheng Pan <[email protected]>
(cherry picked from commit 208354c)
Signed-off-by: Cheng Pan <[email protected]>
@pan3793
Copy link
Member Author

pan3793 commented Jan 30, 2024

Thanks, merged to master/1.8

zhaohehuhu pushed a commit to zhaohehuhu/incubator-kyuubi that referenced this pull request Feb 5, 2024
…ch submit queue

# 🔍 Description
## Issue References 🔗

While enabling batch implementation V2 with the following configurations
```
kyuubi.batch.impl.version=2
kyuubi.batch.submitter.enabled=true
kyuubi.batch.submitter.threads=48
spark.master=yarn
spark.submit.deployMode=cluster
spark.yarn.submit.waitAppCompletion=false
```

I found that the batch jobs will be blocked in the DB queue once a YARN queue has no resources, this brings an issue, the subsequential batch jobs that are going to be submitted to another YARN queue also be queued in DB, rather than YARN queue.

```
mysql> select state, engine_state, count(1) from metadata where state in ('INITIALIZED', 'PENDING', 'RUNNING') group by state, engine_state;
+-------------+--------------+----------+
| state       | engine_state | count(1) |
+-------------+--------------+----------+
| INITIALIZED | NULL         |      166 |
| PENDING     | NULL         |        1 |
| RUNNING     | PENDING      |      148 |
| RUNNING     | RUNNING      |      415 |
+-------------+--------------+----------+
```

## Describe Your Solution 🔧

The submitter queue whose size is controlled by `kyuubi.batch.submitter.threads` is designed to address the `spark-submit` process concurrency issue, too many `spark-submit` processes may run out of the Kyuubi server's node CPU/memory resources and eventually crash the service. For Spark YARN cluster mode, if set `spark.yarn.submit.waitAppCompletion=false`, the local `spark-submit` process exits immediately once the Application goes ACCEPTED status, even no resource could be allocated for the AM container, we should not block such case in submitter queue.

## Types of changes 🔖

- [ ] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

Pass GA, and roll out into internal cluster.

---

# Checklist 📝

- [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes apache#6028 from pan3793/batch-submit.

Closes apache#6028

05fcc75 [Cheng Pan] Exited spark-submit process should not block batch submit queue

Authored-by: Cheng Pan <[email protected]>
Signed-off-by: Cheng Pan <[email protected]>
pan3793 added a commit that referenced this pull request Feb 22, 2024
…se notes

# 🔍 Description
## Issue References 🔗

Currently, we use a rather primitive way to manually write release notes from scratch, and some of the mechanical and repetitive work can be simplified by the scripts.

## Describe Your Solution 🔧

Adds a script to simplify the process of creating release notes.

Note: it just simplifies some processes, the release manager still needs to tune the outputs by hand.

## Types of changes 🔖

- [ ] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

```
RELEASE_TAG=v1.8.1 PREVIOUS_RELEASE_TAG=v1.8.0 build/release/pre_gen_release_notes.py
```

```
$ head build/release/commits-v1.8.1.txt
[KYUUBI #5981] Deploy Spark Hive connector with Scala 2.13 to Maven Central
[KYUUBI #6058] Make Jetty server stop timeout configurable
[KYUUBI #5952][1.8] Disconnect connections without running operations after engine maxlife time graceful period
[KYUUBI #6048] Assign serviceNode and add volatile for variables
[KYUUBI #5991] Error on reading Atlas properties composed of multi values
[KYUUBI #6045] [REST] Sync the AdminRestApi with the AdminResource Apis
[KYUUBI #6047] [CI] Free up disk space
[KYUUBI #6036] JDBC driver conditional sets fetchSize on opening session
[KYUUBI #6028] Exited spark-submit process should not block batch submit queue
[KYUUBI #6018] Speed up GetTables operation for Spark session catalog
```

```
$ head build/release/contributors-v1.8.1.txt
* Shaoyun Chen        -- [KYUUBI #5857][KYUUBI #5720][KYUUBI #5785][KYUUBI #5617]
* Chao Chen           -- [KYUUBI #5750]
* Flyangz             -- [KYUUBI #5832]
* Pengqi Li           -- [KYUUBI #5713]
* Bowen Liang         -- [KYUUBI #5730][KYUUBI #5802][KYUUBI #5767][KYUUBI #5831][KYUUBI #5801][KYUUBI #5754][KYUUBI #5626][KYUUBI #5811][KYUUBI #5853][KYUUBI #5765]
* Paul Lin            -- [KYUUBI #5799][KYUUBI #5814]
* Senmiao Liu         -- [KYUUBI #5969][KYUUBI #5244]
* Xiao Liu            -- [KYUUBI #5962]
* Peiyue Liu          -- [KYUUBI #5331]
* Junjie Ma           -- [KYUUBI #5789]
```
---

# Checklist 📝

- [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes #6074 from pan3793/release-script.

Closes #6074

3d5ec20 [Cheng Pan] credits
1765279 [Cheng Pan] Add a script to simplify the process of creating release notes

Authored-by: Cheng Pan <[email protected]>
Signed-off-by: Cheng Pan <[email protected]>
zhaohehuhu pushed a commit to zhaohehuhu/incubator-kyuubi that referenced this pull request Mar 21, 2024
…ch submit queue

# 🔍 Description
## Issue References 🔗

While enabling batch implementation V2 with the following configurations
```
kyuubi.batch.impl.version=2
kyuubi.batch.submitter.enabled=true
kyuubi.batch.submitter.threads=48
spark.master=yarn
spark.submit.deployMode=cluster
spark.yarn.submit.waitAppCompletion=false
```

I found that the batch jobs will be blocked in the DB queue once a YARN queue has no resources, this brings an issue, the subsequential batch jobs that are going to be submitted to another YARN queue also be queued in DB, rather than YARN queue.

```
mysql> select state, engine_state, count(1) from metadata where state in ('INITIALIZED', 'PENDING', 'RUNNING') group by state, engine_state;
+-------------+--------------+----------+
| state       | engine_state | count(1) |
+-------------+--------------+----------+
| INITIALIZED | NULL         |      166 |
| PENDING     | NULL         |        1 |
| RUNNING     | PENDING      |      148 |
| RUNNING     | RUNNING      |      415 |
+-------------+--------------+----------+
```

## Describe Your Solution 🔧

The submitter queue whose size is controlled by `kyuubi.batch.submitter.threads` is designed to address the `spark-submit` process concurrency issue, too many `spark-submit` processes may run out of the Kyuubi server's node CPU/memory resources and eventually crash the service. For Spark YARN cluster mode, if set `spark.yarn.submit.waitAppCompletion=false`, the local `spark-submit` process exits immediately once the Application goes ACCEPTED status, even no resource could be allocated for the AM container, we should not block such case in submitter queue.

## Types of changes 🔖

- [ ] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

Pass GA, and roll out into internal cluster.

---

# Checklist 📝

- [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes apache#6028 from pan3793/batch-submit.

Closes apache#6028

05fcc75 [Cheng Pan] Exited spark-submit process should not block batch submit queue

Authored-by: Cheng Pan <[email protected]>
Signed-off-by: Cheng Pan <[email protected]>
zhaohehuhu pushed a commit to zhaohehuhu/incubator-kyuubi that referenced this pull request Mar 21, 2024
… release notes

# 🔍 Description
## Issue References 🔗

Currently, we use a rather primitive way to manually write release notes from scratch, and some of the mechanical and repetitive work can be simplified by the scripts.

## Describe Your Solution 🔧

Adds a script to simplify the process of creating release notes.

Note: it just simplifies some processes, the release manager still needs to tune the outputs by hand.

## Types of changes 🔖

- [ ] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

```
RELEASE_TAG=v1.8.1 PREVIOUS_RELEASE_TAG=v1.8.0 build/release/pre_gen_release_notes.py
```

```
$ head build/release/commits-v1.8.1.txt
[KYUUBI apache#5981] Deploy Spark Hive connector with Scala 2.13 to Maven Central
[KYUUBI apache#6058] Make Jetty server stop timeout configurable
[KYUUBI apache#5952][1.8] Disconnect connections without running operations after engine maxlife time graceful period
[KYUUBI apache#6048] Assign serviceNode and add volatile for variables
[KYUUBI apache#5991] Error on reading Atlas properties composed of multi values
[KYUUBI apache#6045] [REST] Sync the AdminRestApi with the AdminResource Apis
[KYUUBI apache#6047] [CI] Free up disk space
[KYUUBI apache#6036] JDBC driver conditional sets fetchSize on opening session
[KYUUBI apache#6028] Exited spark-submit process should not block batch submit queue
[KYUUBI apache#6018] Speed up GetTables operation for Spark session catalog
```

```
$ head build/release/contributors-v1.8.1.txt
* Shaoyun Chen        -- [KYUUBI apache#5857][KYUUBI apache#5720][KYUUBI apache#5785][KYUUBI apache#5617]
* Chao Chen           -- [KYUUBI apache#5750]
* Flyangz             -- [KYUUBI apache#5832]
* Pengqi Li           -- [KYUUBI apache#5713]
* Bowen Liang         -- [KYUUBI apache#5730][KYUUBI apache#5802][KYUUBI apache#5767][KYUUBI apache#5831][KYUUBI apache#5801][KYUUBI apache#5754][KYUUBI apache#5626][KYUUBI apache#5811][KYUUBI apache#5853][KYUUBI apache#5765]
* Paul Lin            -- [KYUUBI apache#5799][KYUUBI apache#5814]
* Senmiao Liu         -- [KYUUBI apache#5969][KYUUBI apache#5244]
* Xiao Liu            -- [KYUUBI apache#5962]
* Peiyue Liu          -- [KYUUBI apache#5331]
* Junjie Ma           -- [KYUUBI apache#5789]
```
---

# Checklist 📝

- [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes apache#6074 from pan3793/release-script.

Closes apache#6074

3d5ec20 [Cheng Pan] credits
1765279 [Cheng Pan] Add a script to simplify the process of creating release notes

Authored-by: Cheng Pan <[email protected]>
Signed-off-by: Cheng Pan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants