New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

feat: Add auto table scan scaling based on memory usage #11879

Closed

xiaoxmeng wants to merge 1 commit into facebookincubator:main from xiaoxmeng:export-D67114511

Contributor

xiaoxmeng commented Dec 16, 2024

Summary:
Adds auto table scan scaling support to solve the query OOM caused by high concurrent memory intensive
table scan operations. Instead of running all the table scan threads at the start of the query, we start from running
one single table scan thread and gradually scheduling more table scan thread when there is sufficient free available
memory capacity for the query (measured as the current used memory versus the query max capacity). When the
query is approaching its max limit, we stop scheduling more table scan threads and even stop current running scan
threads to free up memory to prevent OOM.

The scale up/down decision happens when a table scan operator finishes process a non-empty split. A scale
controller is added to each table scan plan node for this coordinated control and two memory ratios are defined for
scale up/down decision, and it estimate the per-scan driver memory usage based on the memory usage report when
a table scan thread finishes a non-empty split. The Meta internal test shows that a query failed with OOM right after 1 min
execution with 10 leaf threads, and finished in 2 hrs if reduced leaf thread count to 5. Java took 1 hour to finish with
persistent shuffle (LBM). With auto scale, the query finish on Prestissimo in 30 mins. We don't expect this
feature to be enabled by default as adhoc small query needs to run all the scan threads in parallel at the start. This will
only be used for some large pipeline that could be enabled by query config in velox and session properties in Prestissimo

Differential Revision: D67114511

facebook-github-bot added the CLA Signed label

netlify bot commented Dec 16, 2024 •

edited

Loading

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`0a20c48`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/6763494b1c09cb0008d7cf9d

Contributor

facebook-github-bot commented Dec 16, 2024

This pull request was exported from Phabricator. Differential Revision: D67114511

facebook-github-bot added the fb-exported label

Contributor

facebook-github-bot commented Dec 17, 2024

This pull request was exported from Phabricator. Differential Revision: D67114511

xiaoxmeng added a commit to xiaoxmeng/velox that referenced this pull request


          feat: Add auto table scan scaling based on memory usage (facebookincu…

a2ea989

…bator#11879)

Summary:
Pull Request resolved: facebookincubator#11879

Adds auto table scan scaling support to solve the query OOM caused by high concurrent memory intensive
table scan operations. Instead of running all the table scan threads at the start of the query, we start from running
one single table scan thread and gradually scheduling more table scan thread when there is sufficient free available
memory capacity for the query (measured as the current used memory versus the query max capacity). When the
query is approaching its max limit, we stop scheduling more table scan threads and even stop current running scan
threads to free up memory to prevent OOM.

The scale up/down decision happens when a table scan operator finishes process a non-empty split. A scale
controller is added to each table scan plan node for this coordinated control and two memory ratios are defined for
scale up/down decision, and it estimate the per-scan driver memory usage based on the memory usage report when
a table scan thread finishes a non-empty split. The Meta internal test shows that a query failed with OOM right after 1 min
execution with 10 leaf threads, and finished in 2 hrs if reduced leaf thread count to 5. Java took 1 hour to finish with
persistent shuffle (LBM). With auto scale, the query finish on Prestissimo in 30 mins. We don't expect this
feature to be enabled by default as adhoc small query needs to run all the scan threads in parallel at the start. This will
only be used for some large pipeline that could be enabled by query config in velox and session properties in Prestissimo

Differential Revision: D67114511

xiaoxmeng force-pushed the export-D67114511 branch from 975bf22 to a2ea989 Compare

December 17, 2024 04:43

xiaoxmeng added a commit to xiaoxmeng/velox that referenced this pull request


          feat: Add auto table scan scaling based on memory usage (facebookincu…

3358d06

…bator#11879)

Summary:
Pull Request resolved: facebookincubator#11879

Adds auto table scan scaling support to solve the query OOM caused by high concurrent memory intensive
table scan operations. Instead of running all the table scan threads at the start of the query, we start from running
one single table scan thread and gradually scheduling more table scan thread when there is sufficient free available
memory capacity for the query (measured as the current used memory versus the query max capacity). When the
query is approaching its max limit, we stop scheduling more table scan threads and even stop current running scan
threads to free up memory to prevent OOM.

The scale up/down decision happens when a table scan operator finishes process a non-empty split. A scale
controller is added to each table scan plan node for this coordinated control and two memory ratios are defined for
scale up/down decision, and it estimate the per-scan driver memory usage based on the memory usage report when
a table scan thread finishes a non-empty split. The Meta internal test shows that a query failed with OOM right after 1 min
execution with 10 leaf threads, and finished in 2 hrs if reduced leaf thread count to 5. Java took 1 hour to finish with
persistent shuffle (LBM). With auto scale, the query finish on Prestissimo in 30 mins. We don't expect this
feature to be enabled by default as adhoc small query needs to run all the scan threads in parallel at the start. This will
only be used for some large pipeline that could be enabled by query config in velox and session properties in Prestissimo

Differential Revision: D67114511

xiaoxmeng force-pushed the export-D67114511 branch from a2ea989 to 3358d06 Compare

December 17, 2024 04:51

Contributor

facebook-github-bot commented Dec 17, 2024

This pull request was exported from Phabricator. Differential Revision: D67114511

1 similar comment

Contributor

facebook-github-bot commented Dec 17, 2024

This pull request was exported from Phabricator. Differential Revision: D67114511

xiaoxmeng force-pushed the export-D67114511 branch from 3358d06 to d4c9705 Compare

December 17, 2024 04:58

xiaoxmeng added a commit to xiaoxmeng/velox that referenced this pull request


          feat: Add auto table scan scaling based on memory usage (facebookincu…

d4c9705

…bator#11879)

Summary:
Pull Request resolved: facebookincubator#11879

Adds auto table scan scaling support to solve the query OOM caused by high concurrent memory intensive
table scan operations. Instead of running all the table scan threads at the start of the query, we start from running
one single table scan thread and gradually scheduling more table scan thread when there is sufficient free available
memory capacity for the query (measured as the current used memory versus the query max capacity). When the
query is approaching its max limit, we stop scheduling more table scan threads and even stop current running scan
threads to free up memory to prevent OOM.

The scale up/down decision happens when a table scan operator finishes process a non-empty split. A scale
controller is added to each table scan plan node for this coordinated control and two memory ratios are defined for
scale up/down decision, and it estimate the per-scan driver memory usage based on the memory usage report when
a table scan thread finishes a non-empty split. The Meta internal test shows that a query failed with OOM right after 1 min
execution with 10 leaf threads, and finished in 2 hrs if reduced leaf thread count to 5. Java took 1 hour to finish with
persistent shuffle (LBM). With auto scale, the query finish on Prestissimo in 30 mins. We don't expect this
feature to be enabled by default as adhoc small query needs to run all the scan threads in parallel at the start. This will
only be used for some large pipeline that could be enabled by query config in velox and session properties in Prestissimo

Differential Revision: D67114511

Contributor

facebook-github-bot commented Dec 17, 2024

This pull request was exported from Phabricator. Differential Revision: D67114511

xiaoxmeng added a commit to xiaoxmeng/velox that referenced this pull request


          feat: Add auto table scan scaling based on memory usage (facebookincu…

b25dcc9

…bator#11879)

Summary:
Pull Request resolved: facebookincubator#11879

Adds auto table scan scaling support to solve the query OOM caused by high concurrent memory intensive
table scan operations. Instead of running all the table scan threads at the start of the query, we start from running
one single table scan thread and gradually scheduling more table scan thread when there is sufficient free available
memory capacity for the query (measured as the current used memory versus the query max capacity). When the
query is approaching its max limit, we stop scheduling more table scan threads and even stop current running scan
threads to free up memory to prevent OOM.

The scale up/down decision happens when a table scan operator finishes process a non-empty split. A scale
controller is added to each table scan plan node for this coordinated control and two memory ratios are defined for
scale up/down decision, and it estimate the per-scan driver memory usage based on the memory usage report when
a table scan thread finishes a non-empty split. The Meta internal test shows that a query failed with OOM right after 1 min
execution with 10 leaf threads, and finished in 2 hrs if reduced leaf thread count to 5. Java took 1 hour to finish with
persistent shuffle (LBM). With auto scale, the query finish on Prestissimo in 30 mins. We don't expect this
feature to be enabled by default as adhoc small query needs to run all the scan threads in parallel at the start. This will
only be used for some large pipeline that could be enabled by query config in velox and session properties in Prestissimo

Differential Revision: D67114511

xiaoxmeng force-pushed the export-D67114511 branch from d4c9705 to b25dcc9 Compare

December 17, 2024 05:06

Contributor

facebook-github-bot commented Dec 17, 2024

This pull request was exported from Phabricator. Differential Revision: D67114511

xiaoxmeng added a commit to xiaoxmeng/velox that referenced this pull request


          feat: Add auto table scan scaling based on memory usage (facebookincu…

c8af573

…bator#11879)

Summary:
Pull Request resolved: facebookincubator#11879

Adds auto table scan scaling support to solve the query OOM caused by high concurrent memory intensive
table scan operations. Instead of running all the table scan threads at the start of the query, we start from running
one single table scan thread and gradually scheduling more table scan thread when there is sufficient free available
memory capacity for the query (measured as the current used memory versus the query max capacity). When the
query is approaching its max limit, we stop scheduling more table scan threads and even stop current running scan
threads to free up memory to prevent OOM.

The scale up/down decision happens when a table scan operator finishes process a non-empty split. A scale
controller is added to each table scan plan node for this coordinated control and two memory ratios are defined for
scale up/down decision, and it estimate the per-scan driver memory usage based on the memory usage report when
a table scan thread finishes a non-empty split. The Meta internal test shows that a query failed with OOM right after 1 min
execution with 10 leaf threads, and finished in 2 hrs if reduced leaf thread count to 5. Java took 1 hour to finish with
persistent shuffle (LBM). With auto scale, the query finish on Prestissimo in 30 mins. We don't expect this
feature to be enabled by default as adhoc small query needs to run all the scan threads in parallel at the start. This will
only be used for some large pipeline that could be enabled by query config in velox and session properties in Prestissimo

Differential Revision: D67114511

xiaoxmeng force-pushed the export-D67114511 branch from b25dcc9 to c8af573 Compare

December 17, 2024 05:15

Contributor

facebook-github-bot commented Dec 17, 2024

This pull request was exported from Phabricator. Differential Revision: D67114511

xiaoxmeng added a commit to xiaoxmeng/velox that referenced this pull request


          feat: Add auto table scan scaling based on memory usage (facebookincu…

279ff8d

…bator#11879)

Summary:
Pull Request resolved: facebookincubator#11879

Adds auto table scan scaling support to solve the query OOM caused by high concurrent memory intensive
table scan operations. Instead of running all the table scan threads at the start of the query, we start from running
one single table scan thread and gradually scheduling more table scan thread when there is sufficient free available
memory capacity for the query (measured as the current used memory versus the query max capacity). When the
query is approaching its max limit, we stop scheduling more table scan threads and even stop current running scan
threads to free up memory to prevent OOM.

The scale up/down decision happens when a table scan operator finishes process a non-empty split. A scale
controller is added to each table scan plan node for this coordinated control and two memory ratios are defined for
scale up/down decision, and it estimate the per-scan driver memory usage based on the memory usage report when
a table scan thread finishes a non-empty split. The Meta internal test shows that a query failed with OOM right after 1 min
execution with 10 leaf threads, and finished in 2 hrs if reduced leaf thread count to 5. Java took 1 hour to finish with
persistent shuffle (LBM). With auto scale, the query finish on Prestissimo in 30 mins. We don't expect this
feature to be enabled by default as adhoc small query needs to run all the scan threads in parallel at the start. This will
only be used for some large pipeline that could be enabled by query config in velox and session properties in Prestissimo

Differential Revision: D67114511

xiaoxmeng force-pushed the export-D67114511 branch from c8af573 to 279ff8d Compare

December 17, 2024 05:23

xiaoxmeng added a commit to xiaoxmeng/velox that referenced this pull request


          feat: Add auto table scan scaling based on memory usage (facebookincu…

331040e

…bator#11879)

Summary:
Pull Request resolved: facebookincubator#11879

Adds auto table scan scaling support to solve the query OOM caused by high concurrent memory intensive
table scan operations. Instead of running all the table scan threads at the start of the query, we start from running
one single table scan thread and gradually scheduling more table scan thread when there is sufficient free available
memory capacity for the query (measured as the current used memory versus the query max capacity). When the
query is approaching its max limit, we stop scheduling more table scan threads and even stop current running scan
threads to free up memory to prevent OOM.

The scale up/down decision happens when a table scan operator finishes process a non-empty split. A scale
controller is added to each table scan plan node for this coordinated control and two memory ratios are defined for
scale up/down decision, and it estimate the per-scan driver memory usage based on the memory usage report when
a table scan thread finishes a non-empty split. The Meta internal test shows that a query failed with OOM right after 1 min
execution with 10 leaf threads, and finished in 2 hrs if reduced leaf thread count to 5. Java took 1 hour to finish with
persistent shuffle (LBM). With auto scale, the query finish on Prestissimo in 30 mins. We don't expect this
feature to be enabled by default as adhoc small query needs to run all the scan threads in parallel at the start. This will
only be used for some large pipeline that could be enabled by query config in velox and session properties in Prestissimo

Differential Revision: D67114511

xiaoxmeng force-pushed the export-D67114511 branch from 279ff8d to 331040e Compare

December 17, 2024 05:39

Contributor

facebook-github-bot commented Dec 17, 2024

This pull request was exported from Phabricator. Differential Revision: D67114511

1 similar comment

Contributor

facebook-github-bot commented Dec 17, 2024

This pull request was exported from Phabricator. Differential Revision: D67114511

xiaoxmeng added a commit to xiaoxmeng/velox that referenced this pull request


          feat: Add auto table scan scaling based on memory usage (facebookincu…

d83e8be

…bator#11879)

Summary:
Pull Request resolved: facebookincubator#11879

Adds auto table scan scaling support to solve the query OOM caused by high concurrent memory intensive
table scan operations. Instead of running all the table scan threads at the start of the query, we start from running
one single table scan thread and gradually scheduling more table scan thread when there is sufficient free available
memory capacity for the query (measured as the current used memory versus the query max capacity). When the
query is approaching its max limit, we stop scheduling more table scan threads and even stop current running scan
threads to free up memory to prevent OOM.

The scale up/down decision happens when a table scan operator finishes process a non-empty split. A scale
controller is added to each table scan plan node for this coordinated control and two memory ratios are defined for
scale up/down decision, and it estimate the per-scan driver memory usage based on the memory usage report when
a table scan thread finishes a non-empty split. The Meta internal test shows that a query failed with OOM right after 1 min
execution with 10 leaf threads, and finished in 2 hrs if reduced leaf thread count to 5. Java took 1 hour to finish with
persistent shuffle (LBM). With auto scale, the query finish on Prestissimo in 30 mins. We don't expect this
feature to be enabled by default as adhoc small query needs to run all the scan threads in parallel at the start. This will
only be used for some large pipeline that could be enabled by query config in velox and session properties in Prestissimo

Differential Revision: D67114511

xiaoxmeng force-pushed the export-D67114511 branch from 331040e to d83e8be Compare

December 17, 2024 06:01

Yuhta reviewed

View reviewed changes

velox/core/QueryConfig.h Outdated Show resolved Hide resolved

Contributor

facebook-github-bot commented Dec 17, 2024

This pull request was exported from Phabricator. Differential Revision: D67114511

xiaoxmeng added a commit to xiaoxmeng/velox that referenced this pull request


          feat: Add auto table scan scaling based on memory usage (facebookincu…

a935017

…bator#11879)

Summary:
Pull Request resolved: facebookincubator#11879

Adds auto table scan scaling support to solve the query OOM caused by high concurrent memory intensive
table scan operations. Instead of running all the table scan threads at the start of the query, we start from running
one single table scan thread and gradually scheduling more table scan thread when there is sufficient free available
memory capacity for the query (measured as the current used memory versus the query max capacity). When the
query is approaching its max limit, we stop scheduling more table scan threads to prevent OOM caused by table scan.

The scale decision happens when a table scan operator finishes process a non-empty split. A scale
controller is added to each table scan plan node for this coordinated control and two memory ratios are defined for
scale up decision, and it estimate the per-scan driver memory usage based on the memory usage report when
a table scan thread finishes a non-empty split. The Meta internal test shows that a query failed with OOM right after 1 min
execution with 10 leaf threads, and finished in 2 hrs if reduced leaf thread count to 5. Java took 1 hour to finish with
persistent shuffle (LBM). With auto scale, the query finish on Prestissimo in 30 mins. We don't expect this
feature to be enabled by default as adhoc small query needs to run all the scan threads in parallel at the start. This will
only be used for some large pipeline that could be enabled by query config in velox and session properties in Prestissimo

Differential Revision: D67114511

xiaoxmeng force-pushed the export-D67114511 branch from d83e8be to a935017 Compare

December 17, 2024 19:23

xiaoxmeng requested review from assignUser and majetideepak as code owners

December 17, 2024 19:23

Contributor

facebook-github-bot commented Dec 17, 2024

This pull request was exported from Phabricator. Differential Revision: D67114511

xiaoxmeng added a commit to xiaoxmeng/velox that referenced this pull request


          feat: Add auto table scan scaling based on memory usage (facebookincu…

ce49d24

…bator#11879)

Summary:
Pull Request resolved: facebookincubator#11879

Adds auto table scan scaling support to solve the query OOM caused by high concurrent memory intensive
table scan operations. Instead of running all the table scan threads at the start of the query, we start from running
one single table scan thread and gradually scheduling more table scan thread when there is sufficient free available
memory capacity for the query (measured as the current used memory versus the query max capacity). When the
query is approaching its max limit, we stop scheduling more table scan threads to prevent OOM caused by table scan.

The scale decision happens when a table scan operator finishes process a non-empty split. A scale
controller is added to each table scan plan node for this coordinated control and two memory ratios are defined for
scale up decision, and it estimate the per-scan driver memory usage based on the memory usage report when
a table scan thread finishes a non-empty split. The Meta internal test shows that a query failed with OOM right after 1 min
execution with 10 leaf threads, and finished in 2 hrs if reduced leaf thread count to 5. Java took 1 hour to finish with
persistent shuffle (LBM). With auto scale, the query finish on Prestissimo in 30 mins. We don't expect this
feature to be enabled by default as adhoc small query needs to run all the scan threads in parallel at the start. This will
only be used for some large pipeline that could be enabled by query config in velox and session properties in Prestissimo

Differential Revision: D67114511

xiaoxmeng force-pushed the export-D67114511 branch from a935017 to ce49d24 Compare

December 17, 2024 19:50

xiaoxmeng added a commit to xiaoxmeng/velox that referenced this pull request


          feat: Add auto table scan scaling based on memory usage (facebookincu…

e00a0a3

…bator#11879)

Summary:
Pull Request resolved: facebookincubator#11879

Adds auto table scan scaling support to solve the query OOM caused by high concurrent memory intensive
table scan operations. Instead of running all the table scan threads at the start of the query, we start from running
one single table scan thread and gradually scheduling more table scan thread when there is sufficient free available
memory capacity for the query (measured as the current used memory versus the query max capacity). When the
query is approaching its max limit, we stop scheduling more table scan threads to prevent OOM caused by table scan.

The scale decision happens when a table scan operator finishes process a non-empty split. A scale
controller is added to each table scan plan node for this coordinated control and two memory ratios are defined for
scale up decision, and it estimate the per-scan driver memory usage based on the memory usage report when
a table scan thread finishes a non-empty split. The Meta internal test shows that a query failed with OOM right after 1 min
execution with 10 leaf threads, and finished in 2 hrs if reduced leaf thread count to 5. Java took 1 hour to finish with
persistent shuffle (LBM). With auto scale, the query finish on Prestissimo in 30 mins. We don't expect this
feature to be enabled by default as adhoc small query needs to run all the scan threads in parallel at the start. This will
only be used for some large pipeline that could be enabled by query config in velox and session properties in Prestissimo

Differential Revision: D67114511

xiaoxmeng force-pushed the export-D67114511 branch from ce49d24 to e00a0a3 Compare

December 17, 2024 20:35

Contributor

facebook-github-bot commented Dec 17, 2024

This pull request was exported from Phabricator. Differential Revision: D67114511

1 similar comment

Contributor

facebook-github-bot commented Dec 17, 2024

This pull request was exported from Phabricator. Differential Revision: D67114511

xiaoxmeng added a commit to xiaoxmeng/velox that referenced this pull request


          feat: Add auto table scan scaling based on memory usage (facebookincu…

af24860

…bator#11879)

Summary:
Pull Request resolved: facebookincubator#11879

Adds auto table scan scaling support to solve the query OOM caused by high concurrent memory intensive
table scan operations. Instead of running all the table scan threads at the start of the query, we start from running
one single table scan thread and gradually scheduling more table scan thread when there is sufficient free available
memory capacity for the query (measured as the current used memory versus the query max capacity). When the
query is approaching its max limit, we stop scheduling more table scan threads to prevent OOM caused by table scan.

The scale decision happens when a table scan operator finishes process a non-empty split. A scale
controller is added to each table scan plan node for this coordinated control and two memory ratios are defined for
scale up decision, and it estimate the per-scan driver memory usage based on the memory usage report when
a table scan thread finishes a non-empty split. The Meta internal test shows that a query failed with OOM right after 1 min
execution with 10 leaf threads, and finished in 2 hrs if reduced leaf thread count to 5. Java took 1 hour to finish with
persistent shuffle (LBM). With auto scale, the query finish on Prestissimo in 30 mins. We don't expect this
feature to be enabled by default as adhoc small query needs to run all the scan threads in parallel at the start. This will
only be used for some large pipeline that could be enabled by query config in velox and session properties in Prestissimo

Differential Revision: D67114511

xiaoxmeng force-pushed the export-D67114511 branch from e00a0a3 to af24860 Compare

December 17, 2024 21:40

Yuhta approved these changes

View reviewed changes

Contributor

Yuhta left a comment

There are some format errors that need to be lint out before merge

xiaoxmeng added a commit to xiaoxmeng/velox that referenced this pull request


          feat: Add auto table scan scaling based on memory usage (facebookincu…

371ee73

…bator#11879)

Summary:

Adds auto table scan scaling support to solve the query OOM caused by high concurrent memory intensive
table scan operations. Instead of running all the table scan threads at the start of the query, we start from running
one single table scan thread and gradually scheduling more table scan thread when there is sufficient free available
memory capacity for the query (measured as the current used memory versus the query max capacity). When the
query is approaching its max limit, we stop scheduling more table scan threads to prevent OOM caused by table scan.

The scale decision happens when a table scan operator finishes process a non-empty split. A scale
controller is added to each table scan plan node for this coordinated control and two memory ratios are defined for
scale up decision, and it estimate the per-scan driver memory usage based on the memory usage report when
a table scan thread finishes a non-empty split. The Meta internal test shows that a query failed with OOM right after 1 min
execution with 10 leaf threads, and finished in 2 hrs if reduced leaf thread count to 5. Java took 1 hour to finish with
persistent shuffle (LBM). With auto scale, the query finish on Prestissimo in 30 mins. We don't expect this
feature to be enabled by default as adhoc small query needs to run all the scan threads in parallel at the start. This will
only be used for some large pipeline that could be enabled by query config in velox and session properties in Prestissimo

Reviewed By: Yuhta

Differential Revision: D67114511

xiaoxmeng force-pushed the export-D67114511 branch from af24860 to 371ee73 Compare

December 18, 2024 00:46

Contributor

facebook-github-bot commented Dec 18, 2024

This pull request was exported from Phabricator. Differential Revision: D67114511

xiaoxmeng added a commit to xiaoxmeng/velox that referenced this pull request


          feat: Add auto table scan scaling based on memory usage (facebookincu…

da0607b

…bator#11879)

Summary:

Adds auto table scan scaling support to solve the query OOM caused by high concurrent memory intensive
table scan operations. Instead of running all the table scan threads at the start of the query, we start from running
one single table scan thread and gradually scheduling more table scan thread when there is sufficient free available
memory capacity for the query (measured as the current used memory versus the query max capacity). When the
query is approaching its max limit, we stop scheduling more table scan threads to prevent OOM caused by table scan.

The scale decision happens when a table scan operator finishes process a non-empty split. A scale
controller is added to each table scan plan node for this coordinated control and two memory ratios are defined for
scale up decision, and it estimate the per-scan driver memory usage based on the memory usage report when
a table scan thread finishes a non-empty split. The Meta internal test shows that a query failed with OOM right after 1 min
execution with 10 leaf threads, and finished in 2 hrs if reduced leaf thread count to 5. Java took 1 hour to finish with
persistent shuffle (LBM). With auto scale, the query finish on Prestissimo in 30 mins. We don't expect this
feature to be enabled by default as adhoc small query needs to run all the scan threads in parallel at the start. This will
only be used for some large pipeline that could be enabled by query config in velox and session properties in Prestissimo

Reviewed By: Yuhta

Differential Revision: D67114511

xiaoxmeng force-pushed the export-D67114511 branch from 371ee73 to da0607b Compare

December 18, 2024 05:39

Contributor

facebook-github-bot commented Dec 18, 2024

This pull request was exported from Phabricator. Differential Revision: D67114511


          feat: Add auto table scan scaling based on memory usage (facebookincu…

0a20c48

…bator#11879)

Summary:

Adds auto table scan scaling support to solve the query OOM caused by high concurrent memory intensive
table scan operations. Instead of running all the table scan threads at the start of the query, we start from running
one single table scan thread and gradually scheduling more table scan thread when there is sufficient free available
memory capacity for the query (measured as the current used memory versus the query max capacity). When the
query is approaching its max limit, we stop scheduling more table scan threads to prevent OOM caused by table scan.

The scale decision happens when a table scan operator finishes process a non-empty split. A scale
controller is added to each table scan plan node for this coordinated control and two memory ratios are defined for
scale up decision, and it estimate the per-scan driver memory usage based on the memory usage report when
a table scan thread finishes a non-empty split. The Meta internal test shows that a query failed with OOM right after 1 min
execution with 10 leaf threads, and finished in 2 hrs if reduced leaf thread count to 5. Java took 1 hour to finish with
persistent shuffle (LBM). With auto scale, the query finish on Prestissimo in 30 mins. We don't expect this
feature to be enabled by default as adhoc small query needs to run all the scan threads in parallel at the start. This will
only be used for some large pipeline that could be enabled by query config in velox and session properties in Prestissimo

Reviewed By: Yuhta, oerling

Differential Revision: D67114511

xiaoxmeng force-pushed the export-D67114511 branch from da0607b to 0a20c48 Compare

December 18, 2024 22:14

Contributor

facebook-github-bot commented Dec 18, 2024

This pull request was exported from Phabricator. Differential Revision: D67114511

facebook-github-bot closed this in

a82450a

facebook-github-bot added the Merged label

Contributor

facebook-github-bot commented Dec 19, 2024

This pull request has been merged in a82450a.

xiaoxmeng deleted the export-D67114511 branch

December 19, 2024 01:32

athmaja-n pushed a commit to athmaja-n/velox that referenced this pull request


          feat: Add auto table scan scaling based on memory usage (facebookincu…

88af3e9

…bator#11879)

Summary:
Pull Request resolved: facebookincubator#11879

Adds auto table scan scaling support to solve the query OOM caused by high concurrent memory intensive
table scan operations. Instead of running all the table scan threads at the start of the query, we start from running
one single table scan thread and gradually scheduling more table scan thread when there is sufficient free available
memory capacity for the query (measured as the current used memory versus the query max capacity). When the
query is approaching its max limit, we stop scheduling more table scan threads to prevent OOM caused by table scan.

The scale decision happens when a table scan operator finishes process a non-empty split. A scale
controller is added to each table scan plan node for this coordinated control and two memory ratios are defined for
scale up decision, and it estimate the per-scan driver memory usage based on the memory usage report when
a table scan thread finishes a non-empty split. The Meta internal test shows that a query failed with OOM right after 1 min
execution with 10 leaf threads, and finished in 2 hrs if reduced leaf thread count to 5. Java took 1 hour to finish with
persistent shuffle (LBM). With auto scale, the query finish on Prestissimo in 30 mins. We don't expect this
feature to be enabled by default as adhoc small query needs to run all the scan threads in parallel at the start. This will
only be used for some large pipeline that could be enabled by query config in velox and session properties in Prestissimo

Reviewed By: Yuhta, oerling

Differential Revision: D67114511

fbshipit-source-id: f45e8be2b76b0383ff6e0ac1b8d8219adbde60ef

arhimondr mentioned this pull request

[native] Derive TableScan stream type as FIXED prestodb/presto#24468

Merged

6 tasks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed fb-exported Merged