-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: vacuum more runs needed error #703
Conversation
When there is too many files to process during a vacuum, dbt model fails with this error: "ICEBERG_VACUUM_MORE_RUNS_NEEDED: Removed 20000 files in this round of vacuum, but there are more files remaining. Please run another VACUUM command to process the remaining files." We apply therefore the same logic as we did for the optimize. There is also an attempt to gather the code since they have the same logic.
@Jrmyy do we have a way to test this in the CI? I can imaging to setup an iceberg table with vacuum_max_snapshot_age_seconds set to 1 second, then insert many times to the same table to lead to have a situation where iceberg has many snapshot to expire, and finally try to run the vacuum on an iceberg table with many commit. PS: code looks good, I re-triggered the CI that randomly failed due to a functional test where we run concurrent iceberg inserts |
I can try what you suggest ! Since the VACUUM fails when there are more than 20000 files to remove, it means we will have to insert a lot of lines ahah 🙈 I will give it a try and let you know if it works 🔥 |
@Jrmyy I totally understand that reproducing a failure of a vacuum can be cumbersome, if we don't manager to reproduce it, well leave it like that. |
I tried to create some sql queries which maximise entropy in order to generate as much files as possible to be sure vacuum will perform several times. But it did not scale very well. |
Thanks @Jrmyy let's leave it as it is. |
This PR contains the following updates: | Package | Update | Change | |---|---|---| | [dbt-athena-community](https://togithub.com/dbt-athena/dbt-athena) | patch | `==1.8.3` -> `==1.8.4` | --- ### Release Notes <details> <summary>dbt-athena/dbt-athena (dbt-athena-community)</summary> ### [`v1.8.4`](https://togithub.com/dbt-athena/dbt-athena/releases/tag/v1.8.4) [Compare Source](https://togithub.com/dbt-athena/dbt-athena/compare/v1.8.3...v1.8.4) #### What's Changed ##### Fixes - fix: Remove catalog from the DDL SQL generated by on_schema_change=sync_all_columns by [@​iconara](https://togithub.com/iconara) in [https://github.com/dbt-athena/dbt-athena/pull/684](https://togithub.com/dbt-athena/dbt-athena/pull/684) - fix: Query comment for create table statement by [@​sanromeo](https://togithub.com/sanromeo) in [https://github.com/dbt-athena/dbt-athena/pull/702](https://togithub.com/dbt-athena/dbt-athena/pull/702) - fix: remove leading whitespaces on post-hook operations by [@​sanromeo](https://togithub.com/sanromeo) in [https://github.com/dbt-athena/dbt-athena/pull/705](https://togithub.com/dbt-athena/dbt-athena/pull/705) - fix: vacuum more runs needed error by [@​Jrmyy](https://togithub.com/Jrmyy) in [https://github.com/dbt-athena/dbt-athena/pull/703](https://togithub.com/dbt-athena/dbt-athena/pull/703) ##### Dependencies - chore: Update dbt-tests-adapter requirement from ~=1.9.1 to ~=1.9.2 by [@​dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/687](https://togithub.com/dbt-athena/dbt-athena/pull/687) - chore: Update pytest requirement from ~=8.2 to ~=8.3 by [@​dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/690](https://togithub.com/dbt-athena/dbt-athena/pull/690) - chore: Update pyupgrade requirement from ~=3.16 to ~=3.17 by [@​dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/692](https://togithub.com/dbt-athena/dbt-athena/pull/692) - chore: Update tenacity requirement from ~=8.2 to >=8.2,<10.0 by [@​dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/693](https://togithub.com/dbt-athena/dbt-athena/pull/693) - chore: Update black requirement from ~=24.4 to ~=24.8 by [@​dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/694](https://togithub.com/dbt-athena/dbt-athena/pull/694) - chore: Update boto3-stubs\[s3] requirement from ~=1.34 to ~=1.35 by [@​dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/707](https://togithub.com/dbt-athena/dbt-athena/pull/707) - chore: Update moto requirement from ~=5.0.12 to ~=5.0.13 by [@​dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/708](https://togithub.com/dbt-athena/dbt-athena/pull/708) - chore: Update pyparsing requirement from ~=3.1.2 to ~=3.1.4 by [@​dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/709](https://togithub.com/dbt-athena/dbt-athena/pull/709) #### New Contributors - [@​iconara](https://togithub.com/iconara) made their first contribution in [https://github.com/dbt-athena/dbt-athena/pull/684](https://togithub.com/dbt-athena/dbt-athena/pull/684) **Full Changelog**: dbt-labs/dbt-athena@v1.8.3...v1.8.4 </details> --- ### Configuration 📅 **Schedule**: Branch creation - "before 4am on the first day of the month" (UTC), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://togithub.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy40NDAuNyIsInVwZGF0ZWRJblZlciI6IjM3LjQ0MC43IiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJhdXRvbWVyZ2UiXX0=-->
When there is too many files to process during a vacuum, dbt model fails with this error: "ICEBERG_VACUUM_MORE_RUNS_NEEDED: Removed 20000 files in this round of vacuum, but there are more files remaining. Please run another VACUUM command to process the remaining files."
We apply therefore the same logic as we did for the optimize. There is also an attempt to gather the code since they have the same logic.
Description
Models used to test - Optional
Checklist