Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add table function fuse_vacuum2() #16049

Open
wants to merge 148 commits into
base: main
Choose a base branch
from

Conversation

SkyFan2002
Copy link
Member

@SkyFan2002 SkyFan2002 commented Jul 15, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

This PR introduces a new table function fuse_vacuum2() aimed at enhancing performance.

Object Key of Snapshot v5, Segment, and Block

  • UUID Version: Use UUID v7
    • Lexicographically ordered
    • Timestamp can be extracted from the object key
  • Prefix: Prefixed with the character 'g', ensuring all keys of v5 are larger than v4's in lexicographical order

For segments/blocks that are newly created in an operation based on snapshot s, the timestamp embedded in their object keys equals s.timestamp. If there is no base snapshot, let ts be 0. For snapshots, the timestamp embedded in their object keys equals to their own timestamp.

New Field lvt in Snapshot v5

For all snapshots v5 (s), the candidate value (lvt_candidate) of s.lvt is calculated as s.timestamp - settings.get_retention_period(). s.lvt may be adjusted to larger values based on lvt_candidate, in cases where the retention period has been tweaked or due to clock skews. For example, if lvt_candidate <= s.prev.lvt, then s.lvt could be set to s.prev.lvt + 1. s.lvt should respect the retention period setting at the time. Decreasing or setting s.lvt equal to the lvt of the previous snapshot is not allowed.

Properties

  • s.lvt <= s.timestamp
  • s.lvt > s.prev.lvt
  • If s is based on another snapshot s': s.lvt <= s'.timestamp (enforced during commitment: transactions with stale base snapshots are not allowed).

Notes

  • Some(_) > None
  • s.prev may not equal s'

Steps of Vacuum2

fn vacuum2(){
  let lvt = set_lvt();
  let snapshots_before_lvt = list_until(snapshot_dir,lvt);
  let gc_root = select_gc_root(snapshots_before_lvt);
  let snapshots_to_gc = &snapshots_before_lvt[..gc_root_idx];
  
  let lvt = gc_root.lvt;
  let segments_before_gc_root = list_until(segment_dir,lvt);
  let segments_to_gc = segments_before_gc_root - gc_root.segments;

  let blocks_before_gc_root = list_until(block_dir,lvt);
  let blocks_to_gc = blocks_before_gc_root - gc_root.blocks;

  remove(snapshots_to_gc,segments_to_gc,blocks_to_gc);
}

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • Long run

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Jul 15, 2024
# Conflicts:
#	src/query/service/src/pipelines/builders/builder_commit.rs
#	src/query/service/src/pipelines/builders/builder_recluster.rs
#	src/query/service/tests/it/storages/fuse/conflict.rs
#	src/query/storages/fuse/src/operations/common/generators/mutation_generator.rs
@SkyFan2002 SkyFan2002 added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Oct 24, 2024
Copy link
Contributor

Docker Image for PR

  • tag: pr-16049-98872f7-1729759362

note: this image tag is only available for internal use,
please check the internal doc for more details.

@SkyFan2002 SkyFan2002 added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Oct 28, 2024
Copy link
Contributor

Docker Image for PR

  • tag: pr-16049-a6ea003-1730124329

note: this image tag is only available for internal use,
please check the internal doc for more details.

# Conflicts:
#	src/query/catalog/src/catalog/interface.rs
#	src/query/ee/Cargo.toml
#	src/query/service/src/catalogs/default/database_catalog.rs
#	src/query/service/src/catalogs/default/mutable_catalog.rs
#	src/query/service/src/catalogs/default/session_catalog.rs
#	src/query/service/tests/it/storages/fuse/operations/commit.rs
#	src/query/storages/fuse/src/fuse_table.rs
#	src/query/storages/fuse/src/io/locations.rs
#	tests/sqllogictests/suites/base/06_show/06_0014_show_table_functions.test
# Conflicts:
#	src/query/catalog/src/table_context.rs
#	src/query/service/src/pipelines/builders/builder_column_mutation.rs
#	src/query/service/src/sessions/query_ctx.rs
#	src/query/service/tests/it/sql/exec/get_table_bind_test.rs
#	src/query/settings/src/settings_getter_setter.rs
# Conflicts:
#	src/query/service/src/sessions/query_ctx.rs
#	src/query/service/src/sessions/query_ctx_shared.rs
#	src/query/service/tests/it/storages/fuse/operations/commit.rs
#	src/query/settings/src/settings_getter_setter.rs
#	src/query/storages/fuse/src/fuse_table.rs
#	src/query/storages/memory/src/memory_table.rs
@SkyFan2002 SkyFan2002 marked this pull request as ready for review December 23, 2024 06:21
# Conflicts:
#	src/query/service/src/sessions/query_ctx_shared.rs
#	src/query/service/src/test_kits/fuse.rs
#	src/query/service/tests/it/storages/fuse/meta/snapshot.rs
#	src/query/service/tests/it/storages/fuse/operations/commit.rs
#	src/query/service/tests/it/storages/fuse/operations/mutation/block_compact_mutator.rs
#	src/query/service/tests/it/storages/fuse/operations/mutation/recluster_mutator.rs
#	src/query/storages/common/table_meta/src/meta/v4/snapshot.rs
#	src/query/storages/fuse/src/fuse_table.rs
#	src/query/storages/fuse/src/io/write/meta_writer.rs
#	src/query/storages/fuse/src/operations/common/generators/mutation_generator.rs
#	src/query/storages/fuse/src/operations/common/generators/snapshot_generator.rs
#	src/query/storages/fuse/src/operations/common/generators/truncate_generator.rs
# Conflicts:
#	src/query/service/src/pipelines/builders/builder_column_mutation.rs
#	src/query/service/tests/it/storages/fuse/operations/commit.rs
#	src/query/sql/src/executor/physical_plans/physical_commit_sink.rs
#	src/query/sql/src/executor/physical_plans/physical_recluster.rs
#	src/query/storages/fuse/src/io/locations.rs
#	src/query/storages/fuse/src/operations/append.rs
#	src/query/storages/fuse/src/operations/common/processors/transform_mutation_aggregator.rs
#	tests/sqllogictests/suites/base/06_show/06_0014_show_table_functions.test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-cloud Build docker image for cloud test pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants