Skip to content

Commit

Permalink
dedup-tool: add basic crawling
Browse files Browse the repository at this point in the history
Create crawling threads which crawl objects in base pool and deduplicate
based on their deduplication efficiency. Crawler samples objects and finds
duplicated chunks within the samples. It regards an object which has
duplicated chunks higher than object_dedup_threshold value as an efficient
object to be deduplicated. Besides the chunk which is duplicated more than
chunk_dedup_threshold times is also deduplicated.
The commit contains basic crawling which crawls all objects in base pool
instead of sampling among the objects.

[usage]
  ceph_dedup_tool --op sample-dedup --pool POOL --chunk-pool POOL \
    --fingerprint-algorithm FP --object-dedup-threshold <percentile> \
    --chunk-dedup-threshold <number>
  • Loading branch information
jyha200 committed Feb 24, 2022
1 parent cee3cae commit c344821
Showing 1 changed file with 746 additions and 41 deletions.
Loading

0 comments on commit c344821

Please sign in to comment.