Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deduped always 0 when deduped?? #30

Open
i00 opened this issue Aug 20, 2020 · 16 comments
Open

deduped always 0 when deduped?? #30

i00 opened this issue Aug 20, 2020 · 16 comments
Assignees
Labels
bug Something isn't working question Further information is requested

Comments

@i00
Copy link

i00 commented Aug 20, 2020

When I run it using:

dduper --device /dev/sdb1 --dir /Databases/_/BNE/

files appear to be deduped (used space goes down) ... but ...
the output is always "deduped: 0" (see below)
image

@Lakshmipathi
Copy link
Owner

That's strange. Are you using some type of btrfs RAID?
Can you share that output for dduper --device /dev/sdb1 --files /path/to/backup1.bak /path/to/backup2.bak --dry-run ?

@Lakshmipathi Lakshmipathi self-assigned this Aug 20, 2020
@Lakshmipathi Lakshmipathi added the bug Something isn't working label Aug 20, 2020
@Lakshmipathi Lakshmipathi added the question Further information is requested label Aug 24, 2020
@i00
Copy link
Author

i00 commented Aug 28, 2020

Just standard BTRFS (no raid)

The Dry-run:
image

... Also the size does not appear to go down by the amount that it should (basically most of the file is the same).
... I should probably also mention that the command without the dry run takes AGES to run (... over 10 mins!)
The run:
image

@Lakshmipathi
Copy link
Owner

Thanks dry-run seems to indicate we have 1192600 KB (approx 11.03 GB) duplicate data and both files are around 11.05 GB. Is that correct? But I still can't figure out why this shows deduped: 0 . Please let me know following details:

  1. How did you install dduper ?
  2. Can you share output for uname -a and dduper --version

Also the size does not appear to go down by the amount that it should

What's the disk usage before and after you ran dduper command?

that the command without the dry run takes AGES to run (... over 10 mins!)

Its because we are using 32kb chunk size for 11GB file. So it does lot of syscalls, can you try increasing chunk size to 1M ?
dduper --device /dev/sdb1 --files /path/to/backup1.bak /path/to/backup2.bak --chunk-size 1024

Also you can run --analyze option to decide which chunk size gets optimal disk saving and performance.
dduper --device /dev/sdb1 --files /path/to/backup1.bak /path/to/backup2.bak --analyze https://github.com/Lakshmipathi/dduper#analyze-with-different-chunk-size

PS: Please use this suggestion with care: If you already have a backup of these files in other system, you may want to checkout --fast-mode option it is faster than default mode. https://github.com/Lakshmipathi/dduper#dedupe-files-faster-fast-mode

@i00
Copy link
Author

i00 commented Aug 28, 2020

dduper is installed in a docker container from laks\dduper

It's hard to tell the space reclaimed ... since this server gets use from others ... but the volume usage only seemed to drop by ~100MB

image

@Lakshmipathi
Copy link
Owner

Thanks for the details. I didn't update the docker image with recent changes. Let me push of those change to docker and re-create the image and ping here. After that you can pull the latest image and give it a try.

@Lakshmipathi
Copy link
Owner

I was testing the issue without docker image - let me test there and reproduce the issue too

@Lakshmipathi Lakshmipathi removed the question Further information is requested label Aug 28, 2020
@Lakshmipathi
Copy link
Owner

This is a bug, I managed to reproduce the issue. Fixed it (hopefully :P) - currently running some basic tests

Lakshmipathi pushed a commit that referenced this issue Aug 28, 2020
Until now, we skiped making use of bytes_deduped status code. This caused a bug
discussed here #30

Signed-off-by: Lakshmipathi <[email protected]>
@Lakshmipathi
Copy link
Owner

Lakshmipathi commented Aug 29, 2020

@i00 , Changes seems to be working, please pull the latest docker image and give it a try.

@i00
Copy link
Author

i00 commented Aug 31, 2020

Ok ... thanks ... appears to be working now.
Thanks ... :)

Just wondering why when I run it now it takes ~399 seconds the first time (2 files) ... then when I re-run it it takes ~274 seconds ...
The --dry-run on these takes ~1 second ... can it not tell that it has already been deduped?

@i00
Copy link
Author

i00 commented Aug 31, 2020

Also...
I ran the dedupe 2x (below), then dry-ran it...
image

  1. Why did the matched / unmatched chunks change between one run and the next?
  2. Why did the 2nd run dedupe another ~54MB?

Thanks

@i00
Copy link
Author

i00 commented Aug 31, 2020

Also...
Just copied 2 files (Total size: 24,887,927,808 bytes)
Then deduped:
image
Then checked the space on the drive (268.9 GB free in System Monitor)
Then deleted both files (246.9 GB free in System Monitor)
... the space was reduced by ~22GBs? ... how is this when almost 12GB of the file was dedupped?

Thanks

@Lakshmipathi
Copy link
Owner

. can it not tell that it has already been deduped?

dduper doesn't maintain any states on-disk (like storing details in db).
I plan to add this check in upcoming days.

Why did the matched / unmatched chunks change between one run and the next?

Can you try something like this:

run-1

sleep 10 && sync && sleep 10

run-2

and verify the results.

... the space was reduced by ~22GBs? ... how is this when almost 12GB of the file was dedupped?

Sorry, I didn't understand this issue. could you please provide psudeo-code of it?
thanks

@Lakshmipathi
Copy link
Owner

filefrag -e /path/to/file should display extents which are shared or not.

#### 2mb file with extents 

$ filefrag -e /mnt/fn_aa_1_2mb 
Filesystem type is: 9123683e
File size of /mnt/fn_aa_1_2mb is 2097152 (512 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..     511:      35888..     36399:    512:             last,eof
/mnt/fn_aa_1_2mb: 1 extent found

#### perform dedupe on 2mb file

$  sudo ./dduper --device /dev/loop12 --files /mnt/fn_a_1_1mb /mnt/fn_aa_1_2mb --chunk-size 1024


#### now filefrag shows it has 2 extents and both of them are `shared` 

$ filefrag -e /mnt/fn_aa_1_2mb 
Filesystem type is: 9123683e
File size of /mnt/fn_aa_1_2mb is 2097152 (512 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..     255:       4896..      5151:    256:             shared
   1:      256..     511:       4896..      5151:    256:       5152: last,shared,eof
/mnt/fn_aa_1_2mb: 2 extents found

@Lakshmipathi
Copy link
Owner

Another simple option:

btrfs fi du /path/to/deduped files

below says 30MB deduped/shared content on the file.

btrfs fi du /mnt/fn_xbyczd_10_60mb 
     Total   Exclusive  Set shared  Filename
  60.00MiB    30.00MiB    30.00MiB  /mnt/fn_xbyczd_10_60mb

@i00
Copy link
Author

i00 commented Sep 14, 2020

Hey sorry; project has been on hold for a while; hope to investigate this further next week.

@Lakshmipathi
Copy link
Owner

Sure no problem.

@Lakshmipathi Lakshmipathi added the question Further information is requested label Sep 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants