-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Errors with --workers via cron #4
Comments
Can you do a tcpdump while the cronjob is running? And what is your network setup like? |
Is this related to #5 and can be closed or a different issue? |
This is a different issue. I'll try to send you the desired infos soon. |
I had the same issue, with the error: |
@ct16k that fix doesn't always work. I have a replication that works 100% when run by hand with 6 workers and 100% fails on the first status update (I run with
|
Running with multiple workers definitely helps overcome network latency, even when your disks don't increase throughput with multiple workers. |
To be clear, I mean it works when run in a terminal or even in a disconnected |
@theraser I have tried adding debug logging both on the sender and the receiver. The sender ends up failing with (lines may differ as I have added logging):
To enable logging on the receiver, I used a custom python: mypython.txt The worker that fails always has much more logging to stdout (maybe coincidental?) and nothing in stderror, whereas the other workers always have a broken pipe error (presumably because the ssh is killed).
|
With some further digging, it looks like the sender process is attempting to write after the receiver thinks it is finished:
It looks like the ssh finishes normally, but the sender is continuing to send. |
The error seems to be on the very last block:
It loops over and writes every other block, and then the last block the receiver has already closed. Presumably after writing the hash. |
It also fails when sending a file to a localhost path, so it is not tied to SSH. Once a worker has finished, all other workers fails immediately. Here are the output I receive when using 8 workers (when issuing
This happen when starting the process with |
Actually, it only takes a stdout redirection to reproduce the issue. This command line has the exact same problem:
Seems like the first fork that finish its blocks close its file descriptor and other forks ends up with broken pipes on their stdin/stdout. |
theraser#4 ssh ${server} "vmsync ${file_remote} ${localhost} ${file_local}";
As this project seems unmaintained, I allow myself to promote one of my tool that I wrote to replace blocksync and that mainly solve this issue. I didn't find a way to fix this issue by patching this project unfortunately and I needed to start a new project. So if this issue still impacts some people, you can check deltasync. It has also some performance improvement to maximize IO bandwith and CPU usage to compute the checksum, so you should see an improvement in the effective bandwith too. |
There is also https://github.com/nethappen/blocksync-fast |
If I run a script in crontab using blocksync with --workers it always fails:
Environment:
/root/bin/blocksync.py '--extraparams=-p 2222' --cipher=aes256-ctr --blocksize=4194304 --workers=2 --script=/root/bin/blocksync.py /dev/loop1 [email protected] /srv/libvirt_images/myserver.root.img
The text was updated successfully, but these errors were encountered: