-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zfsbackup seems to hang on Solaris with B2 #17
Comments
(As to could it be that I wasn't sufficiently patient – I did leave it to run overnight, so presumably this is enough to compress/encrypt 1GB even on a lowly Celeron.) Edit: this is irrelevant, files backend was quick. |
setting B2_LOG_LEVEL=10, I can see an API call for Compared to a successful upload (with a smaller pool), the difference is that |
Thanks for reporting this - I assume you compiled your own version from tip? Can you try compiling from the last release version and let me know if the problem persists? I'm curious as to whether the issue is with the latest change I made to the b2 backend, the b2 backend library code I'm using itself, or Go on Solaris. Can you also provide more details on your system (version, architecture, etc)? |
I can try although the only material changes since v0.3-rc1 seem related to the Azure backend (which I'm not using, but I did need to make some changes to the SDK to get it to compile on Solaris, hence reluctant to rebase). (And, to support SHA1.) I'm just running a vanilla Oracle Solaris 11.3 system. Small pools backup and restore to B2 successfully, and large pools backup OK with the file backend. |
|
Just recompiled with go1.11.5, no change. |
I added some debugging log statements to If I change |
Perhaps I need to refile this against blazer, unless it's a usage issue. |
PS. zfsbackup prints the command line usage when there is an error, which sometimes obscures the error message. It would be good to fix this. |
Here's the patch I'm using which I think works but I don't know enough about B2 nor Go to say confidently (and obviously, for something as important as backups I need to do some verification). diff --git a/b2/writer.go b/b2/writer.go
index 209cfce..bedf626 100644
--- a/b2/writer.go
+++ b/b2/writer.go
@@ -311,7 +311,12 @@ func (w *Writer) getLargeFile() (beLargeFileInterface, error) {
cur := &Cursor{name: w.name}
objs, _, err := w.o.b.ListObjects(w.ctx, 1, cur)
if err != nil {
- return nil, err
+ if err == io.EOF {
+ w.Resume = false
+ return w.getLargeFile()
+ } else {
+ return nil, err
+ }
}
if len(objs) < 1 || objs[0].name != w.name {
w.Resume = false |
Thanks for all the debugging! Looks like a fix may have been made to blazer. Can you please provide details as to what SHA1 change you mentioned you made? |
I didn't make a change to SHA1, I was just listing the things that appeared to have changed in master since v0.3-rc1 (sorry, I worded it very confusingly). Anyway, with the upgraded blazer, zfsbackup appears to work as long as I set w.Resume = false in the B2 backend. (I can confirm successfully restoring from a backup.) |
The change to set The intended goal was:
I will admit I did not release a new version after this change since I hadn't gotten around to testing the change just yet (funny how "I'll get to this next week" turns into 6+ months...). From the code, it reads as if we didn't list any files, then it would start a normal upload (e.g. set I think the path forward here will be to make the following adjustments:
The other backends work differently in ways:
I'll hopefully have some fixes in for you to try soon! |
I'm running zfsbackup on Solaris 11. It seems to work fine for a small pool of a few MB, but appears to hang when I try on a real pool (still small, around 1GB).
I'm using the B2 backend. The file backend works fine.
I'm wondering if it may have something to do with the way the tasks are parallelized (although setting
--maxFileBuffer=0
doesn't make any difference).Here's where it hangs (pool names etc changed):
Here are a few pertinent stack traces from gdb:
The text was updated successfully, but these errors were encountered: