Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stop while uploading #105

Open
pqkhanhvn opened this issue Dec 29, 2014 · 22 comments
Open

stop while uploading #105

pqkhanhvn opened this issue Dec 29, 2014 · 22 comments

Comments

@pqkhanhvn
Copy link

I use mt-aws-glacier version 1.120 to upload a file 6GB to Glacier but the uploading process STOP without any error messages.
below is command line and log
$mtglacier sync --config=glacier.cfg --dir /storage/DATA --vault=data --journal=journal.info --concurrency=4 --partsize=64

MT-AWS-Glacier, Copyright 2012-2014 Victor Efimov http://mt-aws.com/ Version 1.120

PID 7123 Started worker
PID 7124 Started worker
PID 7125 Started worker
PID 7126 Started worker
PID 7124 Created an upload_id StFe-J5vFcAsOYwO5FCPtqHJTyZGeJZHnRse5vgr2epw3iCMH--bDzRgbdIe5pwCW4S56QBzquResYHEoBvaXotjYVtS
PID 7125 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [0]
PID 7124 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [201326592]
PID 7123 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [134217728]
PID 7126 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [67108864]
PID 7124 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [335544320]
PID 7126 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [469762048]
PID 7125 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [268435456]
PID 7123 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [402653184]
PID 7123 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [738197504]
PID 7124 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [536870912]
PID 7126 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [603979776]
PID 7125 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [671088640]
PID 7124 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [872415232]
PID 7123 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [805306368]
PID 7125 HTTP 408 This might be normal. Will retry (322 seconds spent for request)
PID 7124 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [1073741824]
PID 7126 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [939524096]
PID 7124 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [1207959552]
PID 7126 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [1275068416]
PID 7123 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [1140850688]
PID 7125 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [1006632960]
PID 7124 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [1342177280]
PID 7126 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [1409286144]
PID 7123 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [1476395008]
PID 7125 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [1543503872]
PID 7126 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [1677721600]
PID 7123 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [1744830464]
PID 7124 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [1610612736]
PID 7125 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [1811939328]
PID 7126 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [1879048192]
PID 7125 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [2080374784]
PID 7123 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [1946157056]
PID 7124 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [2013265920]
PID 7123 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [2281701376]
PID 7125 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [2214592512]
PID 7126 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [2147483648]
PID 7124 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [2348810240]
PID 7126 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [2550136832]
PID 7123 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [2415919104]
PID 7125 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [2483027968]
PID 7124 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [2617245696]
PID 7126 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [2684354560]
PID 7123 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [2751463424]
PID 7125 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [2818572288]
PID 7125 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [3087007744]
PID 7124 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [2885681152]
PID 7123 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [3019898880]
PID 7126 Uploaded part for DATA/20141225_14_1_DATA.tar.gz.gpg at offset [2952790016]

The upload process STOP at the last log line without any error message.
Could you please look into this problem?
Thanks for the great tool!

@vsespb
Copy link
Owner

vsespb commented Dec 29, 2014

  1. need strace -p $PID for each of last pids 7126,7123,7124 etc

  2. is the problem repeatable ?

@pqkhanhvn
Copy link
Author

  1. This is strace for the PID
    root@backup:/home/glacier# strace -p 7126
    attach: ptrace(PTRACE_ATTACH, ...): No such process
    root@backup:/home/glacier# strace -p 7123
    attach: ptrace(PTRACE_ATTACH, ...): No such process
    root@backup:/home/glacier# strace -p 7124
    attach: ptrace(PTRACE_ATTACH, ...): No such process
    root@backup:/home/glacier# strace -p 7125
    attach: ptrace(PTRACE_ATTACH, ...): No such process

  2. Yes, the problem is repeatable. It stops randomly, not base on a period of time.

@vsespb
Copy link
Owner

vsespb commented Dec 30, 2014

I need

  1. perl -MJSON::XS -E 'say JSON::XS->VERSION'
  2. perl -MDigest::SHA -E 'say Digest::SHA->VERSION'
  3. perl -V (note capital V)
  4. your OS and distro version
  5. run echo $? in same terminal, after you see failure again
  6. check syslog for OOM errors (out of memory)
  7. Is it possible that there is no enough memory during run ?

@pqkhanhvn
Copy link
Author

  1. root@backup:/home/glacier# perl -MJSON::XS -E 'say JSON::XS->VERSION'
    2.32
  2. root@backup:/home/glacier# perl -MDigest::SHA -E 'say Digest::SHA->VERSION'
    5.61
  3. root@backup:/home/glacier# perl -V
    Summary of my perl5 (revision 5 version 14 subversion 2) configuration:

Platform:
osname=linux, osvers=2.6.42-26-generic, archname=i686-linux-gnu-thread-multi-64int
uname='linux roseapple 2.6.42-26-generic #41-ubuntu smp thu jun 14 17:49:24 utc 2012 i686 i686 i386 gnulinux '
config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i686-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.14 -Darchlib=/usr/lib/perl/5.14 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.14.2 -Dsitearch=/usr/local/lib/perl/5.14.2 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Duse64bitint -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -Ui_libutil -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -Dlibperl=libperl.so.5.14.2 -des'
hint=recommended, useposix=true, d_sigaction=define
useithreads=define, usemultiplicity=define
useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
use64bitint=define, use64bitall=undef, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-O2 -g',
cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
ccversion='', gccversion='4.6.3', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
libpth=/usr/local/lib /lib/i386-linux-gnu /lib/../lib /usr/lib/i386-linux-gnu /usr/lib/../lib /lib /usr/lib
libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
perllibs=-ldl -lm -lpthread -lc -lcrypt
libc=, so=so, useshrplib=true, libperl=libperl.so.5.14.2
gnulibc_version='2.15'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
cccdlflags='-fPIC', lddlflags='-shared -O2 -g -L/usr/local/lib -fstack-protector'

Characteristics of this binary (from libperl):
Compile-time options: MULTIPLICITY PERL_DONT_CREATE_GVSV
PERL_IMPLICIT_CONTEXT PERL_MALLOC_WRAP
PERL_PRESERVE_IVUV USE_64_BIT_INT USE_ITHREADS
USE_LARGE_FILES USE_PERLIO USE_PERL_ATOF
USE_REENTRANT_API
Locally applied patches:
DEBPKG:debian/arm_thread_stress_timeout - http://bugs.debian.org/501970 Raise the timeout of ext/threads/shared/t/stress.t to accommodate slower build hosts
DEBPKG:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN.
DEBPKG:debian/db_file_ver - http://bugs.debian.org/340047 Remove overly restrictive DB_File version check.
DEBPKG:debian/doc_info - Replace generic man(1) instructions with Debian-specific information.
DEBPKG:debian/enc2xs_inc - http://bugs.debian.org/290336 Tweak enc2xs to follow symlinks and ignore missing @inc directories.
DEBPKG:debian/errno_ver - http://bugs.debian.org/343351 Remove Errno version check due to upgrade problems with long-running processes.
DEBPKG:debian/libperl_embed_doc - http://bugs.debian.org/186778 Note that libperl-dev package is required for embedded linking
DEBPKG:fixes/respect_umask - Respect umask during installation
DEBPKG:debian/writable_site_dirs - Set umask approproately for site install directories
DEBPKG:debian/extutils_set_libperl_path - EU:MM: Set location of libperl.a to /usr/lib
DEBPKG:debian/no_packlist_perllocal - Don't install .packlist or perllocal.pod for perl or vendor
DEBPKG:debian/prefix_changes - Fiddle with PREFIX and variables written to the makefile
DEBPKG:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets.
DEBPKG:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor.
DEBPKG:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy.
DEBPKG:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable.
DEBPKG:debian/m68k_thread_stress - http://bugs.debian.org/517938 http://bugs.debian.org/495826 Disable some threads tests on m68k for now due to missing TLS.
DEBPKG:debian/mod_paths - Tweak @inc ordering for Debian
DEBPKG:debian/module_build_man_extensions - http://bugs.debian.org/479460 Adjust Module::Build manual page extensions for the Debian Perl policy
DEBPKG:debian/prune_libs - http://bugs.debian.org/128355 Prune the list of libraries wanted to what we actually need.
DEBPKG:fixes/net_smtp_docs - [rt.cpan.org #36038] http://bugs.debian.org/100195 Document the Net::SMTP 'Port' option
DEBPKG:debian/perlivp - http://bugs.debian.org/510895 Make perlivp skip include directories in /usr/local
DEBPKG:debian/disable-zlib-bundling - Disable zlib bundling in Compress::Raw::Zlib
DEBPKG:debian/cpanplus_definstalldirs - http://bugs.debian.org/533707 Configure CPANPLUS to use the site directories by default.
DEBPKG:debian/cpanplus_config_path - Save local versions of CPANPLUS::Config::System into /etc/perl.
DEBPKG:debian/deprecate-with-apt - http://bugs.debian.org/580034 Point users to Debian packages of deprecated core modules
DEBPKG:fixes/hurd-ccflags - [a190e64] http://bugs.debian.org/587901 [perl #92244] Make hints/gnu.sh append to $ccflags rather than overriding them
DEBPKG:debian/squelch-locale-warnings - http://bugs.debian.org/508764 Squelch locale warnings in Debian package maintainer scripts
DEBPKG:debian/skip-upstream-git-tests - Skip tests specific to the upstream Git repository
DEBPKG:fixes/extutils-cbuilder-cflags - [011e8fb] http://bugs.debian.org/624460 [perl #89478] Append CFLAGS and LDFLAGS to their Config.pm counterparts in EU::CBuilder
DEBPKG:fixes/module-build-home-directory - http://bugs.debian.org/624850 [rt.cpan.org #67893] Fix failing tilde test when run under a UID without a passwd entry
DEBPKG:debian/patchlevel - http://bugs.debian.org/567489 List packaged patches for 5.14.2-6ubuntu2.1 in patchlevel.h
DEBPKG:fixes/h2ph-multiarch - [e7ec705] http://bugs.debian.org/625808 [perl #90122] Make h2ph correctly search gcc include directories
DEBPKG:fixes/index-tainting - [3b36395] http://bugs.debian.org/291450 [perl #64804] RT 64804: tainting with index() of a constant
DEBPKG:debian/skip-kfreebsd-crash - http://bugs.debian.org/628493 [perl #96272] Skip a crashing test case in t/op/threads.t on GNU/kFreeBSD
DEBPKG:fixes/document_makemaker_ccflags - http://bugs.debian.org/628522 [rt.cpan.org #68613] Document that CCFLAGS should include $Config{ccflags}
DEBPKG:fixes/sys-syslog-socket-timeout-kfreebsd.patch - http://bugs.debian.org/627821 [rt.cpan.org #69997] Use a socket timeout on GNU/kFreeBSD to catch ICMP port unreachable messages
DEBPKG:fixes/hurd-hints - http://bugs.debian.org/636609 Improve general GNU hints, needed for GNU/Hurd.
DEBPKG:fixes/pod_fixes - [7698aed] http://bugs.debian.org/637816 Fix typos in several pod/perl
.pod files
DEBPKG:debian/find_html2text - http://bugs.debian.org/640479 Configure CPAN::Distribution with correct name of html2text
DEBPKG:fixes/digest_eval_hole - http://bugs.debian.org/644108 Close the eval "require $module" security hole in Digest->new($algorithm)
DEBPKG:fixes/hurd-ndbm - [f0d0a20] [perl #102680] http://bugs.debian.org/645989 Add GNU/Hurd hints for NDBM_File
DEBPKG:fixes/sysconf.t-posix - [8040185] [perl #102888] http://bugs.debian.org/646016 Fix hang in ext/POSIX/t/sysconf.t on GNU/Hurd
DEBPKG:fixes/hurd-largefile - [1fda587] [perl #103014] http://bugs.debian.org/645790 enable LFS on GNU/Hurd
DEBPKG:debian/hurd_test_todo_syslog - http://bugs.debian.org/650093 Disable failing GNU/Hurd tests in cpan/Sys-Syslog/t/syslog.t
DEBPKG:fixes/hurd_skip_itimer_virtual - [rt.cpan.org #72754] http://bugs.debian.org/650094 Skip interval timer tests in Time::HiRes on GNU/Hurd
DEBPKG:debian/hurd_test_skip_socketpair - http://bugs.debian.org/650186 Disable failing GNU/Hurd tests ext/Socket/t/socketpair.t
DEBPKG:debian/hurd_test_skip_sigdispatch - http://bugs.debian.org/650188 Disable failing GNU/Hurd tests op/sigdispatch.t
DEBPKG:debian/hurd_test_skip_stack - http://bugs.debian.org/650175 Disable failing GNU/Hurd tests dist/threads/t/stack.t
DEBPKG:debian/hurd_test_skip_recv - http://bugs.debian.org/650095 Disable failing GNU/Hurd tests cpan/autodie/t/recv.t
DEBPKG:debian/hurd_test_skip_libc - http://bugs.debian.org/650097 Disable failing GNU/Hurd tests dist/threads/t/libc.t
DEBPKG:debian/hurd_test_skip_pipe - http://bugs.debian.org/650187 Disable failing GNU/Hurd tests io/pipe.t
DEBPKG:debian/hurd_test_skip_io_pipe - http://bugs.debian.org/650096 Disable failing GNU/Hurd tests dist/IO/t/io_pipe.t
Built under linux
Compiled at Aug 10 2012 21:26:09
@inc:
/etc/perl
/usr/local/lib/perl/5.14.2
/usr/local/share/perl/5.14.2
/usr/lib/perl5
/usr/share/perl5
/usr/lib/perl/5.14
/usr/share/perl/5.14
/usr/local/lib/site_perl
.

4a) root@backup:/home/glacier# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 12.04.1 LTS
Release: 12.04
Codename: precise

4b) root@backup:/home/glacier# uname -a
Linux backup 3.2.0-29-generic-pae #46-Ubuntu SMP Fri Jul 27 17:25:43 UTC 2012 i686 i686 i386 GNU/Linux

  1. root@backup:/home/glacier# echo $?
    0
  2. There ISN'T any message related to memory in these syslog*
  3. This may be a reason because my computer has only 1GB RAM but I have uploaded to Amazon Glacier 1.3TB on this computer by mt-aws-glacier version 1.112 without this problem. This problem just occurs when I update mt-aws-glacier to the last version 1.120.
    I will terminate all other programs on this computer to increase free memory for mt-aws-glacierand try again.

Thanks for your support!

@pqkhanhvn
Copy link
Author

I stop all other programs. Free memory is 370MB. I reduce concurrency parameter to 2 but the problem still occurs. (Free memory > concurrency*partsize)
$mtglacier sync --config=glacier.cfg --dir /storage/DATA --vault=data --journal=journal.info --concurrency=2 --partsize=64

@vsespb
Copy link
Owner

vsespb commented Dec 31, 2014

Could you pls try branch debug_for_issue_105 - https://github.com/vsespb/mt-aws-glacier/tree/debug_for_issue_105

I've added some debugging.

@vsespb
Copy link
Owner

vsespb commented Dec 31, 2014

Also, pls try running strace mtglacier ... and paste last lines of output here, after process gone.

@pqkhanhvn
Copy link
Author

I've cloned new branch debug_for_issue_105 on the same machine and uploaded 50GB without this problem. I will continue uploading remain data and will update you if it fires error messages.

@pqkhanhvn
Copy link
Author

pqkhanhvn commented Jul 18, 2017

I got this problem again with branch debug_for_issue_105. Uploading processes stopped without error message. It repeats many times. Below is upload log:
PID 6621 Uploaded part for DATA/part1.tar.gz.gpg at offset [0]
PID 6621 HTTP connection problem (timeout?). Will retry (20 seconds spent for request)
PID 6622 Uploaded part for DATA/part1.tar.gz.gpg at offset [67108864]
PID 6621 Uploaded part for DATA/part1.tar.gz.gpg at offset [134217728]
PID 6622 Uploaded part for DATA/part1.tar.gz.gpg at offset [201326592]
PID 6621 Uploaded part for DATA/part1.tar.gz.gpg at offset [268435456]
PID 6622 Uploaded part for DATA/part1.tar.gz.gpg at offset [335544320]
PID 6621 Uploaded part for DATA/part1.tar.gz.gpg at offset [402653184]
PID 6622 Uploaded part for DATA/part1.tar.gz.gpg at offset [469762048]
PID 6621 Uploaded part for DATA/part1.tar.gz.gpg at offset [536870912]
PID 6621 Uploaded part for DATA/part1.tar.gz.gpg at offset [671088640]
PID 6622 Uploaded part for DATA/part1.tar.gz.gpg at offset [603979776]
PID 6621 Uploaded part for DATA/part1.tar.gz.gpg at offset [738197504]
PID 6622 Uploaded part for DATA/part1.tar.gz.gpg at offset [805306368]
PID 6622 Uploaded part for DATA/part1.tar.gz.gpg at offset [939524096]
PID 6621 Uploaded part for DATA/part1.tar.gz.gpg at offset [872415232]

This is my environment information:

  • OS: Ubuntu 14.04.3 LTS

  • Upload parameters:
    concurrency=2 partsize=64

  • Code version:

        commit 8344b0045b523e983d4b44843ccf18bbc065a40f
        Author: Victor <[email protected]>
        Date:   Wed Dec 31 14:56:54 2014 +0300
    
                 What if parent dies? Need error message.
    
         commit 4b563cff58661ee881577a8089ee780a32996ddf
         Author: Victor <[email protected]>
         Date:   Sun Aug 10 00:06:34 2014 +0400
    
               Version 1.120 released
    
  • perl -MJSON::XS -E 'say JSON::XS->VERSION' --->2.34

  • perl -MDigest::SHA -E 'say Digest::SHA->VERSION' -->5.84_01

  • perl -V: Summary of my perl5 (revision 5 version 18 subversion 2) configuration

  • No OOM message in log files

Could you please fix this problem?
Thank you for the great tool.

@vsespb
Copy link
Owner

vsespb commented Jul 18, 2017

so, "PID 6621 Uploaded part for DATA/part1.tar.gz.gpg at offset [872415232]" - is the very last line in outpout?

@pqkhanhvn
Copy link
Author

yes PID 6621 is the last line.

@vsespb
Copy link
Owner

vsespb commented Jul 18, 2017

again, last i asked two yrs ago - pls check syslog/dmsg for Out of memroy messages or OOM Killer, Segfaults, etc

@pqkhanhvn
Copy link
Author

I checked log files but didn't found Out of memory message. I am uploading now and the system frees 158M

root@backup:/var/log# free
total used free shared buffers cached
Mem: 2055684 1897656 158028 4672 274024 1201712
-/+ buffers/cache: 421920 1633764
Swap: 1030140 1388 1028752

Please advise!

@pqkhanhvn
Copy link
Author

I've expanded double RAM for the system but uploading process still stopped with error messages as attached
image

Could you please fix it?
Thank you very much!
Great tool.

@vsespb
Copy link
Owner

vsespb commented Jul 23, 2017

yes, but fix what? sha256 computation works without flaws forears for me and other users.

you started the ticket at y2014. was this same 1) hardware 2) software like now?

  1. pls check hardware with memtest86 or memtester ( http://manpages.ubuntu.com/manpages/xenial/man8/memtester.8.html )

last signature error could be because of broken RAM (besides you've just added new RAM banks)

@pqkhanhvn
Copy link
Author

Memory passed memtest86 but the problem still happens. I have also built another machine Ubuntu 14, same perl version but still facing with the problem. Below is software version on the new computer

  • perl -MJSON::XS -E 'say JSON::XS->VERSION' --->2.34
  • perl -MDigest::SHA -E 'say Digest::SHA->VERSION' -->5.84_01
  • perl -V: Summary of my perl5 (revision 5 version 18 subversion 2) configuration
  • mt-aws-glacier version 1.1.20
  • parameter --concurrency=2 --partsize=64 or --partsize=128 or --partsize=256 all got the problem sha256 computation

image

Please advise!

@vsespb
Copy link
Owner

vsespb commented Jul 28, 2017

you said "It repeats many times." but how often? every 1/10 minutes/hours? or less often?

@pqkhanhvn
Copy link
Author

It is random. This is information collected from my log.

Fri Jul 14 19:29:41 ICT 2017 :Stopped after: 3820 (seconds)
Fri Jul 14 20:57:11 ICT 2017 :Stopped after: 3310 (seconds)
Fri Jul 14 23:26:29 ICT 2017 :Stopped after: 1288 (seconds)
Sat Jul 15 07:37:12 ICT 2017 :Stopped after: 2171 (seconds)
Sat Jul 15 08:32:29 ICT 2017 :Stopped after: 1948 (seconds)
Sat Jul 15 11:41:16 ICT 2017 :Stopped after: 5235 (seconds)
Sat Jul 15 13:47:38 ICT 2017 :Stopped after: 2257 (seconds)
Sat Jul 15 21:31:30 ICT 2017 :Stopped after: 5189 (seconds)
Sun Jul 16 07:41:47 ICT 2017 :Stopped after: 526 (seconds)
Sun Jul 16 14:27:44 ICT 2017 :Stopped after: 16422 (seconds)
Sun Jul 16 20:37:13 ICT 2017 :Stopped after: 13272 (seconds)
Mon Jul 17 04:45:06 ICT 2017 :Stopped after: 23225 (seconds)
Mon Jul 17 11:43:31 ICT 2017 :Stopped after: 9030 (seconds)
Mon Jul 17 13:54:11 ICT 2017 :Stopped after: 7630 (seconds)
Mon Jul 17 15:13:23 ICT 2017 :Stopped after: 4582 (seconds)
Mon Jul 17 16:52:41 ICT 2017 :Stopped after: 5320 (seconds)
Mon Jul 17 22:18:19 ICT 2017 :Stopped after: 18918 (seconds)
Tue Jul 18 12:48:47 ICT 2017 :Stopped after: 14866 (seconds)
Tue Jul 18 14:43:24 ICT 2017 :Stopped after: 6203 (seconds)
Tue Jul 18 17:40:58 ICT 2017 :Stopped after: 10137 (seconds)
Tue Jul 18 21:00:03 ICT 2017 :Stopped after: 3722 (seconds)
Fri Jul 21 18:20:55 ICT 2017 :Stopped after: 3054 (seconds)
Sat Jul 22 00:31:19 ICT 2017 :Stopped after: 10698 (seconds)
Mon Jul 24 17:35:37 ICT 2017 :Stopped after: 2136 (seconds)
Tue Jul 25 01:20:24 ICT 2017 :Stopped after: 18563 (seconds)
Tue Jul 25 14:07:24 ICT 2017 :Stopped after: 143 (seconds)
Wed Jul 26 19:35:21 ICT 2017 :Stopped after: 5660 (seconds)
Wed Jul 26 20:55:56 ICT 2017 :Stopped after: 1795 (seconds)
Thu Jul 27 18:36:45 ICT 2017 :Stopped after: 9944 (seconds)
Fri Jul 28 00:48:48 ICT 2017 :Stopped after: 8327 (seconds)

@bpmckinnon
Copy link

Hi, I just started using the software and I'm getting a similar issue. I'll try and get you some useful debug info on my next run. For me it repeats every few hours, and I'm running it on many smallish file (15000 uploads for 50GB). So far I have:

dmesg
[158309.634505] Out of memory: Kill process 15899 (mdadm) score 870 or sacrifice child
[158309.634510] Killed process 15899 (mdadm) total-vm:1818512kB, anon-rss:1801668kB, file-rss:0kB

I've restarted the process with strace and I'll let you know if I find anything.

@bpmckinnon
Copy link

One thing I've noticed is that the first perl process is holding onto quite a bit of memory, given that I'm running it on a old server with 2GB of ram.

root 16199 1.5 0.0 5204 1276 pts/0 S 11:18 0:09 strace -o mtglacier.strace /usr/local/src/mt-aws-glacier/mtglacier sync --new --config=/usr/local/bin/glacier/samba-glacier.cfg --
root 16201 4.1 4.1 2734840 83748 pts/0 S 11:18 0:26 perl /usr/local/src/mt-aws-glacier/mtglacier sync --new --config=/usr/local/bin/glacier/samba-glacier.cfg --filter=-glacier-journa
root 16202 0.4 1.8 101152 36396 pts/0 S 11:18 0:02 perl /usr/local/src/mt-aws-glacier/mtglacier sync --new --config=/usr/local/bin/glacier/samba-glacier.cfg --filter=-glacier-journa
root 16203 0.4 1.8 101964 37336 pts/0 S 11:18 0:02 perl /usr/local/src/mt-aws-glacier/mtglacier sync --new --config=/usr/local/bin/glacier/samba-glacier.cfg --filter=-glacier-journa
root 16204 0.4 1.7 100104 35552 pts/0 S 11:18 0:02 perl /usr/local/src/mt-aws-glacier/mtglacier sync --new --config=/usr/local/bin/glacier/samba-glacier.cfg --filter=-glacier-journa
root 16205 0.4 1.9 102984 38460 pts/0 S 11:18 0:03 perl /usr/local/src/mt-aws-glacier/mtglacier sync --new --config=/usr/local/bin/glacier/samba-glacier.cfg --filter=-glacier-journa
root 16206 0.4 1.7 100240 35464 pts/0 S 11:18 0:02 perl /usr/local/src/mt-aws-glacier/mtglacier sync --new --config=/usr/local/bin/glacier/samba-glacier.cfg --filter=-glacier-journa
root 16207 0.4 1.9 103400 38704 pts/0 S 11:18 0:02 perl /usr/local/src/mt-aws-glacier/mtglacier sync --new --config=/usr/local/bin/glacier/samba-glacier.cfg --filter=-glacier-journa
root 16208 0.4 1.8 101428 36724 pts/0 S 11:18 0:02 perl /usr/local/src/mt-aws-glacier/mtglacier sync --new --config=/usr/local/bin/glacier/samba-glacier.cfg --filter=-glacier-journa
root 16209 0.4 1.7 100416 35764 pts/0 S 11:18 0:02 perl /usr/local/src/mt-aws-glacier/mtglacier sync --new --config=/usr/local/bin/glacier/samba-glacier.cfg --filter=-glacier-journa
root 16210 0.4 1.9 103036 38408 pts/0 S 11:18 0:02 perl /usr/local/src/mt-aws-glacier/mtglacier sync --new --config=/usr/local/bin/glacier/samba-glacier.cfg --filter=-glacier-journa
root 16211 0.4 1.9 102776 38296 pts/0 S 11:18 0:02 perl /usr/local/src/mt-aws-glacier/mtglacier sync --new --config=/usr/local/bin/glacier/samba-glacier.cfg --filter=-glacier-journa

@bpmckinnon
Copy link

I have 2 strace stacks. Let me know if you want the full files (I'm not 100% how much data is in there, so I don't know that I just post the entire file).
The first is:
write(16, "00000036", 8) = 8
write(16, "16493\tupload_part\t9142\t354\t40024"..., 36) = 36
write(16, "{"mtime":1279327706,"part_final_"..., 354) = 354
write(16, "RIFF\26\270b\2AVI LISTR\3\0\0hdrlavih8\0\0\0"..., 40024094) = 40024094
select(24, [3 4 5 7 9 11 13 15 17 19], NULL, NULL, NULL) = 1 (in [9])
read(9, "00000026", 8) = 8
read(9, "16498\tresponse\t9141\t238\t0\n", 26) = 26
read(9, "{"console_out":"Created an uploa"..., 238) = 238
write(1, "PID 16498 Created an upload_id K"..., 124) = 124
mmap(NULL, 268439552, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
brk(0x15c94000) = 0x5c94000
mmap(NULL, 268570624, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap(NULL, 268439552, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
The second is:
read(32, "", 8192) = 0
brk(0x40f0000) = 0x40f0000
brk(0x41f0000) = 0x41f0000
write(10, "00000036", 8) = 8
write(10, "16201\tupload_part\t8582\t296\t62247"..., 36) = 36
write(10, "{"relfilename":"Pre-2012 incl. e"..., 296) = 296
write(10, "RIFF>\323\265\3AVI LISTR\3\0\0hdrlavih8\0\0\0"..., 62247750) = 62247750
select(24, [3 4 5 7 9 11 13 15 17 19], NULL, NULL, NULL) = 1 (in [19])
read(19, "00000026", 8) = 8
read(19, "16211\tresponse\t8576\t118\t0\n", 26) = 26
read(19, "{"console_out":"Uploaded part fo"..., 118) = 118
write(1, "PID 16211 Uploaded part for Pre-"..., 96) = 96
munmap(0x7fb517273000, 268439552) = 0
mmap(NULL, 268439552, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
brk(0x1414e000) = 0x41f0000
mmap(NULL, 268570624, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap(NULL, 268439552, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
brk(0x1414e000) = 0x41f0000
mmap(NULL, 268570624, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7fb51f274000
Let me know if I there is a way to provide more precise error data.

@vsespb
Copy link
Owner

vsespb commented Feb 18, 2019

It's not similar issue. Please start new issue. And I am not sure what I should do here if there is no enough RAM for processing..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants