-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when running cifar100 examples #4
Comments
You can use a single GPU to run. To do this, open solver.prototxt and keep only one line 'device_id: 0'. Remove the other line 'device_id: 1'. |
Thanks for your reply. @stephenyan1984 Actually I've tried your method, but it doesn't work very well, so I'd figure out where the problem is. If I only use one GPU, it still gives out the following error: [cliu@ycao-hadoop3 HD-CNN]$ ./examples/cifar100/train_cifar100_NIN_float_crop_v2_train_val.sh Any new ideas? or suggestions? Thanks. |
From the error message, you should check whether the leveldb database exists. Its path is 'examples/cifar100/cifar100-float-train-train-val-leveldb/cifar100-train-leveldb'. |
Thanks for your advice. @stephenyan1984 Actually I have checked that and the folder exists, I also tried to remove the repo and redo all the operation, it seems that it still doesn't work. I'm not sure if the level-db file you provide is broken or not. I used the I will try to generate my own level-db files and see if it works. Thanks. |
Hi, Zhicheng,
I successfully build caffe using your tutorial here: https://sites.google.com/site/homepagezhichengyan/home/hdcnn/code, but when running the example of cifar100 in the 2nd step(./examples/cifar100/train_cifar100_NIN_float_crop_v2_train_val.sh, there is some strange error, as the following shows, I think it may be the problem of multiple GPUs, as I can run single experiment using one GPU in the Caffe's official example. Could you please give me some advices? Thank you very much!
I0302 18:59:01.165179 5920 caffe.cpp:105] Use GPUs with device IDs below
I0302 18:59:01.165335 5920 caffe.cpp:107] device id 0
I0302 18:59:01.165354 5920 caffe.cpp:107] device id 1
I0302 18:59:01.165369 5920 caffe.cpp:117] Starting Optimization
I0302 18:59:11.525671 5920 solver.cpp:77] Creating training net from net file: models/cifar100_NIN_float_crop_v2/train_val/train_test.prototxt
I0302 18:59:11.525739 5920 upgrade_proto.cpp:928] start ReadNetParamsFromTextFileOrDie
I0302 18:59:11.526916 5920 solver.cpp:80] create net
I0302 18:59:11.527045 5920 net.cpp:475] The NetState phase (0) differed from the phase (1) specified by a rule in layer cifar
I0302 18:59:11.527104 5920 net.cpp:475] The NetState phase (0) differed from the phase (1) specified by a rule in layer accuracy
I0302 18:59:11.527258 5920 data_transformer.cpp:24] Loading mean file from: data/cifar100/float_mean.binaryproto
I0302 18:59:11.530076 5920 db.cpp:20] Opened leveldb examples/cifar100/cifar100-float-train-train-val-leveldb/cifar100-train-leveldb
I0302 18:59:11.530103 5920 data_manager.cpp:97] new database cursor
I0302 18:59:11.530953 5920 data_manager.cpp:99] new database transaction
*** Aborted at 1456963151 (unix time) try "date -d @1456963151" if you are using GNU date ***
PC: @ 0x7f9d86a53644 leveldb::(anonymous namespace)::MergingIterator::key()
*** SIGSEGV (@0x18) received by PID 5920 (TID 0x7f9d8c7339c0) from PID 24; stack trace: ***
@ 0x7f9d80d81670 (unknown)
@ 0x7f9d86a53644 leveldb::(anonymous namespace)::MergingIterator::key()
@ 0x7f9d86a3dc7e leveldb::(anonymous namespace)::DBIter::key()
@ 0x4fb102 caffe::db::LevelDBCursor::key()
@ 0x535fa1 caffe::DataManager<>::DataManager()
@ 0x5496b2 caffe::Net<>::InitDataManager()
@ 0x5676be caffe::Net<>::Init()
@ 0x567880 caffe::Net<>::Net()
@ 0x573eab caffe::Solver<>::InitTrainNet()
@ 0x574ebc caffe::Solver<>::Init()
@ 0x575046 caffe::Solver<>::Solver()
@ 0x4229f0 caffe::GetSolver<>()
@ 0x41c1f8 train()
@ 0x414091 main
@ 0x7f9d80d6db15 __libc_start_main
@ 0x41bd6d (unknown)
./examples/cifar100/train_cifar100_NIN_float_crop_v2_train_val.sh: line 5: 5920 Segmentation fault (core dumped) GLOG_logtostderr=1 ./build/tools/caffe train --solver=models/cifar100_NIN_float_crop_v2/train_val/solver.prototxt
I0302 18:59:13.376158 6839 caffe.cpp:105] Use GPUs with device IDs below
I0302 18:59:13.376302 6839 caffe.cpp:107] device id 0
I0302 18:59:13.376323 6839 caffe.cpp:107] device id 1
I0302 18:59:13.376339 6839 caffe.cpp:117] Starting Optimization
I0302 18:59:24.075724 6839 solver.cpp:77] Creating training net from net file: models/cifar100_NIN_float_crop_v2/train_val/train_test.prototxt
I0302 18:59:24.075798 6839 upgrade_proto.cpp:928] start ReadNetParamsFromTextFileOrDie
I0302 18:59:24.076957 6839 solver.cpp:80] create net
I0302 18:59:24.077093 6839 net.cpp:475] The NetState phase (0) differed from the phase (1) specified by a rule in layer cifar
I0302 18:59:24.077142 6839 net.cpp:475] The NetState phase (0) differed from the phase (1) specified by a rule in layer accuracy
I0302 18:59:24.077369 6839 data_transformer.cpp:24] Loading mean file from: data/cifar100/float_mean.binaryproto
I0302 18:59:24.080384 6839 db.cpp:20] Opened leveldb examples/cifar100/cifar100-float-train-train-val-leveldb/cifar100-train-leveldb
I0302 18:59:24.080411 6839 data_manager.cpp:97] new database cursor
I0302 18:59:24.081198 6839 data_manager.cpp:99] new database transaction
*** Aborted at 1456963164 (unix time) try "date -d @1456963164" if you are using GNU date ***
PC: @ 0x7f9b196d2644 leveldb::(anonymous namespace)::MergingIterator::key()
*** SIGSEGV (@0x18) received by PID 6839 (TID 0x7f9b1f3b29c0) from PID 24; stack trace: ***
@ 0x7f9b13a00670 (unknown)
@ 0x7f9b196d2644 leveldb::(anonymous namespace)::MergingIterator::key()
@ 0x7f9b196bcc7e leveldb::(anonymous namespace)::DBIter::key()
@ 0x4fb102 caffe::db::LevelDBCursor::key()
@ 0x535fa1 caffe::DataManager<>::DataManager()
@ 0x5496b2 caffe::Net<>::InitDataManager()
@ 0x5676be caffe::Net<>::Init()
@ 0x567880 caffe::Net<>::Net()
@ 0x573eab caffe::Solver<>::InitTrainNet()
@ 0x574ebc caffe::Solver<>::Init()
@ 0x575046 caffe::Solver<>::Solver()
@ 0x4229f0 caffe::GetSolver<>()
@ 0x41c1f8 train()
@ 0x414091 main
@ 0x7f9b139ecb15 __libc_start_main
@ 0x41bd6d (unknown)
./examples/cifar100/train_cifar100_NIN_float_crop_v2_train_val.sh: line 8: 6839 Segmentation fault (core dumped) GLOG_logtostderr=1 ./build/tools/caffe train --solver=models/cifar100_NIN_float_crop_v2/train_val/solver_lr1.prototxt --snapshot=models/cifar100_NIN_float_crop_v2/train_val/cifar100_NIN_float_crop_v2_iter_100000.solverstate
I0302 18:59:25.059556 7868 caffe.cpp:105] Use GPUs with device IDs below
I0302 18:59:25.059707 7868 caffe.cpp:107] device id 0
I0302 18:59:25.059726 7868 caffe.cpp:107] device id 1
I0302 18:59:25.059741 7868 caffe.cpp:117] Starting Optimization
I0302 18:59:35.792002 7868 solver.cpp:77] Creating training net from net file: models/cifar100_NIN_float_crop_v2/train_val/train_test.prototxt
I0302 18:59:35.792084 7868 upgrade_proto.cpp:928] start ReadNetParamsFromTextFileOrDie
I0302 18:59:35.793494 7868 solver.cpp:80] create net
I0302 18:59:35.793649 7868 net.cpp:475] The NetState phase (0) differed from the phase (1) specified by a rule in layer cifar
I0302 18:59:35.793747 7868 net.cpp:475] The NetState phase (0) differed from the phase (1) specified by a rule in layer accuracy
I0302 18:59:35.793915 7868 data_transformer.cpp:24] Loading mean file from: data/cifar100/float_mean.binaryproto
I0302 18:59:35.796743 7868 db.cpp:20] Opened leveldb examples/cifar100/cifar100-float-train-train-val-leveldb/cifar100-train-leveldb
I0302 18:59:35.796777 7868 data_manager.cpp:97] new database cursor
I0302 18:59:35.797546 7868 data_manager.cpp:99] new database transaction
*** Aborted at 1456963175 (unix time) try "date -d @1456963175" if you are using GNU date ***
PC: @ 0x7f10c8d4a644 leveldb::(anonymous namespace)::MergingIterator::key()
*** SIGSEGV (@0x18) received by PID 7868 (TID 0x7f10cea2a9c0) from PID 24; stack trace: ***
@ 0x7f10c3078670 (unknown)
@ 0x7f10c8d4a644 leveldb::(anonymous namespace)::MergingIterator::key()
@ 0x7f10c8d34c7e leveldb::(anonymous namespace)::DBIter::key()
@ 0x4fb102 caffe::db::LevelDBCursor::key()
@ 0x535fa1 caffe::DataManager<>::DataManager()
@ 0x5496b2 caffe::Net<>::InitDataManager()
@ 0x5676be caffe::Net<>::Init()
@ 0x567880 caffe::Net<>::Net()
@ 0x573eab caffe::Solver<>::InitTrainNet()
@ 0x574ebc caffe::Solver<>::Init()
@ 0x575046 caffe::Solver<>::Solver()
@ 0x4229f0 caffe::GetSolver<>()
@ 0x41c1f8 train()
@ 0x414091 main
@ 0x7f10c3064b15 __libc_start_main
@ 0x41bd6d (unknown)
./examples/cifar100/train_cifar100_NIN_float_crop_v2_train_val.sh: line 11: 7868 Segmentation fault (core dumped) GLOG_logtostderr=1 ./build/tools/caffe train --solver=models/cifar100_NIN_float_crop_v2/train_val/solver_lr2.prototxt --snapshot=models/cifar100_NIN_float_crop_v2/train_val/cifar100_NIN_float_crop_v2_iter_115000.solverstate
The text was updated successfully, but these errors were encountered: