Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OutOfRangeError: RandomShuffleQueue '_1_shuffle_batch/random_shuffle_queue' is closed and has insufficient elements (requested 1, current size 0) #10

Open
FredHaa opened this issue Mar 9, 2017 · 20 comments

Comments

@FredHaa
Copy link

FredHaa commented Mar 9, 2017

Hello,

I am trying to use the framework to segment images of bacteria.

I am using the provided recipe for FCN_32s, but with a few adaptations for my custom data set (different lut, changed image size and number of classes)

The entire script looks like this:

import tensorflow as tf
import tensorflow.contrib.slim as slim
import numpy as np
import skimage.io as io
import os, sys
from matplotlib import pyplot as plt

root_dir = '/home/frederik/Documents/Uni/semester_8/AI4/DeepBacteriaSegmentation/'
sys.path.append(root_dir + 'models/slim/')
sys.path.append(root_dir + 'tf-image-segmentation/')

from tf_image_segmentation.models.fcn_32s import FCN_32s, extract_vgg_16_mapping_without_fc8
from tf_image_segmentation.utils.tf_records import read_tfrecord_and_decode_into_image_annotation_pair_tensors
from tf_image_segmentation.utils.training import get_valid_logits_and_labels
from tf_image_segmentation.utils.augmentation import flip_randomly_left_right_image_with_annotation, scale_randomly_image_with_annotation_with_fixed_size_output
from tf_image_segmentation.utils.look_up_tables import alive_and_dead_cell_lut

checkpoints_dir = root_dir + 'checkpoints/'
log_folder = root_dir + 'log_folder/'
vgg_checkpoint_path = checkpoints_dir + 'vgg_16.ckpt'

image_train_size = [704, 320]
number_of_classes = 3

tfrecord_filename = 'bacteria.tfrecords'

cell_lut = alive_and_dead_cell_lut()
class_labels = cell_lut.keys()

filename_queue = tf.train.string_input_producer(
    [tfrecord_filename], num_epochs=10)

image, annotation = read_tfrecord_and_decode_into_image_annotation_pair_tensors(filename_queue)

resized_image, resized_annotation = scale_randomly_image_with_annotation_with_fixed_size_output(image, annotation, image_train_size)

resized_annotation = tf.squeeze(resized_annotation)

image_batch, annotation_batch = tf.train.shuffle_batch( [resized_image, resized_annotation],
                                             batch_size=1,
                                             capacity=3000,
                                             num_threads=2,
                                             min_after_dequeue=1000)

upsampled_logits_batch, vgg_16_variables_mapping = FCN_32s(image_batch_tensor=image_batch,
                                                           number_of_classes=number_of_classes,
                                                           is_training=True)

valid_labels_batch_tensor, valid_logits_batch_tensor = get_valid_logits_and_labels(annotation_batch_tensor=annotation_batch,
                                                                                     logits_batch_tensor=upsampled_logits_batch,
                                                                                    class_labels=class_labels)

cross_entropies = tf.nn.softmax_cross_entropy_with_logits(logits=valid_logits_batch_tensor,
                                                          labels=valid_labels_batch_tensor)

# Normalize the cross entropy -- the number of elements
# is different during each step due to mask out regions
cross_entropy_sum = tf.reduce_mean(cross_entropies)

pred = tf.argmax(upsampled_logits_batch, dimension=3)

probabilities = tf.nn.softmax(upsampled_logits_batch)


with tf.variable_scope("adam_vars"):
    train_step = tf.train.AdamOptimizer(learning_rate=0.000001).minimize(cross_entropy_sum)


# Variable's initialization functions
vgg_16_without_fc8_variables_mapping = extract_vgg_16_mapping_without_fc8(vgg_16_variables_mapping)


init_fn = slim.assign_from_checkpoint_fn(model_path=vgg_checkpoint_path,
                                         var_list=vgg_16_without_fc8_variables_mapping)

global_vars_init_op = tf.global_variables_initializer()

tf.summary.scalar('cross_entropy_loss', cross_entropy_sum)

merged_summary_op = tf.summary.merge_all()

summary_string_writer = tf.summary.FileWriter(log_folder)

# Create the log folder if doesn't exist yet
if not os.path.exists(log_folder):
     os.makedirs(log_folder)

#The op for initializing the variables.
local_vars_init_op = tf.local_variables_initializer()

combined_op = tf.group(local_vars_init_op, global_vars_init_op)

# We need this to save only model variables and omit
# optimization-related and other variables.
model_variables = slim.get_model_variables()
saver = tf.train.Saver(model_variables)

with tf.Session()  as sess:

    sess.run(combined_op)
    init_fn(sess)

    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)

    # 10 epochs
    for i in xrange(11127 * 10):

        cross_entropy, summary_string, _ = sess.run([ cross_entropy_sum,
                                                      merged_summary_op,
                                                      train_step ])

        print("Current loss: " + str(cross_entropy))

        summary_string_writer.add_summary(summary_string, i)

        if i % 11127 == 0:
            save_path = saver.save(sess, checkpoints_dir + "model_fcn32s_epoch_" + str(i / 11127) + ".ckpt")
            print("Model saved in file: %s" % save_path)


    coord.request_stop()
    coord.join(threads)

    save_path = saver.save(sess, checkpoints_dir + "model_fcn32s_final.ckpt")
    print("Model saved in file: %s" % save_path)

summary_string_writer.close()`

When i run the script i get the following error:

Traceback (most recent call last):
  File "/home/frederik/Documents/Uni/semester_8/AI4/DeepBacteriaSegmentation/fcn_32s_train.py", line 111, in <module>
    train_step ])
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 766, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 964, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1014, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1034, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: RandomShuffleQueue '_2_shuffle_batch/random_shuffle_queue' is closed and has insufficient elements (requested 1, current size 0)
	 [[Node: shuffle_batch = QueueDequeueMany[_class=["loc:@shuffle_batch/random_shuffle_queue"], component_types=[DT_UINT8, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](shuffle_batch/random_shuffle_queue, shuffle_batch/n)]]

Caused by op u'shuffle_batch', defined at:
  File "/home/frederik/Documents/Uni/semester_8/AI4/DeepBacteriaSegmentation/fcn_32s_train.py", line 43, in <module>
    min_after_dequeue=1000)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py", line 917, in shuffle_batch
    dequeued = queue.dequeue_many(batch_size, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/data_flow_ops.py", line 458, in dequeue_many
    self._queue_ref, n=n, component_types=self._dtypes, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 1099, in _queue_dequeue_many
    timeout_ms=timeout_ms, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
    self._traceback = _extract_stack()

OutOfRangeError (see above for traceback): RandomShuffleQueue '_2_shuffle_batch/random_shuffle_queue' is closed and has insufficient elements (requested 1, current size 0)
	 [[Node: shuffle_batch = QueueDequeueMany[_class=["loc:@shuffle_batch/random_shuffle_queue"], component_types=[DT_UINT8, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](shuffle_batch/random_shuffle_queue, shuffle_batch/n)]]

bacteria.tfrecords is a file of 11127 image/annotation pairs (copies of the same image), created using

from tf_image_segmentation.utils.tf_records import write_image_annotation_pairs_to_tfrecord

Do you have any idea of what might be wrong?

@FredHaa
Copy link
Author

FredHaa commented Mar 9, 2017

This happens at the first train step, so it appears that the queue is never filled

@FredHaa
Copy link
Author

FredHaa commented Mar 12, 2017

I think I have narrowed the problem down to how the tfrecord is made. I create the file using the following script:

import tensorflow as tf
import tensorflow.contrib.slim as slim
import numpy as np
import skimage.io as io
import os, sys
from os import walk

root_dir = "/home/frederik/Documents/Uni/semester_8/AI4/DeepBacteriaSegmentation/"

# Add a path to a custom fork of TF-Slim
# Get it from here:
# https://github.com/warmspringwinds/models/tree/fully_conv_vgg
sys.path.append(root_dir + "models/slim/")

# Add path to the cloned library
sys.path.append(root_dir + "tf-image-segmentation/")



from tf_image_segmentation.utils.tf_records import write_image_annotation_pairs_to_tfrecord, read_image_annotation_pairs_from_tfrecord

img_path = []
annotation_path = []
for (dirpath, dirnames, filenames) in walk(root_dir + "annotated/"):
    for image in filenames:
        if image[-3:] == "jpg":
            img_path.append(dirpath + image)
        elif image[-3:] == "png":
            annotation_path.append(dirpath + image)
    break

file_pairs = []
if len(img_path) == len(annotation_path):
    for i in range(0, len(img_path)):
        file_pairs.append((img_path[i], annotation_path[i]))


write_image_annotation_pairs_to_tfrecord(file_pairs, "bacteria.tfrecords")

pairs = read_image_annotation_pairs_from_tfrecord("bacteria.tfrecords")

But there seem to be some inconsistencies:

read_image_annotation_pairs_from_tfrecord expects the annotation image to only have 1 channel
annotation = annotation_1d.reshape((height, width)) in tf_records.py

However, the FCN_32s model require that the annotations are of the same shape as the logits, which have 3 channels.

read_image_annotation_pairs_from_tfrecord can be fixed by changing the line to
annotation = annotation_1d.reshape((height, width, 3))
assuming that 3 channel annotations is the correct behavior.

Regarding the original issue, I assume that I am using write_image_annotation_pairs_to_tfrecord correctly?

@vj-1988
Copy link

vj-1988 commented Mar 29, 2017

I am also facing the same issue while training the VOC dataset. Any updates on this error?

@jhjang
Copy link

jhjang commented Apr 18, 2017

I got this error, too. Did you get any solution?

@vaklyuenkov
Copy link

So, I have the same error on FCN_8s. Any ideas?

@ahundt
Copy link

ahundt commented May 15, 2017

@FrederikHaa Can you create a pull request with these fixes?

@ghost
Copy link

ghost commented Jun 14, 2017

This training code assumes the number of training samples = 11127. If at all the training sample is different from this default value, you need to change it accordingly. I also faced the same issue, because my custom dataset contains less training sample. After doing this fix the code is working fine.

@jhjang
Copy link

jhjang commented Jul 13, 2017

@nirmaljith Can you explain how to change the number of training samples? I can't find the variable to change the numbers.

@ghost
Copy link

ghost commented Jul 21, 2017

@jhjang Its not assigned to any variable in the code, so may find it difficult to figure out. In code tf-image-segmentation/tf_image_segmentation/recipes/pascal_voc/FCNs/fcn_32s_train.ipynb you have to change the value in xrange. The original code assumes training samples to be 11127

for i in xrange(11127 * 10):

@jhjang
Copy link

jhjang commented Jul 24, 2017

@nirmaljith I got it. Thank for your helping :)

@vinayakarannil
Copy link

I am still facing this issue...any solutions? i am running training script for my own dataset. Training script runs successfully for the same number of training samples of pascal voc, but not for my dataset. So its not the issue of number of training samples in my case

@vinayakarannil
Copy link

i followed the comment by @ahundt written in "tfrecords should also include depth and format #13" and now my problem is solved.

@deepk91
Copy link

deepk91 commented Dec 21, 2017

@vinayakkailas I am also running the same script but for my own dataset which has very less images around 250. How did you solve this error? What value of depth and format you used? It would be great if you can share this.

@kheffah
Copy link

kheffah commented Jan 10, 2018

I'm facing the same error, and there doesn't seem to be an available answer. Can anyone help? Thanks!

@kheffah
Copy link

kheffah commented Jan 11, 2018

Got it! The dataset was corrupted as numpy decided to expand the dimensions of the image (M,N) -> (M,N,1) when I passed a slice of the image to another method rather than defining a separate np array. Hope this helps others facing the same issue.

@bohelion
Copy link

@kheffah Can you say more in detail? how to do?I am a beginner, it would be greatful if you can share this.

@kheffah
Copy link

kheffah commented Jan 25, 2018

@bohelion Sure. Actually the error was not from numpy, but from scipy.misc. In my case, I was reading the label mask with scipy.misc, but forgot to specify the mode='I' parameter, which resulted in my label having a 3rd dim (height, width, extra). So, when it was saved to the .tfrecords file, it had the wrong dimensions and did not fit the pre-specified dimensions of my TF graph. Hope this helps.
#Read image
im = scipy.misc.imread(impath, mode='RGB')
#Read label
lbl = scipy.misc.imread(lblpath, mode='L')

@DiyuanLu
Copy link

in your code, change num_epochs to a larger number would solve the problem. I had the same problem and this worked fine for me.
filename_queue = tf.train.string_input_producer(
[tfrecord_filename], num_epochs=10)

@dhKwang
Copy link

dhKwang commented Nov 28, 2018

maybe you should check your file name,try changing it to absolute path.

@cdcky
Copy link

cdcky commented Apr 16, 2019

@kheffah ,Thank you!!!!!! repect from China,you save my life. 谢谢~~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests