How to read the tfrecord? #4

DidiD1 · 2024-06-21T06:37:04Z

Great work! When i try to read the tfrecord data, some errors happened. It seems the tfrecord has been broken. When i use num_elements = tf.data.experimental.cardinality(record_iter).numpy()
to check the nums, it shows 'Number of elements in dataset: -2' in the terminal.
Could you release some scripts to help for read or update the tfrecord?
Thanks for answer!!!

leebird · 2024-06-22T05:08:28Z

Hello, do the file sizes look correct (e.g., training set should be ~144M)? If not, you might need to install git large file storage first and git clone again: https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage

Updated README.

leebird · 2024-06-24T23:53:08Z

We have also added a simple script to show how to retrieve the labels from the dataset at https://github.com/google-research/google-research/blob/master/richhf_18k/parse_tfrecord_file.py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to read the tfrecord? #4

How to read the tfrecord? #4

DidiD1 commented Jun 21, 2024

leebird commented Jun 22, 2024

leebird commented Jun 24, 2024

How to read the tfrecord? #4

How to read the tfrecord? #4

Comments

DidiD1 commented Jun 21, 2024

leebird commented Jun 22, 2024

leebird commented Jun 24, 2024