Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to read the tfrecord? #4

Open
DidiD1 opened this issue Jun 21, 2024 · 2 comments
Open

How to read the tfrecord? #4

DidiD1 opened this issue Jun 21, 2024 · 2 comments

Comments

@DidiD1
Copy link

DidiD1 commented Jun 21, 2024

Great work! When i try to read the tfrecord data, some errors happened. It seems the tfrecord has been broken. When i use num_elements = tf.data.experimental.cardinality(record_iter).numpy()
to check the nums, it shows 'Number of elements in dataset: -2' in the terminal.
Could you release some scripts to help for read or update the tfrecord?
Thanks for answer!!!

@leebird
Copy link
Collaborator

leebird commented Jun 22, 2024

Hello, do the file sizes look correct (e.g., training set should be ~144M)? If not, you might need to install git large file storage first and git clone again: https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage

Updated README.

@leebird
Copy link
Collaborator

leebird commented Jun 24, 2024

We have also added a simple script to show how to retrieve the labels from the dataset at https://github.com/google-research/google-research/blob/master/richhf_18k/parse_tfrecord_file.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants