Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) #5

Open
peonycabbage opened this issue Nov 20, 2023 · 3 comments

Comments

@peonycabbage
Copy link

Thank you for sharing the code with us.

I would like to report an error.

\signjoey\batch.py", line 148, in sort_by_sgn_lengths
self.gls = self.gls[perm_index]
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

Any advice/suggestion is appreciated. Thank you very much~

@peonycabbage
Copy link
Author

I was able to resolve it now~

@FrorsttzNguyen
Copy link

Hi peonycabbage,

I'm also researching this model to retrain it on a new dataset and noticed that you seem to have successfully run the code. I’m encountering issues with preprocessing the BPE and SIM folders, so I was wondering if you could provide some guidance on the preprocessing steps. It seems that you're also interested in topics related to Sign Language, so if possible, would you be open to sharing your contact information for further discussion? Alternatively, if you’re able to share your BPE and SIM folders, that would also be very helpful.

Thank you!

@peonycabbage
Copy link
Author

Hi @FrorsttzNguyen ,

I used the following code to generate cos_sim.pkl:

from sentence_transformers import SentenceTransformer
import pandas as pd
import pickle
# Read the content from the text file
df = pd.read_csv('path to your video map that contains the translation for each video', delimiter='|')

# Extract the relevant column for sentence embeddings
sentences = df['word']
print(sentences)

# Initialize the sentence transformer model
model = SentenceTransformer('sentence-transformers/distiluse-base-multilingual-cased-v1')

# Generate embeddings for the sentences
embeddings = model.encode(sentences)

# Define the path to save the embeddings
file_path = 'data/pht/bpe/cos_sim.pkl'
with open(file_path, 'wb+') as file:
    # Serialize and write the embeddings to the file
    pickle.dump(embeddings, file)

For name_to_video.json, I am sorry I do not have the relevant code for it anymore. I suggest follow https://github.com/neccam/slt/tree/master. Download their feature files, and see how they generated name_to_video_id.

For BPE, the authors have provided instruction on their readme.md .

Hope this helps!

@peonycabbage peonycabbage reopened this Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants