Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating embeddings of source code #16

Open
Avv22 opened this issue Oct 13, 2021 · 2 comments
Open

Generating embeddings of source code #16

Avv22 opened this issue Oct 13, 2021 · 2 comments

Comments

@Avv22
Copy link

Avv22 commented Oct 13, 2021

Hello,

Can you please explain how to use your model to generate embeddings for Python and for Java separately?

Thanks.

@basedrhys
Copy link
Owner

basedrhys commented Oct 26, 2021

Hi @Avra2 ,

You'll want to follow the usage instructions for the dataset pipeline.

This will only generate embeddings for Java files. To embed Python files, you'll need a Python extractor. The code2vec authors have referenced a python extractor made by JetBrains which might be of use: Link.

Let me know if you get stuck on generating embeddings for Java. Unfortunately Python isn't currently supported so you'll have to do some hacking to get that working (e.g., by using the python extractor linked above and updating the path here

Thanks

@Avv22
Copy link
Author

Avv22 commented Nov 24, 2021

@basedrhys.

Thank you. It has been a while, but I tried code2vec and code2seq. Code2vec did not work as astminer tool does not give all files needed for code2vec to run as dict file is missing and I have to construct it by myself. So, for Java embeddings please, I have a dataset of 20k files, if I ran code2vec, I would get a file name prediction for each file, is that correct please? If that is the case, I am looking for a context vector prediction representing the whole file and not just single method name. Hopefully you understand my question and thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants