-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
preprocessing of texts and source code #24
Comments
have you solved the question ? |
No. But I built a new dataset (https://github.com/ds4an/CoDas4CG) and conduct a sequence of preprossing as I will :) |
datasets/conala/dataset.py may do the preprocessing, i thought |
Hi, Professor Liu, I am also interested about how to pre-process the data. |
I'm not sure why, but some people have been having trouble downloading the dataset on the chrome browser. Here's a direct link that should work. The dataset is still available: |
Oh great @neubig |
Great work on source code generation.
The details of the preprocessing of texts (naturel languges) and source code are missing from the paper. Would you kindly let me known what kind of preprossing has been conducted, e.g., unifying identifiers?
Thanks.
Hui
[email protected]
The text was updated successfully, but these errors were encountered: