preprocessing of texts and source code #24

liuhuigmail · 2020-03-05T06:45:23Z

Great work on source code generation.
The details of the preprocessing of texts (naturel languges) and source code are missing from the paper. Would you kindly let me known what kind of preprossing has been conducted, e.g., unifying identifiers?

Thanks.

Hui
[email protected]

jason-hanling · 2020-12-08T14:38:44Z

Great work on source code generation.
The details of the preprocessing of texts (naturel languges) and source code are missing from the paper. Would you kindly let me known what kind of preprossing has been conducted, e.g., unifying identifiers?

Thanks.

Hui
[email protected]

have you solved the question ?
i am also curious about it

liuhuigmail · 2020-12-09T00:18:32Z

Great work on source code generation.
The details of the preprocessing of texts (naturel languges) and source code are missing from the paper. Would you kindly let me known what kind of preprossing has been conducted, e.g., unifying identifiers?
Thanks.
Hui
[email protected]

have you solved the question ?
i am also curious about it

No. But I built a new dataset (https://github.com/ds4an/CoDas4CG) and conduct a sequence of preprossing as I will :)

jason-hanling · 2020-12-10T03:05:47Z

Great work on source code generation.
The details of the preprocessing of texts (naturel languges) and source code are missing from the paper. Would you kindly let me known what kind of preprossing has been conducted, e.g., unifying identifiers?
Thanks.
Hui
[email protected]

have you solved the question ?
i am also curious about it

No. But I built a new dataset (https://github.com/ds4an/CoDas4CG) and conduct a sequence of preprossing as I will :)

datasets/conala/dataset.py may do the preprocessing, i thought

ShangwenWang · 2022-05-20T07:32:19Z

@jason-hanling @liuhuigmail

Hi, Professor Liu,

I am also interested about how to pre-process the data.
I note that the pre-processing is done by the dataset.py script (you are right). I'd like to know what the files are like before pre-processing (conala-train.json). However, I found that the official webpage of CoNala (https://conala-corpus.github.io/) does not support downloading any more.
I wonder do you have something to share. Thanks!

neubig · 2022-05-20T10:16:02Z

I'm not sure why, but some people have been having trouble downloading the dataset on the chrome browser. Here's a direct link that should work. The dataset is still available:
http://www.phontron.com/download/conala-corpus-v1.1.zip

ShangwenWang · 2022-05-20T10:24:06Z

Oh great @neubig
Thanks a lot.

pcyin self-assigned this Mar 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

preprocessing of texts and source code #24

preprocessing of texts and source code #24

liuhuigmail commented Mar 5, 2020

jason-hanling commented Dec 8, 2020

liuhuigmail commented Dec 9, 2020

jason-hanling commented Dec 10, 2020

ShangwenWang commented May 20, 2022

neubig commented May 20, 2022

ShangwenWang commented May 20, 2022

preprocessing of texts and source code #24

preprocessing of texts and source code #24

Comments

liuhuigmail commented Mar 5, 2020

jason-hanling commented Dec 8, 2020

liuhuigmail commented Dec 9, 2020

jason-hanling commented Dec 10, 2020

ShangwenWang commented May 20, 2022

neubig commented May 20, 2022

ShangwenWang commented May 20, 2022