Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing results of ICSE 16 paper #31

Open
kanghj opened this issue Feb 3, 2019 · 2 comments
Open

Reproducing results of ICSE 16 paper #31

kanghj opened this issue Feb 3, 2019 · 2 comments

Comments

@kanghj
Copy link

kanghj commented Feb 3, 2019

Hi, I am trying to reproduce the results of https://arxiv.org/pdf/1512.06448.pdf.

In my setup, I am using the file-level tokenizer, I've changed sourcerer-cc.properties MIN_TOKENS to 1,

# Ignore all files outside these bounds
MIN_TOKENS=1
MAX_TOKENS=500000

as well as changed runnodes.sh's threshold threshold="${3:-7}".

Using BigCloneEval, I'm using these flags "-st both -mit 50 -mil 6". The default clone matcher is used. I'm getting the following results for type-1 and type-2 clones:

Type-1: 34301 / 35787 = 0.9584765417609746
Type-2: 3334 / 4573 = 0.7290618849770392

According to the ICSE paper, SourcererCC is able to get 1.0 on Type-1, and 0.98 on Type-2.

Is there any step in particular that I missed, or is there another configuration to change, in order to reproduce the ICSE paper's results?

@kanghj
Copy link
Author

kanghj commented Feb 3, 2019

Ah, perhaps I should be using the block-level tokenizer?

Edit: using the block-level tokenizer seems to result in worse performance

@qw3ry
Copy link

qw3ry commented Jan 17, 2020

If I understand the paper correctly (Table 1), they used a method level tokenizer. Sadly, it doesnt seem to be available :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants