Reproducing results of ICSE 16 paper #31

kanghj · 2019-02-03T11:29:39Z

Hi, I am trying to reproduce the results of https://arxiv.org/pdf/1512.06448.pdf.

In my setup, I am using the file-level tokenizer, I've changed sourcerer-cc.properties MIN_TOKENS to 1,

# Ignore all files outside these bounds
MIN_TOKENS=1
MAX_TOKENS=500000

as well as changed runnodes.sh's threshold threshold="${3:-7}".

Using BigCloneEval, I'm using these flags "-st both -mit 50 -mil 6". The default clone matcher is used. I'm getting the following results for type-1 and type-2 clones:

Type-1: 34301 / 35787 = 0.9584765417609746
Type-2: 3334 / 4573 = 0.7290618849770392

According to the ICSE paper, SourcererCC is able to get 1.0 on Type-1, and 0.98 on Type-2.

Is there any step in particular that I missed, or is there another configuration to change, in order to reproduce the ICSE paper's results?

The text was updated successfully, but these errors were encountered:

kanghj · 2019-02-03T13:00:46Z

Ah, perhaps I should be using the block-level tokenizer?

Edit: using the block-level tokenizer seems to result in worse performance

qw3ry · 2020-01-17T10:55:56Z

If I understand the paper correctly (Table 1), they used a method level tokenizer. Sadly, it doesnt seem to be available :(

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing results of ICSE 16 paper #31

Reproducing results of ICSE 16 paper #31

kanghj commented Feb 3, 2019 •

edited

Loading

kanghj commented Feb 3, 2019 •

edited

Loading

qw3ry commented Jan 17, 2020

Reproducing results of ICSE 16 paper #31

Reproducing results of ICSE 16 paper #31

Comments

kanghj commented Feb 3, 2019 • edited Loading

kanghj commented Feb 3, 2019 • edited Loading

qw3ry commented Jan 17, 2020

kanghj commented Feb 3, 2019 •

edited

Loading

kanghj commented Feb 3, 2019 •

edited

Loading