Skip to content

yorktown-class/AST-clone-detection

Repository files navigation

AST-clone-detection

基于AST和注意力机制的代码克隆检测

Requirements

tree_sitter
sentence_transformers

Dataset

主要使用的数据集为OJClone

下载与预处理数据集

  1. google drive 下载数据集
cd dataset/OJClone
pip install gdown
gdown https://drive.google.com/uc?id=0B2i-vWnOu7MxVlJwQXN6eVNONUU
tar -xvf programs.tar.gz
  1. 处理数据
python preprocess.py
cd ../..

会得到三个文件dataset/OJClone/train.jsonl, dataset/OJClone/test.jsonl, dataset/OJClone/valid.jsonl

构建语言解析工具

mkdir build
cd build
git clone https://github.com/tree-sitter/tree-sitter-c
cd ..
python build_tree_sitter.py

About

基于AST和注意力机制的代码克隆检测

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages