Skip to content

Latest commit

 

History

History
35 lines (25 loc) · 1.94 KB

README.md

File metadata and controls

35 lines (25 loc) · 1.94 KB

BiScope: AI-generated Text Detection by Checking Memorization of Preceding Tokens

Shield: CC BY-NC-ND 4.0

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License.

CC BY-NC-ND 4.0


Table of Contents

Overview

Dataset

  • We extend existing datasets by crafting more AI-generated data using five latest commercial LLMs, including GPT-3.5-Turbo, GPT-4-Turbo, Claude-3-Sonnet, Claude-3-Opus, and Gemini-1.0-Pro.
  • Our Datasets consist of 2 short natural language datasets (Arxiv, Yelp), 2 long natural language datasets (Creative, Essay), and 1 code dataset (Code).
  • We craft both the non-paraphrased version (./Dataset) and paraphrased version (./Paraphrased_Dataset) for each AI-generated data.
  • Detailed dataset statistics:

Code Implementation

Due to delays in the university's internal processing, we need to postpone the release of our code to ensure compliance with their policies. We will update and publish the code as soon as the internal procedures are completed.