This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License.
- This is the official implementation for NeurIPS 2024 paper "BiScope: AI-generated Text Detection by Checking Memorization of Preceding Tokens".
- [video] | [slides] | [poster] | [paper]
- We extend existing datasets by crafting more AI-generated data using five latest commercial LLMs, including GPT-3.5-Turbo, GPT-4-Turbo, Claude-3-Sonnet, Claude-3-Opus, and Gemini-1.0-Pro.
- Our Datasets consist of 2 short natural language datasets (Arxiv, Yelp), 2 long natural language datasets (Creative, Essay), and 1 code dataset (Code).
- We craft both the non-paraphrased version (
./Dataset
) and paraphrased version (./Paraphrased_Dataset
) for each AI-generated data. - Detailed dataset statistics:
Due to delays in the university's internal processing, we need to postpone the release of our code to ensure compliance with their policies. We will update and publish the code as soon as the internal procedures are completed.