This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License.
- This is the official implementation for NeurIPS 2024 paper "BiScope: AI-generated Text Detection by Checking Memorization of Preceding Tokens".
- [video] | [slides] | [poster] | [paper]
- We extend existing datasets by crafting more AI-generated data using five latest commercial LLMs, including GPT-3.5-Turbo, GPT-4-Turbo, Claude-3-Sonnet, Claude-3-Opus, and Gemini-1.0-Pro.
- Our Datasets consist of 2 short natural language datasets (Arxiv, Yelp), 2 long natural language datasets (Creative, Essay), and 1 code dataset (Code).
- We craft both the non-paraphrased version (
./Dataset
) and paraphrased version (./Paraphrased_Dataset
) for each AI-generated data. - Detailed dataset statistics:
Coming soon...