FLAIR

This repository contains the datasets and evaluation framework for the paper "Bot or Human? Detecting ChatGPT Imposters with A Single Question". The paper proposes a new framework named FLAIR (Finding Large Language Model Authenticity via a Single Inquiry and Response) to detect conversational bots in an online manner. The approach aims to differentiate human users from bots using single-question scenarios.

Datasets

The questions are divided into two categories:

Questions that are easy for humans but difficult for bots (e.g., counting, substitution, positioning, noise filtering, and ASCII art)
Questions that are easy for bots but difficult for humans (e.g., memorization and computation)

Below are the description for each FLAIR question:

Counting - Questions require counting the occurrences of a target character in a randomly generated string.
Reverse - Questions require reversing the characters of a random word with consecutive double or triple letters.
Substitution - Questions require deciphering a string where each character is substituted with another character based on a substitution table.
Positioning - Questions require finding the k-th character after the j-th appearance of a character c in a randomly generated string.
Random Editing - Questions require performing drop, insert, swap, and substitute operations on a random string and providing three different outputs.
Noise Injection - Questions are common sense questions with added noise by appending uppercase letters to words within the question.
ASCII Art - Questions present an ASCII art and require providing the corresponding label as the answer.
Memorization - Questions require enumerating items within a category or answering domain-specific questions that are difficult for humans to recall.
Computation - Questions require calculating the product of two randomly sampled four-digit numbers.

Evaluation

Implementation Details

We choosed ten users for our user study.

1. Counting

To conduct this experiment, we will first generate a candidate character set by randomly sampling 3 to 5 letters from the entire alphabet.
Using the generated character set, we will create a random string by sampling k times, where k is set to 30 for this experiment.
Next, we will randomly select a character from the generated string and ask users to count the number of times it appears.
Each participant is allocated with 10 counting questions. Answers should match the results exactly.

2. Reverse

We randomly choose 100 different english words with consecutive double or triple letters. The words are from dictionaries and this document.
The letters of the words are reversed to create the dataset.
Each participant is allocated with 10 Reverse questions. Answers should match the results exactly.

3. Substitution

We randomly choose 100 different english words as the original strings.
Then, we designed a random substitution rule to substitute characters within the words.
Given the words and different substitution rules, participants should perform substitution and output the correct results.
To standardize the experiment, each user will be allocated 10 substitution questions. Answers should match the results exactly.

4. Positioning

For our experiment, we will start by generating a candidate character set by randomly sampling 6 to 10 letters from the entire alphabet.
Using the generated character set, we will create a random string by sampling k times, where k is set to 30 for this experiment.
Next, we will randomly select a character from the generated string. Users should find the k-th character after the j-th occurence of the selected character.
Each participant is allocated with 10 positioning questions. Answers should match the results exactly.

5. Random Edit

For the first category of questions, we will randomly drop k zeros or ones from a sequence of 20 bits.
For the second category of questions, we will randomly add k zeros or ones to a sequence of 20 bits.
In the third category, we will randomly substitute k zeros with ones or k ones with zeros in a sequence of 20 bits.
The fourth category of questions will involve randomly swapping zeros and ones k times in a sequence of 20 bits.
Each participant is allocated 10 random edit questions from 2 categories. Answers should pass our answer checker.

6. Noise Injection

To design our experiment, we first collected a set of 100 common sense questions along with their corresponding answers. Additionally, we generated a set of 400 random words to serve as noise.
In order to inject noise into the common sense questions, we replaced the spaces within the questions with uppercase random words.
Users will be presented with the noisy questions and are required to remove the random words and answer the questions correctly.
Each participant is allocated 10 noise injection questions from 2 categories. It is important to note that all answers that make sense will be considered correct.

7. ASCII arts

To conduct our experiment, we first collected a set of 50 ASCII arts from https://www.asciiart.eu/
For the experiment, users will be presented with the ASCII arts and are required to identify what is depicted in each image.
Each participant is allocated 5 ASCII questions. It is important to note that all answers that make sense will be considered correct.

8. Memorization

We have collected 100 questions from various professional fields, including both numerical and knowledge-based questions.
For numerical questions, users are required to provide an answer with an error margin of no more than 5%.
For knowledge-based questions, users must provide accurate answers.
Each participant is allocated 10 random memorization questions.

9. Computation

Users are required to complete a multiplication question involving two randomly generated four-digit numbers within a time limit of 10 seconds.
Any answers submitted after the time limit will be marked as incorrect.
In order for the answer to be considered correct, the margin of error must be within 5%.

Contributing

We welcome contributions to expand the dataset and improve the detection of conversational bots. If you have a new question that you believe can effectively differentiate human users from bots, please feel free to contribute to the dataset via submitting a pull request to this repo.

Citation

Please cite our paper if you find this repository helpful in your research or you use our data:

@article{FLAIR,
  title={Bot or Human? Detecting ChatGPT Imposters with A Single Question},
  author={Wang, Hong and Luo, Xuan and Wang, Weizhi and Yan, Xifeng},
  journal={arXiv preprint arXiv:2305.06424},
  year={2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FLAIR

Datasets

Evaluation

Implementation Details

1. Counting

2. Reverse

3. Substitution

4. Positioning

5. Random Edit

6. Noise Injection

7. ASCII arts

8. Memorization

9. Computation

Contributing

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
ASCII		ASCII
computation		computation
counting		counting
memorization		memorization
noise_injection		noise_injection
positioning		positioning
random_edit		random_edit
reverse		reverse
substitution		substitution
README.md		README.md
exp.png		exp.png

wwngh1233/FLAIR

Folders and files

Latest commit

History

Repository files navigation

FLAIR

Datasets

Evaluation

Implementation Details

1. Counting

2. Reverse

3. Substitution

4. Positioning

5. Random Edit

6. Noise Injection

7. ASCII arts

8. Memorization

9. Computation

Contributing

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages