Can't Find Me - Obscuring LLM Traces

Created as a final project for CS6973 Trustworthy Generative AI. Paper can be found here.

The Hide and Seek approach uses a two-model iterative learning process to create prompts that identify the model family of the LLM they're querying from and subsequently analyze them against other generations. These two models are known as the 'Auditor' and the 'Detective'.

The experiment is conducted over T trials:

The Auditor generates an initial set of prompts.
These prompts are presented to N different LLMs (including two from the same source).
The Detective analyzes the outputs and attempts to identify the two similar models.
The Results block is provided to the Auditor.
Steps 2-4 are repeated for T trials.

To account for the Auditor’s learning curve, there is a warm-up period of W trials. The Auditor’s performance can be analyzed once it has had the opportunity to refine its prompt generation strategy based on feedback.

Quickstart

source trust/bin/activate

pip install -r utils/requirements.txt

export TOGETHER_API_KEY="insert_key_here"

python -m algo_helpers.adversarial_helpers --save_response --num_trials 10 --models_file utils/models.yaml --output_path ./results

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
algo_helpers		algo_helpers
llm		llm
results		results
utils		utils
.gitignore		.gitignore
README.md		README.md
paper.pdf		paper.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Can't Find Me - Obscuring LLM Traces

Quickstart

About

Releases

Packages

Languages

lukecurrier/CantFindMe

Folders and files

Latest commit

History

Repository files navigation

Can't Find Me - Obscuring LLM Traces

Quickstart

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages