NotaAI_Multilingual_MGT_Detection

COLING 2025 Workshop on Detecting AI Generated Content

Title: Nota AI at GenAI Detection Task 1: Unseen Language-Aware Detection System for Multilingual Machine-Generated Text

Authors: Hancheol Park, Jaeyeon Kim and Geonmin Kim

Abstract

We propose a novel system that distinguishes between languages seen and unseen during training for prediction tasks. The system comprises two components:

A multilingual pre-trained language model (PLM) trained on a multilingual dataset.
A custom model trained on token-level predictive distributions extracted from a large language model (LLM).

Our findings indicate that for predicting text in languages encountered during training, the multilingual PLM is more accurate (language-dependent).

Token-level predictive distributions include:

Log probability of the predicted token,
Log probability of the generated token,
Entropy of the predictive distribution.

These values tend to be smaller for machine-generated text since they are minimized during the LLM's training process (language-independent).

Based on these insights, we propose a hybrid approach:

Use the multilingual PLM for inference on text in languages seen during training.
Employ the custom model for languages unseen during training.

Using this method, we achieved third place among 25 teams in Subtask B (binary multilingual machine-generated text detection) of Shared Task 1, with a macro F1 score of 0.7532.

Installation

conda create -yn mgt-env python=3.10
conda activate mgt-env
pip install -r requirements.txt

Inference

CUDA_VISIBLE_DEVICES=0 python inference.py \
    --input_text "<input_text>" \
    --hf_token "<hf_token>

Citation

TBA

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
MGTDetectionModel.pt		MGTDetectionModel.pt
README.md		README.md
inference.py		inference.py
model.py		model.py
requirements.txt		requirements.txt
run_inference.sh		run_inference.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NotaAI_Multilingual_MGT_Detection

Abstract

Installation

Inference

Citation

About

Releases

Packages

Languages

nota-github/NotaAI_Multilingual_MGT_Detection

Folders and files

Latest commit

History

Repository files navigation

NotaAI_Multilingual_MGT_Detection

Abstract

Installation

Inference

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages