Evaluating MGT Detector Bias: Native vs Non-Native English Authors

Lab: Ubiquitous Knowledge Processing Lab (UKP), TU Darmstadt Period: December 2025 – April 2026 Supervisor: Prof. Rozovskaya

Overview

This project investigates whether machine-generated text (MGT) detectors exhibit systematic bias against non-native English speakers (NNES) in scientific writing, and how LLM assistance, prompt language (L1 vs. L2), and model choice jointly shape this bias.

Contributions

Constructed two benchmarks: an essay-writing benchmark and a scientific-writing benchmark built from the ACL Anthology with venue-based filtering and native language identification to assign NNES/NES labels
Designed an LLM-based augmentation pipeline that rewrites text at controlled proportions using multiple LLMs (LLaMA, T5) with prompts in different languages (L1 vs. L2), enabling systematic analysis of how editing degree, model choice, and prompt language affect detector bias
Evaluated multiple detector families and measured group-wise fairness metrics across NNES and NES groups under varying levels of LLM modification
Planned evaluation of data-centric bias mitigation strategies—including resampling, counterfactual data augmentation, and group-specific threshold optimization—to quantify fairness–utility trade-offs for MGT detection

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)

Zhuo Yu

Overview

Contributions

Share on