DetectGPT: Zero-Shot Machine-Generated Text Detection

DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature.
The fluency and factual knowledge of large language models (LLMs) heightens
the need for corresponding systems to detect whether a piece of text is
machine-written. For example, students may use LLMs to complete written
assignments, leaving instructors unable to accurately assess student learning.
In this paper, we first demonstrate that text sampled from an LLM tends to
occupy negative curvature regions of the model’s log probability function.
Leveraging this observation, we then define a new curvature-based criterion for
judging if a passage is generated from a given LLM. This approach, which we
call DetectGPT, does not require training a separate classifier, collecting a
dataset of real or generated passages, or explicitly watermarking generated
text. It uses only log probabilities computed by the model of interest and
random perturbations of the passage from another generic pre-trained language
model (e.g, T5). We find DetectGPT is more discriminative than existing
zero-shot methods for model sample detection, notably improving detection of
fake news articles generated by 20B parameter GPT-NeoX from 0.81 AUROC for the
strongest zero-shot baseline to 0.95 AUROC for DetectGPT. See
DetectGPT for code, data, and other project
information.

Read in full here:

This thread was posted by one of our members via one of our news source trackers.

Corresponding tweet for this thread:

Share link for this tweet.