HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Detecting AI-Generated Sentences in Human-AI Collaborative Hybrid Texts: Challenges, Strategies, and Insights

Zeng Zijie ; Liu Shiqi ; Sha Lele ; Li Zhuang ; Yang Kaixun ; Liu Sannyuya ; Gašević Dragan ; Chen Guanliang

Detecting AI-Generated Sentences in Human-AI Collaborative Hybrid Texts:
  Challenges, Strategies, and Insights

Abstract

This study explores the challenge of sentence-level AI-generated textdetection within human-AI collaborative hybrid texts. Existing studies ofAI-generated text detection for hybrid texts often rely on synthetic datasets.These typically involve hybrid texts with a limited number of boundaries. Wecontend that studies of detecting AI-generated content within hybrid textsshould cover different types of hybrid texts generated in realistic settings tobetter inform real-world applications. Therefore, our study utilizes theCoAuthor dataset, which includes diverse, realistic hybrid texts generatedthrough the collaboration between human writers and an intelligent writingsystem in multi-turn interactions. We adopt a two-step, segmentation-basedpipeline: (i) detect segments within a given hybrid text where each segmentcontains sentences of consistent authorship, and (ii) classify the authorshipof each identified segment. Our empirical findings highlight (1) detectingAI-generated sentences in hybrid texts is overall a challenging task because(1.1) human writers' selecting and even editing AI-generated sentences based onpersonal preferences adds difficulty in identifying the authorship of segments;(1.2) the frequent change of authorship between neighboring sentences withinthe hybrid text creates difficulties for segment detectors in identifyingauthorship-consistent segments; (1.3) the short length of text segments withinhybrid texts provides limited stylistic cues for reliable authorshipdetermination; (2) before embarking on the detection process, it is beneficialto assess the average length of segments within the hybrid text. Thisassessment aids in deciding whether (2.1) to employ a text segmentation-basedstrategy for hybrid texts with longer segments, or (2.2) to adopt a directsentence-by-sentence classification strategy for those with shorter segments.

Code Repositories

douglashiwo/aisentencedetection
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
boundary-detection-on-coauthorDeBERTa-v3 (Naive)
Cohen’s Kappa score: 0.4002

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp