Prompt-GAN – Customisable hate speech and extremist datasets via radicalised neural language models

Govers, Jarod

Item

Prompt-GAN – Customisable hate speech and extremist datasets via radicalised neural language models

Govers, Jarod

Abstract

Online hate speech and violent extremism knows no borders, no political boundaries, no remorse. Researchers face an uphill battle to collect hate speech data in volumes and topical diversity suitable for training state-of-the art content-moderation systems. Neural language models ushered in a new era of synthetic data generation in use across various businesses, all despite calls for research to protect against unintended toxic output. We present a method for radicalising pre-trained neural language models to identify real online hate speech, as well as present the risks of rouge radicalised AI bots which could undermine our trust in social media. We present Prompt-GAN, a prompt-tuning adversarial approach with three achievements. We demonstrate prompt-tuning’s ability to generate realistic types of hate and non-hate speech which mimics political extremist discourse. Prompt-GAN’s architecture offers a twofold reduction in memory and runtime requirements compared to fine-tuning. Finally, Prompt-GAN improves hate speech classification F1-scores by up to 10.1% and sets a new record in neural language simulation compared to the current state-of-the-art across three benchmark datasets.

Type

Thesis

Date

2022

Publisher

The University of Waikato

Degree

Master of Cyber Security (MCS)

Supervisors

Patros, Panos

Feldman, Philip

Dant, Aaron

Rights

Prompt-GAN – Customisable hate speech and extremist datasets via radicalised neural language models

Govers, Jarod

Abstract

Type

Type of thesis

Series

Citation

Date

Publisher

Degree

Supervisors

Rights

Files

Permanent link

DOI

Publisher version

Collections