Prompt-GAN – Customisable hate speech and extremist datasets via radicalised neural language models

Govers, Jarod

Prompt-GAN – Customisable hate speech and extremist datasets via radicalised neural language models

Authors

Govers, Jarod

Files

thesis.pdf (8.75 MB)

Permanent Link

https://hdl.handle.net/10289/15259

Rights

Abstract

Online hate speech and violent extremism knows no borders, no political boundaries, no remorse. Researchers face an uphill battle to collect hate speech data in volumes and topical diversity suitable for training state-of-the art content-moderation systems. Neural language models ushered in a new era of synthetic data generation in use across various businesses, all despite calls for research to protect against unintended toxic output. We present a method for radicalising pre-trained neural language models to identify real online hate speech, as well as present the risks of rouge radicalised AI bots which could undermine our trust in social media. We present Prompt-GAN, a prompt-tuning adversarial approach with three achievements. We demonstrate prompt-tuning’s ability to generate realistic types of hate and non-hate speech which mimics political extremist discourse. Prompt-GAN’s architecture offers a twofold reduction in memory and runtime requirements compared to fine-tuning. Finally, Prompt-GAN improves hate speech classification F1-scores by up to 10.1% and sets a new record in neural language simulation compared to the current state-of-the-art across three benchmark datasets.

Type

Thesis

Date

2022

Publisher

The University of Waikato

Degree

Master of Cyber Security (MCS)

Supervisor

Patros, Panos
Feldman, Philip
Dant, Aaron

Prompt-GAN – Customisable hate speech and extremist datasets via radicalised neural language models

Authors

Files

Permanent Link

Publisher link

Rights

Abstract

Citation

Type

Series name

Date

Publisher

Degree

Type of thesis

Supervisor