Thumbnail Image

Prompt-GAN – Customisable hate speech and extremist datasets via radicalised neural language models

Online hate speech and violent extremism knows no borders, no political boundaries, no remorse. Researchers face an uphill battle to collect hate speech data in volumes and topical diversity suitable for training state-of-the art content-moderation systems. Neural language models ushered in a new era of synthetic data generation in use across various businesses, all despite calls for research to protect against unintended toxic output. We present a method for radicalising pre-trained neural language models to identify real online hate speech, as well as present the risks of rouge radicalised AI bots which could undermine our trust in social media. We present Prompt-GAN, a prompt-tuning adversarial approach with three achievements. We demonstrate prompt-tuning’s ability to generate realistic types of hate and non-hate speech which mimics political extremist discourse. Prompt-GAN’s architecture offers a twofold reduction in memory and runtime requirements compared to fine-tuning. Finally, Prompt-GAN improves hate speech classification F1-scores by up to 10.1% and sets a new record in neural language simulation compared to the current state-of-the-art across three benchmark datasets.
Type of thesis
The University of Waikato
All items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.