Thumbnail Image

Using character-grams to automatically generate pseudowords and how to evaluate them

This paper provides a practical solution to the problem of generating (good) pseudowords, which are commonly used in vocabulary testing and experimental research in applied linguistics, and introduces an empirically founded solution to evaluating the suitability of pseudowords for different tasks. In the first part of the paper, we propose a novel way of generating pseudowords—a character-gram chaining algorithm. A major advantage of the algorithm is that it does not require any knowledge of the language, thereby facilitating the generation of pseudowords in any language. Secondly, there is currently a lack of formal criteria for evaluating pseudowords, both in terms of (i) their orthographic fit in the target language they are intended for and (ii) their suitability for use in various lexical processing and language teaching tasks. In the second part of the paper, we argue for the need to evaluate pseudowords, propose a set of linguistic criteria for evaluating the generated pseudowords, and provide a comparison with other current pseudoword lists using this criteria.
Journal Article
Type of thesis
König, J. L., Calude, A. S., & Coxhead, A. (2019). Using character-grams to automatically generate pseudowords and how to evaluate them. Applied Linguistics, amz045. https://doi.org/10.1093/applin/amz045
Oxford University Press
This is an author’s accepted version of an article published in the journal: Applied Linguistics. © The Author(s) (2019). Published by Oxford University Press.