Using character-grams to automatically generate pseudowords and how to evaluate them

This paper provides a practical solution to the problem of generating (good) pseudowords, which are commonly used in vocabulary testing and experimental research in applied linguistics, and introduces an empirically founded solution to evaluating the suitability of pseudowords for different tasks. In the first part of the paper, we propose a novel way of generating pseudowords—a character-gram chaining algorithm. A major advantage of the algorithm is that it does not require any knowledge of the language, thereby facilitating the generation of pseudowords in any language. Secondly, there is currently a lack of formal criteria for evaluating pseudowords, both in terms of (i) their orthographic fit in the target language they are intended for and (ii) their suitability for use in various lexical processing and language teaching tasks. In the second part of the paper, we argue for the need to evaluate pseudowords, propose a set of linguistic criteria for evaluating the generated pseudowords, and provide a comparison with other current pseudoword lists using this criteria.
König, J. L., Calude, A. S., & Coxhead, A. (2019). Using character-grams to automatically generate pseudowords and how to evaluate them. Applied Linguistics, amz045. https://doi.org/10.1093/applin/amz045
