Wu, S., Franken, M. & Witten, I. H. (2009). Refining the use of the web (and web search) as a language teaching and learning resource. Computer Assisted Language Learning, 22(3), 249-268.
Permanent Research Commons link: http://hdl.handle.net/10289/2827
The web is a potentially useful corpus for language study because it provides examples of language that are contextualized and authentic, and is large and easily searchable. However, web contents are heterogeneous in the extreme, uncontrolled and hence 'dirty,' and exhibit features different from the written and spoken texts in other linguistic corpora. This article explores the use of the web and web search as a resource for language teaching and learning. We describe how a particular derived corpus containing a trillion word tokens in the form of n-grams has been filtered by word lists and syntactic constraints and used to create three digital library collections, linked with other corpora and the live web, that exploit the affordances of web text and mitigate some of its constraints.
This is an author's accepted version of an article published in the journal: Computer Assisted Language Learning. Copyright 2009 Taylor & Francis.