Publication:
Language inference from function words

Abstract

Language surface structures demonstrate regularities that make it possible to learn a capacity for producing an infinite number of well-formed expressions. This paper outlines a system that uncovers and characterizes regularities through principled wholesale pattern analysis of copious amounts of machine-readable text. The system uses the notion of closed-class lexemes to divide the input into phrases, and from these phrases infers lexical and syntactic information. The set of closed-class lexemes is derived from the text, and then these lexemes are clustered into functional types. Next the open-class words are categorized according to how they tend to appear in phrases and then clustered into a smaller number of open-class types. Finally these types are used to infer, and generalize, grammar rules. Statistical criteria are employed for each of these inference operations. The result is a relatively compact grammar that is guaranteed to cover every sentence in the source text that was used to form it. Closed-class inferencing compares well with current linguistic theories of syntax and offers a wide range of potential applications.

Citation

Smith, T. C., & Witten, I. H. (1993). Language inference from function words (Computer Science Working Papers 93/3). Hamilton, New Zealand: Department of Computer Science, University of Waikato.

Date

Publisher

Department of Computer Science, University of Waikato

Degree

Type of thesis

Supervisor

Link to supplementary material

Keywords

Research Projects

Organizational Units

Journal Issue