Applications of graph theory to an English rhyming corpus

来源:互联网 发布:老白老婆 知乎 编辑:程序博客网 时间:2024/05/01 19:28

http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6WCW-504BT0B-2&_user=10&_coverDate=05%2F21%2F2010&_rdoc=1&_fmt=high&_orig=search&_sort=d&_docanchor=&view=c&_searchStrId=1386248107&_rerunOrigin=google&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=476a072a4191ae733fbf0360d39a8c82

 

References and further reading may be available for this article. To view references and further reading you mustpurchase this article.

Morgan SondereggerCorresponding Author Contact Information,a, E-mail The Corresponding Author

a University of Chicago, Department of Computer Science, 1100 East 58th Street, Chicago, IL 60637, USA

Received 16 October 2009; 
revised 10 March 2010; 
accepted 7 May 2010. 
Available online 21 May 2010.

Abstract

How much can we infer about the pronunciation of a language – past or present – by observing which words its speakers rhyme? This paper explores the connection between pronunciation and network structure in sets of rhymes. We consider therhyme graphs corresponding to rhyming corpora, where nodes are words and edges are observed rhymes. We describe the graphView the MathML source corresponding to a corpus of not, vert, similar 12000 rhymes from English poetry written c. 1900, and find a close correspondence between graph structure and pronunciation: most connected components show community structure that reflects the distinction between full and half rhymes. We build classifiers for predicting which components correspond to full rhymes, using a set of spectral and non-spectral features. Feature selection gives a small number (1–5) of spectral features, with accuracy andF-measure of not, vert, similar90%, reflecting that positive components are essentially those without any good partition. We partition components ofView the MathML source via maximum modularity, giving a new graph, View the MathML source, in which the “quality” of components, by several measures, is much higher than inView the MathML source. We discuss how rhyme graphs could be used for historical pronunciation reconstruction.

Keywords: Rhymes; Graph theory; Complex networks; Poetry; Phonology; English

Article Outline

1. Introduction
2. Data
2.1. Rhyming corpora
2.2. Pronunciations, rhyme stems
3. Rhyme graphs
3.1. Notation
3.2. The rhyme graph G
3.3. Summary
4. Classification
4.1. Feature set
4.1.1. Non-spectral features
4.1.2. Spectral features
4.2. Experiments
4.3. Sensitivity analysis
4.4. Summary
5. Partitioning
5.1. Modularity
5.2. Modularity maximization
5.3. Experiment
5.4. Examples
5.5. Measuring the quality of View the MathML source vs. View the MathML source
5.5.1. General measures of similarity between clusterings
5.5.2. Intuitive measures of rhyme graph quality
5.6. Summary
6. Discussion
6.1. Future work
6.2. Summary
Acknowledgements
References
原创粉丝点击