On the evolution of word usage of classical Chinese poetry
Liang Liu
Department of Statistics and Institute of Bioinformatics
University of Georgia
《Computer Science》2015
ABSTRACT
The hierarchy of classical Chinese poetry has been broadly acknowledged by a number of studies in Chinese literature. However, quantitative investigations about the evolution of classical Chinese poetry are limited. The primary goal of this study is to provide quantitative evidence of the evolutionary linkages, with emphasis on word usage, among different period genres for classical Chinese poetry. Specifically, various statistical analyses were performed to find and compare the patterns of word usage in the poems of nine period genres, including shi jing, chu ci, Han shi , Jin shi, Tang shi, Song shi, Yuan shi, Ming shi, and Qing shi. The result of analysis indicates that each of nine period genres has unique patterns of word usage, with some Chinese characters being preferably used by the poems of a particular period genre. The analysis on the general pattern of word preference implies a decreasing trend in the use of ancient Chinese characters along the timeline of dynastic types of classical Chinese poetry. The phylogenetic analysis based on the distance matrix suggests that the evolution of different types of classical Chinese poetry is congruent with their chronological order, suggesting that word frequencies contain useful phylogenetic information and thus can be used to infer evolutionary linkages among various types of classical Chinese poetry. The statistical analyses conducted in this study can be applied to the data sets of general Chinese literature. Such analyses can provide quantitative insights about the evolution of general Chinese literature.
SUBJECT KEYWORDS: evolutionary linguistics, classical Chinese poetry, phylogenetic tree
1. INTRODUCTION
Quantitative measurements have been commonly used in linguistic studies for understanding various language phenomena and language structures (Liu and Huang 2012). The analysis of affinity among Chinese dialects adopted correlation coefficients to quantify dialect similarity (Cheng 1991). Peng et al. (2008) utilized average distances, clustering coefficients, and the degree distribution to quantitatively compare the networks of Chinese syllables and characters. Moreover, it has become common practice in quantitative linguistics that language laws should be tested on empirical data using statistical methods. Increasing involvement of statistics and mathematics in linguistic studies has significantly changed the way of conducting scientific research for understanding important aspects of language. The origins and development of human language are perhaps the most fundamental questions in linguistics. A closely related but much more complicated question is how human language evolves over time. Various hypotheses regarding the origins and evolution of human language have been proposed and described in the context of probabilistic and biological models (Freedman and Wang 1996; Wang 2013). Since August Schleicher first introduced the representation of language families as an evolutionary tree, phylogenetic trees have been fundamental tools for understanding the history of language in the context of human evolution (Wang 1998a; Wang 1998b). Neighbor-joining (Saitou and Nei 1987) is one of the most commonly used methods for building phylogenetic trees in evolutionary linguistics. In addition, the efficiency of the algorithm searching for the best tree was greatly improved by matrix decomposition of phylogenetic trees (Qiao and Wang 1998). In this paper, phylogenetic trees are used as primary tools to investigate the evolution of poetic styles in word usage. Specifically, the similarity of word usage is measured by the average distance between the frequencies of Chinese characters in classical Chinese poems from nine period genres. A neighbor-joining tree is then built from the matrix of similarity scores to illustrate the evolutionary linkage among nine period genres for classical Chinese poetry. The earliest Chinese poem tan ge (谈歌) was first cited in an archerrsquo;s response to an inquiry of the king Gou Jian of Yue about the secret of accurate bow shooting. Although this short poem was recorded in a history book wu yue chun qiu《吴越春秋》dated back to 80 C.E., it has been speculated that the poem was actually passed down orally from Chinese primitive society (2600 B.C.E – 2100 B.C.E) and was documented in books much later by descendants. In general, Chinese poetry consists of two types of poetry, namely, classical Chinese poetry and modern Chinese poetry (Yip 1997). Classical Chinese poetry is characterized as written in traditional Chinese with certain traditional modes associated with particular historical periods (Hinton 2008). The tradition of classical Chinese poetry begins at least as early as the publication of shi jing (i.e., the Book of Songs), a collection of 305 poems from over two millennia ago (Watson 1984). Classical Chinese poetry continued to grow up to until May fourth 1919 movement (Grieder 1980), which is commonly considered as the stimulus of emergence of the modern Chinese poetry (Yeh 1991). Over years, classical Chinese poetry has formed its unique style of rhythm and word usage (Zhong 2010). The earliest anthology of classical Chinese poetry is shi jing, in which poems or songs are predominantly composed of four-character lines developed during the period between Western Zhou Dynasty and the Spring and Autumn period (Dobson 1964). There are three chapters in shi jing, namely, folkway feng, elegance ya, and praise song. The chapter of folkway primarily features folk-songs, while the chapter of elegance includes songs from high-class officials and nobility. The songs in the chapter of praise were predominantly used to sing w
剩余内容已隐藏,支付完成后下载完整资料
论中国古典诗词用字的演变
梁流
统计与生物信息研究所
乔治亚大学
《计算机科学》2015
摘要
中国古典诗歌的等级制度在中国文学研究中得到了广泛的承认。然而,关于中国古典诗歌演变的定量研究是有限的。本研究的主要目的是提供定量证据从而进化联系,强调字的使用,在不同时期的风格,中国古典诗歌在字的使用的频率。具体而言,各种统计分析,发现和比较的方式使用的字在诗歌的九个时期的风格,包括诗经,楚辞,汉诗,晋诗,唐诗,宋诗,元 诗,明诗,清诗。分析结果表明,九种时期的每一种体裁都有独特的词语使用模式,其中一些汉字被特定时期的诗歌所采用。对词语偏好的一般模式的分析意味着中国古代汉字在中国古典诗歌朝代类型的使用中呈下降趋势。根据距离矩阵的系统发育分析表明,不同类型的中国古典诗歌的演变是一致的时间顺序,表明词的频率包含有用的系统发育信息,因此可以用来推断不同类型的中国古典诗歌之间的进化联系。在本研究中进行的统计分析,可以应用到一般中国文学的数据集。这种分析可以提供一般中国文学发展的定量见解。
关键词:演化语言学,中国古代诗歌,进化树
1。简介
定量测量已被广泛用于语言学研究,了解各种语言现象和语言结构(刘和黄2012)。汉语方言的亲和性分析采用相关系数量化方言相似度(程1991)。彭等。(2008)利用平均距离、聚类系数,定量比较汉语音节和汉字网络的度分布。此外,它已成为普遍的做法,在定量语言学,语言的法律应进行测试的经验数据,采用统计方法。在语言研究中,统计学和数学的研究越来越多地改变了对语言重要方面进行科学研究的方式。人类语言的起源和发展也许是语言学中最基本的问题。一个密切相关但更复杂的问题是人类语言如何随着时间的推移演变。各种假设和人类语言进化的起源已经被提出并在概率和生物模型中描述(弗里德曼和王1996;王2013)。自8月他首先介绍了语言的家庭作为一个进化树的表示,系统发育树已经了解人类进化的过程中语言的历史的基本工具(王,1998年a;王1998年b)。邻接(Saitou、内蒙古1987)是演化语言学中的构建系统发育树的最常用的方法之一。此外,通过系统进化树的矩阵分解,提高了算法寻找最佳树的效率(乔和王1998)。在本文中,系统发育树作为主要工具,探讨词汇风格的诗歌风格的演变。具体而言,词的使用的相似性是衡量中国古典诗歌的频率从九个时期的体裁之间的平均距离。然后用相似分数矩阵建立邻接树,以说明中国古典诗歌九个时期类型之间的进化联系。中国最早的诗谈歌(谈歌)第一次引用在射手的回应调查被越王勾践询问关于精确弓射击的秘密。虽然这首短诗被记载在历史书《吴越春秋》中,可以追溯到公元前80年。据推测,这首诗实际上是口头流传下来,从中国原始社会(2600 B.C.E–2100 B.C.E)之后被后人记录在书中。一般来说,中国诗歌由两类诗歌构成,即,中国古典诗歌与中国现代诗歌(叶问1997)。中国古典诗歌的特点是用繁体中文书写与某些传统模式与特定的历史时期有关(Hinton 2008)。两千多年前的305首诗集(华生1984)。中国古典诗歌的继续长大,直到5月4日1919运动( Grieder 1980),这通常被认为是中国现代诗歌出现的刺激(叶1991)。多年来,中国古典诗歌形成了独特的节奏和词语使用方式(钟2010)。中国古典诗歌最早的选集是lt;lt;诗经gt;gt;,在诗或歌曲主要是由四个字符之间的发展期间是西周和春秋时期(Dobson 1964)。诗经中有三个章节,即风、雅、颂。社会风俗的章主要特点的民歌,而优雅的章包括从高级官员和贵族的歌曲。在赞美章节中的歌曲主要用于在祖先的祭祀仪式中用舞蹈唱歌。另一部早期的诗集《楚辞》被称为《南方之歌》,其中包括与楚城在华南特别相关的诗歌。楚辞中的大多数诗都是屈原和宋玉创作的。相反,诗经、楚辞是典型的不规则的线的长度(霍克斯2011)。诗经、楚词反映了秦前期的社会政治地位。一个新的中国古典诗歌的形式,称为乐府风格,是汉代发展的(比勒尔1993)。和诗经与楚辞相,、乐府诗是由五个字符组成(刘1966)。乐府诗在汉代和建安(汉末六代初)时期发展,之后就变成了古体诗(Watson,1971)。为了区别于唐代至清代发展的诗歌,即“近体诗”(即新诗)(叶问1976)。不像它的前身(即古体诗),新风格的诗歌是由五个字符或七个字符行严格的规则的数量,押韵,和一定程度的强制并行(格雷厄姆1977)。Mair和梅的研究(1991)表明,新体诗植根于梵文韵律和诗学平仄(Deo 2007)。中国古典诗歌在过去三千年的社会、政治和文化变化的背景下继续发展(奥尔加2013)。中国古典诗歌的层次结构已被中国文学研究所广泛承认(1976)。两个古代文集,诗经、楚辞,在该层次结构的顶部,其次是乐府诗(汉代)和新体诗(唐、清),代表了中国古典诗歌沿朝代变化线的演变。然而,关于中国古典诗歌演变的定量研究却极为有限。这篇文章的主要目的是提供定量证据的进化联系,强调字的使用,在不同时期的流派(表2)的中国古典诗歌。
- 方法和结果
根据时间顺序从九个时期的体裁中收集中国古典诗歌的数据集,包括诗经、楚辞、汉诗、晋诗、唐诗、宋词、元诗、明诗、清诗(表2)。具体来说,数据集由诗305首诗,楚辞15首诗,从汉书《史记》中选取675个汉诗,1821晋诗选自晋书,313唐诗从《唐诗三百首》中选取(孙1763),100首宋词选自《宋词选注》(钱1958),651首元诗选自《元诗别裁集》(2012张),14602首明诗选自《列朝诗集》(钱2007),27801首清诗选自《清石会》(徐1929)。三进行统计分析:(1)分析了九个时期诗歌的词频分布;(2)“诗”在九代诗歌体裁中的显著性分析;(3)“九代”诗歌之间的进化联系的相似性分析。
在词频分布分析中,分别计算了九个时期诗歌中各汉字的出现频率。汉字按其频率从最高到最低依次排列。此外,二十个最常用的汉字(mfucc)被选定为下游分析。该分析包括两种类型的比较九个时期的诗歌体裁。词频分布的比较表明,施声与楚辞的词频与其他时期的体裁有显著差异。汉诗至清诗的词频分布比较平缓,表明汉诗、晋诗、唐诗、宋诗、元诗、明诗、清诗等词的使用频率相近(图1)。相比之下,诗经和楚辞更为常用。这种模式在楚词中更为明显,这个词的”兮”是词的频率分布(图1)。兮的高频率是楚词在词用法上的独特特征之一。许多的mfucc诗经、楚辞是虚词,而其他时期流派最mfucc是实词(如名词)。此外,楚辞采用”余”来表示第一人称,但其他时期流派使用单词”我”表示第一人称。最后,“兮”包括诗经、楚辞、汉诗中但是却没有在,晋诗、唐诗、宋诗、元诗、明诗、清诗中。最常用的词(一,风,云,天空)共享在晋诗,唐诗,宋诗,袁诗,明诗,清诗(图1),这是他们对汉诗特定词语的偏好一个有趣的模式。
汉诗对特定词语的偏爱。九个时期诗歌中词语使用的显著性分析,我计算了特定时期的诗歌中每个词的频率。我进一步计算的频率相同的词在诗歌结合所有其他时期的流派。采用二项式假设检验(豪厄尔2007)评价某一特定时期诗歌体裁中的词频是否显著高于其他时期类型。由于测试涉及多个比较,Bonferoni(Dunnett 1955)进行了调整改进,导致最后的P Padj。如果le;padj 0.01,在一个特定时期的诗歌体裁的词的频率明显高于其他时期的流派,因此,这个词被定义为“特征词”。分析结果表明,唐诗和宋词具有零特征词。特征字的数量增加,因为它走向的两端(诗经和清诗)的频谱的九个时期流派的X轴的图2。九个时期的诗歌按特征词的数量分为两类,唐诗与宋诗是两组之间的分界点(前唐组和后宋组)。进一步的分析(见图3)表明,唐诗与宋诗与后歌组(元史,明世,清诗)更为相似。因此,唐诗和宋诗,连同元诗、明诗,及清诗,形成一个诗歌团体,即新体诗。前十个特征词,诗经是古代汉字的特殊含义,在后秦时期的诗歌很少发生。因此,它表明诗经倾向于使用古代汉字。
剩余内容已隐藏,支付完成后下载完整资料
资料编号:[27680],资料为PDF文档或Word文档,PDF文档可免费转换为Word
课题毕业论文、文献综述、任务书、外文翻译、程序设计、图纸设计等资料可联系客服协助查找。