首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于先秦语料库的古汉语地名自动识别模型构建研究
引用本文:黄水清,王东波,何琳.基于先秦语料库的古汉语地名自动识别模型构建研究[J].图书情报工作,2015,59(12):135-140.
作者姓名:黄水清  王东波  何琳
作者单位:南京农业大学信息科学技术学院 南京 210095
摘    要:目的/意义] 在数字人文研究这一大趋势下,基于先秦古汉语语料库和条件随机场模型,构建古汉语地名自动识别模型。方法/过程] 对《春秋左氏传》中的地名的内部和外部特征进行统计分析,构建模型的特征模板。在规模为187, 901个词汇的训练和测试语料上,对比条件随机场模型和最大熵模型的地名识别效果,把调和平均数为90.94%的条件随机场训练模型确定为最佳,作为本文所要构建的模型,并在《国语》语料上进行验证。结果/结论] 在古汉语地名自动识别中,条件随机场模型优于最大熵模型,基于人工标注过的语料构建条件随机场自动识别模型能取得较好的识别效果。

关 键 词:古汉语地名  条件随机场  词汇特征  先秦语料库  
收稿时间:2015-05-23

Research on Constructing Automatic Recognition Model for Ancient Chinese Place Names Based on Pre-Qin Corpus
Huang Shuiqing,Wang Dongbo,He Lin.Research on Constructing Automatic Recognition Model for Ancient Chinese Place Names Based on Pre-Qin Corpus[J].Library and Information Service,2015,59(12):135-140.
Authors:Huang Shuiqing  Wang Dongbo  He Lin
Institution:College of Information Science and Technology, Nanjing Agricultural University, Nanjing 210095
Abstract:Purpose/significance] Under the trend of digital humanities research, the automatic recognition model for ancient Chinese place names is constructed based on Pre-Qin ancient Chinese corpus and conditional random field.Method/process] The internal and external characteristics of ancient Chinese place names in Zuo Commentary are analyzed, and the feature template of model is constructed. The training model, which is gained in train and test corpus of 187,901 words and the F-score of which is 91.52%, is best identified the ancient Chinese place names recognition model and applied the model to recognize the place name in Guo Yu by comparing the recognition results of the models of conditional random field and maximum entropy .Result/conclusion] The model of conditional random field is better than the model of maximum entropy in recognizing ancient Chinese place names. The performance of automatic recognition model based on conditional random field trained in annotated corpus is very well.
Keywords:ancient Chinese place name  conditional random field  lexical feature  pre-Qin corpus  
点击此处可从《图书情报工作》浏览原始摘要信息
点击此处可从《图书情报工作》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号