基于词同现网络与支持向量机的论文甄别 |
| |
引用本文: | 孙文俊,杜娟.基于词同现网络与支持向量机的论文甄别[J].现代情报,2010,30(7):87-92. |
| |
作者姓名: | 孙文俊 杜娟 |
| |
作者单位: | 哈尔滨工业大学经济管理学院,黑龙江,哈尔滨,150001 |
| |
摘 要: | 单词在句子中的交互不是随机的,而是基于一定的规则,这种规则可以通过语言网络进行研究。词同现网络是人类语言网络的一种表现形式,它利用单词在句子中的相邻关系来确定一个连接。文中采用语言网络分析的方法对论文进行甄别:将论文用词同现网络表示,计算网络的特征参数并输出一个向量来表征论文,然后运用支持向量机对论文进行分类。结果表明,使用该方法对高水平的论文和文本发生器产生的论文具有很好的甄别效果,对领域差别大的论文甄别效果也较显著。
|
关 键 词: | 词同现网络 论文甄别 语言网络分析 小世界网络 |
Paper Discrimination Based-on Word Co-occurrence Network and Support Vector Machine |
| |
Authors: | Sun Wenjun Du Juan |
| |
Institution: | School of Management, Harbin Institute of Technology, Haerbin 150001, China |
| |
Abstract: | Words in human language interact in sentences in non-random ways,but in a subtle manner that can be described in terms of a network of word interactions.Word co-occurrence network is a form of the human language complex network;it uses the co-occurrence of words in a sentence to define connections.This paper discriminates papers using language network analysis method:employ the word co-occurrence network of papers to represent them,then calculate the various parameters of the network and output a vector;finally,apply support vector machines to discriminate papers.The experimental results show that the classifier built by this method behaves well on high quality papers and unauthentic papers generated by text generators,and it also discriminates the papers which come from different area significantly. |
| |
Keywords: | language network analysis word co-occurrence network paper discrimination small-world network |
本文献已被 维普 万方数据 等数据库收录! |
| 点击此处可从《现代情报》浏览原始摘要信息 |
| 点击此处可从《现代情报》下载免费的PDF全文 |
|