基于领域中文文本的术语抽取方法研究 Study on Term Extraction on the Basis of Chinese Domain Texts期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于领域中文文本的术语抽取方法研究

引用本文：	谷俊,王昊.基于领域中文文本的术语抽取方法研究[J].现代图书情报技术,2011(4).

作者姓名：	谷俊王昊

作者单位：	南京大学信息管理系;上海宝山钢铁股份有限公司;

摘要：	在ICTCLAS词典分词的基础上,利用串频最大匹配算法从中文专利文本中抽取候选术语,再利用TF-IDF算法得到相关特征项的权重,经过筛选后得到最终概念术语。最后,抽取部分样本数据进行实验,并对结果进行分析。
关键词：	本体概念抽取串频最大匹配 TF-IDF 中文分词
Study on Term Extraction on the Basis of Chinese Domain Texts

Gu Jun, Wang Hao.Study on Term Extraction on the Basis of Chinese Domain Texts[J].New Technology of Library and Information Service,2011(4).

Authors:	Gu Jun Wang Hao

Institution:	Gu Jun1,2 Wang Hao11(Department of Information Management,Nanjing University,Nanjing 210093,China)2(Baoshan Iron and Steel Company Ltd.,Shanghai 201900,China)

Abstract:	Based on the ICTCLAS dictionary segmentation,this paper proposes a method that extracts relevant concept terminology from the Chinese patent texts by maximum matching and frequency statistics,then computes the weights of the items by TF-IDF and gets the final concept terminology.Finally,it analyzes the results with the sample data extraction experiments.

Keywords:	Ontology Concept extraction Maximum matching and frequency statistics TF-IDF Chinese word segmentation
本文献已被 CNKI 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏