首页 | 本学科首页   官方微博 | 高级检索  
     检索      

VSM信息检索中的数据稀疏问题分析与规避策略
引用本文:梁士金.VSM信息检索中的数据稀疏问题分析与规避策略[J].图书情报工作,2013,57(1):142-146.
作者姓名:梁士金
作者单位:东莞理工学院城市学院图书信息中心
摘    要:以矩阵理论作为研究的切入点,将经典向量空间模型中常用的向量和集合以矩阵的形式加以重构,并认为基于向量内积法的相似性计算与相应矩阵的乘法运算等价。结合稀疏矩阵和数据稀疏的定义,分析VSM信息检索背景下数据稀疏产生的原因;同时,讨论三种情形下数据稀疏对相似性计算的共同影响--部分毫无意义的时间复杂度。最后,给出规避数据稀疏问题的三层策略:文本级策略、文本集级策略和矩阵级策略。

关 键 词:向量空间模型  信息检索  数据稀疏  规避策略  
收稿时间:2012-09-13

Data Sparseness Analysis and its Avoidance Strategies in the VSM Information Retrieval
Liang Shijin.Data Sparseness Analysis and its Avoidance Strategies in the VSM Information Retrieval[J].Library and Information Service,2013,57(1):142-146.
Authors:Liang Shijin
Institution:Library and Information Center, City College of Dongguan University of Technology, Dongguan 523106
Abstract:With matrix theory as a research starting point, this paper reconstructs the vector and the set involved in the vector space model in the form of matrix, and indicates that the similarity calculation based on the method of inner product of vectors is equivalent to the corresponding matrix multiplication. Combined with the definitions of sparse matrix and data sparseness, it analyzes the causes of data sparseness under the background of VSM information retrieval. At the same time, it discusses that the data sparseness brings common consequences-part of the meaningless time complexity to similarity calculation under three circumstances. Finally, this paper gives three layers strategies: text level strategy, text set level strategy and matrix level strategy which can avoid the data sparseness.
Keywords:vector space model  information retrieval  data sparseness  avoidance strategy  
本文献已被 万方数据 等数据库收录!
点击此处可从《图书情报工作》浏览原始摘要信息
点击此处可从《图书情报工作》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号