共查询到20条相似文献,搜索用时 234 毫秒
1.
2.
3.
信息检索加权理论与技术:基于VSM模型的分析 总被引:1,自引:0,他引:1
分析了信息检索加权技术的理论基础,探讨了局部统计分布特性和全局分布特性在词加权技术中的应用以及不对称分布对加权性能的影响,结合词加权技术的基本原理提出了词加权形式化描述与计算模型,并运用该模型对基于向量空间模型的加权技术及其优化策略进行了分析.针对加权技术需解决的关键问题描述文献内容和区分文献,提出计算文献权重应同时利用特征词局部分布和全局分布信息,并消除文献长度和语义信息缺乏等不对称分布问题的影响. 相似文献
4.
5.
6.
基于MeSH加权的非相关文献知识发现排序方法研究 总被引:2,自引:1,他引:1
文章在对现有非相关文献知识发现的中间集排序方法进行分析的基础上,以共现理论为基础,以主题关联度为着眼点,提出基于共有MeSH密度加权的B排序方法.并以Swanson的早期发现之一为基础,考察经共有MeSH密度加权与逆文献频率加权两种方法排序筛选后B的范围以及目标关联词和目标关联对的出现情况,以此作为评价其对B影响的依据.结果表明基于共有MeSH加权法能显著提高B的质量,从而提高发现效率. 相似文献
7.
自PageRank提出以来,就引起了学界广泛关注。在概述PageRank算法的基础上,从Topic-Related PageR-ank﹑时间维加权PageRank和科研学术网络中加权PageRank这3个方面对加权PageRank算法进行了综述和评价。 相似文献
8.
主题词自动加权方法的讨论,是现代情报检索理论在计算机中实现的一个重大问题,该方法的研究对于将现代情报检索模型用于中文科技情报自动检索,具有积极的促进作用和实用意义。 相似文献
9.
文献信息利用情况与借阅时间关系的探析 总被引:1,自引:0,他引:1
根据郑立琴同志对《文献信息利用率初探》一文的质疑,进一步分析了借阅时间与文献信息利用情况之间的关系,对文献信息利用率的概念,公式进行了重新界定和改进。 相似文献
10.
在多目标最优化问题的研究中,围绕最优解涌现了很多成果,产生了不少解的概念㈤。对于通常的单目标最优化问题解的唯一性以及稳定性,也有过一些通有性的研究成果。对于多目标最优化问题解的稳定性,Yu曾给出了一个通有稳定性结果,Xiang也曾给出了当权因子,权因子和目标函数变化时加权解的稳定性结果。鉴于在实际运用中,加权方法和加权解的作用突出,因此研究加权解的稳定性具有重要意义。在本文中,将在Xiang的基础上研究引入计算机一些应用知识当权因子,目标函数和约束集合都变化时多目标优化加权解的稳定性,最后通过计算机模拟阐述现实中的一个例子说明稳定的加权解在现实应用中如何选择满意的解要依实际情况而定。 相似文献
11.
12.
《Information processing & management》2005,41(5):1065-1080
Traditional index weighting approaches for information retrieval from texts depend on the term frequency based analysis of the text contents. A shortcoming of these indexing schemes, which consider only the occurrences of the terms in a document, is that they have some limitations in extracting semantically exact indexes that represent the semantic content of a document. To address this issue, we developed a new indexing formalism that considers not only the terms in a document, but also the concepts. In this approach, concept clusters are defined and a concept vector space model is proposed to represent the semantic importance degrees of lexical items and concepts within a document. Through an experiment on the TREC collection of Wall Street Journal documents, we show that the proposed method outperforms an indexing method based on term frequency (TF), especially in regard to the few highest-ranked documents. Moreover, the index term dimension was 80% lower for the proposed method than for the TF-based method, which is expected to significantly reduce the document search time in a real environment. 相似文献
13.
Accurate term discrimination in information retrieval is essential for identifying important terms in specific documents. In addition to the widely known inverse document frequency (IDF) method, alternative approaches such as the residual inverse document frequency (RIDF) scheme have been introduced for term discrimination. However, existing methods' performance is not unconditionally convincing. We propose a new collection frequency weighting scheme derived from the negative binomial distribution model of term occurrences. Factorial experiments were performed to examine potential interaction effect between collection frequency weight methods and term frequency weight methods according to the mean average precision and normalized discounted cumulative gain performance assessors. The results indicate that our proposed term discrimination method offers a significant gain in accuracy as compared to the IDF and RIDF scheme. This finding is reinforced by the fact that the results show no interaction effects among factors. 相似文献
14.
确定电子资源评价指标模糊权重的可行方法 总被引:10,自引:1,他引:10
本文应用模糊理论中的模糊关系和模糊转换法来处理专家群体对电子资源评价指标重要性的问卷调查和数据分析。以基于重心的解模糊化方法确定模糊权重值,充分利用专家所给出的评判信息,将提高最终评测结果的可信度。 相似文献
15.
基于模糊向量空间的文本分类方法 总被引:1,自引:0,他引:1
本文针对文本自动分类问题,提出了一种基于模糊向量空间模型和径向基函数网络的分类方法。网络由输入层、隐层和输出层组成。输入层完成分类样本的输入,隐层提取输入样本所隐含的模式特征,将分类结果在输出层表现出来。该方法在特征提取时充分考虑了特征项在文档中的位置信息,构造出模糊特征向量,使自动分类更接近手工分类方法。以中国期刊网全文数据库部分文档数据为例验证了该方法的有效性。 相似文献
16.
17.
In this paper, we propose a document reranking method for Chinese information retrieval. The method is based on a term weighting scheme, which integrates local and global distribution of terms as well as document frequency, document positions and term length. The weight scheme allows randomly setting a larger portion of the retrieved documents as relevance feedback, and lifts off the worry that very fewer relevant documents appear in top retrieved documents. It also helps to improve the performance of maximal marginal relevance (MMR) in document reranking. The method was evaluated by MAP (mean average precision), a recall-oriented measure. Significance tests showed that our method can get significant improvement against standard baselines, and outperform relevant methods consistently. 相似文献
18.
传统信息检索方法忽视了文档结构对信息检索过程的影响.本文提出了一种改进的基于文档结构的信息检索方法,该方法首先使用第一类特征域对检索文档集进行过滤,然后使用第二类特征域进行匹配排序;引入AHP方法动态确定各特征域的重要性权重因子;最后使用向量内积计算的方法合成总相似度值.实验结果表明该方法可以提高信息检索的查准率和检索结果的排序合理性. 相似文献
19.
This work addresses the information retrieval problem of auto-indexing Arabic documents. Auto-indexing a text document refers to automatically extracting words that are suitable for building an index for the document. In this paper, we propose an auto-indexing method for Arabic text documents. This method is mainly based on morphological analysis and on a technique for assigning weights to words. The morphological analysis uses a number of grammatical rules to extract stem words that become candidate index words. The weight assignment technique computes weights for these words relative to the container document. The weight is based on how spread is the word in a document and not only on its rate of occurrence. The candidate index words are then sorted in descending order by weight so that information retrievers can select the more important index words. We empirically verify the usefulness of our method using several examples. For these examples, we obtained an average recall of 46% and an average precision of 64%. 相似文献
20.
《Information processing & management》2001,37(1):39-51
Variable bit-block compression (VBC) signature is extended for document ranking. Two different extensions were experimented: the weighted VBC (WVBC) scheme and the aggregate VBC (AVBC) scheme. For both, analytical bounds of the additional storage for the term frequencies were derived. The upper and lower bounds of WVBC signatures were better than the corresponding bounds for AVBC signatures. In general, these bounds are functions of the word size (in bits) of the term frequencies. Therefore, term frequencies were scaled to reduce the word size. Experiments showed that the additional storage cost is closer to the lower than the upper bound for both WVBC and AVBC signatures. In addition, WVBC signatures were better than AVBC signatures in terms of storage and retrieval speed. Logarithmic scaling was found to be significantly better than linear scaling, in measuring the agreement of document ranking against the case without scaling, using the Kendall rank-order correlation. If a 75% ranking performance is acceptable, then the additional storage of the term frequencies is only 3.4% of all the indexed documents. 相似文献