共查询到20条相似文献,搜索用时 201 毫秒
1.
本文首先分析了目前P2P网络中基于DHI的精确匹配搜索方法,并在此基础上提出一种基于关键字的信息搜索方法,实现了基于关键字的语义查询.通过仿真实验表明:该方法相对于现有算法具有更好的命中率和更高的查全率. 相似文献
2.
3.
4.
基于内容的非结构化P2P搜索系统中直接影响查询效果和搜索成本的两个主要问题是,高维语义空间所引起的文本相似度计算复杂以及广播算法带来的大量冗余消息. 本文提出利用集合差异度实现基于内容聚类的P2P搜索模型提高查询效率和减少冗余消息。该模型利用集合差异度定义文本相似度,将文本相似性的计算复杂度控制在线性时间内而有效地减少了查询时间;利用节点之间的集合差异度实现基于内容的聚类,既降低了查询时间,又减少了冗余消息.模拟实验表明,利用集合差异度构建的基于内容的搜索模型不仅具有较高的召回率,而且将搜索成本和查询时间分别降低到了Gnutella系统的40%和30%左右. 相似文献
5.
6.
7.
基于本体论的网络信息检索 总被引:3,自引:1,他引:3
网络信息的激增和多样化给有效的信息检索带来了种种困难,目前的检索工具仅提供了基于关键字的检索,而忽略了关键字本身所包含的语义内容。针对这些问题,提出了一种基于本体论的网络信息检索方法,该方法可以弥补基于关键字机械匹配检索机制的不足,改善网络信息检索的性能,增强网络信息检索的语义性。 相似文献
8.
针对XML文档检索的特点,提出了一种基于XSEarch引擎的语义近似检索模型。设计了利用WordNet对查询项进行语义扩展的计算方法,且对XSEarch引擎的答案排序模型进行了改进,并提出了满足近似检索模型的系统体系结构。 相似文献
9.
结构化P2P网络的资源定位算法采用的是分布式哈希表(DHT)算法,根据精确关键字进行资源的定位与发现。本文介绍了几种基于DHT的资源定位算法:CAN、Chord和Pastry,对它们的构建和路由算法进行分析,最后指出了结构化P2P网络所面临的问题。 相似文献
10.
针对传统检索模型局限于语法层次上关键词匹配的特点,以领域本体为知识组织方式,提出了一种基于领域本体的语义检索模型,同时给出了该模型中的查询语义扩展算法和相似度计算算法。 相似文献
11.
Recent developments have shown that entity-based models that rely on information from the knowledge graph can improve document retrieval performance. However, given the non-transitive nature of relatedness between entities on the knowledge graph, the use of semantic relatedness measures can lead to topic drift. To address this issue, we propose a relevance-based model for entity selection based on pseudo-relevance feedback, which is then used to systematically expand the input query leading to improved retrieval performance. We perform our experiments on the widely used TREC Web corpora and empirically show that our proposed approach to entity selection significantly improves ad hoc document retrieval compared to strong baselines. More concretely, the contributions of this work are as follows: (1) We introduce a graphical probability model that captures dependencies between entities within the query and documents. (2) We propose an unsupervised entity selection method based on the graphical model for query entity expansion and then for ad hoc retrieval. (3) We thoroughly evaluate our method and compare it with the state-of-the-art keyword and entity based retrieval methods. We demonstrate that the proposed retrieval model shows improved performance over all the other baselines on ClueWeb09B and ClueWeb12B, two widely used Web corpora, on the [email protected], and [email protected] metrics. We also show that the proposed method is most effective on the difficult queries. In addition, We compare our proposed entity selection with a state-of-the-art entity selection technique within the context of ad hoc retrieval using a basic query expansion method and illustrate that it provides more effective retrieval for all expansion weights and different number of expansion entities. 相似文献
12.
This paper presents a novel IR-style keyword search model for semantic web data retrieval, distinguished from current retrieval methods. In this model, an answer to a keyword query is a connected subgraph that contains all the query keywords. In addition, the answer is minimal because any proper subgraph can not be an answer to the query. We provide an approximation algorithm to retrieve these answers efficiently. A special ranking strategy is also proposed so that answers can be appropriately ordered. The experimental results over real datasets show that our model outperforms existing possible solutions with respect to effectiveness and efficiency. 相似文献
13.
Internet已成为全球最丰富的数据源,数据类型繁杂且动态变化,如何从中快速准确地检索出用户所需要的信息是一个亟待解决的问题.传统的搜索引擎基于语法的方式进行搜索,缺乏语义信息,难以准确地表达用户的查询需求和被检索对象的文档语义,致使查准率和查全率较低且搜索范围有限.本文对现有的语义检索方法进行了研究,分析了其中存在的问题,在此基础上提出了一种基于领域的语义搜索引擎模型,结合语义Web技术,使用领域本体元数据模型对用户的查询进行语义化规范,依据领域本体模式抽取文档中的知识并RDF化,准确地表达了用户的查询语义和作为被查询对象的文档语义,可以大大提高检索的准确性和检索效率,详细地给出了模型的体系结构、基本功能和工作原理. 相似文献
14.
《Information processing & management》2022,59(1):102746
The importance of query performance prediction has been widely acknowledged in the literature, especially for query expansion, refinement, and interpolating different retrieval approaches. This paper proposes a novel semantics-based query performance prediction approach based on estimating semantic similarities between queries and documents. We introduce three post-retrieval predictors, namely (1) semantic distinction, (2) semantic query drift, and (3) semantic cohesion based on (1) the semantic similarity of a query to the top-ranked documents compared to the whole collection, (2) the estimation of non-query related aspects of the retrieved documents using semantic measures, and (3) the semantic cohesion of the retrieved documents. We assume that queries and documents are modeled as sets of entities from a knowledge graph, e.g., DBPedia concepts, instead of bags of words. With this assumption, semantic similarities between two texts are measured based on the relatedness between entities, which are learned from the contextual information represented in the knowledge graph. We empirically illustrate these predictors’ effectiveness, especially when term-based measures fail to quantify query performance prediction hypotheses correctly. We report our findings on the proposed predictors’ performance and their interpolation on three standard collections, namely ClueWeb09-B, ClueWeb12-B, and Robust04. We show that the proposed predictors are effective across different datasets in terms of Pearson and Kendall correlation coefficients between the predicted performance and the average precision measured by relevance judgments. 相似文献
15.
16.
17.
《Information processing & management》2020,57(6):102342
Pseudo-relevance feedback (PRF) is a well-known method for addressing the mismatch between query intention and query representation. Most current PRF methods consider relevance matching only from the perspective of terms used to sort feedback documents, thus possibly leading to a semantic gap between query representation and document representation. In this work, a PRF framework that combines relevance matching and semantic matching is proposed to improve the quality of feedback documents. Specifically, in the first round of retrieval, we propose a reranking mechanism in which the information of the exact terms and the semantic similarity between the query and document representations are calculated by bidirectional encoder representations from transformers (BERT); this mechanism reduces the text semantic gap by using the semantic information and improves the quality of feedback documents. Then, our proposed PRF framework is constructed to process the results of the first round of retrieval by using probability-based PRF methods and language-model-based PRF methods. Finally, we conduct extensive experiments on four Text Retrieval Conference (TREC) datasets. The results show that the proposed models outperform the robust baseline models in terms of the mean average precision (MAP) and precision P at position 10 (P@10), and the results also highlight that using the combined relevance matching and semantic matching method is more effective than using relevance matching or semantic matching alone in terms of improving the quality of feedback documents. 相似文献
18.
To obtain high performances, previous works on FAQ retrieval used high-level knowledge bases or handcrafted rules. However, it is a time and effort consuming job to construct these knowledge bases and rules whenever application domains are changed. To overcome this problem, we propose a high-performance FAQ retrieval system only using users’ query logs as knowledge sources. During indexing time, the proposed system efficiently clusters users’ query logs using classification techniques based on latent semantic analysis. During retrieval time, the proposed system smoothes FAQs using the query log clusters. In the experiment, the proposed system outperformed the conventional information retrieval systems in FAQ retrieval. Based on various experiments, we found that the proposed system could alleviate critical lexical disagreement problems in short document retrieval. In addition, we believe that the proposed system is more practical and reliable than the previous FAQ retrieval systems because it uses only data-driven methods without high-level knowledge sources. 相似文献
19.
《Information processing & management》2022,59(1):102734
Existing pseudo-relevance feedback (PRF) methods often divide an original query into individual terms for processing and select expansion terms based on the term frequency, proximity, position, etc. This process may lose some contextual semantic information from the original query. In this work, based on the classic Rocchio model, we propose a probabilistic framework that incorporates sentence-level semantics via Bidirectional Encoder Representations from Transformers (BERT) into PRF. First, we obtain the importance of terms at the term level. Then, we use BERT to interactively encode the query and sentences in the feedback document to acquire the semantic similarity score of a sentence and the query. Next, the semantic scores of different sentences are summed as the term score at the sentence level. Finally, we balance the term-level and sentence-level weights by adjusting factors and combine the terms with the top-k scores to form a new query for the next-round processing. We apply this method to three Rocchio-based models (Rocchio, PRoc2, and KRoc). A series of experiments are conducted based on six official TREC data sets. Various evaluation indicators suggest that the improved models achieve a significant improvement over the corresponding baseline models. Our proposed models provide a promising avenue for incorporating sentence-level semantics into PRF, which is feasible and robust. Through comparison and analysis of a case study, expansion terms obtained from the proposed models are shown to be more semantically consistent with the query. 相似文献
20.
The term mismatch problem in information retrieval is a critical problem, and several techniques have been developed, such as query expansion, cluster-based retrieval and dimensionality reduction to resolve this issue. Of these techniques, this paper performs an empirical study on query expansion and cluster-based retrieval. We examine the effect of using parsimony in query expansion and the effect of clustering algorithms in cluster-based retrieval. In addition, query expansion and cluster-based retrieval are compared, and their combinations are evaluated in terms of retrieval performance by performing experimentations on seven test collections of NTCIR and TREC. 相似文献