首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
农志明  李自力 《现代情报》2012,32(10):66-71
本文基于文献的引文和共引关系,利用CiteSpaceII信息可视化软件对技术成熟度研究文献数据进行计量分析。追溯当前技术成熟度研究的起点,描绘研究热点和前沿领域,并结合关键节点文献的知识基础,以软件自带的时序图,绘制技术成熟度共引网络图谱,分析技术成熟度的主要研究领域、方法、趋势等问题,并用可视化的图谱关系描述技术成熟度的发展历程及研究状况。给人们提供一种新的技术成熟度研究模式。  相似文献   

2.
This study proposes a novel extended co-citation search technique, which is graph-based document retrieval on a co-citation network containing citation context information. The proposed search expands the scope of the target documents by repetitively spreading the relationship of co-citation in order to obtain relevant documents that are not identified by traditional co-citation searches. Specifically, this search technique is a combination of (a) applying a graph-based algorithm to compute the similarity score on a complicated network, and (b) incorporating co-citation contexts into the process of calculating similarity scores to reduce the negative effects of an increasing number of irrelevant documents. To evaluate the search performance of the proposed search, 10 proposed methods (five representative graph-based algorithms applied to co-citation networks weighted with/without contexts) are compared with two kinds of baselines (a traditional co-citation search with/without contexts) in information retrieval experiments based on two test collections (biomedicine and computer linguistic articles). The experiment results showed that the scores of the normalized discounted cumulative gain ([email protected]) of the proposed methods using co-citation contexts tended to be higher than those of the baselines. In addition, the combination of the random walk with restart (RWR) algorithm and the network weighted with contexts achieved the best search performance among the 10 proposed methods. Thus, it is clarified that the combination of graph-based algorithms and co-citation contexts are effective in improving the performance of co-citation search techniques, and that sole use of a graph-based algorithm is not enough to enhance search performances from the baselines.  相似文献   

3.
4.
The importance of query performance prediction has been widely acknowledged in the literature, especially for query expansion, refinement, and interpolating different retrieval approaches. This paper proposes a novel semantics-based query performance prediction approach based on estimating semantic similarities between queries and documents. We introduce three post-retrieval predictors, namely (1) semantic distinction, (2) semantic query drift, and (3) semantic cohesion based on (1) the semantic similarity of a query to the top-ranked documents compared to the whole collection, (2) the estimation of non-query related aspects of the retrieved documents using semantic measures, and (3) the semantic cohesion of the retrieved documents. We assume that queries and documents are modeled as sets of entities from a knowledge graph, e.g., DBPedia concepts, instead of bags of words. With this assumption, semantic similarities between two texts are measured based on the relatedness between entities, which are learned from the contextual information represented in the knowledge graph. We empirically illustrate these predictors’ effectiveness, especially when term-based measures fail to quantify query performance prediction hypotheses correctly. We report our findings on the proposed predictors’ performance and their interpolation on three standard collections, namely ClueWeb09-B, ClueWeb12-B, and Robust04. We show that the proposed predictors are effective across different datasets in terms of Pearson and Kendall correlation coefficients between the predicted performance and the average precision measured by relevance judgments.  相似文献   

5.
孙海生 《现代情报》2019,39(4):134-142
[目的/意义]已有研究对文献耦合关系和同被引关系比较的研究较少,本文比较两种关系在文献间建立联系的差异,并且比较耦合/同被引强度与文献相似度的相关性,分析耦合分析和同被引分析各自更适合哪些方面的应用。[方法/过程]根据复杂网络理论,构建文献耦合网络和同被引网络,实证比较文献耦合网络和同被引网络的拓扑性质。利用QAP关联分析,研究耦合关系、同被引关系与文献内容相似度的关系。[结果/结论]网络拓扑结构分析表明,耦合关系在文献之间建立的联系更普遍而且更稳定,更利于检索被引用次数较少的大多数文献;同被引关系在高被引文献之间建立的联系更紧密,利于检索和确定领域内的核心文献。QAP关联分析表明耦合强度和文献相似度的相关性更强,在文献聚类分析研究主题时,耦合强度更可靠。  相似文献   

6.
运用共现分析法与引文分析法,以CNKI数据库为基础,检索2004-2012年间的RFID技术在图书馆应用相关文献,利用CitespaceⅡ分析软件,绘制我国图书馆应用RFID技术领域的热点主题、高产作者、高产机构、高产期刊和高被引期刊知识图谱以及引文知识图谱,对这些图谱进行综合对比分析,最后进行讨论和展望。对明确我国图书馆应用RFID技术的现状、前沿、发展趋势,以及发现存在问题,梳理其发展脉络,提出合理化建议具有一定的意义。  相似文献   

7.
Pseudo-relevance feedback (PRF) is a well-known method for addressing the mismatch between query intention and query representation. Most current PRF methods consider relevance matching only from the perspective of terms used to sort feedback documents, thus possibly leading to a semantic gap between query representation and document representation. In this work, a PRF framework that combines relevance matching and semantic matching is proposed to improve the quality of feedback documents. Specifically, in the first round of retrieval, we propose a reranking mechanism in which the information of the exact terms and the semantic similarity between the query and document representations are calculated by bidirectional encoder representations from transformers (BERT); this mechanism reduces the text semantic gap by using the semantic information and improves the quality of feedback documents. Then, our proposed PRF framework is constructed to process the results of the first round of retrieval by using probability-based PRF methods and language-model-based PRF methods. Finally, we conduct extensive experiments on four Text Retrieval Conference (TREC) datasets. The results show that the proposed models outperform the robust baseline models in terms of the mean average precision (MAP) and precision P at position 10 (P@10), and the results also highlight that using the combined relevance matching and semantic matching method is more effective than using relevance matching or semantic matching alone in terms of improving the quality of feedback documents.  相似文献   

8.
A co-citation cluster analysis of a three year (1975–1977) cumulation of the Social Sciences Citation Index is described, and clusters of information science documents contained in this data-base are identified using a journal subset concentration measure. The internal structure of the information science clusters is analyzed in terms of co-citations among clusters, and external linkages to fields outside information science are explored. It is shown that clusters identified by the journal concentration method also cohere in a natural way through cluster co-citation. Conclusions are drawn regarding the relationship of information science to the social sciences, and suggestions are made on how these data might be used in planning an agenda for research in the field.  相似文献   

9.
International research collaboration (IRC) has been increasingly important as an emerging area of innovation studies. This study reviews the intellectual base, main research trajectories and intellectual communities of the IRC research domain over the period 1957–2015. It integrates qualitative review and three quantitative analyses including co-citation network analysis, main path analysis and bibliographic coupling analysis. The results show that the IRC research has gone through three phases, namely, “emergence” (1957–1991), “fermentation” (1992–2005) and “take-off” (2006–2015) phases. The co-citation network analysis confirms that the IRC research field has been developed under the influence of two pioneering studies related to bibliometrics research. The main research trajectories in IRC studies over the three development phases and over the whole period are identified based on the main path analysis, which shows that co-authorship analysis is the main research method in IRC studies. A bibliographic coupling analysis suggests that the whole IRC research domain can be classified into five distinct intellectual areas: drivers of IRC, IRC patterns, IRC effects, IRC networks and IRC measurement. Seven topics for future research are also identified.  相似文献   

10.
Traditional information retrieval techniques that primarily rely on keyword-based linking of the query and document spaces face challenges such as the vocabulary mismatch problem where relevant documents to a given query might not be retrieved simply due to the use of different terminology for describing the same concepts. As such, semantic search techniques aim to address such limitations of keyword-based retrieval models by incorporating semantic information from standard knowledge bases such as Freebase and DBpedia. The literature has already shown that while the sole consideration of semantic information might not lead to improved retrieval performance over keyword-based search, their consideration enables the retrieval of a set of relevant documents that cannot be retrieved by keyword-based methods. As such, building indices that store and provide access to semantic information during the retrieval process is important. While the process for building and querying keyword-based indices is quite well understood, the incorporation of semantic information within search indices is still an open challenge. Existing work have proposed to build one unified index encompassing both textual and semantic information or to build separate yet integrated indices for each information type but they face limitations such as increased query process time. In this paper, we propose to use neural embeddings-based representations of term, semantic entity, semantic type and documents within the same embedding space to facilitate the development of a unified search index that would consist of these four information types. We perform experiments on standard and widely used document collections including Clueweb09-B and Robust04 to evaluate our proposed indexing strategy from both effectiveness and efficiency perspectives. Based on our experiments, we find that when neural embeddings are used to build inverted indices; hence relaxing the requirement to explicitly observe the posting list key in the indexed document: (a) retrieval efficiency will increase compared to a standard inverted index, hence reduces the index size and query processing time, and (b) while retrieval efficiency, which is the main objective of an efficient indexing mechanism improves using our proposed method, retrieval effectiveness also retains competitive performance compared to the baseline in terms of retrieving a reasonable number of relevant documents from the indexed corpus.  相似文献   

11.
科技政策研究代表人物与核心文献可视化网络   总被引:2,自引:0,他引:2       下载免费PDF全文
栾春娟  侯海燕 《科学学研究》2008,26(6):1164-1167
 引文计量方法常被用来确定某一研究领域的代表人物与核心文献。以国际科学技术政策研究权威期刊《科研政策》(Research Policy)的全部引文数据作为样本,通过作者共被引分析与文献共被引分析,确定了国际科技政策研究领域的代表人物与核心文献;并在此基础上,利用信息可视化技术,绘制出科技政策研究领域代表人物与核心文献的可视化网络,为科技政策研究者提供重要参考。  相似文献   

12.
共引分析:研究学科及其文献结构和特点的一种有效方法   总被引:12,自引:2,他引:12  
赵党志 《情报杂志》1993,12(2):36-42
利用聚类分析和多维标度技术对1987年农业科学文献进行了文章共引分析,即按被引次数及共引情况选出28篇文章作为分析对象,用多维标度技术把高维引文空间中的28个文献点以二维散点图的形式表示出来,并把这些点通过聚类分析圈成点群,根据这样的圈所提供的信息,分析讨论了农业科学及其文献的结构和特点。  相似文献   

13.
Transfer learning utilizes labeled data available from some related domain (source domain) for achieving effective knowledge transformation to the target domain. However, most state-of-the-art cross-domain classification methods treat documents as plain text and ignore the hyperlink (or citation) relationship existing among the documents. In this paper, we propose a novel cross-domain document classification approach called Link-Bridged Topic model (LBT). LBT consists of two key steps. Firstly, LBT utilizes an auxiliary link network to discover the direct or indirect co-citation relationship among documents by embedding the background knowledge into a graph kernel. The mined co-citation relationship is leveraged to bridge the gap across different domains. Secondly, LBT simultaneously combines the content information and link structures into a unified latent topic model. The model is based on an assumption that the documents of source and target domains share some common topics from the point of view of both content information and link structure. By mapping both domains data into the latent topic spaces, LBT encodes the knowledge about domain commonality and difference as the shared topics with associated differential probabilities. The learned latent topics must be consistent with the source and target data, as well as content and link statistics. Then the shared topics act as the bridge to facilitate knowledge transfer from the source to the target domains. Experiments on different types of datasets show that our algorithm significantly improves the generalization performance of cross-domain document classification.  相似文献   

14.
This paper analyzes the bibliographic references made by all papers published by ACM in 2006. Both an automatic classification of all references and a human classification of a random sample of them resulted that around 40% of the references are to conference proceedings papers, around 30% are to journal papers, and around 8% are to books. Among the other types of documents, standards and RFC correspond to 3% of the references, technical and other reports correspond to 4%, and other Web references to 3%. Among the documents cited at least 10 times by the 2006 ACM papers, 41% are conferences papers, 37% are books, and 16% are journal papers.  相似文献   

15.
The goal of this paper is to present a visual mapping of intellectual structure in two-dimensions and to identify the subfields of the technology acceptance model through co-citation analysis. All the citation documents are included in the ISI Web of Knowledge database between 1989 and 2006. By using a sequence of statistical analyses including factor analysis, multidimensional scaling, and cluster analysis, we identified three main trends: task-related systems, e-commerce systems, and hedonic systems. The findings yielded managerial implications for both academic and practical issues.  相似文献   

16.
This paper presents an overview of automatic methods for building domain knowledge structures (domain models) from text collections. Applications of domain models have a long history within knowledge engineering and artificial intelligence. In the last couple of decades they have surfaced noticeably as a useful tool within natural language processing, information retrieval and semantic web technology. Inspired by the ubiquitous propagation of domain model structures that are emerging in several research disciplines, we give an overview of the current research landscape and some techniques and approaches. We will also discuss trade-offs between different approaches and point to some recent trends.  相似文献   

17.
In this paper, we present a well-defined general matrix framework for modelling Information Retrieval (IR). In this framework, collections, documents and queries correspond to matrix spaces. Retrieval aspects, such as content, structure and semantics, are expressed by matrices defined in these spaces and by matrix operations applied on them. The dualities of these spaces are identified through the application of frequency-based operations on the proposed matrices and through the investigation of the meaning of their eigenvectors. This allows term weighting concepts used for content-based retrieval, such as term frequency and inverse document frequency, to translate directly to concepts for structure-based retrieval. In addition, concepts such as pagerank, authorities and hubs, determined by exploiting the structural relationships between linked documents, can be defined with respect to the semantic relationships between terms. Moreover, this mathematical framework can be used to express classical and alternative evaluation measures, involving, for instance, the structure of documents, and to further explain and relate IR models and theory. The high level of reusability and abstraction of the framework leads to a logical layer for IR that makes system design and construction significantly more efficient, and thus, better and increasingly personalised systems can be built at lower costs.  相似文献   

18.
19.
本文在选取我国图书馆学情报学领域24种典型期刊的基础上,利用期刊同被引分析方法对从CNKI中检索出的1999—2009年的期刊同被引数据进行了分析,包括聚类分析、多维尺度分析和因子分析,得出了期刊的关系与结构及其在图书馆学情报学领域中的地位。  相似文献   

20.
This paper proposes a method to improve retrieval performance of the vector space model (VSM) in part by utilizing user-supplied information of those documents that are relevant to the query in question. In addition to the user's relevance feedback information, information such as original document similarities is incorporated into the retrieval model, which is built by using a sequence of linear transformations. High-dimensional and sparse vectors are then reduced by singular value decomposition (SVD) and transformed into a low-dimensional vector space, namely the space representing the latent semantic meanings of words. The method has been tested with two test collections, the Medline collection and the Cranfield collection. In order to train the model, multiple partitions are created for each collection. Improvement of average precision of the averages over all partitions, compared with the latent semantic indexing (LSI) model, are 20.57% (Medline) and 22.23% (Cranfield) for the two training data sets, and 0.47% (Medline) and 4.78% (Cranfield) for the test data, respectively. The proposed method provides an approach that makes it possible to preserve user-supplied relevance information for the long term in the system in order to use it later.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号