首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 281 毫秒
1.
Patent prior art search is a type of search in the patent domain where documents are searched for that describe the work previously carried out related to a patent application. The goal of this search is to check whether the idea in the patent application is novel. Vocabulary mismatch is one of the main problems of patent retrieval which results in low retrievability of similar documents for a given patent application. In this paper we show how the term distribution of the cited documents in an initially retrieved ranked list can be used to address the vocabulary mismatch. We propose a method for query modeling estimation which utilizes the citation links in a pseudo relevance feedback set. We first build a topic dependent citation graph, starting from the initially retrieved set of feedback documents and utilizing citation links of feedback documents to expand the set. We identify the important documents in the topic dependent citation graph using a citation analysis measure. We then use the term distribution of the documents in the citation graph to estimate a query model by identifying the distinguishing terms and their respective weights. We then use these terms to expand our original query. We use CLEF-IP 2011 collection to evaluate the effectiveness of our query modeling approach for prior art search. We also study the influence of different parameters on the performance of the proposed method. The experimental results demonstrate that the proposed approach significantly improves the recall over a state-of-the-art baseline which uses the link-based structure of the citation graph but not the term distribution of the cited documents.  相似文献   

2.
[目的/意义]施引文献与被引文献往往存在着某种相似性,揭示这种现象背后的形成机制有助于深入理解引文的本质。[方法/过程]采用指数随机图模型,以图书馆与情报学领域为对象开展实证分析,旨在揭示文献相似性对引用关系的影响机制。[结果/结论]实证研究发现:在网络结构、机构、期刊层面存在显著的引用文献相似倾向。具体地,引用关系更倾向于嵌入三角传递结构;来源于相同机构和期刊的文献之间更容易产生引用关系;来源于学科优势地位国家的文献之间更容易产生引用。实证结果充分说明社会接近性是引用行为的重要形成机制,反映了引用偏好的社会属性。  相似文献   

3.
Automatic document classification can be used to organize documents in a digital library, construct on-line directories, improve the precision of web searching, or help the interactions between user and search engines. In this paper we explore how linkage information inherent to different document collections can be used to enhance the effectiveness of classification algorithms. We have experimented with three link-based bibliometric measures, co-citation, bibliographic coupling and Amsler, on three different document collections: a digital library of computer science papers, a web directory and an on-line encyclopedia. Results show that both hyperlink and citation information can be used to learn reliable and effective classifiers based on a kNN classifier. In one of the test collections used, we obtained improvements of up to 69.8% of macro-averaged F 1 over the traditional text-based kNN classifier, considered as the baseline measure in our experiments. We also present alternative ways of combining bibliometric based classifiers with text based classifiers. Finally, we conducted studies to analyze the situation in which the bibliometric-based classifiers failed and show that in such cases it is hard to reach consensus regarding the correct classes, even for human judges.  相似文献   

4.
介绍可被引文献和“非可被引文献(non-citable document,NCD)的概念.对NCD的引证特征进行统计分析,证实了NCD是可被引用的,甚至还可能有极高的被引频次.通过文献计量学分析,探讨NCD对期刊影响因子的贡献度.  相似文献   

5.
国际学术期刊库绩效评价方法应用现状   总被引:1,自引:0,他引:1  
把近10年来国际学术期刊库绩效评价方法归纳为科研成果出版数量评价法、特定读者使用情况分析法、引文分析法、与其他业务环节相关性分析法、馆员评价法、读者问卷调查法、访问次数/全文次数统计法、篇均使用成本统计法、馆藏结构评价法9种,对这些方法的应用案例进行分析得出:9种绩效评价方法基本可行且具有指导实践意义。在开展期刊库绩效评价时应综合应用各种评价方法,并重点认识篇均使用成本统计法存在的弊端。  相似文献   

6.
The standard impact factor allows one to compare scientific journals only within particular scientific subjects. To overcome this limitation, another indicator of citation, viz., the thematically weighted impact factor (TWIF), is proposed. This indicator allows one to compare journals of various subjects and takes the fact that a journal belongs to several subjects into account. Information on the thematic headings of a journal and the value of a standard impact factor is necessary for calculation of the indicator. The TWIF, which is calculated according to the citation index of Journal Citation Reports, is investigated in this article.  相似文献   

7.
A new link-based document ranking framework is devised with at its heart, a contents and time sensitive random literature explorer designed to more accurately model the behaviour of readers of scientific documents. In particular, our ranking framework dynamically adjusts its random walk parameters according to both contents and age of encountered documents, thus incorporating the diversity of topics and how they evolve over time into the score of a scientific publication. Our random walk framework results in a ranking of scientific documents which is shown to be more effective in facilitating literature exploration than PageRank measured against a proxy gold standard based on papers’ potential usefulness in facilitating later research. One of its many strengths lies in its practical value in reliably retrieving and placing promisingly useful papers at the top of its ranking.  相似文献   

8.
The rise of software as a research object is mirrored by increasing interests in quantitative studies of scientific software. However, inconsistent citation practices have led most existing studies of this type to base their analysis of software impact on software name mentions, as identified in full-text publications. Despite its limitations, citation data exists in much greater quantities and covers a broader array of scientific fields than full-text data, and thus can support investigations with much wider scope. This paper aims to analyze the extent to which citation data can be used to reconstruct the impact of software. Specifically, we identify the variety of citable objects related to the lme4 R package and examine how the package’s impact is dispersed across these objects. Our results shed light on a little-discussed challenge of using citation data to measure software impact: even within the category of formal citation, the same software object might be cited in different forms. We consider the implications of this challenge and propose a method to reconstruct the impact of lme4 through its citations nonetheless.  相似文献   

9.
This paper gives the results of the scientometric analysis of foreign publications by Kazakh authors that was reflected in the SCOPUS DB in 1991–2008. The publication activity is expressed in 3883 documents, the citation index of which is 10 132. The average share of Kazakh publications in the total worldwide flow is equal to 0.017%. The citation rate of publications was revealed to have significantly grown since the 1996–2000 period. It is shown that most articles were written in English and published in periodical editions. The main themes of publications are represented by physics and chemistry. The leading foreign partners of Kazakhstan in the scientific sphere were determined. Kazakh-Russian scientific cooperation is developing most fruitfully.  相似文献   

10.
科学知识扩散研究框架   总被引:2,自引:1,他引:1  
通过对科学知识扩散相关文献的梳理,构建科学知识扩散的研究框架,并对研究对象、扩散关系表示、衡量指标、扩散模型等方面进行详细评述。扩散的对象包括期刊、学科、科研人员等,扩散过程主要以文献引证和作者合著关系表示。在实证中,基于文献引证关系的引文及引文网络分析是科学知识扩散的主流研究方法。衡量指标可以按照测度粒度分为文章、期刊、学科3个层次。现有科学知识扩散的模型研究以定性研究为主,定量化分析较少,常见思路为跨学科借鉴成熟模型。  相似文献   

11.
[目的/意义]确定基于引用关系提取关键文献时各种方法的优缺点、适用场合,从而使用户快速捕捉领域重要文献,掌握领域概貌。[方法/过程]基于文献引用关系,从文献被引频次、文献引用网络、文献共被引网络3个角度,结合HistCite、CiteSpace等软件探讨领域关键文献的识别方法,通过同源数据的实际验证,对不同方法进行判别比较。[结果/结论]基于被引频次的方法更适合选择特定领域中哪些文献对总体文献的科学进步产生重大影响角度提取关键文献,对应的关键文献集合呈现非常分散的特性;基于引用网络的方法更适合从特定领域研究动态提取发展过程中的关键文献,对应的关键文献集合呈现非常明显的集中特性;基于共被引网络的方法更适合从特定领域研究基础角度提取关键文献,对应的关键文献集合呈现较强的集中性,且能发现原始采集中未发现的大量关键文献。  相似文献   

12.
The level of consensus in science has traditionally been measured by a number of different methods. The variety is important as each method measures different aspects of science and consensus. Citation analytical studies have previously measured the level of consensus using the scientific journal as their unit of analysis. To produce a more fine grained citation analysis one needs to study consensus formation on an even more detailed level – i.e. the scientific document or article. To do so, we have developed a new technique that measures consensus by aggregated bibliographic couplings (ABC) between documents. The advantages of the ABC-technique are demonstrated in a study of two selected disciplines in which the levels of consensus are measured using the proposed technique.  相似文献   

13.
丁文姚  李健  韩毅 《图书情报工作》2019,63(22):118-128
[目的/意义] 探索期刊论文科学数据引用特征与规律不仅有助于描述学科领域对科学数据的利用情况,还能够揭示学术成果表达中的数据引用模式。[方法/过程] 以我国图书情报领域6种期刊2017年与2018年第一期刊载论文为样本,结合国家标准《信息技术科学数据引用》的引用元素,采用内容分析法从9个维度对样本论文的科学数据引用行为进行数据编码,应用统计学方法描述图书情报领域期刊论文科学数据引用特征并探索不同维度特征间的关联关系。[结果/结论] 图书情报领域期刊论文广泛引用来自国内外的统计整理类科学数据,对期刊论文中个人研究科学数据的引用量较大;科学数据引用标注方式与科学数据类型存在一定对应关系,但多样化的标注方式缺乏统一性;二手引用现象较为突出,二手引用程度与科学数据创建者类型相关。  相似文献   

14.
The cost effectiveness and quality of full-text journals are analyzed for four prominent online aggregated journal packages: EBSCOhost Academic Search FullTEXT, UMI Proquest Direct Periodicals Research II, IAC’s Expanded Academic ASAP, and H.W. Wilson’s OmniFile. Price data from EBSCO’s Librarians’ Handbook are used to assess the total and average value of social sciences journals in each package. Quality of social sciences journals coverage is compared based on citation impact factors as recorded in Journal Citation Reports—Social Sciences Edition.  相似文献   

15.
Automatic detection of source code plagiarism is an important research field for both the commercial software industry and within the research community. Existing methods of plagiarism detection primarily involve exhaustive pairwise document comparison, which does not scale well for large software collections. To achieve scalability, we approach the problem from an information retrieval (IR) perspective. We retrieve a ranked list of candidate documents in response to a pseudo-query representation constructed from each source code document in the collection. The challenge in source code document retrieval is that the standard bag-of-words (BoW) representation model for such documents is likely to result in many false positives being retrieved, because of the use of identical programming language specific constructs and keywords. To address this problem, we make use of an abstract syntax tree (AST) representation of the source code documents. While the IR approach is efficient, it is essentially unsupervised in nature. To further improve its effectiveness, we apply a supervised classifier (pre-trained with features extracted from sample plagiarized source code pairs) on the top ranked retrieved documents. We report experiments on the SOCO-2014 dataset comprising 12K Java source files with almost 1M lines of code. Our experiments confirm that the AST based approach produces significantly better retrieval effectiveness than a standard BoW representation, i.e., the AST based approach is able to identify a higher number of plagiarized source code documents at top ranks in response to a query source code document. The supervised classifier, trained on features extracted from sample plagiarized source code pairs, is shown to effectively filter and thus further improve the ranked list of retrieved candidate plagiarized documents.  相似文献   

16.
为揭示不同领域历史研究对档案的利用需求差异,分析不同类型档案对历史研究的支持力度,本文基于《历史研究》期刊2013—2017年发表的史学论文中的档案引用记录,采用引文分析法,从施引文献和被引档案分析角度探讨了我国历史学者利用档案的主题、年代、类型、载体、地区和机构分布等特征,并基于此从档案文献编纂、档案全文数据库建设和平衡档案利用需求等方面为档案机构面向历史学者开展馆藏建设与利用服务提供对策。  相似文献   

17.
[目的/意义]数字人文作为新兴的跨学科研究领域,受到许多学科的广泛关注。本文旨在探析数字人文在图情档学科中的知识扩散,供数字人文和图情档发展借鉴。[方法/过程]以Web of Science核心数据集为数据来源,通过对数字人文在图情档的目标文献以及目标文献的图情档施引文献两部分数据的期刊扩散性和主题扩散性两个维度进行分析。其中,期刊扩散性采用来自传播学的期刊双向传播理论以及改良的消除时间因素影响的期刊引证系数两个指标进行分析,主题扩散性采用基于关键词的词云图和基于关键词的聚类进行分析。[结果/结论]研究表明,期刊双向传播中扩散性大于1的期刊共有13个,JOURNAL OF INFORMETRICS为双向扩散性能最好的期刊;数字人文在图情档学科所属学科期刊引证系数排名靠前的期刊有JOURNAL OF INFORMETRICS、SCIENTOMETRICS、JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY等;期刊引证系数均呈现上下波动的现象,但是基本上top10的期刊在2006、2012、2016年出现了峰值,表明这些期刊在这些年份的影响力较大;数字人文研究现状、数字人文与图书馆融合、数字人文在教学中的应用、数字人文的激励措施在目标文献和施引文献的主题中均形成了规模,可以认为这四个方向的主题为数字人文在图情档的重点研究内容。  相似文献   

18.
高Altmetrics指标科技论文学术影响力研究   总被引:9,自引:0,他引:9  
引入"公平性测试"方法以消除时间窗口对被引次数的影响。以高Altmetrics指标论文作为样本,选取与样本论文发表在同一期刊同一期上前后两篇论文作为参照。利用Altmetric.com、Web of Science分别获取273篇样本及参照论文的Altmetric分数、底层数据值和被引用次数。通过比较分析后发现:Altmetrics和引文数两种指标反映出读者对文献的不同关注方向,底层数据源中大众媒体对于Altmetric分数的影响最明显,高Altmetrics指标论文同时具有较高的学术影响力。作为一种早期指标,高Altmetrics指标在一定程度上能够被视作文章在未来获得高被引的风向标。  相似文献   

19.
探讨如何在数字图书馆的文献检索平台中集成实现检索结果聚类、相关文献的关联推荐、相关作者和研究机构的关联推荐以及相关词语的关联推荐,由此帮助用户全面提高查准率和查全率,并且对聚类和推荐结果采用图形进行可视化展示,进一步提高用户的使用满意度。  相似文献   

20.
本文介绍一种基于句法分析和格式语义结构,被称为“语义矢量空间模式”的文献自动标引/检索技术。在此模式中,自然语言文献和检索提问均表示为语义矩阵。通过计算语义矩阵的相似值,检索系统可以预测文献与给定提问之间的相关度,从而达到检索相关文献的目的。初步试验结果表明,若文献及检索提问较长,特别是以原文献作为提问样本时,此检索技术与康奈尔大学的SMART系统相比,在检全率、检准率和相关排序有效性方面均有所改进  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号