首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
在对目前各种作者重名消解方法进行总结的基础上, 针对中文文献题录数据特征, 将重名消解问题转换为同名作者文献的分类问题, 提出一种基于规则和相似度的重名消解框架模型, 并对其中的分解规则和合并规则进行详细的算法描述, 最后选取3个学科的重名作者数据集进行实验, 实验结果表明该模型能有效提高作者重名消解的准确率.  相似文献   

2.
作者重名辨识研究进展   总被引:1,自引:0,他引:1  
作者重名现象将降低文献检索和网络检索的准确性,影响文献数据搜集质量,增加基于作者个人层面分析评价的障碍。目前国内外学者提出了人工辨识、数据库字段修正、基于机器学习的重名辨识等多种方法来解决作者重名问题。文章总结作者重名辨识面临的问题,分析当前各辨识方法的特点以及不足之处,指明作者重名辨识特别是中国作者重名辨识的发展方向。  相似文献   

3.
Web person search is one of the most common activities of Internet users. Recently, a vast amount of work on applying various NLP techniques for person name disambiguation in large web document collections has been reported, where the main focus was on English and few other major languages. This article reports on knowledge-poor methods for tackling person name matching and lemmatization in Polish, a highly inflectional language with complex person name declension paradigm. These methods apply mainly well-established string distance metrics, some new variants thereof, automatically acquired simple suffix-based lemmatization patterns and some combinations of the aforementioned techniques. Furthermore, we also carried out some initial experiments on deploying techniques that utilize the context, in which person names appear. Results of numerous experiments are presented. The evaluation carried out on a data set extracted from a corpus of on-line news articles revealed that achieving lemmatization accuracy figures greater than 90% seems to be difficult, whereas combining string distance metrics with suffix-based patterns results in 97.6–99% accuracy for the name matching task. Interestingly, no significant additional gain could be achieved through integrating some basic techniques, which try to exploit the local context the names appear in. Although our explorations were focused on Polish, we believe that the work presented in this article constitutes practical guidelines for tackling the same problem for other highly inflectional languages with similar phenomena.
Marcin SydowEmail:
  相似文献   

4.
A large number of overseas elites were brought back to China by the policy in the past decade. However, name disambiguation defied investigations on the relationship between their mobility and research performance. By taking advantage of the ORCID website and applying causal inference strategies, we investigated 2489 China-connected scientists’ academic performance in the Web of Science database in terms of their job mobility, including 1388 scientists who moved to China the treatment group, and 1101 scientists with a possibility to move to China the control group. The results show that first, scientists moving to China have a new growth pattern where both their productivity and the rates of being corresponding authors in publications grew more rapidly than before; however, they made fewer contributions to the four top journals, Nature, Science, Cell, and PNAS. Second, the research performance of the scientists is affected by the time of mobility towards China, the countries from which they moved, and the disciplines of their publications. Last, China now maintains symmetrical inflow-outflow patterns with most countries, especially developed countries in Europe and North America, with only a few exceptions (e.g., Pakistan).  相似文献   

5.
付媛  朱礼军  韩红旗 《情报工程》2016,2(1):053-058
为应对日益严重的姓名歧义现象给提高搜索引擎查全率和查准率带来的挑战,同时给姓名消歧方法研究提供参考建议,对研究现状和主要成果进行总结.首先,介绍研究姓名消歧的目的和意义.其次,对国内外现有姓名消歧方法研究进展进行梳理,主要方法包括基于特征的、基于机器学习的、基于社会网络的、基于网络知识资源的姓名消歧等多种方法来解决姓名歧义问题.最后,文章分析各种方法的特征和不足,总结姓名消歧待解决的问题以及未来的研究方向.  相似文献   

6.
The paper first introduces the basic problems of author bibliographic coupling including the relationship between author bibliographic coupling and document bibliographic coupling as well as the three calculation methods of author coupling strength, namely, simple method, minimum method and combined method. Next I choose a small sample of authors in Chinese library and information science (LIS) as the research objects to have a comparative analysis of three types of author coupling strength algorithms (the data source is from the Chinese Social Sciences Citation Index (CSSCI)). The result shows that the minimum method is the most appropriate one to calculate the author coupling strength. Then a large sample of authors is chosen to analyze the intellectual structure of Chinese LIS. The result shows that author bibliographic coupling analysis (ABCA) can discover the intellectual structure of a discipline better. It is also found that compared with author cocitation analysis (ACA), ABCA has the advantage that it not only can discover the intellectual structure of a discipline more comprehensively and concretely but also can reflect the research frontier of the discipline. Finally, some practical problems that arise during this research are discussed.  相似文献   

7.
在深入分析NSTL篇级元数据特点的基础上,结合模糊匹配算法,提出一种适合NSTL现有数据的人名消歧规则集,并给出基于该规则集的人名消歧算法。通过对实际数据集的实验,该算法在准确率、召回率等指标方面都有良好的表现,具备较好的消歧效果。  相似文献   

8.
宋春燕  王菊香 《编辑学报》2012,24(3):249-249
针对科技期刊论文参考文献著录错误较高的情况,提出了核查与校对参考文献的方法:在编辑加工时重视参考文献著录审核和加工;在校对阶段,可运用通读、前后对照、熟记业务内常用信息、利用常识质疑等方法,做好参考文献的校对工作。  相似文献   

9.
基于网络协作标注的标签消歧方法述评   总被引:1,自引:0,他引:1  
以网络协作标注中的标签为研究对象,调研标签消歧方法,并将其划分为基于数据挖掘方法消歧、基于统计分析方法消歧、利用相关知识组织工具消歧、引入控制机制消歧和开发可视化组件消歧5类。比较这5类消歧方法在用户参与度、消歧时机、消歧性质、实验与应用情况和发展前景5个方面存在的区别和联系。  相似文献   

10.
对于中外分类法的映射研究,国内多关注于对计算机自动映射方法的探讨,缺乏对词表类目复杂语义关系的具体研究。本文在对DDC和《中图法》理学领域共计4639个类目进行人工直接映射的基础上,通过统计其匹配依据,得出结论:理学类目下数学、物理、化学、天文、地理等不同学科内匹配依据的总体分布呈现一致性的特点,这为检验计算机自动匹配准确性提供了方法。实验表明,类目名称、注释、主题词、类目关系作为主要的判断依据,占映射类目的63%,依据匹配规则占映射的5.14%,依据书目记录的匹配占31.87%,因此论文建议实现计算机自动匹配时,除类目自身信息外,还需考虑书目记录的匹配。  相似文献   

11.
This paper treats document–document similarity approaches in the context of science mapping. Five approaches, involving nine methods, are compared experimentally. We compare text-based approaches, the citation-based bibliographic coupling approach, and approaches that combine text-based approaches and bibliographic coupling. Forty-three articles, published in the journal Information Retrieval, are used as test documents. We investigate how well the approaches agree with a ground truth subject classification of the test documents, when the complete linkage method is used, and under two types of similarities, first-order and second-order. The results show that it is possible to achieve a very good approximation of the classification by means of automatic grouping of articles. One text-only method and one combination method, under second-order similarities in both cases, give rise to cluster solutions that to a large extent agree with the classification.  相似文献   

12.
13.
14.
A variety of bibliometric measures have been proposed to quantify the impact of researchers and their work. The h-index is a notable and widely used example which aims to improve over simple metrics such as raw counts of papers or citations. However, a limitation of this measure is that it considers authors in isolation and does not account for contributions through a collaborative team. To address this, we propose a natural variant that we dub the Social h-index. The idea is to redistribute the h-index score to reflect an individual's impact on the research community. In addition to describing this new measure, we provide examples, discuss its properties, and contrast with other measures.  相似文献   

15.
16.
吴丹 《图书情报工作》2009,53(13):120-81
查询翻译歧义性问题是影响跨语言信息检索结果的关键,因此针对查询翻译的消歧研究已成为信息检索领域的研究热点。在对现有研究与应用调研的基础上,详细分析四类自动消歧方法,分别是:对查询进行结构化处理、通过语言分析帮助消歧、借助机读化语言资源进行消歧以及通过人机交互消歧,以期为跨语言信息检索查询翻译提供较好的消歧方法。  相似文献   

17.
18.
ABSTRACT

Although it is not yet known for certain what will replace MARC, eventually bibliographic data will need to be transformed to move into a linked data environment. This article discusses why the National Library of Medicine chose to add Uniform Resource Identifiers for Medical Subject Headings as our starting point and details the process by which they were added to the MeSH MARC authority records, the legacy bibliographic records, and the records for newly cataloged items. The article outlines the various enhancement methods available, decisions made, and the rationale for the selected method.  相似文献   

19.
The citation records of 26 physicists are analyzed in order to determine the modified g index gm which takes multiple coauthorship into account by fractionalized counting of the publications. The results are compared with the original g index as well as with the h index and the respective modified h index hm. Although the correlations between these indices are relatively strong, the arrangement of the datasets is significantly different in detail depending on whether they are put into order according to the values of either the original or the modified indices.  相似文献   

20.
This paper presents a method for assessing the quality of similarity functions. The scenario taken into account is that of approximate data matching, in which it is necessary to determine whether two data instances represent the same real world object. Our method is based on the semi-automatic estimation of optimal threshold values. We propose two methods for performing such estimation. The first method is an algorithm based on a reward function, and the second is a statistical method. Experiments were carried out to validate the techniques proposed. The results show that both methods for threshold estimation produce similar results. The output of such methods was used to design a grading function for similarity functions. This grading function, called discernability, was used to compare a number of similarity functions applied to an experimental data set.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号