共查询到20条相似文献,搜索用时 31 毫秒
1.
在对目前各种作者重名消解方法进行总结的基础上, 针对中文文献题录数据特征, 将重名消解问题转换为同名作者文献的分类问题, 提出一种基于规则和相似度的重名消解框架模型, 并对其中的分解规则和合并规则进行详细的算法描述, 最后选取3个学科的重名作者数据集进行实验, 实验结果表明该模型能有效提高作者重名消解的准确率. 相似文献
2.
3.
On knowledge-poor methods for person name matching and lemmatization for highly inflectional languages 总被引:1,自引:1,他引:0
Web person search is one of the most common activities of Internet users. Recently, a vast amount of work on applying various
NLP techniques for person name disambiguation in large web document collections has been reported, where the main focus was
on English and few other major languages. This article reports on knowledge-poor methods for tackling person name matching
and lemmatization in Polish, a highly inflectional language with complex person name declension paradigm. These methods apply
mainly well-established string distance metrics, some new variants thereof, automatically acquired simple suffix-based lemmatization
patterns and some combinations of the aforementioned techniques. Furthermore, we also carried out some initial experiments
on deploying techniques that utilize the context, in which person names appear. Results of numerous experiments are presented.
The evaluation carried out on a data set extracted from a corpus of on-line news articles revealed that achieving lemmatization
accuracy figures greater than 90% seems to be difficult, whereas combining string distance metrics with suffix-based patterns
results in 97.6–99% accuracy for the name matching task. Interestingly, no significant additional gain could be achieved through
integrating some basic techniques, which try to exploit the local context the names appear in. Although our explorations were
focused on Polish, we believe that the work presented in this article constitutes practical guidelines for tackling the same
problem for other highly inflectional languages with similar phenomena.
相似文献
Marcin SydowEmail: |
4.
《Journal of Informetrics》2020,14(2):101037
A large number of overseas elites were brought back to China by the policy in the past decade. However, name disambiguation defied investigations on the relationship between their mobility and research performance. By taking advantage of the ORCID website and applying causal inference strategies, we investigated 2489 China-connected scientists’ academic performance in the Web of Science database in terms of their job mobility, including 1388 scientists who moved to China the treatment group, and 1101 scientists with a possibility to move to China the control group. The results show that first, scientists moving to China have a new growth pattern where both their productivity and the rates of being corresponding authors in publications grew more rapidly than before; however, they made fewer contributions to the four top journals, Nature, Science, Cell, and PNAS. Second, the research performance of the scientists is affected by the time of mobility towards China, the countries from which they moved, and the disciplines of their publications. Last, China now maintains symmetrical inflow-outflow patterns with most countries, especially developed countries in Europe and North America, with only a few exceptions (e.g., Pakistan). 相似文献
5.
6.
Ruimin Ma 《Journal of Informetrics》2012,6(4):532-542
The paper first introduces the basic problems of author bibliographic coupling including the relationship between author bibliographic coupling and document bibliographic coupling as well as the three calculation methods of author coupling strength, namely, simple method, minimum method and combined method. Next I choose a small sample of authors in Chinese library and information science (LIS) as the research objects to have a comparative analysis of three types of author coupling strength algorithms (the data source is from the Chinese Social Sciences Citation Index (CSSCI)). The result shows that the minimum method is the most appropriate one to calculate the author coupling strength. Then a large sample of authors is chosen to analyze the intellectual structure of Chinese LIS. The result shows that author bibliographic coupling analysis (ABCA) can discover the intellectual structure of a discipline better. It is also found that compared with author cocitation analysis (ACA), ABCA has the advantage that it not only can discover the intellectual structure of a discipline more comprehensively and concretely but also can reflect the research frontier of the discipline. Finally, some practical problems that arise during this research are discussed. 相似文献
7.
在深入分析NSTL篇级元数据特点的基础上,结合模糊匹配算法,提出一种适合NSTL现有数据的人名消歧规则集,并给出基于该规则集的人名消歧算法。通过对实际数据集的实验,该算法在准确率、召回率等指标方面都有良好的表现,具备较好的消歧效果。 相似文献
8.
针对科技期刊论文参考文献著录错误较高的情况,提出了核查与校对参考文献的方法:在编辑加工时重视参考文献著录审核和加工;在校对阶段,可运用通读、前后对照、熟记业务内常用信息、利用常识质疑等方法,做好参考文献的校对工作。 相似文献
9.
基于网络协作标注的标签消歧方法述评 总被引:1,自引:0,他引:1
窦玉萌 《现代图书情报技术》2010,26(3):27-32
以网络协作标注中的标签为研究对象,调研标签消歧方法,并将其划分为基于数据挖掘方法消歧、基于统计分析方法消歧、利用相关知识组织工具消歧、引入控制机制消歧和开发可视化组件消歧5类。比较这5类消歧方法在用户参与度、消歧时机、消歧性质、实验与应用情况和发展前景5个方面存在的区别和联系。 相似文献
10.
对于中外分类法的映射研究,国内多关注于对计算机自动映射方法的探讨,缺乏对词表类目复杂语义关系的具体研究。本文在对DDC和《中图法》理学领域共计4639个类目进行人工直接映射的基础上,通过统计其匹配依据,得出结论:理学类目下数学、物理、化学、天文、地理等不同学科内匹配依据的总体分布呈现一致性的特点,这为检验计算机自动匹配准确性提供了方法。实验表明,类目名称、注释、主题词、类目关系作为主要的判断依据,占映射类目的63%,依据匹配规则占映射的5.14%,依据书目记录的匹配占31.87%,因此论文建议实现计算机自动匹配时,除类目自身信息外,还需考虑书目记录的匹配。 相似文献
11.
This paper treats document–document similarity approaches in the context of science mapping. Five approaches, involving nine methods, are compared experimentally. We compare text-based approaches, the citation-based bibliographic coupling approach, and approaches that combine text-based approaches and bibliographic coupling. Forty-three articles, published in the journal Information Retrieval, are used as test documents. We investigate how well the approaches agree with a ground truth subject classification of the test documents, when the complete linkage method is used, and under two types of similarities, first-order and second-order. The results show that it is possible to achieve a very good approximation of the classification by means of automatic grouping of articles. One text-only method and one combination method, under second-order similarities in both cases, give rise to cluster solutions that to a large extent agree with the classification. 相似文献
12.
13.
14.
A variety of bibliometric measures have been proposed to quantify the impact of researchers and their work. The h-index is a notable and widely used example which aims to improve over simple metrics such as raw counts of papers or citations. However, a limitation of this measure is that it considers authors in isolation and does not account for contributions through a collaborative team. To address this, we propose a natural variant that we dub the Social h-index. The idea is to redistribute the h-index score to reflect an individual's impact on the research community. In addition to describing this new measure, we provide examples, discuss its properties, and contrast with other measures. 相似文献
15.
16.
17.
18.
Diane L. Boehr 《Cataloging & classification quarterly》2018,56(2-3):262-272
ABSTRACTAlthough it is not yet known for certain what will replace MARC, eventually bibliographic data will need to be transformed to move into a linked data environment. This article discusses why the National Library of Medicine chose to add Uniform Resource Identifiers for Medical Subject Headings as our starting point and details the process by which they were added to the MeSH MARC authority records, the legacy bibliographic records, and the records for newly cataloged items. The article outlines the various enhancement methods available, decisions made, and the rationale for the selected method. 相似文献
19.
Michael Schreiber 《Journal of Informetrics》2010,4(4):636-643
The citation records of 26 physicists are analyzed in order to determine the modified g index gm which takes multiple coauthorship into account by fractionalized counting of the publications. The results are compared with the original g index as well as with the h index and the respective modified h index hm. Although the correlations between these indices are relatively strong, the arrangement of the datasets is significantly different in detail depending on whether they are put into order according to the values of either the original or the modified indices. 相似文献
20.
《Journal of Informetrics》2007,1(1):35-46
This paper presents a method for assessing the quality of similarity functions. The scenario taken into account is that of approximate data matching, in which it is necessary to determine whether two data instances represent the same real world object. Our method is based on the semi-automatic estimation of optimal threshold values. We propose two methods for performing such estimation. The first method is an algorithm based on a reward function, and the second is a statistical method. Experiments were carried out to validate the techniques proposed. The results show that both methods for threshold estimation produce similar results. The output of such methods was used to design a grading function for similarity functions. This grading function, called discernability, was used to compare a number of similarity functions applied to an experimental data set. 相似文献