首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this paper, which treats Swedish full text retrieval, the problem of morphological variation of query terms in the document database is studied. The Swedish CLEF 2003 test collection was used, and the effects of combination of indexing strategies with query terms on retrieval effectiveness were studied. Four of the seven tested combinations involved indexing strategies that used normalization, a form of conflation. All of these four combinations employed compound splitting, both during indexing and at query phase. SWETWOL, a morphological analyzer for the Swedish language, was used for normalization and compound splitting. A fifth combination used stemming, while a sixth attempted to group related terms by right hand truncation of query terms. The truncation was performed by a search expert. These six combinations were compared to each other and to a baseline combination, where no attempt was made to counteract the problem of morphological variation of query terms in the document database. Both the truncation combination, the four combinations based on normalization and the stemming combination outperformed the baseline. Truncation had the best performance. The main conclusion of the paper is that truncation, normalization and stemming enhanced retrieval effectiveness in comparison to the baseline. Further, normalization and stemming were not far below truncation.  相似文献   

2.
There have been ample suggestions in the literature that terms added to documents from Flickr and Wikipedia can complement traditional methods of indexing and controlled vocabularies. At the same time, adding new metadata to existing metadata objects may not always add value to those objects. The potential added-value of using user-contributed (“social”) terms from Flickr and the English Wikipedia in image indexing is compared with using two expert-created controlled vocabularies—the Thesaurus for Graphic Materials and the Library of Congress Subject Headings—without those social terms. Experiments confirmed that the social terms did add value, relative to terms from the controlled vocabularies. The median rating for the usefulness of social terms was significantly higher than the baseline rating, but was lower than the ratings for the terms from the Thesaurus for Graphic Materials and the Library of Congress Subject Headings. Furthermore, complementing the controlled vocabulary terms with social terms more than doubled the average coverage of participants' terms for a photograph. The relationships between user demographics and users' perceptions of the value of terms were also investigated, as well as the relationships between user demographics and indexing quality, as measured by the number of terms participants assigned to a photograph. Participants with more tagging and indexing experience assigned a greater number of tags than did other participants.  相似文献   

3.
Abstract

Subject indexing and classification of law resources is a complex issue due to several factors: specialized meanings of legal terms, meanings across different branches of law, terms in legal systems from diverse countries, and terms in different languages. These issues led to the development of a classification and subject indexing system which will help answer the major challenges of indexing and classifying law resources in the Research Institute Library at the National Autonomous University of Mexico. Adopting its own classification required interdisciplinary work between law and information organization specialists, constant updating by legal specialists and others beyond the Legal Research Institute; and the sharing of this classification system with other institutions. Now, this classification system is used by important institutions that specialize in law, such as the network of Libraries of the Supreme Court of Justice of the Nation of Mexico. The purpose of this article is to show why and how this law classification and subject system was developed and is continuously being updated by libarians and law scholars in order for it to meet their specific needs.  相似文献   

4.
BACKGROUND: EUROETHICS is a database covering European literature on ethics in medicine. It is produced within Eurethnet, a European information network on ethics in medicine and biotechnology. OBJECTIVES: The aim of Euroethics is to disseminate information on European bioethical literature that may otherwise be difficult to find. METHODS: A collaboration model for pooling data from different centres was developed. The policy was to accomplish data uniformity, while still allowing for local differences in terms of software, indexing practices and resources. Records contributed to the database follow common standards in terms of data fields and indexing terms. The indexing terms derive from two thesauri, Thesaurus Ethics in the Life Sciences (TELS) and Medical Subject Headings (MeSH). Combining elements from search tools developed previously, the developers sought to find a technical solution optimized for this data model. An approach relying on a thesaurus database that is loaded along with the bibliographic database is described. RESULTS AND CONCLUSIONS: The present case study offers examples of possible approaches to several tasks often encountered in database development, such as: merging data from diverse sources, getting the most out of indexing terms used in a database, and handling more than one thesaurus in the same system.  相似文献   

5.
对中文科学引文数据库中的被引文献进行作者评价调查 ,证明引文索引词能较好地反映出由其所标引的文献的主题。  相似文献   

6.
一种改进的余弦向量度量法文本检索模型   总被引:2,自引:1,他引:1  
付永贵 《图书情报工作》2011,55(19):115-119
针对用户对索引项要求的不同提出改进余弦向量度量法(ICVMM)文本检索模型,该模型将索引项分为主索引项和特征索引项,根据查询相关文本集中特征索引项相关性概率值来修改文本和查询特征索引项的初始权值;通过实例对传统余弦向量度量法(TCVMM)文本检索模型和ICVMM文本检索模型的查询效率进行对比,说明ICVMM文本检索模型的查询结果更接近用户的需求。  相似文献   

7.
中文期刊文献通用词标引分析   总被引:1,自引:0,他引:1  
通用因素是文献主题的构成因素之一,对主体因素起细分作用。通用词是指那些在专业领域没有独立检索意义的泛指词。在中文期刊文献标引的过程中,通用词的使用对其标引结果产生着重要的影响。文章讨论了通用词标引的一般规则,并以《中国期刊网》中的文献为例,进行抽样统计和实例分析,归纳了通用词标引的错误现象及其原因,并对期刊文献的通用词标引提出了几点改进意见。  相似文献   

8.
Vocabulary incompatibilities arise when the terms used to index a document collection are largely unknown, or at least not well-known to the users who eventually search the collection. No matter how comprehensive or well-structured the indexing vocabulary, it is of little use if it is not used effectively in query formulation. This paper demonstrates that techniques for mapping user queries into the controlled indexing vocabulary have the potential to radically improve document retrieval performance. We also show how the use of controlled indexing vocabulary can be employed to achieve performance gains for collection selection. Finally, we demonstrate the potential benefit of combining these two techniques in an interactive retrieval environment. Given a user query, our evaluation approach simulates the human user's choice of terms for query augmentation given a list of controlled vocabulary terms suggested by a system. This strategy lets us evaluate interactive strategies without the need for human subjects.  相似文献   

9.
中文网页标引源主题表达能力的调查统计   总被引:22,自引:1,他引:21  
通过对随机采集的300篇中文经济类网页进行人工自由标引、人工打分、词频统计,并进行统计数据的分析,得出网页内容主题与网页题名、文章标题等12个标引源的关系,分析中文网页的不同部位的主题表达能力,并为之设计加权标引时的适当权值,以便为自动标引及人工智能搜索引擎的研制提供数据。  相似文献   

10.
Introduction: Locating reports of trials from journals not indexed in the major databases presents difficulties to systematic reviewers, and may be a factor in improving the reliability of the reviews. Objectives: To identify and make available reports of controlled trials from the Australasian Medical Index (AMI). To measure the quality of indexing of trials in AMI. Methods: Using a highly sensitive search strategy consisting of methodology indexing and free‐text terms, records from AMI were read for reports of controlled trials. Trials meeting the criteria were submitted for inclusion in The Cochrane Controlled Trials Register (CCTR) and assessed for the quality of their indexing. Results: 3621 records were downloaded, of which 512 were identified as reports of controlled trials (317 RCTs; 195 CCTs) and submitted to CCTR. The precision of methodology indexing terms was 60%, but sensitivity just 18%. The quality of indexing of trials was generally poor with only 50 tagged with the RCT/CCT publication type term. 453 reports (88%) were not previously available in CCTR. Conclusions: The large proportion of trials found to be unique to the AMI database increases the pool of studies available to systematic reviewers, and helps ensure CCTR remains the most comprehensive source of trials.  相似文献   

11.
Indexing consistency in MEDLINE   总被引:3,自引:0,他引:3  
The quality of indexing of periodicals in a bibliographic data base cannot be measured directly, as there is no one "correct" way to index an item. However, consistency can be used to measure the reliability of indexing. To measure consistency in MEDLINE, 760 twice-indexed articles from 42 periodical issues were identified in the data base, and their indexing compared. Consistency, expressed as a percentage, was measured using Hooper's equation. Overall, checktags had the highest consistency. Medical Subject Headings (MeSH) and subheadings were applied more consistently to central concepts than to peripheral points. When subheadings were added to a main heading, consistency was lowered. "Floating" subheadings were more consistent than were attached subheadings. Indexing consistency was not affected by journal indexing priority, language, or length of the article. Terms from MeSH Tree Structure categories A, B, and D appeared more often than expected in the high-consistency articles; whereas terms from categories E, F, H, and N appeared more often than expected in the low-consistency articles. MEDLINE, with its excellent controlled vocabulary, exemplary quality control, and highly trained indexers, probably represents the state of the art in manually indexed data bases.  相似文献   

12.
宋芸芳 《图书馆建设》2012,(3):52-54,57
组配标引是在词表中选择两个及两个以上有形式逻辑关系的词,按照特定规则组成的一组标引词串,用以满足文献多层次、多途径检索的需要。概念组配是文献标引的关键环节。根据参与组配的主题词之间的逻辑关系,概念组配可分为交叉组配、限定组配和联结组配3种基本类型。在实际组配标引工作中,编目员应避免因对新词表不熟悉造成检索词语构成混乱,避免因主题概念转换错误造成粗标、漏标和错标,避免因未遵循专指性标引规则造成切题不当,减少组配标引失误。  相似文献   

13.
用于中文信息自动分类的《中图法》知识库的构建   总被引:4,自引:0,他引:4  
中文文献数据库中存在着大量的分类号与关键词(或主题词)对应的人工标引记录。通过对这些数据的加工整理,以《中图法》类目体系为主干,组织各学科领域的语词,从而构建出反映分类号与语词概念对应关系的《中图法》知识库,用以实现信息的自动标引和自动分类。构建《中图法》知识库面临着一些难题:异构数据的整合;原始数据中分类号与主题词或词串之间一对多、多对多关系的筛选;标引词串与知识库中的词串的相符性比较等。图2。参考文献8。  相似文献   

14.
针对中文学术文献,提出一种新的自动标引方法,该方法基于文献之间的引用关系,利用被引文献的标引词,对遗传算法进行改进,实现自动标引,避免利用文献正文、标题等内部文本特征进行自动标引的局限性。通过在大规模真实测试集(中文学术文献)上进行实验,验证该方法的有效性。  相似文献   

15.
Scientific repositories create a new environment for studying traditional information science issues. The interaction between indexing terms provided by users and controlled vocabularies continues to be an area of debate and study. This article reports and analyzes findings from a study that mapped the relationships between free text keywords and controlled vocabulary terms used in the sciences. Based on this study's findings recommendations are made about which vocabularies may be better to use in scientific data repositories.  相似文献   

16.
在随机抽样调查的基础上,从标引深度、检索深度、标引词、标题级数及标引语言等5个方面,对中美图书在版编目中的主题标引作了对比,分别指出了它们的优缺点,并提出了改进意见。  相似文献   

17.
Pattern indexing is an attempt at combining standardized and free indexing. In contrast to prevailing indexing methods, notably precoordinated ones, pattern indexing also takes into consideration the terminological and information retrieval habits in certain displines of science. It is based on patterns consisting of subject categories reflecting the conceptual and methodological framework of a given discipline. These categories provide structured sets of standardized subject headings. To allow for flexibility and adequacy, these headings may be complemented by free indexing terms. Pattern indexing is intended to mend opaque catalog structures and terminological uncertainties of topical subject headings in common precoordinated indexing practice. Pattern indexing is discussed in the context of literary scholarship.  相似文献   

18.
����ISA�������̽��   总被引:2,自引:1,他引:1  
对美国《情报学文摘》(ISA)的《受控标引词表》、ISA的主题标引深度和标引词的标引频率,以及ISA所用的自由词作了6年的统计分析,并在此基础上,指出了ISA主题标引的优缺点和存在的问题,提出了改进ISA主题标引工具的5点意见和提高主题标引质量的3条措施。  相似文献   

19.
20.
后控规范的计算机处理   总被引:7,自引:1,他引:6  
分析了建立情报检索系统时受控主题标引的不足之处:标引效率低、语义网得不到扩充、不一致的组配标引。提出了以后控规范为基础的标引体系。为使后控规范能方便地由计算机实现,研究了利用相似性匹配技术找出语义上有一定联系的术语,并由计算机半自动地建立用、代、属、分、参等语义关系的方法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号