首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 453 毫秒
1.
Authors and searchers usually express the same things in many different ways, which causes problems in free text searching of text databases. Thus, a switching tool connecting the different names of one concept is needed. This study tests the effectiveness of a thesaurus as a search-aid in free text searching of a full text database. A set of queries was searched against a large full text database of newspaper articles. The search-aid thesaurus constructed for the test contains the usual relationships of a thesaurus, namely equivalence, hierarchical, and associative relationships. Each query was searched in five distinct modes: basic search, synonym search, narrower term search, related term search, and union of all previous searches. The basic searches contained only terms included in the original query statements. In the synonym searches, the terms of the basic search were extended by disjunction of the synonyms given by the search-aid thesaurus without modifying the overall logic of the basic search. Likewise, the basic search was extended in turn with the narrower terms and with the related terms given by the search-aid thesaurus. The last search mode included the basic terms and all the terms used in the previous searches. The searches were analyzed in terms of relative recall and precision; relative recall was estimated by setting the recall of the union search to 100%. On the average the value of relative recall was 47.2% in the basic search, compared with 100% in the union search; the average value of precision decreased only from 62.5% in the basic search to 51.2% in the union search.  相似文献   

2.
通过对本体、形式概念分析等理论研究进行分析,提出一种以"文档——术语"为核心,形式概念分析为技术手段的气象灾害领域的本体构建方法。针对气象灾害领域知识库和主题词表的缺失,以中英文学术论文为数据源,对气象灾害领域术语的层次关系抽取和分析进行了详细阐述和论证,具体包括领域术语的抽取和筛选,文档术语矩阵的建立,主题概念格的生成,术语层次关系分析;本体OWL描述和可视化展示等过程,最后利用GATE Developer对构建本体的有效性进行了验证。  相似文献   

3.
【目的/意义】从海量微博信息中提取准确的主题词,以期为政府和企业进行舆情分析提供有价值的参考。 【方法/过程】通过分析传统微博主题词提取方法的特点及不足,提出了基于语义概念和词共现的微博主题词提取 方法,该方法利用文本扩充策略将微博从短文本扩充为较长文本,借助于语义词典对微博文本中的词汇进行语义 概念扩展,结合微博文本结构特点分配词汇权重,再综合考虑词汇的共现度来提取微博主题词。【结果/结论】实验 结果表明本文提出的微博主题词提取算法优于传统方法,它能够有效提高微博主题词提取的性能。【创新/局限】利 用语义概念结合词共现思想进行微博主题词提取是一种新的探索,由于算法中的分词方法对个别网络新词切分可 能不合适,会对关键词提取准确性造成微小影响。  相似文献   

4.
【目的/意义】通过概念层次关系自动抽取可以快速地在大数据集上进行细粒度的概念语义层次自动划分, 为后续领域本体的精细化构建提供参考。【方法/过程】首先,在由复合术语和关键词组成的术语集上,通过词频、篇 章频率和语义相似度进行筛选,得到学术论文评价领域概念集;其次,考虑概念共现关系和上下文语义信息,前者 用文献-概念矩阵和概念共现矩阵表达,后者用word2vec词向量表示,通过余弦相似度进行集成,得到概念相似度 矩阵;最后,以关联度最大的概念为聚类中心,利用谱聚类对相似度矩阵进行聚类,得到学术论文评价领域概念层 次体系。【结果/结论】经实验验证,本研究提出的模型有较高的准确率,构建的领域概念层次结构合理。【创新/局限】 本文提出了一种基于词共现与词向量的概念层次关系自动抽取模型,可以实现概念层次关系的自动抽取,但类标 签确定的方法比较简单,可以进一步探究。  相似文献   

5.
Classaurus is a faceted hierarchic scheme of terms with vocabulary control features. It is a system of terms having separate hierarchic schedules of the Elementary Categories: Discipline, Entity, Property, and Action, together with their respective Species/Types, Parts and Special Modifiers. Also there are separate schedules for the Common Modifiers: Form, Time, Environment, and Place. Each of the terms in these hierarchic schedules is enriched with synonyms, quasi synonyms etc. The hierarchic schedules constituting the systematic part is supplemented by an alphabetical index of chain entries. Classaurus is used in the formulation of subject headings in general, and in particular, subject headings according to the Postulate based Permuted Subject Indexing (POPSI) language. For the construction of classaurus the POPSI language itself provides guidelines. A set of programs have been developed to construct a classaurus using as input, subject headings formulated according to POPSI language which are enriched with certain codes to denote the different Elementary Categories, their Species, Parts, Special Modifiers and other Common Modifiers of different kinds. The resulting classaurus has hierarchic schedules but terms in an array are arranged only alphabetically. The hierarchic schedules constitute the Systematic part of the classaurus. The system generates an alphabetic Index Part to the Systematic Part, in which for each term its broader terms are kept to its right hand side successively along with a code to denote the schedule to which the term belongs. To find out the position of a term in the Systematic Part, the whole entry for the term in the Alphabetic Part is taken and the sequence of the terms in it is reversed. Using the code for the schedule in the entry, the appropriate hierarchic schedule is selected. The schedule is then searched using the broader terms successively as keys until the term in question is reached, wherein all the hierarchically related terms could be found, including synonyms, quasi-synonyms etc. Both the Systematic Part and the Alphabetical Index Part are printed out for manual reference and also kept as direct access files for ondashline access and ondashth-spot updating and building up of the classaurus while inputting new subject headings formulated for this purpose.  相似文献   

6.
程惠兰  胡小华 《现代情报》2009,29(10):156-158
基于科技文献检索的信息发现与信息检索,探讨VIP、CNKI、万方数据知识服务平台系统功能在课题检索中的应用。在信息发现方面,综合运用各数据库系统功能寻找检索概念的同义词、课题相关的分类号和所属学科类别、抽象检索概念的相关词、课题相关的研究机构和研究者。在信息检索方面,根据各系统的功能特点不同,制定相应的课题检索策略——主题  相似文献   

7.
【目的/意义】从海量论文元数据中抽取算法术语并构建它们之间的创新演化关系,有利于对算法的有效管 理和运用,以帮助科研工作者提升研究效率、采纳前沿成果。【方法/过程】首先,以GAN算法论文摘要为语料,通过 人工标注与规则抽取相结合的方式进行算法术语标注,并利用BERT-BiLSTM-CRF模型实现算法术语的自动抽 取。然后,将建立的模型应用于LDA算法论文的被引文献元数据中抽取算法术语,依据规则判断和引文关系,从被 引内容中抽取LDA算法的创新演化路径并构建。【结果/结论】以GAN论文为实例的算法术语实验中,精确率、召回 率与F1分数分别达到了0.81、0.63与0.71,并应用关系抽取方法成功构建了LDA算法的创新演化路径,该方法可以 有效推动算法进化网络构建和算法检索与追踪等方面的工作,丰富创新扩散理论的相关研究。【创新/局限】拓展了 命名实体识别技术的应用领域,为计算机算法管理提供了良好的思路。后续可优化创新演化路径的构建方法。  相似文献   

8.
杨世明  王兵 《情报科学》1998,16(1):35-41
引文索引以十分接近自然语言的引用文献为检索语言,摆脱了传统检索工具必须用特定的检索语言"词典"的种种制约;引文索引自有一套特殊的结构和运作程序,它以文章的主题思想将各种相关文献组合起来,从而取得跨越学科、时间和空间以及高引得深度的检索功效.引文索引与核心期刊相结合,可以不足10%的期刊,取得相当于全部期刊的90%的信息量.这些特殊功效对于急需建立适合本国国情的信息系统的发展中国家来说,具有十分重要的现实意义.  相似文献   

9.
10.
This study investigates the robust stability of the retarded type of interval fractional order plants with an interval time delay. To this end, the characteristic quasi-polynomial is divided into two terms. The first term is simply the denominator interval polynomial of the open loop system and the second term is the multiplication of the interval delay term in the numerator of the open loop system which is an interval polynomial. Each of these two terms of the characteristic quasi-polynomial makes their own value sets in the complex plane for a given frequency. In this paper, based on these two value sets and by using the zero exclusion principle, the robust stability of the closed loop system by applying a FOPID controller is analyzed. Finally, two numerical examples and an experimental verification are provided to demonstrate the effectiveness of the proposed method in the robust stabilization of fractional order plants with interval uncertainties and interval time delay.  相似文献   

11.
张一鸣  曾丽萍 《现代情报》2011,31(12):21-26
2010年7月,我室接受了科技部中国国际核聚变能源计划执行中心和中核集团公司联合下达的大型战略性情报软课题——我国核聚变能源研究发展技术预测和战略途径研究的研究任务。从2010年7月27日课题启动到2011年7月26日提交调研报告,用时1年。在课题组全体人员的努力下,经过文献资料收集、调研,文献阅读消化提炼、提纲确定、初稿撰写,内容分析和优化、课题组内部初审、专家审稿,修改等环节,2011年7月26日正式完成50万字,340页的调研报告。这是我室建室以来第一次承担这样大型的国家软课题调研和研究任务。通过这项工作,为我室今后更好地开展和接受相关课题研究,锻炼队伍,开展与同行业相关学科及国际前沿课题关注领域的比较分析等知识服务工作,为我院深入开展科学研究提供信息支持奠定了良好的开端。本文回顾和总结了我室对该战略性情报软课题的组织和实施工作。  相似文献   

12.
Traditional index weighting approaches for information retrieval from texts depend on the term frequency based analysis of the text contents. A shortcoming of these indexing schemes, which consider only the occurrences of the terms in a document, is that they have some limitations in extracting semantically exact indexes that represent the semantic content of a document. To address this issue, we developed a new indexing formalism that considers not only the terms in a document, but also the concepts. In this approach, concept clusters are defined and a concept vector space model is proposed to represent the semantic importance degrees of lexical items and concepts within a document. Through an experiment on the TREC collection of Wall Street Journal documents, we show that the proposed method outperforms an indexing method based on term frequency (TF), especially in regard to the few highest-ranked documents. Moreover, the index term dimension was 80% lower for the proposed method than for the TF-based method, which is expected to significantly reduce the document search time in a real environment.  相似文献   

13.
郑阳  莫建文 《大众科技》2012,14(4):20-23
针对在科技文献中,未登录词等相关专业术语其变化多端,在中文分词中难以识别,影响了专业领域文章的分词准确度,结合实际情况给出了一种基于专业术语提取的中文分词方法。通过大量特定领域的专业语料库,基于互信息和统计的方法,对文中的未登录词等专业术语进行提取,构造专业术语词典,并结合通用词词典,利用最大匹配方法进行中文分词。经实验证明,该分词方法可以较准确的抽取出相关专业术语,从而提高分词的精度,具有实际的应用价值。  相似文献   

14.
基于词共现的概念图自动构建研究   总被引:1,自引:0,他引:1  
提出了一种利用词共现技术自动构建概念图的方法,首先进行词条选择,并计算词条之间的关联强度生成关系矩阵;接着,从关系矩阵中挖掘概念图;最后,利用可视化技术动态展示概念图。实验表明,新的挖掘算法和可视化技术的引入,能够改善概念图自动构建的效果。  相似文献   

15.
[目的/意义]实现对领域概念的自动学习抽取,解决领域本体自动化构建的首要基础任务。[方法/过程]以无监督的学习方法和端到端的识别模式为理论技术基础,首先通过对主流词嵌入模型进行对比分析,设计提出了基于Word2Vec和Skip-Gram的领域文本特征词嵌入模型的自动生成方法;其次研究构建了以IOB格式的标注文本作为输入,基于自注意力机制的BLSTM-CRF领域概念自动抽取模型;最后以资源环境学科领域为例进行了实验研究与评估分析。[结果/结论]模型能够实现对领域概念的自动抽取,对领域新概念或术语的自动识别也具有一定的健壮性。[局限]模型精度尚未达到峰值,有待进一步优化提升。  相似文献   

16.
上海市地方高校重点学科建设验收评估指标体系研究   总被引:7,自引:0,他引:7  
针对上海市教委第三批重点学科中期评估所采用的评估指标体系存在的动态性和定量化方面的缺陷,根据重点学科建设要求和制订评估指标体系的原则,结合访谈,设置了由35条定性和定量指标组成的重点学科建设验收评估指标体系。运用专家咨询法,确立了各评估指标的权重,并确定了综合评估分的计算方法。采和该方法对26个重点学科中期评估的有关资料进行了回顾性验证。本研究制订的评估指标体系,在一定程度上消除了重点学科原有水平的不一致对评估结果的影响。同时,也在一定程序上消除主观因素对评估结果的影响。  相似文献   

17.
在对近年来创新管理领域普遍使用的创新术语进行全面梳理的同时,构建创新概念框架体系,并以维度特征为视角对各创新术语进行特征类比和类型区分,探究相关发展趋势和各个创新类型在中国创新环境下的实践应用。  相似文献   

18.
The implementation of relevance feedback explored here demonstrates the feasibility of query reformulation for boolean retrievals. Improvements to a term prevalence formula used in earlier research are presented along with experimental results that confirm the crucial role of term weights in relevance feedback. In particular, the weighting formula presented here differs significantly from those used by researchers in associative retrieval environments by giving equal weight to a patron's judgements of nonrelevance. The use of negative feedback—by NOTing terms in the negative prevalence range—failed to improve precision, a result that is also considered significant.  相似文献   

19.
《教育概论》作为建构教育专业学生基础框架的必读书是非常有必要的,叶澜教授对于教育定义、教育要素、教育与社会与人发展之间的关系均有自己独特的、客观的见解。该书通过总-分-总的整体结构,深刻阐释了客观实在的"教育"定义、教育具体的主客体关系和教育两大功能的关系,为读者提供了一幅教育学基础框架的严谨图谱。  相似文献   

20.
本文简要地介绍了MultiTes 2007 Pro的使用方法,并通过创建一个小型的情报学叙词表,讨论了该软件的功能和特点,情报学主题词的获取以及创建一个简单叙词表的步骤,最后,本文对MultiTes 2007 Pro的优缺点进行了简要评价。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号