首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 359 毫秒
张宁  朱礼军 《情报工程》2016,2(1):032-042
自动问答系统成为近年来自然语言处理领域的研究热点,问句分析作为问答系统的首要环节,在问答系统中起着关键的作用.简要介绍了中文问句分析的基本内容,主要包括分词、词性标注以及句法分析的发展;同时也对中文问句分析中问句分类和问句语义分析的研究内容进行了重点介绍;最后,提出中文问句分析面临的一些难点问题以及对未来可能研究方向的一个初步展望.  相似文献   

基于提问内容分析的数字参考咨询需求研究   总被引:3,自引:0,他引:3  
用户参考咨询提问是其信息需求和服务期望的直接映射,并已成为国外图书馆界常规分析对象.本文以用户参考咨询提问为直接分析对象,采取内容分析法,以提问词、学科主题、需求形式、提问原因、回复情况等为分析单元,对提问数据展开语法、语义和语用三层次分析.发现数字参考咨询提问以职业领域内主题为主,并开始呈现出寻求知识服务的特征,但各类提问的回答效率有明显落差.从而证明假设:用户提问所映射的服务期望与公共图书馆数字化参考咨询服务努力基本一致,但其具体内容需求和形式需求结构呈现一定特征.论文最后提出了三点推论.  相似文献   

基于Ontology的中文问答系统问题分类研究   总被引:1,自引:0,他引:1  
问题分类是问答系统处理的基础。现在绝大多数的问答系统把问题局限在person,location,date,quantity,manner,works,organization等类型。不利于对更多情况和更深语义的问题的处理。可以基于Ontology的思想建立完整的、全面的、多层次的问题分类模型。表1。图8。参考文献6。  相似文献   

[目的/意义] 快速、准确地从突发网络舆情文本中识别事件。[方法/过程] 提出一种融合句法特征和句法相似度的网络舆情突发事件识别方法。结合句法特征提出面向事件的句法特征提取方法,利用事件语义标注和句法特征提取方法构造事件句法特征库,通过计算待测文本与句法库的句法相似度来识别网络舆情突发事件。[结果/结论] 以新型冠状病毒肺炎疫情为例,所提出网络舆情突发事件识别方法在该舆情下的最优相似度为0.93,在此相似度下从一段新的文本中识别出160个事件和30个非事件,F1值达到了0.848。通过方法测评证明网络舆情突发事件识别方法在利用句法相似度识别事件和进行相同相邻词性合并等方面创新的有效性。  相似文献   

This paper studies the contribution of semantic and semantic–syntactic analysis to the effectiveness of solving applied text-processing tasks: question answering and extraction of definitions from scientific publications. Methods for solving these problems, which, in addition to morphological and syntactic structures, also use semantic structures of texts, are presented. We carried out the experimental evaluation of these methods and comparison of two approaches to syntactic and semantic analysis: separate and joint semantic–syntactic parsing.  相似文献   

郭海红  李姣  代涛 《情报工程》2016,2(6):039-049
本文旨在构建一个中文健康问句分类方法,并通过对高血压相关的健康问句进行人工分类标注,分析公众的高血压相关健康信息需求,同时为研发高血压相关的智能中文问答系统提供语料基础。本研究基于临床问句分类及公众健康信息查询场景层次模型,构建一个四级中文健康问句主题分类方法,并由5位标注员独立地对从某中文健康网站上收集的将近10万条高血压相关提问数据中随机抽取的2000条样本数据进行人工分类标注,以优化和测试该问句分类方法的可靠性,构建标注语料库,并分析公众的高血压相关健康信息需求。5位标注员使用该分类方法进行独立标注的四级类目评判者间信度kappa值为0.63,意味着分类结果可靠,一级大类获得高度一致性(kappa=0.82),略优于国际上的同类研究。分布在治疗、诊断、健康生活方式、临床发现/病情管理、流行病学、择医六个一级类别中的问句分别占样本总量的48.1%、23.8%、11.9%、5.2%、9.0%和1.9%。所构建的健康问句分类方法可用于组织大型健康问题集,以提高检索效率;分类标注的样本问句可作为高血压相关健康问句自动分类研究的语料;得出的高血压相关健康问句主题分布有助于指导健康网站的知识资源建设。此外,所设计和采用的问句分类方法构建方式、语料标注流程、评判者间信度测量方法等,也可为开放领域及其他受限领域开展用户问句分类与语料构建提供借鉴。  相似文献   

传统的关键词自动抽取常以候选词的出现频次、位置等非语义信息构建特征,并未考虑关键词在学术文献中承担的特定语义角色,即词汇功能。通过对现有数据统计,本文发现作者标注关键词中约有67.99%是研究问题或研究方法词。因此,本文将关键词的词汇功能分为三类:“研究问题”“研究方法”和“其他”,在传统的词频特征以及位置特征基础上,融合词汇功能特征,使用计算机领域的学术文献基于分类和排序两种思想进行关键词抽取实验。实验结果表明,融合词汇功能后,关键词抽取效果得到明显提升。相较于基准实验,二分类模型的准确率Acc和F值分别相对提升24.63%和25.19%,达到了0.840和0.666;排序模型的MAP、NDCG@5和P@5分别相对提升168.32%、189.50%和148.30%,提升至0.813、0.828和0.447,证明了学术文献词汇功能特征在关键词自动抽取中具有重要作用。  相似文献   

As critical building blocks of scientific research, research questions and research methods are put forward to reveal the nature of a publication's scientific novelty. Although existing studies have examined scientific novelty from multiple combination-based views, the temporal and semantic complexity of research questions and methods remains to be fully addressed. To remedy this, we introduce a new approach to measuring the novelty of papers from the perspective of question-method combination. Specifically, we demonstrated a life-index novelty measurement based on the frequency and age of question terms and method terms. Furthermore, by using deep learning and representation learning techniques, we proposed a semantic novelty measurement algorithm based on the semantic similarity of terms. By using the dataset of papers collected from ACM Digital Library for evaluation, the effectiveness of our methods was evaluated by case studies and statistical analysis. Our work innovatively integrates the age, frequency, and semantics of research methods and research questions that characterizes novelty in scientific publications.  相似文献   

汉语框架网络问答系统的问句分析设计与实现   总被引:1,自引:0,他引:1  
利用框架语义学原理,构建出面向问句分析的语义框架——Q框架,在此基础上实现对问句的语义分析。从语义规则角度提出问句分析设计的思路:基于依存句法树确定不同类型问句的目标词,采取模式匹配方法实现基于Q框架的问句语义分析,通过映射完成对问句的框架语义标注,最终确定问句焦点和问句类型。  相似文献   

汉语框架网络问答系统问句处理研究   总被引:1,自引:0,他引:1  
问句处理是问答系统的首要问题。汉语框架网络问答系统旨在以汉语框架网络本体为基础,选择法律领域作为研究对象,进行问句处理的研究,探索新型的问答系统设计技术,来满足用户准确检索信息的需求。本论文利用依存关系表示查询问句的句法关系,并将查询问句与问句模板库中的模板进行匹配,最终确定查询问句的配价模式,实现对查询问句的框架语义标注,为下一步基于问答的框架语义检索系统的设计奠定基础。  相似文献   

[目的/意义] 研究统一医学语言系统中语义网络与社会化标注系统结合的深层次应用。[方法/过程] 总结UMLS语义网络的现有应用,分析UMLS语义类型与FrameNet语义类型的特征,构建适合本研究的语义类型,并通过实例梳理社会化标注系统与本体映射的思路。[结果/结论] 提出将所构建的语义类型作为标签归类的分类器以及作为分众分类系统与框架网络本体映射的桥梁。突破传统的基于统计的标签归类方法可为本体与分众分类系统的映射提供新的视角。  相似文献   

Automatic question answering using the web: Beyond the Factoid   总被引:4,自引:0,他引:4  
In this paper we describe and evaluate a Question Answering (QA) system that goes beyond answering factoid questions. Our approach to QA assumes no restrictions on the type of questions that are handled, and no assumption that the answers to be provided are factoids. We present an unsupervised approach for collecting question and answer pairs from FAQ pages, which we use to collect a corpus of 1 million question/answer pairs from FAQ pages available on the Web. This corpus is used to train various statistical models employed by our QA system: a statistical chunker used to transform a natural language-posed question into a phrase-based query to be submitted for exact match to an off-the-shelf search engine; an answer/question translation model, used to assess the likelihood that a proposed answer is indeed an answer to the posed question; and an answer language model, used to assess the likelihood that a proposed answer is a well-formed answer. We evaluate our QA system in a modular fashion, by comparing the performance of baseline algorithms against our proposed algorithms for various modules in our QA system. The evaluation shows that our system achieves reasonable performance in terms of answer accuracy for a large variety of complex, non-factoid questions.  相似文献   

[目的/意义]在线问答社区成为互联网用户获取高质量知识的重要途径,探索中文问答社区答案质量对知识传播具有重要意义。[方法/过程]以规模最大的中文问答社区之一"知乎"为研究对象,采用数据挖掘和机器学习方法,选取逻辑回归、支持向量机和随机森林三种分类模型,进行三层递进式训练和检验。从结构化特征、文本特征以及用户社交属性三个维度构建答案质量的特征体系。[结果/结论]实验结果显示,随着特征体系的不断丰富,三种分类模型的性能逐步提升;而随机森林作为一种组合分类模型,在全量特征的情况下,取得出色的分类性能。对特征组合分析发现,包含用户社交属性的随机森林总是比同等级的其它模型更加出色,表明社会化网络在答案质量评价中的地位。研究结论表明从答案本身和答案编写者两个角度能够评价答案质量,构建的特征体系和模型可以较为全面地预测答案质量。  相似文献   

Usage of field-normalized citation scores is a bibliometric standard. Different methods for field-normalization are in use, but also the choice of field-classification system determines the resulting field-normalized citation scores. Using Web of Science data, we calculated field-normalized citation scores using the same formula but different field-classification systems to answer the question if the resulting scores are different or similar. Six field-classification systems were used: three based on citation relations, one on semantic similarity scores (i.e., a topical relatedness measure), one on journal sets, and one on intellectual classifications. Systems based on journal sets and intellectual classifications agree on at least the moderate level. Two out of the three sets based on citation relations also agree on at least the moderate level. Larger differences were observed for the third data set based on citation relations and semantic similarity scores. The main policy implication is that normalized citation impact scores or rankings based on them should not be compared without deeper knowledge of the classification systems that were used to derive these values or rankings.  相似文献   

网络问答社区与联合参考咨询比较与评价   总被引:3,自引:1,他引:2  
为改善Web20环境下图书馆的参考咨询服务,借鉴网络问答社区的咨询服务模式,采用对比分析与问答实验的方法,对网络问答社区与联合参考咨询的运行机制、咨询内容、个性化服务等内容进行调研;对经济学、文学和图书馆学3个领域的事实性问题、列举性问题、定义性问题、探索性问题等4类问题的回答质量与效率进行评价。实验结果表明,网络社区的回复数量多,响应速度快,而参考咨询系统更擅长事实性问题的解答。联合参考咨询应借鉴网络问答社区多样化的运作机制和有效的回答问题方式。IPL2的服务理念值得在参考咨询系统中推广。表10。参考文献19。  相似文献   


This proposed new classification scheme is based on two main elements: hierarchism and binary theory. Hence, it is called Universal Binary Classification (UBC). Some advantages of this classification are highlighted including subject heading development, construction of a thesaurus, and all terms with meaningful features arranged in tabular form that can help researchers, through a semantic process, to find what they need. This classification scheme is fully consistent with the classification of knowledge. The classification of knowledge is also based on hierarchism and binary principle. Finally, a survey on randomly selected books in McLennan Library of McGill University is presented to compare the codes of this new classification with the currently employed Library of Congress Classification (LCC) numbers in the discipline of Library and Information Sciences.  相似文献   

周雷  李颖  石崇德 《情报工程》2016,2(1):114-122
本文以语言学句法构词和语义构词研究为基础,结合术语学及认知语言学对于词汇部分的研究,根据科技词汇自身特点,对影响科技词汇构词因素进行研究,提出了影响科技词汇构词的四个过程:句法-语义过程、认知过程、翻译过程和审美过程.  相似文献   

运用图示法自动提取中文专利文本的语义信息   总被引:1,自引:0,他引:1  
姜春涛 《图书情报工作》2015,59(21):115-122
[目的/意义]提出利用图结构的表示法自动挖掘中文专利文本的语义信息,以为基于文本内容的专利智能分析提供语义支持。[方法/过程] 设计两种运用图结构的模型:①基于关键词的文本图模型;②基于依存关系树的文本图模型。第一种图模型通过计算关键词之间的相似性关系来定义;第二种图模型则由句中所提取的语法关系来定义。在案例研究中,借助频繁子图挖掘算法,对所建图模型进行子图挖掘, 并构建以子图为特征的文本分类器,用来检测所建图模型的表达性和有效性。[结果/结论]将所建的基于图模型的文本分类器应用于4个不同技术领域的专利文本数据集,并与经典文本分类器的测试结果相比较而知:前者在使用明显较少的特征数的基础上,分类性能较后者提升2.1%-10.5%。由此而推断,使用图结构的表达法并结合图挖掘技术从专利文本中所提取的语义信息是有效的,有助于进一步的专利文本分析。  相似文献   

面向语义网的本体表示   总被引:12,自引:0,他引:12  
本体将在语义网中起至关重要作用。它通过提供共享的并精确定义的术语源,将语法的互操作扩展到语义的互操作。语义网肯定是以XML and RDF为基础,采用分层设计的方法。图2。参考文献8。  相似文献   

Spoken language is encoded extremely rapidly and by exceedingly complex cognitive operations, yet it is amazingly free of errors. In recent years there has been debate on the question of how the speech‐production system guards itself against erroneous output. One explanation is that the system is sufficiently sophisticated and rule‐governed in its early message‐formulation stages so as generally to avoid constructing anomalous plans. The authors have argued elsewhere, however, for an explanation whereby anomalous and other error plans are formulated during early production stages but are vetoed and corrected (i.e., “edited”) during later encoding stages. We have yet to synthesize these arguments into a coherent encoding model, however, and that is our purpose here. An “Editing” model of speech production is presented, featuring prearticulatory evaluations of impending speech segments via feedback to a spreading‐activation lexicon which is susceptible to semantic, syntactic, phonological, and extralinguistic influences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号