首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 328 毫秒
1.
张宁  朱礼军 《情报工程》2016,2(1):032-042
自动问答系统成为近年来自然语言处理领域的研究热点,问句分析作为问答系统的首要环节,在问答系统中起着关键的作用.简要介绍了中文问句分析的基本内容,主要包括分词、词性标注以及句法分析的发展;同时也对中文问句分析中问句分类和问句语义分析的研究内容进行了重点介绍;最后,提出中文问句分析面临的一些难点问题以及对未来可能研究方向的一个初步展望.  相似文献   

2.
Given a user question, the goal of a Question Answering (QA) system is to retrieve answers rather than full documents or even best-matching passages, as most Information Retrieval systems currently do. In this paper, we present BRUJA, a QA system for the management of multilingual collections. BRUJ rkstions (English, Spanish and French). The BRUJA architecture is not formed with three monolingual QA systems but instead uses English as Interlingua to make usual QA tasks such as question classifications and answer extractions. In addition, BRUJA uses Cross Language Information Retrieval (CLIR) techniques to retrieve relevant documents from a multilingual collection. On the one hand, we have more documents to find answers from but on the other hand, we are introducing noise into the system because of translations to the Interlingua (English) and the CLIR module. The question is whether the difficulty of managing three languages is worth it or whether a monolingual QA system delivers better results. We report on in-depth experimentation and demonstrate that our multilingual QA system gets better results than its monolingual counterpart whenever it uses good translation resources and, especially, CLIR techniques that are state-of-the-art.  相似文献   

3.
Automatic question answering using the web: Beyond the Factoid   总被引:4,自引:0,他引:4  
In this paper we describe and evaluate a Question Answering (QA) system that goes beyond answering factoid questions. Our approach to QA assumes no restrictions on the type of questions that are handled, and no assumption that the answers to be provided are factoids. We present an unsupervised approach for collecting question and answer pairs from FAQ pages, which we use to collect a corpus of 1 million question/answer pairs from FAQ pages available on the Web. This corpus is used to train various statistical models employed by our QA system: a statistical chunker used to transform a natural language-posed question into a phrase-based query to be submitted for exact match to an off-the-shelf search engine; an answer/question translation model, used to assess the likelihood that a proposed answer is indeed an answer to the posed question; and an answer language model, used to assess the likelihood that a proposed answer is a well-formed answer. We evaluate our QA system in a modular fashion, by comparing the performance of baseline algorithms against our proposed algorithms for various modules in our QA system. The evaluation shows that our system achieves reasonable performance in terms of answer accuracy for a large variety of complex, non-factoid questions.  相似文献   

4.
郭海红  李姣  代涛 《情报工程》2016,2(6):039-049
本文旨在构建一个中文健康问句分类方法,并通过对高血压相关的健康问句进行人工分类标注,分析公众的高血压相关健康信息需求,同时为研发高血压相关的智能中文问答系统提供语料基础。本研究基于临床问句分类及公众健康信息查询场景层次模型,构建一个四级中文健康问句主题分类方法,并由5位标注员独立地对从某中文健康网站上收集的将近10万条高血压相关提问数据中随机抽取的2000条样本数据进行人工分类标注,以优化和测试该问句分类方法的可靠性,构建标注语料库,并分析公众的高血压相关健康信息需求。5位标注员使用该分类方法进行独立标注的四级类目评判者间信度kappa值为0.63,意味着分类结果可靠,一级大类获得高度一致性(kappa=0.82),略优于国际上的同类研究。分布在治疗、诊断、健康生活方式、临床发现/病情管理、流行病学、择医六个一级类别中的问句分别占样本总量的48.1%、23.8%、11.9%、5.2%、9.0%和1.9%。所构建的健康问句分类方法可用于组织大型健康问题集,以提高检索效率;分类标注的样本问句可作为高血压相关健康问句自动分类研究的语料;得出的高血压相关健康问句主题分布有助于指导健康网站的知识资源建设。此外,所设计和采用的问句分类方法构建方式、语料标注流程、评判者间信度测量方法等,也可为开放领域及其他受限领域开展用户问句分类与语料构建提供借鉴。  相似文献   

5.
Analysis of Statistical Question Classification for Fact-Based Questions   总被引:1,自引:0,他引:1  
Question classification systems play an important role in question answering systems and can be used in a wide range of other domains. The goal of question classification is to accurately assign labels to questions based on expected answer type. Most approaches in the past have relied on matching questions against hand-crafted rules. However, rules require laborious effort to create and often suffer from being too specific. Statistical question classification methods overcome these issues by employing machine learning techniques. We empirically show that a statistical approach is robust and achieves good performance on three diverse data sets with little or no hand tuning. Furthermore, we examine the role different syntactic and semantic features have on performance. We find that semantic features tend to increase performance more than purely syntactic features. Finally, we analyze common causes of misclassification error and provide insight into ways they may be overcome.  相似文献   

6.
Background: Question‐answering systems (or QA Systems) stand as a new alternative for Information Retrieval Systems. Most users frequently need to retrieve specific information about a factual question to obtain a whole document. Objectives: The study evaluates the efficiency of QA systems as terminological sources for physicians, specialised translators and users in general. It assesses the performance of one open‐domain QA system, START, and one restricted‐domain QA system, MedQA. Method: The study collected two hundred definitional questions (What is…?), either general or specialised, from the health website WebMD. Sources used by the open‐domain QA system, START, and the restricted‐domain QA system, MedQA, were studied to retrieve answers, and later a range of evaluation measures (precision, Mean Reciprocal Rank, Total Reciprocal Rank, First Hit Success) were applied to mark the quality of answers. Results: It was established that both systems are useful in the retrieval of valid definitional healthcare information, with an acceptable degree of coherent and precise responses from both. The answers supplied by MedQA were more reliable that those of START in the sense that they came from specialised clinical or academic sources, most of them showing links to further research articles. Conclusions: Results obtained show the potential of this type of tool in the more general realm of information access, and the retrieval of health information. They may be considered a good, reliable and reasonably precise alternative in alleviating the information overload. Both QA systems can help professionals and users can obtain healthcare information.  相似文献   

7.
"Quality Assurance" (QA) is a practical tool for library management, a link in the library's relationship with administrative decision makers. Instituting or refining a QA program takes into account formal and informal methods. Described are one library's program and problem-solving model, the role of others in the library's program, types of recordkeeping, and integration of the library program with organizational QA. Positive results include participation in management-level activities and improvement in quality and delivery of services and products. Ten suggestions are made to begin a QA program.  相似文献   

8.
汉语句法分析是汉语研究和中文信息化处理中的一个关键环节,同时也是难点之一,面向汉语句法分析的辅助系统能为这个领域的研究与实践带来便利。本文概述了句法分析辅助系统的体系结构和基本功能,详细分析了系统实现中的两个关键算法,即括号匹配算法和句法解析算法,初步实验结果表明辅助系统效果良好,达到设计目的。  相似文献   

9.
A quality assurance (QA) program provides not only a mechanism for establishing training and competency standards, but also a method for continuously monitoring current service practices to correct shortcomings. The typical QA cycle includes these basic steps: select subject for review, establish measurable standards, evaluate existing services using the standards, identify problems, implement solutions, and reevaluate services. The Claude Moore Health Sciences Library (CMHSL) developed a quality assurance program for online services designed to evaluate services against specific criteria identified by research studies as being important to customer satisfaction. These criteria include reliability, responsiveness, approachability, communication, and physical factors. The application of these criteria to the library's existing online services in the quality review process is discussed with specific examples of the problems identified in each service area, as well as the solutions implemented to correct deficiencies. The application of the QA cycle to an online services program serves as a model of possible interventions. The use of QA principles to enhance online service quality can be extended to other library service areas.  相似文献   

10.
There are different types of questions that can be used to determine whether students learned what they were taught in an information literacy (IL) session. This article summarizes best practices from the education literature for constructing short-answer, alternative-response, matching, multiple-choice, interpretative, and essay questions, as well as includes question examples from the author’s own experience delivering IL instruction.  相似文献   

11.
针对改善句法分析整体性能的需求,从可视化编辑学习的规则和词典、树形显示和操作句法分析结果两方面入手,提出并构建一个中英文句法分析系统及验证平台。对平台的设计思想、具体实现和关键技术进行详细的介绍,指出存在的问题和改善的方法。  相似文献   

12.
One of the highest priorities in today's hospitals is the provision of quality care to patients. The medical librarian has an increased responsibility to furnish quality information to the medical staff. Traditional methods of reference service continue to work well, but it is increasingly important for librarians to become more directly involved in hospital quality assurance (QA) activities. Occurrence screening is one system of QA where the librarian can make a difference.  相似文献   

13.
基于提问内容分析的数字参考咨询需求研究   总被引:3,自引:0,他引:3  
用户参考咨询提问是其信息需求和服务期望的直接映射,并已成为国外图书馆界常规分析对象.本文以用户参考咨询提问为直接分析对象,采取内容分析法,以提问词、学科主题、需求形式、提问原因、回复情况等为分析单元,对提问数据展开语法、语义和语用三层次分析.发现数字参考咨询提问以职业领域内主题为主,并开始呈现出寻求知识服务的特征,但各类提问的回答效率有明显落差.从而证明假设:用户提问所映射的服务期望与公共图书馆数字化参考咨询服务努力基本一致,但其具体内容需求和形式需求结构呈现一定特征.论文最后提出了三点推论.  相似文献   

14.
[目的/意义] 什么是情报?什么是智库?它们的区别与联系是什么?这些问题看似简单但又常常令人倍感困惑,本文力图从基本概念、基本研究对象、基本指导理论和基本方法论体系入手,对二者之间的区别与联系进行辨析。[方法/过程] 通过大量历史文献调研和实际相关机构案例调查,在此基础上进行细致的总结、对比分析,并对二者之间的共性及差异性进行深入细致的揭示。[结果/结论] 通过梳理发现,无论是从基本概念、研究对象、指导理论还是方法论体系上来看,无不体现出二者作为处在同一决策咨询流程链条两端的两个"工种",虽然在价值理念、关注侧重点上差异甚大,但彼此依赖又相互转化。  相似文献   

15.
Summary

This study evaluates how well eight major search engines produced answers to twenty-one real reference questions and five made-up subject questions. The retrieval and relevancy-ranking abilities of search engines were measured by precision, duplicate, most-relevant-item score, and relevancy-ranking score. Search engines did not produce good results for the reference questions, but did well with the subject questions. T-tests found the two types of questions quite different in nature, so the best engines were identified by the type of questions. Open Text was the best in handling the reference questions, and InfoSeek was the best at answering subject questions.  相似文献   

16.
TELEGEN and BIOTECHNOLOGY, two new databases on thc SDC ORBIT system, index the whole range of genetic engineering literaturc. TELEGEN covers business aspects of the industry, as well as technical literature: BIOTECHNOLOGY covers patents and technical literature only. Both files are PROXIMITY-searchable. File design and content, as well as search techniques specific to each file, are discusscd in detail. The readcr should. after completing this introduction, try one or more of the sample questions included to "learn by experience" the capabilities of each file.  相似文献   

17.
The study introduces the pedagogy of listening-based questioning training (LBJQ) for journalism students and journalists. The approach provides basic training and is somewhat different from the usual interview training as LBJQ focuses on training in basic skills: listening and reflective responding skills, as well as the ability to use different types of questions efficiently. The aim of the exercises is to help the journalist to get the most precise information possible from people with various social skills and styles. The article explains theoretical approaches and provides sample exercises for trainers and journalists.  相似文献   

18.
In attempting to move questionnaire design from art to science,researchers use different evaluation techniques to help determinehow well questions are working. Techniques such as behaviorcoding, respondent debriefing, interviewer debriefing, cognitiveinterviewing, and nonresponse analysis all provide informationto help the questionnaire designer assess whether respondentsunderstand questions as intended and whether they are able toprovide adequate answers to them. However, these techniquesdo not actually measure question reliability. It is assumedthat questions that pass the screen of the questionnaire evaluationtechniques described above are also more likely to produce datathat are reliable and valid. In this paper, we use behaviorcoding data to predict test–retest reliability. Respondentbehavior codes significantly predict such reliability whereasinterviewer codes—at least in this survey—do not.We also report the results of sensitivity testing to determinewhat percentage of adequate respondent answers best predictstest—retest reliability.  相似文献   

19.
Free newspapers are a substantial segment of the U.S. newspaper industry, as well as an under-studied topic within media research. This study considers the economic health of free newspapers in the United States and whether they face a dire future given their heavy reliance on advertising, a source of revenue that has been in decline for newspapers. One question guiding this research is whether free newspapers face two options: continue producing free content by relying on advertising (in addition to other revenue sources), or abandon the advertising-based business model. Seven research questions address a number of issues, such as whether free newspapers are profitable, if decision-makers are considering changing their business model, whether they are seeking alternative sources of revenue, whether reader engagement is connected to the price, or a lack of one, of a newspaper, and whether decision-makers are optimistic or pessimistic about the future of their industry. A Web-based survey asked decision-makers at free newspapers in the United States to respond to questions related to the health and future of their newspaper or newspapers. This survey was complemented by in-depth interviews with publishers of four different types of free newspapers in Texas. The study concludes by suggesting free newspapers are not only viable but in many markets they are thriving. Sweeping generalizations (often seen in industry discourse) about the future of print newspapers can be misleading. This study contributes a reality check and calls for further research on the economics of print media in the digital era.  相似文献   

20.
The ability to correctly classify sentences that describe events is an important task for many natural language applications such as Question Answering (QA) and Text Summarisation. In this paper, we treat event detection as a sentence level text classification problem. Overall, we compare the performance of discriminative versus generative approaches to this task: namely, a Support Vector Machine (SVM) classifier versus a Language Modeling (LM) approach. We also investigate a rule-based method that uses handcrafted lists of ‘trigger’ terms derived from WordNet. Two datasets are used in our experiments to test each approach on six different event types, i.e., Die, Attack, Injure, Meet, Transport and Charge-Indict. Our experimental results show that the trained SVM classifier significantly outperforms the simple rule-based system and language modeling approach on both datasets: ACE (F1 66% vs. 45% and 38%, respectively) and IBC (F1 92% vs. 88% and 74%, respectively). A detailed error analysis framework for the task is also provided which separates errors into different types: semantic, inference, continuous and trigger-less.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号