首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
本文介绍了一种基于最大公共子串(Longest Common Substring,LCS)算法的术语抽取方法:按标点符号对领域文档进行切分;抽取切分后的语句片断的所有最大公共子串作为候选术语集;通过停用词过滤、对照领域词筛选和术语嵌套子串筛选等规则进行判别,得到最终的术语集.通过学前教育领域术语抽取的实验,验证了该算法可以有效地抽取中文领域术语:术语抽取平均准确率达84.2%;4~6字符双词术语抽取的效果尤佳,准确率接近100%.  相似文献   

2.
研究从科技论文文本中抽取作者关键词以外的科技术语的方法。因为标引效应问题,单纯选择论文中的关键词作为候选术语会影响术语库的数量和质量,需要考虑从论文文本中抽取术语。现有的大多数术语抽取方法重视采用termhood指标,而忽视unithood指标,针对此问题,在C-value算法的基础上,提出用于生成候选术语的中文术语构词规则和测量术语内部结合强度的unithood指标,实现从论文文本中抽取中文科技术语。以信息资源管理领域的术语抽取为例对提出的方法进行验证,实验结果证明,提出的方法能够有效地抽取领域科技术语,抽取精度较高。  相似文献   

3.
为充分发挥知识组织在企业专利战略中的作用,在分析专利文献的基础上,根据中文专利文献句法描述的特点,利用最大串频匹配、蚁群聚类、多层KMeans聚类、改进关联规则计算、基于规则和CRFs的术语关系抽取等算法,设计出一套领域本体的半自动构建系统,包括术语抽取、分类关系抽取、非分类关系抽取、本体形式化等模块,初步实现结构化数据和非结构化文本的本体半自动构建。  相似文献   

4.
应用社会网络分析的方法解决多属性关联规则挖掘的问题,这是解决这类问题全新的视角.首先,从啤酒的不同品牌与尿不湿不同颜色的搭配引出了多属性关联规则挖掘的问题,并指出这类问题也包含着广泛的评价和推荐问题;而后,基于社会网络分析的视角,建立了相应的图模型及与之等价的矩阵,通过对图和矩阵的分析,引出了多属性关联规则挖掘的方法;为了进一步使方法有助于程序化表达,将既有的方法通过引入"指标向量"实现了统一表达,这有助于程序递归的实现;最后,给出了本文方法的算法步骤,并将其应用在一个100 000评估量规模的数据集上对方法进行实证分析.结果表明:本文通过社会网络分析的视角将抽象的关联规则挖掘变得可视化,这便于矩阵表达的引入,使得到的方法具有算法复杂度低、直观和易于把握的特征,相比于既有的多属性关联规则挖掘算法有优势.  相似文献   

5.
提出一种新的政务本体术语自动抽取的方法。首先通过中文分词技术和单字合并法提取政务文本中的词作为候选术语;通过C-value求解法和TF-IDF算法对候选术语进行过滤抽取,从而实现政务领域术语的自动抽取。通过实验比较,发现该方法在不影响领域术语抽取召回率的同时可以提高抽取术语的正确率。  相似文献   

6.
专利技术术语的抽取方法   总被引:2,自引:0,他引:2  
针对专利中缺少技术关键词的问题,在对主要的术语抽取方法研究的基础上,引入C-value方法,修改了术语构词规则和术语度(termhood)计算公式,用PC-value值测量一个词语的术语度,提出了专利技术术语抽取的流程模型,实现了从专利中抽取技术术语.该模型分为四个阶段:①分词和词性标注; ②运用语言学规则取得可能术语列表; ③计算词语的术语度值,取得候选术语列表; ④领域专家评估并确定术语.实验结果证明,提出的方法能很好地抽取中文专利技术术语,在长术语的抽取和抽取精度上比C-value方法更具有优势.  相似文献   

7.
一种从WEB上抽取信息的方法   总被引:1,自引:0,他引:1  
韩立新  谢立 《情报学报》2004,23(1):45-51
由于WWW上的信息很多存储在HTML页面上 ,因此如何从HTML文档中抽取有用信息是一个迫切需要解决的问题。文中提出一种从HTML文档中抽取信息的方法。该方法综合运用关联规则法、模式匹配、语法规则、聚类法等技术来抽取信息 ,从而较好地解决了现有的抽取方法准确性较差、通用性较差、人工干预较多的问题。  相似文献   

8.
术语的抽取是领域本体构建的基础工作,决定了本体构建的质量.获取的术语除了要求有准确的短语识别率,还要求有较高的术语领域度.本文试图研究一种不依赖于背景语料的术语领域度筛选方法.本文的主要工作集中在两个方面:一是通过统计和规则相结合的方法从领域语料中抽取候选术语(短语),二是提出了通过候选术语的分布度、活跃度以及主题度进行计算的多策略术语抽取方法,并通过实验进行了验证和分析.实验结果表明,在小规模航空航天领域语料库上进行验证性实验后发现,在不大量增加计算时间复杂度的情况下,能够有效提高领域术语抽取的质量,获得令人较满意的结果.  相似文献   

9.
从信息分析的实际需求出发,对与电动汽车相关的5 405条专利数据进行术语抽取、生僻术语识别和字段比较研究。结果显示关键短语抽取的方法可行,互信息抽取的术语所在文档的平均文档长度更接近集合的平均文档长度;摘要和First Claim字段的术语存在一定差别,但对分类或聚类同等重要;生僻术语识别算法能够发现生僻词和高频词的对应关系。研究结论可以为专利文本挖掘和专利信息分析提供结果和方法,并为信息分析工作提供所需的参考术语。  相似文献   

10.
侯丽  李姣  侯震  陈松景 《图书情报工作》2015,59(23):115-123
[目的/意义] 从互联网公众查询数据中发现公众使用的健康术语,为建立公众健康术语与医学专业术语的映射提供基础,进而优化健康类知识服务平台的知识组织与管理性能。[方法/过程] 设计规则与N-Gram相结合的健康术语新词的识别模型,采集公众查询数据,开展实验验证,通过多次实验,逐步完善过滤语料集合,结合人工判读,不断优化并验证方案的有效性。[结果/结论] 从互联网中公众提问句抽取出规则,结合统计算法进行公众使用的健康类新词抽取,该技术方法对识别公众使用的健康术语具有一定的通用性,能为建立公众术语与医学术语映射提供数据基础。实验结果表明:基于规则进行公众日志数据预处理,能为后续的实验方案提供较好的预处理文本,而采用N-Gram及各种过滤规则结合的术语识别方法,能较好地识别发现短文本中的新词。  相似文献   

11.
ABSTRACT

The article examines the most important periodicals of ethnic minorities in Poland. After 1989, many ethnic groups (e.g., Germans and Romanies) were allowed to publish journals and newspapers for the first time since the end of World War II. The publications examined show the rich cultural life of the various ethnic groups as well as their current status in Poland. In addition to popular titles, some scholarly publications are also discussed.  相似文献   

12.
ABSTRACT

In today's current political environment raising support for millage renewals, bond campaigns or even millage continuations for public libraries is affected by national politics, and a tendency towards tax aversion on principle across the country. The lessons we can learn from the governing boards of small and rural public libraries are worth raising up to a greater national consciousness. Board governance, community consciousness, and facilities management are clear and logical tools that elected boards of public libraries can use to politick for support of libraries.  相似文献   

13.
The author answers a reference question on bibliographic sources for the Ukrainian periodical press 1840–1850. Helpful publications include bibliographies, guides, and library catalogs. These potentially make mention of revolutionary developments in Hungary (such as the Twelve Points paragraph of the Demands of the Hungarian Nation in March 1848, the subsequent April Laws, and Hungary's declaration of independence in April 1949), and elsewhere in the Hapsburg Empire.  相似文献   

14.
ABSTRACT

The history of the almanac in Croatia is reconstructed through primary research in bibliographic and archival sources. The almanac is a vehicle for knowledge communication in informal contexts, engaging both oral tradition and literary forms traceable to medieval literacy and ways of structuring knowledge. The history of the almanac in Croatia reflects the changing context of the book trade, literacy, and the evolution of language. Four main stages are identified: (1) the beginning of the annual almanac in the seventeenth century; astrological almanacs reflecting the sensibility of the Baroque period; (2) the Enlightenment's stimulation of almanac publishing in the spirit of contemporary secular reforms in agriculture and education; (3) nineteenth-and twentieth-century almanac trade, showing complex and overlapping networks for the production, distribution and appropriation of printed almanacs;(4) roughly the end of World War II, when the almanac slowly moved out of the role of a popular mass medium and into specialized niches represented by regional, diaspora, and religious almanacs.  相似文献   

15.
Some key questions for publishers in today’s market are: Could Amazon’s recent merger and acquisitions strategy create a disruptive or paradigm shifting business model in the publishing industry? Do their recent actions post mergers and acquisitions illustrate a predicable pattern of behaviour that publishers can strategize around? This paper will explore these questions and look at some of the possible reasons behind Amazon’s business practices and the possible consequences to publishers.  相似文献   

16.
ABSTRACT

German authorities are expecting more than 1 million refugees by the end of 2015. These people come to Germany to seek protection and assistance and to build a new life, therefore it is important to welcome them and to assist them in their integration as soon as possible. This situation creates a versatile and perfectly fitting opportunity for cultural and educational programs, including libraries, which can play a vital role in this integration process. The key to integration is the knowledge of the German language, and the most important challenge now is to teach the necessary language skills to as many asylum-seekers as possible.  相似文献   

17.
Abstract

Many libraries use RSS to syndicate information about their collections to users. A survey of 65 academic libraries revealed their most common use for RSS is to disseminate information about library holdings, such as lists of new acquisitions. Even though typical RSS feeds are ill suited to the task of carrying rich bibliographic metadata, great potential exists for developing applications that can exploit metadata exposed to Web services via RSS. Using the MODS metadata format, entire catalog records can be seamlessly embedded in RSS 2.0 feeds. Existing tools, such as Library of Congress Java toolkits and XSLT stylesheets, can facilitate this process, while a new XSLT stylesheet may be used to create the RSS feeds complete with MODS records. As an example of the added functionality these MODS/RSS feeds can offer, records from a MODS-enriched RSS feed can be ingested into a non-RSS application such as Zotero. As more emerging library technologies use Web services architectures to handle data objects, the ability to syndicate catalog records will become more critical to providing innovative library Web services.  相似文献   

18.
This essay investigates key moments in the history of personal digital assistant (PDA) marketing to women. Analyzing promotional texts for three PDAs that received considerable press coverage from 1999 to 2001, this essay explores the cultural significance of the convergence of anxieties about women's place in the gendered division of labor with the computer industry's changing marketing imperatives. Drawing on an array of promotional texts, including news articles, press releases, promotional Web sites, and ads appearing in newspapers and magazines, this paper tells the story of how the computer industry aimed to sell smaller, faster computing devices to women while promising to mediate and thus reproduce women's overwork as paid and familial laborers. After experimenting with the PDA as a sexy fashionable gadget for working women, marketers approached women as mothers with “Audrey,” an Internet appliance designed for the kitchen.  相似文献   

19.
This study explores the attitudes and actions of bank managers toward communication spending during recession. Results of this study indicate that organizational leaders cut communication spending early during recession and also erroneously rely on the bank's financials to mitigate irrational customer withdrawals. This research is used to illustrate the need for public librarians to meet the needs of bank stakeholders when their institution ceases to communicate during crises, for academic librarians to educate students in undergraduate and graduate programs regarding heightened information needs during recession, and for corporate librarians to understand a potential avenue for competitive advantage during recession.  相似文献   

20.
毛彦妮  王菲菲 《图书情报工作》2012,56(18):93-98,126
指出共链分析已在商务网站评价、竞争情报分析、商业信息挖掘等领域有广泛应用。综合运用共链分析与社会网络分析的方法,对国内市场份额排名前50位的电子商务网站的企业竞争态势与竞争关系及地位进行综合分析,并对企业间潜在竞争关系挖掘进行初步探索,以期能对竞争情报方法论研究和国内电子商务市场的发展提供一定的启示和参考。研究发现,共链网络中节点的程度中心性与企业实际的市场份额之间存在显著的相关关系,且竞争关系多存在于不同组织下业务类型相似的企业实体,而有些互补服务的企业之间则是合作关系的存在,这一发现或许能为未来电子商务市场发展战略规划与制定提供一定的指导和帮助。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号