期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

李纲郑重《情报理论与实践》2008,31(3):471-476

统计语言模型作为一种自然语言处理的工具,已经被证明有能力处理大规模真实文本.而统计语言模型和IR相结合后所形成的SLM-IR模型的提出,是信息检索模型研究上的重大进展.本文介绍了统计语言模型在信息检索领域的基本模型及相关问题,重点分析了Lemur工具箱和标题语言模型的原理及模型,最后从整体上介绍了该领域的国际动态和研究进展情况. 相似文献

2.

统计语言模型浅析

康筱彬《科技风》2015,(12)

随着信息化的迅速发展,智能技术的普及,实现计算机对自然语言和文字处理的需求日益增大,语言处理系统得到迅速发展。目前计算机语言学经验主义的研究办法得出对于自然语言的处理系统其核心是统计语言模型。顾名思义,统计语言模型是借助数学统计的方法对自然语言的内在规律进行描述的数学模型。本文介绍了统计语言模型的定义和分类,以及模型的数学原理。相似文献

3.

以ChatGPT为代表的大型语言模型研究进展

柯沛雷文强黄民烈《中国科学基金》2023,(5):714-723

大型语言模型是当今人工智能领域最前沿的研究方向之一,该方向旨在训练含有大规模参数的通用语言模型,使其能够遵循人类指令完成不同类型的自然语言处理任务。作为大型语言模型的代表,由OpenAI研发的ChatGPT在各个领域均展现出强大的自然语言生成能力,受到了全球各行各业的关注。本文从语言模型的发展历程出发,介绍了近年研究者在扩大语言模型规模上的探索,然后分析了大型语言模型带来的范式改变,并以ChatGPT为典型实例概述了其发展、技术和应用,接着介绍了后ChatGPT时代大型语言模型的前沿进展,最后从评价和治理两方面总结了目前大型语言模型的局限性及未来需要解决的挑战。相似文献

4.

基于语言模型的信息检索研究

康恺《科技风》2010,(23)

语言模型是目前信息检索研究的热点,本文对Ponte和Croft在该领域的先驱性工作做了一系列修正及简化改进,并在此基础上对基于语言模型的信息检索的两大框架做了综合比较分析,在从理论上揭示了模型的实质的同时,通过一系列实验验证了简化改进以及平滑方法的效果. 相似文献

5.

语音文本中标点符号的检测方法简介

储琢佳《科教文汇》2015,(21)

本文综述了语音文本中标点符号识别的相关概念和基于句法—语义规则以及基于语料库和统计语言模型的主要识别方法,并介绍了几个典型的标点符号识别系统。最后,指出了有待进一步研究的关键性问题。相似文献

6.

ChatGPT 能力分析与未来展望

武俊宏赵阳宗成庆《中国科学基金》2023,(5):735-742

近年来,大语言模型的自然语言处理能力不断提升,尤其近期,聊天生成式预训练模型(ChatGPT)所掌握的“渊博知识”和表现出来的强大对话能力成为举世瞩目的热点话题。ChatGPT语言理解能力的真实水平如何?与专用模型相比,其性能表现谁居上风?它是否能够成为整个自然语言处理领域的通用模型而取代其它模型,甚至使所有自然语言处理问题得到彻底解决呢?为了回答上述问题,本文对ChatGPT在多个自然语言处理任务上的性能表现进行了评估和分析。在此基础上,我们讨论了ChatGPT对自然语言处理领域的影响,并对未来的发展进行了展望。相似文献

7.

大语言模型时代下的信息检索研究发展趋势

赵鑫窦志成文继荣《中国科学基金》2023,(5):786-792

以ChatGPT为代表的大语言模型带来了人工智能技术的新一轮发展浪潮,获得了广泛的社会关注。大语言模型通过大规模无标注数据预训练、指令微调、人类对齐等关键技术途径,学习到了丰富的世界知识,具有较好的文本理解与生成能力,能够有效求解各种复杂任务。这一重要技术进展对于信息检索领域的发展带来了新的机遇。本文从大语言模型对于已有信息检索架构的改进以及现有检索技术如何改进大语言模型两个方面进行阐述,针对相关科学问题的可行技术方法进行了梳理与展望,探讨大语言模型时代下的信息检索发展趋势,旨在推动信息检索领域的科研进步。相似文献

8.

信息检索模型及其在跨语言信息检索中的应用进展

吴丹齐和庆《现代情报》2009,29(7):215-221

信息检索发展中的一个重要理论问题是如何对查询与文档进行匹配,由此形成了不同的信息检索模型。跨语言信息检索是信息检索研究的一个分支,也是近年来的热点问题。本文主要对信息检索模型的研究进展,及其在跨语言信息检索中的应用进展进行分析与综述。相似文献

9.

基于XML的数据挖掘建模

潘有能《情报理论与实践》2007,30(2):259-261

XML是一种数据组织的标准,将XML应用于数据挖掘建模中可以在不同的数据挖掘工具之间或数据挖掘工具和其他应用系统之间实现模型交换和数据交流。本文对预言模型标记语言(PMML)进行了讨论,并给出了它在具体数据挖掘工具中的应用实例。相似文献

10.

跨语言信息检索可视化研究 总被引：5，自引：0，他引：5

张会平周宁陈立孚《情报科学》2007,25(1):134-138

语言的多样性限制了人们利用信息的自由，也影响了信息价值的充分发挥。因此，跨语言信息检索成为当今的研究热点。本文将信息可视化的相关方法和技术应用到跨语言信息检索领域当中，提出了跨语言信息检索可视化模型，并介绍了一个实例——澳门法律信息可视化系统。相似文献

11.

基于软件的简单计算机仿真

杨兴运康晓凤《科技广场》2009,(1)

为便于计算机实践教学,本文采用C++语言设计了一款基于软件的简单仿真计算机,并阐述了该模型机的系统组成、指令系统以及实现方法.该模型机既能执行机器指令程序,也能执行汇编语言程序. 相似文献

12.

第二语言习得中的监控模式对士官学员英语教学的启示

王华军方蔚《大众科技》2011,(9):214-216

文章简要介绍了第二语言习得中的监控模式理论及其五种假说,探讨了该理论模式对英语教学,特别是对士官英语教学的指导意义。相似文献

13.

Fast exact maximum likelihood estimation for mixture of language model

Yi Zhang Wei Xu 《Information processing & management》2008

Language modeling is an effective and theoretically attractive probabilistic framework for text information retrieval. The basic idea of this approach is to estimate a language model of a given document (or document set), and then do retrieval or classification based on this model. A common language modeling approach assumes the data D is generated from a mixture of several language models. The core problem is to find the maximum likelihood estimation of one language model mixture, given the fixed mixture weights and the other language model mixture. The EM algorithm is usually used to find the solution. 相似文献

14.

话题追踪技术研究综述

王卫姣《人天科学研究》2013,(4):147-149

海量的网络媒体信息使得人们在有限的时间内难以全面地掌握一些话题的信息,这样容易导致部分重要信息的遗漏。话题检测与追踪技术正是在这种需求下产生的。这种技术可以从庞大的信息集合中快速准确地获取人们感兴趣的内容。近几年,话题检测与追踪技术已成为自然语言处理领域热门的研究方向,它能把大量的信息有效地组织起来,并使用相关技术从中挖掘出有用的信息,用简洁有效的方式让人们了解一个事件或现象中所有细节以及它们之间的相关性。对话题跟踪的研究背景、相关概念、评测方法以及相关技术进行了综述,并总结了当前的相关技术。相似文献

15.

Measuring and mitigating language model biases in abusive language detection

《Information processing & management》2023,60(3):103277

Warning: This paper contains abusive samples that may cause discomfort to readers.Abusive language on social media reinforces prejudice against an individual or a specific group of people, which greatly hampers freedom of expression. With the rise of large-scale pre-trained language models, classification based on pre-trained language models has gradually become a paradigm for automatic abusive language detection. However, the effect of stereotypes inherent in language models on the detection of abusive language remains unknown, although this may further reinforce biases against the minorities. To this end, in this paper, we use multiple metrics to measure the presence of bias in language models and analyze the impact of these inherent biases in automatic abusive language detection. On the basis of this quantitative analysis, we propose two different debiasing strategies, token debiasing and sentence debiasing, which are jointly applied to reduce the bias of language models in abusive language detection without degrading the classification performance. Specifically, for the token debiasing strategy, we reduce the discrimination of the language model against protected attribute terms of a certain group by random probability estimation. For the sentence debiasing strategy, we replace protected attribute terms and augment the original text by counterfactual augmentation to obtain debiased samples, and use the consistency regularization between the original data and the augmented samples to eliminate the bias at the sentence level of the language model. The experimental results confirm that our method can not only reduce the bias of the language model in the abusive language detection task, but also effectively improve the performance of abusive language detection. 相似文献

16.

Speciesist language and nonhuman animal bias in English Masked Language Models

《Information processing & management》2022,59(5):103050

Warning: This paper contains examples of offensive language, including insulting or objectifying expressions.Various existing studies have analyzed what social biases are inherited by NLP models. These biases may directly or indirectly harm people, therefore previous studies have focused only on human attributes. However, until recently no research on social biases in NLP regarding nonhumans existed. In this paper,¹ we analyze biases to nonhuman animals, i.e. speciesist bias, inherent in English Masked Language Models such as BERT. We analyzed speciesist bias against 46 animal names using template-based and corpus-extracted sentences containing speciesist (or non-speciesist) language. We found that pre-trained masked language models tend to associate harmful words with nonhuman animals and have a bias toward using speciesist language for some nonhuman animal names. Our code for reproducing the experiments will be made available on GitHub.² 相似文献

17.

The Persistent Impact of Language on Global Operations

《普罗米修斯》2012,30(3):193-209

In studies of firms' internationalisation, language has tended to be bundled into 'cultural and psychic distance boxes'. In this article, an attempt is made to unbundle the impact of language through (a) an examination of the way in which language influences the pattern of foreign market expansion; and (b) an analysis of how a firm may try to cope with language diversity by adopting a common corporate language. We conclude that attempts to impose a common corporate language may hinder or alter information flows, knowledge transfer, and communication. 相似文献

18.

一种基于语义扩展的跨语言自动检索方法的设计

宁琳《现代情报》2014,34(1):155-158

跨语言检索是一种重要的信息检索手段之一。为了提高跨语言检索效率,采用语义扩展的方法,通过分析其设计思想和工作流程,构建出一种基于语义扩展的跨语言自动检索模型,重点对其语义扩展、知识库和结果聚类等设计进行了阐述,提出了语义理解切分法的分词方法,采用了Single-Pass算法进行聚类,实验结果表明,该模型能有效提高跨语言检索的查全率和查准率。相似文献

19.

二语习得理论模式下的语言迁移

何静令李伶俐《中国科技信息》2009,(14):49-50

语言迁移研究是二语习得的重要内容.本文以克拉申(Krashen)的第二语言习得理论输入假说(the Input Hypothesis)和标记理论为基础,指出语言迁移的因素以及标记性与语言迁移的关系. 相似文献

20.

Leveraging relevance cues for language modeling in speech recognition

Berlin Chen Kuan-Yu Chen 《Information processing & management》2013

Language modeling (LM), providing a principled mechanism to associate quantitative scores to sequences of words or tokens, has long been an interesting yet challenging problem in the field of speech and language processing. The n-gram model is still the predominant method, while a number of disparate LM methods, exploring either lexical co-occurrence or topic cues, have been developed to complement the n-gram model with some success. In this paper, we explore a novel language modeling framework built on top of the notion of relevance for speech recognition, where the relationship between a search history and the word being predicted is discovered through different granularities of semantic context for relevance modeling. Empirical experiments on a large vocabulary continuous speech recognition (LVCSR) task seem to demonstrate that the various language models deduced from our framework are very comparable to existing language models both in terms of perplexity and recognition error rate reductions. 相似文献