首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Language modeling is an effective and theoretically attractive probabilistic framework for text information retrieval. The basic idea of this approach is to estimate a language model of a given document (or document set), and then do retrieval or classification based on this model. A common language modeling approach assumes the data D is generated from a mixture of several language models. The core problem is to find the maximum likelihood estimation of one language model mixture, given the fixed mixture weights and the other language model mixture. The EM algorithm is usually used to find the solution.  相似文献   

2.
Warning: This paper contains examples of offensive language, including insulting or objectifying expressions.Various existing studies have analyzed what social biases are inherited by NLP models. These biases may directly or indirectly harm people, therefore previous studies have focused only on human attributes. However, until recently no research on social biases in NLP regarding nonhumans existed. In this paper,1 we analyze biases to nonhuman animals, i.e. speciesist bias, inherent in English Masked Language Models such as BERT. We analyzed speciesist bias against 46 animal names using template-based and corpus-extracted sentences containing speciesist (or non-speciesist) language. We found that pre-trained masked language models tend to associate harmful words with nonhuman animals and have a bias toward using speciesist language for some nonhuman animal names. Our code for reproducing the experiments will be made available on GitHub.2  相似文献   

3.
Warning: This paper contains abusive samples that may cause discomfort to readers.Abusive language on social media reinforces prejudice against an individual or a specific group of people, which greatly hampers freedom of expression. With the rise of large-scale pre-trained language models, classification based on pre-trained language models has gradually become a paradigm for automatic abusive language detection. However, the effect of stereotypes inherent in language models on the detection of abusive language remains unknown, although this may further reinforce biases against the minorities. To this end, in this paper, we use multiple metrics to measure the presence of bias in language models and analyze the impact of these inherent biases in automatic abusive language detection. On the basis of this quantitative analysis, we propose two different debiasing strategies, token debiasing and sentence debiasing, which are jointly applied to reduce the bias of language models in abusive language detection without degrading the classification performance. Specifically, for the token debiasing strategy, we reduce the discrimination of the language model against protected attribute terms of a certain group by random probability estimation. For the sentence debiasing strategy, we replace protected attribute terms and augment the original text by counterfactual augmentation to obtain debiased samples, and use the consistency regularization between the original data and the augmented samples to eliminate the bias at the sentence level of the language model. The experimental results confirm that our method can not only reduce the bias of the language model in the abusive language detection task, but also effectively improve the performance of abusive language detection.  相似文献   

4.
Language modeling (LM), providing a principled mechanism to associate quantitative scores to sequences of words or tokens, has long been an interesting yet challenging problem in the field of speech and language processing. The n-gram model is still the predominant method, while a number of disparate LM methods, exploring either lexical co-occurrence or topic cues, have been developed to complement the n-gram model with some success. In this paper, we explore a novel language modeling framework built on top of the notion of relevance for speech recognition, where the relationship between a search history and the word being predicted is discovered through different granularities of semantic context for relevance modeling. Empirical experiments on a large vocabulary continuous speech recognition (LVCSR) task seem to demonstrate that the various language models deduced from our framework are very comparable to existing language models both in terms of perplexity and recognition error rate reductions.  相似文献   

5.

This article provides a framework by which rival firms' incentives for interconnection in unregulated telecommunications markets may be analyzed and argues that the widespread voluntary interconnection observed among Internet service providers (ISPs) is anomalous when compared with examples of other similar markets from U.S. industrial history. However, the fact that it is anomalous provides an opportunity to test common explanations and to explore new explanations for the remarkable connectivity observed among ISPs through a comparative analysis. The comparative analysis reveals that (1) network effects and competitive forces in telecommunications markets will not necessarily drive firms to interconnect their networks voluntarily as there are other options to them, and (2) government actions played an important role in shaping the interconnection behavior competing firms in telecommunications markets. The article then explores some of the implications of these findings for telecommunications policy, and interconnection regulation in particular.  相似文献   

6.
《普罗米修斯》2012,30(1):23-40

Much of the Knowledge Management (KM) literature assumes that all relevant knowledge can be represented as information and 'managed'. But the meaning of information is always context-specific and open to subsequent reinterpretation. Moving over time or between contexts affords scope for new meanings to emerge. Making sense of information signals (speech, body language, tone-of-voice or whatever)--Aand the absence of such signals--Ainvolves dimensions of individual and collective tacit knowledge that are frequently misrepresented or ignored in mainstream KM. By relating power and knowledge to 'rules of the game', it is possible to consider how the contexts in which information is rendered meaningful are bounded, as well as crucially related in the stretch between macro-level processes and micro-level practices. In the knowledge debate, Japan stands as a counterfactual to Anglo-Saxon expectations about formal rules, liberal individualism and market-rational entrepreneurship. While seminal accounts of knowledge creation in Japanese companies impelled the West towards KM, there has been no corresponding KM-boom in Japan. Our interpretation of the processes by which Japanese and Anglo-Saxon practices are situated suggests that KM is limited by the separation of knowledge from power and information from meaning.  相似文献   

7.
芮蓉 《中国科技纵横》2014,(20):283-284
标志是视觉语言符号,能够以其特殊的视觉效果传递明确信息。在经济快速发展的今天,企业的形象、文化以及经营战略都与之有着密不可分的关系,它不仅仅是一个符号、一幅组图,更是一种象征品牌内涵的艺术。如何设计出令人印象深刻且蕴含现代设计风格的标志,是本文主要探索分析的问题。  相似文献   

8.
With the explosion of multilingual content on Web, particularly in social media platforms, identification of languages present in the text is becoming an important task for various applications. While automatic language identification (ALI) in social media text is considered to be a non-trivial task due to the presence of slang words, misspellings, creative spellings and special elements such as hashtags, user mentions etc., ALI in multilingual environment becomes even more challenging task. In a highly multilingual society, code-mixing without affecting the underlying language sense has become a natural phenomenon. In such a dynamic environment, conversational text alone often fails to identify the underlying languages present in the text. This paper proposes various methods of exploiting social conversational features for enhancing ALI performance. Although social conversational features for ALI have been explored previously using methods like probabilistic language modeling, these models often fail to address issues related to code-mixing, phonetic typing, out-of-vocabulary etc. which are prevalent in a highly multilingual environment. This paper differs in the way the social conversational features are used to propose text refinement strategies that are suitable for ALI in highly multilingual environment. The contributions in this paper therefore includes the following. First, this paper analyzes the characteristics of various social conversational features by exploiting language usage patterns. Second, various methods of text refinement suitable for language identification are proposed. Third, the effects of the proposed refinement methods are investigated using various sentence level language identification frameworks. From various experimental observations over three conversational datasets collected from Facebook, Youtube and Twitter social media platforms, it is evident that our proposed method of ALI using social conversational features outperforms the baseline counterparts.  相似文献   

9.
《普罗米修斯》2012,30(1):75-91

In April 1997, Tasmania (Australia) adopted the reputably successful New Brunswick (Canada) industrial strategy to build an information technology (IT) industry of significance. The strategy aims to overcome isolation in small regional economies and structurally change from declining natural resource industries. Both plans reject neo-classical economics-based industry policy, opting instead for a strong state-based investment planning approach. An analytical framework is set out, using Adolph Lowe's 'Instrumental Analysis', to examine implementation of both IT strategies. Implications of this analysis are drawn for any attempts at developing IT regional plans and, more generally, as a guide for broad strategic-based national industrial strategies.  相似文献   

10.
《Research Policy》2021,50(10):104347
We compare individuals presently employed either at a university, or at a firm from an R&D-intensive sector, and analyze which of their personal-specific and employer-specific characteristics are related to their choice to leave their present employer for an own startup. Our data set combines the population of Danish employees with their present employers. We focus on persons who at least hold a Bachelor's degree in engineering, sciences and health and track them over 2001-2012. We show that (i) there are overall few differences between the characteristics of university and corporate startup entrepreneurs, (ii) common factors associated with startup activity of both university and corporate employees are education, top management team membership, previous job mobility and being male, (iii) it is primarily human capital-related characteristics that are related to startup choice of university employees while (iv) the characteristics of the present workplace are the foremost factors of entrepreneurial activity by corporate employees.  相似文献   

11.
Abstract

A range of interlocking liability issues has arisen in recent years in connection with state and local governmental handling of crisis situations. As such authorities strive to anticipate and cope with impending disaster, when advance action is an option, the result of their decisions often directly affects the safety of the citizenry and the degree to which property is protected. In particular, the question of government liability for failing to use available information in fulfillment of its disaster‐related responsibilities is explored.  相似文献   

12.
Document filtering (DF) and document classification (DC) are often integrated together to classify suitable documents into suitable categories. A popular way to achieve integrated DF and DC is to associate each category with a threshold. A document d may be classified into a category c only if its degree of acceptance (DOA) with respect to c is higher than the threshold of c. Therefore, tuning a proper threshold for each category is essential. A threshold that is too high (low) may mislead the classifier to reject (accept) too many documents. Unfortunately, thresholding is often based on the classifier's DOA estimations, which cannot always be reliable, due to two common phenomena: (1) the DOA estimations made by the classifier cannot always be correct, and (2) not all documents may be classified without any controversy. Unreliable estimations are actually noises that may mislead the thresholding process. In this paper, we present an adaptive and parameter-free technique AS4T to sample reliable DOA estimations for thresholding. AS4T operates by adapting to the classifier's status, without needing to define any parameters. Experimental results show that, by helping to derive more proper thresholds, AS4T may guide various classifiers to achieve significantly better and more stable performances under different circumstances. The contributions are of practical significance for real-world integrated DF and DC.  相似文献   

13.
Term weighting for document ranking and retrieval has been an important research topic in information retrieval for decades. We propose a novel term weighting method based on a hypothesis that a term’s role in accumulated retrieval sessions in the past affects its general importance regardless. It utilizes availability of past retrieval results consisting of the queries that contain a particular term, retrieved documents, and their relevance judgments. A term’s evidential weight, as we propose in this paper, depends on the degree to which the mean frequency values for the relevant and non-relevant document distributions in the past are different. More precisely, it takes into account the rankings and similarity values of the relevant and non-relevant documents. Our experimental result using standard test collections shows that the proposed term weighting scheme improves conventional TF*IDF and language model based schemes. It indicates that evidential term weights bring in a new aspect of term importance and complement the collection statistics based on TF*IDF. We also show how the proposed term weighting scheme based on the notion of evidential weights are related to the well-known weighting schemes based on language modeling and probabilistic models.  相似文献   

14.
15.
Abstract

Recent interest has been expressed in the potential of information technology to create new kinds of monopolies. This paper looks at production and marketing factors in the information services industry which may increase concentration in the hands of fewer producers, potentially leading to monopoly formation. The research develops an economic model of topic‐specific market concentration and delineates the factors which might cause monopolies to occur in the markets of information data base production firms. The model shows that market concentration rises with inelastic demand, reduced marginal costs and efficient technology, and increased data acquisition costs exacerbated by low rates of data obsolescence. These effects are empirically investigated in the DIALOG group of data bases. The results of the research have implications for corporate information systems and information systems in the public sector.  相似文献   

16.
BackgroundEthylene plays an important role in the regulation of floral organ development in soybean, and 1-aminocyclopropane-1-carboxylate synthase (ACS) is a rate-limiting enzyme for ethylene biosynthesis. However, whether ACS also regulates floral organ differentiation in soybean remains unknown. To address this, we constructed an RNAi vector to inhibit ACS expression in cotyledonary nodes. Linear DNA cassettes of RNAi-ACS obtained by PCR were used to transform soybean cotyledonary nodes.ResultsIn total, 131 of 139 transiently transformed plants acquired herbicide resistance and displayed GUS activities in the new buds. In comparison to untransformed seedling controls, a greater number of flower buds were differentiated at the cotyledonary node; GM-ACS1 mRNA expression levels and ethylene emission in the transformed buds were reduced.ConclusionThese results indicate that the cotyledonary node transient transformation system may be suitable for stable transformation and that the inhibition of ACS expression may be an effective strategy for promoting floral organ differentiation in soybean.  相似文献   

17.
《普罗米修斯》2012,30(4):377-393

Silicon Valley in Southern California has, over the past 30 years, become a model for high technology development in many parts of the world. Associated with Silicon Valley is a common rhetoric and mythology that explains the origins of this area of high technology agglomeration and indeed the business and entrepreneurial attributes needed for success. Governments in many parts of the world (including Southeast Asia and Australia) have tried to emulate this growth through various industry and regional development mechanisms, in particular, the science or technology park. More recently, promoting developments in information technology has come to be seen as an integral feature of these parks' activities. In this paper, we argue that the modeling process used by governments to promote Silicon Valley-like regional development has tended to model the wrong things about Silicon Valley. The models have tended to be mechanical and have failed to reflect the nature of information and information industries. While we have not sought to develop a model for Silicon Valley in this paper,we address a number of issues that require attention on the part of anyone serious about this project. After discussing problems with previous attempts to model Silicon Valley and problems associated with the activity of modeling itself, we move to consider four issues that must be addressed in any real attempt to model Silicon Valley in Southeast Asia. The first is the role of the state and the problems that state involvement may create. The second concerns the contribution that universities can make to the project. The third is the role of firms, particularly Chinese firms. The fourth is the cultural context within which the 'model' will sit. Since technology parks are seen as a popular way of promoting high technology development by governments, the revised history suggested in this paper provides fresh thinking about modeling Silicon Valley in the Southeast Asian region.  相似文献   

18.
《普罗米修斯》2012,30(1):58-73

In spite of the advances in information and communication technologies, the implementation of teleworking is still behind early expectations. The slow adoption of teleworking may be explained by different organizational drivers that influence its implementation. This article reports the empirical findings of a survey conducted among a sample of Spanish companies to identify potential drivers and constraints based on top manager and institutional perspectives. The results indicate that the potential of teleworking is influenced by the manager's perception of teleworking benefits and barriers, the manager's tenure, the company's use of information and communication technologies, the company's degree of innovation, the proportion of salespeople, women and middle-age employees in the workforce, and the company size. Top manager factors seem to have more influence in the decision to adopt teleworking, while institutional factors are more significant in the potential diffusion in the company.  相似文献   

19.
Conversational Recommendation Systems (CRSs) have recently started to leverage pretrained language models (LM) such as BERT for their ability to semantically interpret a wide range of preference statement variations. However, pretrained LMs are prone to intrinsic biases in their training data, which may be exacerbated by biases embedded in domain-specific language data (e.g., user reviews) used to fine-tune LMs for CRSs. We study a simple LM-driven recommendation backbone (termed LMRec) of a CRS to investigate how unintended bias — i.e., bias due to language variations such as name references or indirect indicators of sexual orientation or location that should not affect recommendations — manifests in substantially shifted price and category distributions of restaurant recommendations. For example, offhand mention of names associated with the black community substantially lowers the price distribution of recommended restaurants, while offhand mentions of common male-associated names lead to an increase in recommended alcohol-serving establishments. While these results raise red flags regarding a range of previously undocumented unintended biases that can occur in LM-driven CRSs, there is fortunately a silver lining: we show that train side masking and test side neutralization of non-preferential entities nullifies the observed biases without significantly impacting recommendation performance.  相似文献   

20.
《普罗米修斯》2012,30(4):437-452

This paper explores the complexity of public/private identities in the emerging global economies of gene sequence mapping and analysis. In so doing we seek to offer a less over-determined acccount of what it means to describe institutional actors as either 'public' or 'private'. Instead, these 'codes' can be seen to offer actors a means of mutual positioning that, more usually conceals broader interdependencies within the world's bioinformatics networks.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号