首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
4.
5.
It is quite universally acknowledged by bioethicists, at least in the western world, that respect for the patients’ autonomy, non-malevolence, beneficence, and justice (also called equity) are four core ethical values in medicine. The Ethics Guidelines of key journals in laboratory medicine are not explicit about the first three of these values, and even implicitly, they seem to miss values of justice. Health equity being one of the main objectives of public health policy across the world, we suggest that values of equity explicitly become part of the Ethics Guidelines of laboratory medicine journals. Biochemia Medica could show the way to other medical publishers by incorporating into its Ethics Guidelines these very important core bioethical values.  相似文献   

6.
7.
8.
9.
10.
11.
Text categorization pertains to the automatic learning of a text categorization model from a training set of preclassified documents on the basis of their contents and the subsequent assignment of unclassified documents to appropriate categories. Most existing text categorization techniques deal with monolingual documents (i.e., written in the same language) during the learning of the text categorization model and category assignment (or prediction) for unclassified documents. However, with the globalization of business environments and advances in Internet technology, an organization or individual may generate and organize into categories documents in one language and subsequently archive documents in different languages into existing categories, which necessitate cross-lingual text categorization (CLTC). Specifically, cross-lingual text categorization deals with learning a text categorization model from a set of training documents written in one language (e.g., L1) and then classifying new documents in a different language (e.g., L2). Motivated by the significance of this demand, this study aims to design a CLTC technique with two different category assignment methods, namely, individual- and cluster-based. Using monolingual text categorization as a performance reference, our empirical evaluation results demonstrate the cross-lingual capability of the proposed CLTC technique. Moreover, the classification accuracy achieved by the cluster-based category assignment method is statistically significantly higher than that attained by the individual-based method.  相似文献   

12.
13.
14.
The documents retrieved by a web search are useful if the information they contain contributes to some task or information need. To measure search result utility, studies have typically focused on perceived usefulness rather than on actual information use. We investigate the actual usefulness of search results—as indicated by their use as sources in an extensive writing task—and the factors that make a writer successful at retrieving useful sources. Our data comprise 150 essays written by 12 writers whose querying, clicking and writing activities were recorded. By tracking authors’ text reuse behavior, we quantify the search results’ contribution to the task more accurately than before. We model the overall utility of the search results retrieved throughout the writing process using path analysis, and compare a binary utility model (Reuse Events) to one that quantifies a degree of utility (Reuse Amount). The Reuse Events model has greater explanatory power (63% vs. 48%); in both models, the number of clicks is by far the strongest predictor of useful results—with β-coefficients up to 0.7—while dwell time has a negative effect (β between −0.14 and −0.21). As a conclusion, we propose a new measure of search result usefulness based on a source’s contribution to an evolving text. Our findings are valid for tasks where text reuse is allowed, but also have implications on designing indicators of search result usefulness for general writing tasks.  相似文献   

15.
GPS-enabled devices and social media popularity have created an unprecedented opportunity for researchers to collect, explore, and analyze text data with fine-grained spatial and temporal metadata. In this sense, text, time and space are different domains with their own representation scales and methods. This poses a challenge on how to detect relevant patterns that may only arise from the combination of text with spatio-temporal elements. In particular, spatio-temporal textual data representation has relied on feature embedding techniques. This can limit a model’s expressiveness for representing certain patterns extracted from the sequence structure of textual data. To deal with the aforementioned problems, we propose an Acceptor recurrent neural network model that jointly models spatio-temporal textual data. Our goal is to focus on representing the mutual influence and relationships that can exist between written language and the time-and-place where it was produced. We represent space, time, and text as tuples, and use pairs of elements to predict a third one. This results in three predictive tasks that are trained simultaneously. We conduct experiments on two social media datasets and on a crime dataset; we use Mean Reciprocal Rank as evaluation metric. Our experiments show that our model outperforms state-of-the-art methods ranging from a 5.5% to a 24.7% improvement for location and time prediction.  相似文献   

16.
In this paper we focus on the problem of question ranking in community question answering (cQA) forums in Arabic. We address the task with machine learning algorithms using advanced Arabic text representations. The latter are obtained by applying tree kernels to constituency parse trees combined with textual similarities, including word embeddings. Our two main contributions are: (i) an Arabic language processing pipeline based on UIMA—from segmentation to constituency parsing—built on top of Farasa, a state-of-the-art Arabic language processing toolkit; and (ii) the application of long short-term memory neural networks to identify the best text fragments in questions to be used in our tree-kernel-based ranker. Our thorough experimentation on a recently released cQA dataset shows that the Arabic linguistic processing provided by Farasa produces strong results and that neural networks combined with tree kernels further boost the performance in terms of both efficiency and accuracy. Our approach also enables an implicit comparison between different processing pipelines as our tests on Farasa and Stanford parsers demonstrate.  相似文献   

17.
In the present work we perform compressed pattern matching in binary Huffman encoded texts [Huffman, D. (1952). A method for the construction of minimum redundancy codes, Proc. of the IRE, 40, 1098–1101]. A modified Knuth–Morris–Pratt algorithm is used in order to overcome the problem of false matches, i.e., an occurrence of the encoded pattern in the encoded text that does not correspond to an occurrence of the pattern itself in the original text. We propose a bitwise KMP algorithm that can move one extra bit in the case of a mismatch since the alphabet is binary. To avoid processing any bit of the encoded text more than once, a preprocessed table is used to determine how far to back up when a mismatch is detected, and is defined so that we are always able to align the start of the encoded pattern with the start of a codeword in the encoded text. We combine our KMP algorithm with two practical Huffman decoding schemes which handle more than a single bit per machine operation; skeleton trees defined by Klein [Klein, S. T. (2000). Skeleton trees for efficient decoding of huffman encoded texts. Information Retrieval, 3, 7–23], and numerical comparisons between special canonical values and portions of a sliding window presented in Moffat and Turpin [Moffat, A., & Turpin, A. (1997). On the implementation of minimum redundancy prefix codes. IEEE Transactions on Communications, 45, 1200–1207]. Experiments show rapid search times of our algorithms compared to the “decompress then search” method, therefore, files can be kept in their compressed form, saving memory space. When compression gain is important, these algorithms are better than cgrep [Ferragina, P., Tommasi, A., & Manzini, G. (2004). C Library to search over compressed texts, http://roquefort.di.unipi.it/~ferrax/CompressedSearch], which is only slightly faster than ours.  相似文献   

18.
The absence of diacritics in text documents or search queries is a serious problem for Turkish information retrieval because it creates homographic ambiguity. Thus, the inappropriate handling of diacritics reduces the retrieval performance in search engines. A straightforward solution to this problem is to normalize tokens by replacing diacritic characters with their American Standard Code for Information Interchange (ASCII) counterparts. However, this so-called ASCIIfication produces either synthetic words that are not legitimate Turkish words or legitimate words with meanings that are completely different from those of the original words. These non-valid synthetic words cannot be processed by morphological analysis components (such as stemmers or lemmatizers), which expect the input to be valid Turkish words. By contrast, synthetic words are not a problem when no stemmer or a simple first-n-characters-stemmer is used in the text analysis pipeline. This difference emphasizes the notion of the diacritic sensitivity of stemmers. In this study, we propose and evaluate an alternative solution based on the application of deASCIIfication, which restores accented letters in query terms or text documents. Our risk-sensitive evaluation results showed that the diacritics restoration approach yielded more effective and robust results compared with normalizing tokens to remove diacritics.  相似文献   

19.
This paper presents an empirical analysis of the determinants of research cooperation between firms and Public research organisations (PROs) for a sample of innovating small and medium-sized enterprises (SMEs). The econometric analysis is based on the results of the KNOW survey carried out in seven EU countries during 2000. In contrast to earlier works that provide information about the importance of PROs’ research, we know the number of firm/PRO collaborative research and development (R&D) projects. This allows us to study the determinants of firm collaboration with PROs in terms of both the propensity of a firm to undertake R&D projects with a university (do they cooperate or not) and the extent of this collaboration (number of R&D projects). Two questions are addressed. Which firms cooperated with PROs? And what are the firm characteristics that might explain the number of R&D projects with PROs? The results of our analysis point to two major phenomena. First, the propensity to forge an agreement with an academic partner depends on the ‘absolute size’ of the industrial partner. Second the openness of firms to the external environment, as measured by their willingness to search, screen and signal, significantly affects the development of R&D projects with PROs. Our findings suggest that acquiring knowledge through the screening of publications and involvement in public policies positively affects the probability of signing an agreement with a PRO, but not the number of R&D projects developed. In fact, firms that outsource research and development, and patent to protect innovation and to signal competencies show higher levels of collaboration.  相似文献   

20.
Sentiment analysis concerns the study of opinions expressed in a text. Due to the huge amount of reviews, sentiment analysis plays a basic role to extract significant information and overall sentiment orientation of reviews. In this paper, we present a deep-learning-based method to classify a user's opinion expressed in reviews (called RNSA).To the best of our knowledge, a deep learning-based method in which a unified feature set which is representative of word embedding, sentiment knowledge, sentiment shifter rules, statistical and linguistic knowledge, has not been thoroughly studied for a sentiment analysis. The RNSA employs the Recurrent Neural Network (RNN) which is composed by Long Short-Term Memory (LSTM) to take advantage of sequential processing and overcome several flaws in traditional methods, where order and information about the word are vanished. Furthermore, it uses sentiment knowledge, sentiment shifter rules and multiple strategies to overcome the following drawbacks: words with similar semantic context but opposite sentiment polarity; contextual polarity; sentence types; word coverage limit of an individual lexicon; word sense variations. To verify the effectiveness of our work, we conduct sentence-level sentiment classification on large-scale review datasets. We obtained encouraging result. Experimental results show that (1) feature vectors in terms of (a) statistical, linguistic and sentiment knowledge, (b) sentiment shifter rules and (c) word-embedding can improve the classification accuracy of sentence-level sentiment analysis; (2) our method that learns from this unified feature set can obtain significant performance than one that learns from a feature subset; (3) our neural model yields superior performance improvements in comparison with other well-known approaches in the literature.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号