首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
This article introduces a nonparametric methodology combining the strengths of binary regression and latent variable formulations, while overcoming their disadvantages. The mathematical results are implemented through a novel Bayesian Hierarchical estimation methodology called Latent Adaptive Hierarchical Expectation Maximization Like algorithm. Requiring minimal assumptions, it extends extant methodologies, and in simulation studies gives better prediction and inference performances for asymmetric data generating processes. A new classification statistic, called Adjusted Receiver Operating Curve Statistic is also introduced. Utilizing it we demonstrate better overall model fit, inference and prediction performance of the proposed methodology over widely used existing methods in the sciences. In addition, the methodology can be used to perform model diagnostics for any model specification. This is a highly useful result, and it extends existing work for categorical model diagnostics broadly across the sciences. Furthermore, the mathematical results also highlight important new findings regarding the interplay of statistical significance and scientific significance. Finally, the methodology is applied to identifying highly-cited papers in the social sciences in a joint estimation framework. The results indicate that the methodology outperforms widely used existing artificial intelligence and machine learning models with very few Monte Carlo iterations. In Scientometric application, it finds Journal Impact Factor to be more important than Keyword Popularity parameters for explaining citation outcomes in select social science fields. It further finds that the percentage change in Published Popularity may also help to explain citation outcomes in the field. The findings appear to be new to the Scientometric field.  相似文献   

2.
This paper reports on the underlying IR problems encountered when indexing and searching with the Bulgarian language. For this language we propose a general light stemmer and demonstrate that it can be quite effective, producing significantly better MAP (around + 34%) than an approach not applying stemming. We implement the GL2 model derived from the Divergence from Randomness paradigm and find its retrieval effectiveness better than other probabilistic, vector-space and language models. The resulting MAP is found to be about 50% better than the classical tf idf approach. Moreover, increasing the query size enhances the MAP by around 10% (from T to TD). In order to compare the retrieval effectiveness of our suggested stopword list and the light stemmer developed for the Bulgarian language, we conduct a set of experiments on another stopword list and also a more complex and aggressive stemmer. Results tend to indicate that there is no statistically significant difference between these variants and our suggested approach. This paper evaluates other indexing strategies such as 4-gram indexing and indexing based on the automatic decompounding of compound words. Finally, we analyze certain queries to discover why we obtained poor results, when indexing Bulgarian documents using the suggested word-based approach.  相似文献   

3.
A usual strategy to implement CLIR (Cross-Language Information Retrieval) systems is the so-called query translation approach. The user query is translated for each language present in the multilingual collection in order to compute an independent monolingual information retrieval process per language. Thus, this approach divides documents according to language. In this way, we obtain as many different collections as languages. After searching in these corpora and obtaining a result list per language, we must merge them in order to provide a single list of retrieved articles. In this paper, we propose an approach to obtain a single list of relevant documents for CLIR systems driven by query translation. This approach, which we call 2-step RSV (RSV: Retrieval Status Value), is based on the re-indexing of the retrieval documents according to the query vocabulary, and it performs noticeably better than traditional methods. The proposed method requires query vocabulary alignment: given a word for a given query, we must know the translation or translations to the other languages. Because this is not always possible, we have researched on a mixed model. This mixed model is applied in order to deal with queries with partial word-level alignment. The results prove that even in this scenario, 2-step RSV performs better than traditional merging methods.  相似文献   

4.
In the information retrieval process, functions that rank documents according to their estimated relevance to a query typically regard query terms as being independent. However, it is often the joint presence of query terms that is of interest to the user, which is overlooked when matching independent terms. One feature that can be used to express the relatedness of co-occurring terms is their proximity in text. In past research, models that are trained on the proximity information in a collection have performed better than models that are not estimated on data. We analyzed how co-occurring query terms can be used to estimate the relevance of documents based on their distance in text, which is used to extend a unigram ranking function with a proximity model that accumulates the scores of all occurring term combinations. This proximity model is more practical than existing models, since it does not require any co-occurrence statistics, it obviates the need to tune additional parameters, and has a retrieval speed close to competing models. We show that this approach is more robust than existing models, on both Web and newswire corpora, and on average performs equal or better than existing proximity models across collections.  相似文献   

5.
This paper describes and evaluates different retrieval strategies that are useful for search operations on document collections written in various European languages, namely French, Italian, Spanish and German. We also suggest and evaluate different query translation schemes based on freely available translation resources. In order to cross language barriers, we propose a combined query translation approach that has resulted in interesting retrieval effectiveness. Finally, we suggest a collection merging strategy based on logistic regression that tends to perform better than other merging approaches.  相似文献   

6.
Open government data (OGD) has attracted widespread attention and has been widely carried out on a global scale. With further promotion, OGD performance becomes a hot topic and meaningful enough for in-depth exploration. This research focuses on the influential factors and generation mechanisms of OGD performance. Based on the resource-based theory and institutional theory, this paper constructs a model from multiple dimensions of internal resources and external pressures. Subsequently, from the 122 cities in China that have constructed OGD platforms, this study adopts a mixed research methods approach, which combines the regression analysis method and qualitative comparative analysis (QCA). The regression analysis results show that the organization arrangement, legal and policy, and horizontal pressure have direct positive effects on OGD performance. On this basis, this paper use QCA method to explore the configuration paths for the generation of OGD performance of cities in different geographic regions and at different administrative ranks levels. The QCA results provide different configuration paths to achieve better OGD performance, which verified the conclusions drawn by the regression analysis, also provides alternative paths for governments with different characteristics. This paper enriches the studies on OGD performance and provides more targeted paths together with references for the implementation of OGD.  相似文献   

7.
医学期刊中容易误用的统计学方法辨析   总被引:1,自引:0,他引:1  
对医学期刊中容易误用的几种统计学方法进行辨析,包括重复测量方差分析的误用、非参数检验的误用,并对多元线性回归与Logistic回归进行了差异性分析,使编辑能够分辨论文统计方法的正误,并作出正确处理,以提高论文的科学性和期刊质量。  相似文献   

8.
Citations are increasingly used for research evaluations. It is therefore important to identify factors affecting citation scores that are unrelated to scholarly quality or usefulness so that these can be taken into account. Regression is the most powerful statistical technique to identify these factors and hence it is important to identify the best regression strategy for citation data. Citation counts tend to follow a discrete lognormal distribution and, in the absence of alternatives, have been investigated with negative binomial regression. Using simulated discrete lognormal data (continuous lognormal data rounded to the nearest integer) this article shows that a better strategy is to add one to the citations, take their log and then use the general linear (ordinary least squares) model for regression (e.g., multiple linear regression, ANOVA), or to use the generalised linear model without the log. Reasonable results can also be obtained if all the zero citations are discarded, the log is taken of the remaining citation counts and then the general linear model is used, or if the generalised linear model is used with the continuous lognormal distribution. Similar approaches are recommended for altmetric data, if it proves to be lognormally distributed.  相似文献   

9.
提出一种基于基本要素方法的中文自动文本摘要模型(BESM)。该模型主要借鉴基本要素的思想进行建立,和单纯的基于词的自动文摘模型相比,它将语义信息作为评估句子重要程度的一部分,实现基本要素中提出的将语义信息和统计方法的结合。通过与普通方法的实例对比,突出基本要素方法的优越性和BESM模型的可行性。  相似文献   

10.
王若佳  李培 《图书情报工作》2016,60(18):122-132
[目的/意义] 分析国内互联网搜索数据和我国流感疫情的相关性,探讨利用搜索数据辅助流行病监测的应用可能,为相关搜索引擎和疾病防控中心提供参考。[方法/过程] 通过分析百度中文搜索词搜索情况和我国流感活动情况的相关性,选择合适的搜索关键词,构建并比较一元线性回归、多元线性回归、主成分回归及人工神经网络模型,选出最优模型;引入官方发布的流感监测历史信息,进行模型优化。[结果/结论] 多元线性回归和人工神经网络模型具有更好的拟合优度,其中多元线性回归的精度更高;主成分回归模型在理论上可以减少变量之间的共线性,但实践证明无论是其拟合效果还是监测效果相对于多元回归模型来说都有所下降;历史数据和搜索数据包含的信息具有一定程度的互补性,综合使用两种数据具有最好的监测效果。  相似文献   

11.
12.
Public organizations often face numerous barriers when it comes to adopting and using social media to communicate and engage with the broader public. This research aims to better understand how barriers to social media adoption can be tackled by zooming in on one specific type of organization: the police. Our research answers the following question: to what effect do police forces manage barriers to the adoption of social media with social media policies? Firstly, by systematically reviewing previous studies using a typology of barriers to ICT adoption, this study identifies the types of barriers that the police are often faced with. Secondly, by qualitatively analyzing two frontrunner cases, the United Kingdom and the Netherlands, this study analyzes how social media policies address and can help overcome these barriers. The empirical analysis indicates that in addressing barriers to social media adoption, a combination of exploration and exploitation is needed to address both structural and cultural barriers to social media adoption. We argue that this fits an approach of the ‘perpetual beta’: ongoing technological innovation requires organization capacity to continuously adapt to socio-technical change.  相似文献   

13.
The challenges posed by the increased prevalence of Alzheimer's dementia (AD) demonstrate a need to understand this social issue better. A case study is used to critique Wilson's (Applied Nursing Research, 2(1), 40–44, 1989) model of family caregiving for a person with AD. Although the case fits the model in some ways, these data suggest that the model may ignore nontraditional family situations. Suggestions for future research involve modifying the model or creating multiple caregiver models.  相似文献   

14.
张静  廖芹  楼宏青 《图书馆论坛》2002,22(6):98-100
高校图书馆的综合服务水平直接影响高校教学和科研工作的进行。恰当评估高校图书馆的办馆效益并进行优势控制,达到促进综合服务水平的提高,是高校图书馆面临的问题。本文在高校图书馆传统效益评估的基础上,建立适合定性、定量评估指标的效益评估神经网络模型。经实际检验,证明该评估方法比较传统方法更有效。  相似文献   

15.
A researcher's Q denotes their ability in scientific research as a real number. Due to their short presence in the academic environment, junior researchers have unstable Q values. This article aims to present a model that uses data from junior researchers’ first years of publication to predict their stable Q values. We tested the deep model and the linear regression model and compared their accuracies. We have obtained reliable results showing that the predicted values estimated with both models are better than the estimated Q values computed with the Q model itself when using only data from the first five years of publication. Lastly, we note that both approaches are robust approaches to deal with the inflation of citation bias.  相似文献   

16.
Given the explosive growth of digital collections and the digital transformation of services in libraries, the need for management education in library and information science programs has grown. However, I contend in this article that confusion abounds like the “Tower of Babel” for LIS programs to accomplish this goal. A one size fits all approach does not work. Instead, management theory and principles need to be LIS-centric in accord with present day needs and conceptions. Management education also needs to be linked with scientific rigor to practice to meet the needs and expectations of LIS students. Rather than delivering management education that speaks the language of business, the content needs to resonate with the conditions for managing in the LIS context and to be supportive of the values of the LIS field. The article provides examples from research on management and innovation in libraries and other information-rich contexts. It also illustrates the types of issues that management education in LIS programs should address to deepen our understanding of the essential principles needed to manage information and knowledge.  相似文献   

17.
Is more always better? We address this question in the context of bibliometric indices that aim to assess the scientific impact of individual researchers by counting their number of highly cited publications. We propose a simple model in which the number of citations of a publication depends not only on the scientific impact of the publication but also on other ‘random’ factors. Our model indicates that more need not always be better. It turns out that the most influential researchers may have a systematically lower performance, in terms of highly cited publications, than some of their less influential colleagues. The model also suggests an improved way of counting highly cited publications.  相似文献   

18.
This study develops regression models for predicting the performance of cross-language information retrieval (CLIR). The model assumes that CLIR performance can be explained by two factors: (1) the ease of search inherent in each query and (2) the translation quality in the process of CLIR systems. As operational variables, monolingual information retrieval (IR) performance is used for measuring the ease of search, and the well-known evaluation metric BLEU is used to measure the translation quality. This study also proposes an alternative metric, weighted average for matched unigrams (WAMU), which is tailored to gauging translation quality for special IR purposes. The data for regression analysis are obtained from a retrieval experiment of English-to-Italian bilingual searches using the CLEF 2003 test collection. The CLIR and monolingual IR performances are measured by average precision score. The result shows that the proposed regression model can explain about 60% of the variation in CLIR performance, and WAMU has more predictive power than BLEU. A back translation method for applying the regression model to operational CLIR systems in real situations is discussed.  相似文献   

19.
Research articles are being shared in increasing numbers on multiple online platforms. Although the scholarly impact of these articles has been widely studied, the online interest determined by how long the research articles are shared online remains unclear. Being cognizant of how long a research article is mentioned online could be valuable information to the researchers. In this paper, we analyzed multiple social media platforms on which users share and/or discuss scholarly articles. We built three clusters for papers, based on the number of yearly online mentions having publication dates ranging from the year 1920 to 2016. Using the online social media metrics for each of these three clusters, we built machine learning models to predict the long-term online interest in research articles. We addressed the prediction task with two different approaches: regression and classification. For the regression approach, the Multi-Layer Perceptron model performed best, and for the classification approach, the tree-based models performed better than other models. We found that old articles are most evident in the contexts of economics and industry (i.e., patents). In contrast, recently published articles are most evident in research platforms (i.e., Mendeley) followed by social media platforms (i.e., Twitter).  相似文献   

20.
This article advocates the development of a sustainable, structured, and open data model for periodical research. Considering the recent digital turn in periodical studies, it argues that a data model embracing Linked Open Data practices will not only facilitate collaboration among periodical scholars across language boundaries but also contribute to a better understanding of what periodicals are and how the relationships among periodicals may evolve over time. By way of illustration, the article presents the data model developed in the context of an ongoing five-year research project on women editors and their periodicals in early-eighteenth- to early-twentieth-century Europe.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号