首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 721 毫秒
1.
It is shown that the maximum likelihood estimator of the widely used omega coefficient for reliability of multicomponent measuring instruments converges almost surely to the population reliability coefficient for normal congeneric measures with uncorrelated errors as sample size increases indefinitely. This strong consistency implies convergence in probability (consistency) as well as in distribution for the omega estimator. Strong consistency is also demonstrated for the maximal reliability estimator associated with the optimal linear combination of the instrument components. The findings of this note add (i) to the recommendation to use in the general normality case the omega estimator in empirical research, (ii) to the critical literature on the popular coefficient alpha then, and (iii) to the literature on the properties of the optimal linear combination of observed measures and the maximal reliability estimator.  相似文献   

2.
It is shown that in general the popular coefficient alpha estimator for reliability of multi-component measuring instruments converges almost surely to a quantity that is not equal to the population reliability coefficient. This convergence with probability 1 is a stronger statement than convergence in probability (consistency) and convergence in distribution for the alpha estimator, which have been studied in the past. In the special case of congeneric measures with uncorrelated errors and equal loadings on the common true score, the alpha estimator converges almost surely to the population reliability coefficient that equals population alpha, which implies also its consistency as a reliability estimator. When the loadings are unequal but sufficiently high and similar, the alpha estimator converges almost surely to population alpha that is essentially indistinguishable from the population reliability coefficient, which implies alpha’s approximate consistency then. For the general case, the results entail that the alpha estimator is not a consistent estimator of reliability. The findings add to the critical literature on coefficient alpha in the general case, as well as to the justification of its use as a dependable measuring instrument reliability estimator in special cases and settings resulting under appropriate restrictive conditions, and are illustrated using a numerical example.  相似文献   

3.
In the lead article, Davenport, Davison, Liou, & Love demonstrate the relationship among homogeneity, internal consistency, and coefficient alpha, and also distinguish among them. These distinctions are important because too often coefficient alpha—a reliability coefficient—is interpreted as an index of homogeneity or internal consistency. We argue that factor analysis should be conducted before calculating internal consistency estimates of reliability. If factor analysis indicates the assumptions underlying coefficient alpha are met, then it can be reported as a reliability coefficient. However, to the extent that items are multidimensional, alternative internal consistency reliability coefficients should be computed based on the parameter estimates of the factor model. Assuming a bifactor model evidenced good fit, and the measure was designed to assess a single construct, omega hierarchical—the proportion of variance of the total scores due to the general factor—should be presented. Omega—the proportion of variance of the total scores due to all factors—also should be reported in that it represents a more traditional view of reliability, although it is computed within a factor analytic framework. By presenting both these coefficients and potentially other omega coefficients, the reliability results are less likely to be misinterpreted.  相似文献   

4.
In repeated measure studies with unidimensional scales, measurement invariance, and specificity stability over time, the specificity variance in each instrument component can be identified. This article describes for that setting an improved point and interval estimation procedure for the maximal reliability coefficient associated with a given set of homogeneous measures. The method is developed within the framework of latent variable modeling and can also be readily used in longitudinal studies for improved point and interval estimation of individual measure reliability and scale reliability at each assessment occasion. The procedure is based on empirically testable conditions and is illustrated with an example.  相似文献   

5.
A latent variable modeling method for studying maximal reliability of unidimensional multicomponent measuring instruments with correlated errors is outlined. In the presence of correlation between 2 residual terms, the procedure allows one to point and interval estimate the reliability of the linear combination of the scale components that possesses the highest possible reliability coefficient. The approach is readily applicable with popular latent variable modeling software and also provides an alternative scoring rule to the widely used overall sum score for homogeneous psychometric scales. The discussed method is illustrated with a numerical example.  相似文献   

6.
Scales are important tools for obtaining quantitative measures of theoretical constructs. Once a set of measures to be used in a scale is selected, reliability is commonly examined in order to assess their measurement quality. To date, Cronbach’s coefficient alpha is the most commonly reported index of measurement quality for assessing scale reliability. In this paper, an asymptotic distribution of the natural estimator of coefficient alpha is derived. A new interval estimate and a statistical test on the significance of the sample estimate of the coefficient are also presented. The proposed approach is compared to four popular methods commonly used to compute confidence intervals (CI) for alpha using a Monte Carlo simulation study. An R function for implementing the proposed CI approach is also provided.  相似文献   

7.
命制的试卷是否科学、合理、有效,是考试功能能否正常发挥的基本前提之一。计算相关系数是试卷质量分析的常用方法,主要用于对试卷区分度、信度和效度的分析。这种定量分析是我们为命题工作提供较为科学的建议和意见的重要依据。  相似文献   

8.
In this study, the authors aimed to examine 8 of the different methods for computing confidence intervals around alpha that have been proposed to determine which of these, if any, is the most accurate and precise. Monte Carlo methods were used to simulate samples under known and controlled population conditions wherein the underlying item distribution is nonnormal and when the items’ responses are those of rating scales rather than dichotomous items. Overall, one can conclude that, despite concerns expressed over the use of Fisher's method for coefficient alpha, in general, it actually outperformed the other methods. Larger sample sizes and larger coefficient alphas also resulted in better band coverage, whereas smaller number of items resulted in poorer band coverage.  相似文献   

9.
应用随机-模糊概率模型,分析了钢制压力容器初始静强度在压力试验和正常操作时的可靠性。对于按我国标准设计的钢制压力容器,得到其初始静强度在压力试验和正常操作时的可靠度系数,初步解决了把可靠性理论和方法应用于压力容器工程设计的一个基础问题。  相似文献   

10.
I discuss the contribution by Davenport, Davison, Liou, & Love (2015) in which they relate reliability represented by coefficient α to formal definitions of internal consistency and unidimensionality, both proposed by Cronbach (1951). I argue that coefficient α is a lower bound to reliability and that concepts of internal consistency and unidimensionality, however defined, belong to the realm of validity, viz. the issue of what the test measures. Internal consistency and unidimensionality may play a role in the construction of tests when the theory of the attribute for which the test is constructed implies that the items be internally consistent or unidimensional. I also offer examples of attributes that do not imply internal consistency or unidimensionality, thus limiting these concepts' usefulness in practical applications.  相似文献   

11.
The survey investigated the problems of social desirability (SD), non‐response bias (NRB) and reliability in the Minnesota Multiphasic Personality Inventory – Revised (MMPI‐2) self‐report inventory administered to Brunei student teachers. Bruneians scored higher on all the validity scales than the normative US sample, thereby threatening the internal validity of the study. Of the three validity scales that assess various forms of SD, only the F scale was reliable and its mean score was in the clinical range. In addition, seven of the ten clinical scales had poor reliability. Although Brunei males scored much higher on the K scale than females, both mean scores were below the critical region. Protocols for two respondents with many missing values indicated that the study’s external validity was vulnerable to NRB effects. Altogether SD, NRB and low reliability had potential to undermine and depress the overall validity of the MMPI‐2 and caution the value of using it ‘as is’ in Brunei.  相似文献   

12.
In this ITEMS module, we provide a two‐part introduction to the topic of reliability from the perspective of classical test theory (CTT). In the first part, which is directed primarily at beginning learners, we review and build on the content presented in the original didactic ITEMS article by Traub and Rowley (1991). Specifically, we discuss the notion of reliability as an intuitive everyday concept to lay the foundation for its formalization as a reliability coefficient via the basic CTT model. We then walk through the step‐by‐step computation of key reliability indices and discuss the data collection conditions under which each is most suitable. In the second part, which is directed primarily at intermediary learners, we present a distribution‐centered perspective on the same content. We discuss the associated assumptions of various CTT models ranging from parallel to congeneric, and review how these affect the choice of reliability statistics. Throughout the module, we use a customized Excel workbook with sample data and basic data manipulation functionalities to illustrate the computation of individual statistics and to allow for structured independent exploration. In addition, we provide quiz questions with diagnostic feedback as well as short videos that walk through sample exercises within the workbook.  相似文献   

13.
This meta-analysis synthesizes the last two decades of experimental and quasi-experimental research on reading instruction across academic contexts (e.g., social studies, science, mathematics, English language arts) for English learners (ELs) in grades 4 through 8, to determine (a) the overall effectiveness of reading instruction for upper elementary and middle school students who are ELs and (b) how the magnitude of the effect varies based on student, instructional, and study characteristics. The analysis included a total of 11 studies with 46 individual effect sizes and yielded a mean effect size of g?=?0.35 across all (i.e., standardized and unstandardized) reading measures, g?=?0.01 across standardized reading measures, and g?=?0.43 across unstandardized reading measures. For all reading, unstandardized reading, all vocabulary, and unstandardized vocabulary measures, results suggest that higher quality studies tended to have smaller effects, and these effects were even more evident for unstandardized measures (i.e., one unit increase in study quality was associated with decreased effects: g?=?0.21, g?=?0.30, g?=?0.24, g?=?0.30, respectively). For all comprehension measures, effects were larger for instruction that included both vocabulary and comprehension (g?=?0.39) than for instruction that focused on vocabulary alone (g?=?0.08). Results suggest the benefit of developing and refining high-impact approaches to reading instruction for ELs that can be delivered across content areas and grades.  相似文献   

14.
Wording effect refers to the systematic method variance caused by positive and negative item wordings on a self-report measure. This Monte Carlo simulation study investigated the impact of ignoring wording effect on the reliability and validity estimates of a self-report measure. Four factors were considered in the simulation design: (a) the number of positively and negatively worded items, (b) the loadings on the trait and the wording effect factors, (c) sample size, and (d) the magnitude of population validity coefficient. The findings suggest that the unidimensional model that ignores the negative wording effect would underestimate the composite reliability and criterion-related validity, but overestimate the homogeneity coefficient. The magnitude of relative bias of the composite reliability was generally small and acceptable, whereas the relative bias for the homogeneity coefficient and criterion-related validity coefficient was negatively correlated with the strength of the general trait factor.  相似文献   

15.
对于全国性测试,经常性的评估是必不可少的。语言测试评估、有效性研究的关键是信度或一致性研究。本研究使用TEM4平行试卷,分别进行信度统计、差异分析。它不仅检验了平行测试之间的一致性问题,还在有差异的情况下,对有差异的测试或题项进行定位。这种定位对以后的测试编制、预测及拼卷将起到积极的作用。  相似文献   

16.
This article studies the difference between the criterion validity coefficient of the widely used overall scale score for a unidimensional multicomponent measuring instrument and the maximal criterion validity coefficient that is achievable with a linear combination of its components. A necessary and sufficient condition of their identity is presented in the case of measurement errors being uncorrelated among themselves and with a used criterion. An upper bound of the difference in these validity coefficients is provided, indicating that it cannot exceed the discrepancy between the maximal reliability and composite reliability indexes. A readily applicable latent variable modeling procedure is discussed that can be used for point and interval estimation of the difference between the maximal and scale criterion validity coefficients. The outlined method is illustrated with a numerical example.  相似文献   

17.
概化理论在标准化参照系测验、非标准化测验、教师教学评价和人事测评等领域得到了广泛应用。通过对某高校运筹学试题进行随机交叉设计,探讨考试试题的计分误差来源和试题可靠性。研究表明,考生通过试题所获得的分数与掌握课程的真实水平之间的差异来自于考生本身掌握知识的水平、试题难易度以及考生和试题两者的交互效应。通过计算得出在不同题型下的变异分量估计值和试题可靠值,为测评试题的区分度和稳定性提供了一种工具。  相似文献   

18.
Scores on essay‐based assessments that are part of standardized admissions tests are typically given relatively little weight in admissions decisions compared to the weight given to scores from multiple‐choice assessments. Evidence is presented to suggest that more weight should be given to these assessments. The reliability of the writing scores from two of the large volume admissions tests, the GRE General Test (GRE) and the Test of English as a Foreign Language Internet‐based test (TOEFL iBT), based on retesting with a parallel form, is comparable to the reliability of the multiple‐choice Verbal or Reading scores from those tests. Furthermore, and even more important, the writing scores from both tests are as effective as the multiple‐choice scores in predicting academic success and could contribute to fairer admissions decisions.  相似文献   

19.
A method is developed for testing a priori multiple regression models. The method allows one to specify in advance as many unstandardized or standardized coefficients as one wants to and allows the remaining slopes to be free to vary. The comparative strength and predictive power of competing models is accessed by the absolute and proportional difference in R2 and by an accompanying F test. Computer techniques for the procedures are given in the Appendix.  相似文献   

20.
本测评系统是为检测本校进行综合教育改革、推行素质教育的效果而编制的。主要从学生的创造性品质、道德水平、心理健康水平、思维力水平、责任感、理智感状况等六个方面着手 ,编成六个分量表 ,并力求使测评系统标准化 ,成为衡量学生心理素质水平的客观尺度。用以鉴别比较综合专业与普通专业学生心理素质水平的差异 ,验证教改效果  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号