首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this study, we compared 12 statistical strategies proposed for selecting loglinear models for smoothing univariate test score distributions and for enhancing the stability of equipercentile equating functions. The major focus was on evaluating the effects of the selection strategies on equating function accuracy. Selection strategies' influence on the estimation of cumulative test score distributions was also assessed. The results of this simulation study differentiate the selection strategies and define the situations where their use has the most important implications for equating function accuracy. The recommended strategy for estimating test score distributions and for equating is AIC minimization.  相似文献   

2.
Information fit indexes such as Akaike Information Criterion, Consistent Akaike Information Criterion, Bayesian Information Criterion, and the expected cross validation index can be valuable in assessing the relative fit of structural equation models that differ regarding restrictiveness. In cases in which models without mean restrictions (i.e., saturated mean structure) are compared to models with restricted (i.e., modeled) means, one should take account of the presence of means, even if the model is saturated with respect to the means. The failure to do this can result in an incorrect rank order of models in terms of the information fit indexes. We demonstrate this point by an analysis of measurement invariance in a multigroup confirmatory factor model.  相似文献   

3.
Smoothing is designed to yield smoother equating results that can reduce random equating error without introducing very much systematic error. The main objective of this study is to propose a new statistic and to compare its performance to the performance of the Akaike information criterion and likelihood ratio chi-square difference statistics in selecting the smoothing parameter for polynomial loglinear equating under the random groups design. These model selection statistics were compared for four sample sizes (500, 1,000, 2,000, and 3,000) and eight simulated equating conditions, including both conditions where equating is not needed and conditions where equating is needed. The results suggest that all model selection statistics tend to improve the equating accuracy by reducing the total equating error. The new statistic tended to have less overall error than the other two methods.  相似文献   

4.
This research focuses on the problem of model selection between the latent change score (LCS) model and the autoregressive cross-lagged (ARCL) model when the goal is to infer the longitudinal relationship between variables. We conducted a large-scale simulation study to (a) investigate the conditions under which these models return statistically (and substantively) different results concerning the presence of bivariate longitudinal relationships, and (b) ascertain the relative performance of an array of model selection procedures when such different results arise. The simulation results show that the primary sources of differences in parameter estimates across models are model parameters related to the slope factor scores in the LCS model (specifically, the correlation between the intercept factor and the slope factor scores) as well as the size of the data (specifically, the number of time points and sample size). Among several model selection procedures, correct selection rates were higher when using model fit indexes (i.e., comparative fit index, root mean square error of approximation) than when using a likelihood ratio test or any of several information criteria (i.e., Akaike’s information criterion, Bayesian information criterion, consistent AIC, and sample-size-adjusted BIC).  相似文献   

5.
In this article, linear item response theory (IRT) observed‐score equating is compared under a generalized kernel equating framework with Levine observed‐score equating for nonequivalent groups with anchor test design. Interestingly, these two equating methods are closely related despite being based on different methodologies. Specifically, when using data from IRT models, linear IRT observed‐score equating is virtually identical to Levine observed‐score equating. This leads to the conclusion that poststratification equating based on true anchor scores can be viewed as the curvilinear Levine observed‐score equating.  相似文献   

6.
The goal of this study was the development of a procedure to predict the equating error associated with the long-term equating method of Tate (2003) for mixed-format tests. An expression for the determination of the error of an equating based on multiple links using the error for the component links was derived and illustrated with simulated data. Expressions relating the equating error for single equating links to relevant factors like the equating design and the history of the examinee population ability distribution were determined based on computer simulation. Use of the resulting procedure for the selection of a long-term equating design was illustrated.  相似文献   

7.
Mixture modeling is a widely applied data analysis technique used to identify unobserved heterogeneity in a population. Despite mixture models' usefulness in practice, one unresolved issue in the application of mixture models is that there is not one commonly accepted statistical indicator for deciding on the number of classes in a study population. This article presents the results of a simulation study that examines the performance of likelihood-based tests and the traditionally used Information Criterion (ICs) used for determining the number of classes in mixture modeling. We look at the performance of these tests and indexes for 3 types of mixture models: latent class analysis (LCA), a factor mixture model (FMA), and a growth mixture models (GMM). We evaluate the ability of the tests and indexes to correctly identify the number of classes at three different sample sizes (n = 200, 500, 1,000). Whereas the Bayesian Information Criterion performed the best of the ICs, the bootstrap likelihood ratio test proved to be a very consistent indicator of classes across all of the models considered.  相似文献   

8.
Standard procedures for equating tests, including those based on item response theory (IRT), require item responses from large numbers of examinees. Such data may not be forthcoming for reasons theoretical, political, or practical. Information about items' operating characteristics may be available from other sources, however, such as content and format specifications, expert opinion, or psychological theories about the skills and strategies required to solve them. This article shows how, in the IRT framework, collateral information about items can be exploited to augment or even replace examinee responses when linking or equating new tests to established scales. The procedures are illustrated with data from the Pre-Professional Skills Test (PPST).  相似文献   

9.
《Educational Assessment》2013,18(1):99-110
The purpose of this article is to describe some of the measurement issues encountered in the equating of performance assessments designed for use in making teacher certification decisions. As some teacher certification programs move from sole reliance on multiple-choice items to inclusion of complex performance tasks, difficult measurement issues related to equating may arise. A variety of analytic and judgmental strategies are described in this article that may provide solutions for addressing these equating issues. Analytic strategies are based on examinee data and involve the modification of existing equating procedures, such as linear and equipercentile methods, that have been used successfully in the past with test forms composed of multiple-choice items. Judgmental strategies for equating involve the use of expert judgments to determine the equivalence of scores obtained from alternate forms of an assessment instrument.  相似文献   

10.
Little research has examined factors influencing statistical power to detect the correct number of latent classes using latent profile analysis (LPA). This simulation study examined power related to interclass distance between latent classes given true number of classes, sample size, and number of indicators. Seven model selection methods were evaluated. None had adequate power to select the correct number of classes with a small (Cohen's d = .2) or medium (d = .5) degree of separation. With a very large degree of separation (d = 1.5), the Lo–Mendell–Rubin test (LMR), adjusted LMR, bootstrap likelihood ratio test, Bayesian Information Criterion (BIC), and sample-size-adjusted BIC were good at selecting the correct number of classes. However, with a large degree of separation (d = .8), power depended on number of indicators and sample size. Akaike's Information Criterion and entropy poorly selected the correct number of classes, regardless of degree of separation, number of indicators, or sample size.  相似文献   

11.
We investigate the current bandwidth selection methods in kernel equating and propose a method based on Silverman's rule of thumb for selecting the bandwidth parameters. In kernel equating, the bandwidth parameters have previously been obtained by minimizing a penalty function. This minimization process has been criticized by practitioners for being too complex and that it does not offer sufficient smoothing in certain cases. In addition, the bandwidth parameters have been treated as constants in the derivation of the standard error of equating even when they were selected by considering the observed data. Here, the bandwidth selection is simplified, and modified standard errors of equating (SEEs) that reflect the bandwidth selection method are derived. The method is illustrated with real data examples and simulated data.  相似文献   

12.
Educational measurement specialists in undertaking test equating in applied settings have been plagued by the absence of a logically or mathematically compelling rationale for their test equating efforts. Classical test theory and other test theories based on the assumption of identically distributed true scores are tautological in terms of test equating. The present study examined (by means of a Monte Carlo procedure) the effects of four parameters on the accuracy of test equating under a relaxed definition of test form equivalence. The four parameters studied were sample size, test form length, test form reliability, and the correlation between the true scores of the test forms to be equated. Significant interactions involving sample size and the other parameters indicated that smaller samples of observations yielded disproportionately larger errors in test equating for fixed values of the test form parameters. In terms of main effects, sample size emerged as most important in controlling equating error. Taken together, the results suggest that when test equating is carried out on larger samples of observations, errors of equating will tend to be relatively small even though the test forms are not strictly parallel. For arbitrarily small samples, however, errors of equating will tend to be larger regardless of how equivalent the test forms are.  相似文献   

13.
由2007年开始,香港中学会考中国语文科及英国语文科采用了水平参照模式(standards-referenced reporting)对考生的成绩进行等级评定。在分数处理过程中,采用了含结构参数的Rasch模型。本文介绍了该模型及其一些主要性质,导出了联合极大似然估计(Joint Maximum Likelihood Estimation)的求解方程,并报告了应用该模型于香港中学会考水平参照等级评定中的主要结果。  相似文献   

14.
This study investigated the performance of fit indexes in selecting a covariance structure for longitudinal data. Data were simulated to follow a compound symmetry, first-order autoregressive, first-order moving average, or random-coefficients covariance structure. We examined the ability of the likelihood ratio test (LRT), root mean square error of approximation (RMSEA), comparative fit index (CFI), and Tucker–Lewis Index (TLI) to reject misspecified models with varying degrees of misspecification. With a sample size of 20, RMSEA, CFI, and TLI are high in both Type I and Type II error rates, whereas LRT has a high Type II error rate. With a sample size of 100, these indexes generally have satisfactory performance, but CFI and TLI are affected by a confounding effect of their baseline model. Akaike's Information Criterion (AIC) and Bayesian Information Criterion (BIC) have high success rates in identifying the true model when sample size is 100. A comparison with the mixed model approach indicates that separately modeling the means and covariance structures in structural equation modeling dramatically improves the success rate of AIC and BIC.  相似文献   

15.
《教育实用测度》2013,26(3):245-254
A procedure for checking the score equivalence of nearly identical editions of a test is described. This procedure is used early in the score equating process to help determine whether it is necessary to conduct separate equating analyses (using a variety of equating methods) for the two nearly identical versions of the test. The procedure employs the standard error of equating and utilizes graphical representation of score conversion deviation from the identity function in standard error units. Two illustrations of the procedure involving Scholastic Aptitude Test (SAT) data are presented. Advice about what to do if statistical equivalence does not obtain is given in the discussion section. Alternative strategies for assessing score equivalence are also discussed.  相似文献   

16.
The Non-Equivalent-groups Anchor Test (NEAT) design has been in wide use since at least the early 1940s. It involves two populations of test takers, P and Q, and makes use of an anchor test to link them. Two linking methods used for NEAT designs are those (a) based on chain equating and (b) that use the anchor test to post-stratify the distributions of the two operational test scores to a common population (i.e., Tucker equating and frequency estimation). We show that, under different sets of assumptions, both methods are observed score equating methods and we give conditions under which the methods give identical results. In addition, we develop analogues of the Dorans and Holland (2000) RMSD measures of population invariance of equating methods for the NEAT design for both chain and post-stratification equating methods.  相似文献   

17.
Combinations of five methods of equating test forms and two methods of selecting samples of students for equating were compared for accuracy. The two sampling methods were representative sampling from the population and matching samples on the anchor test score. The equating methods were the Tucker, Levine equally reliable, chained equipercentile, frequency estimation, and item response theory (IRT) 3PL methods. The tests were the Verbal and Mathematical sections of the Scholastic Aptitude Test. The criteria for accuracy were measures of agreement with an equivalent-groups equating based on more than 115,000 students taking each form. Much of the inaccuracy in the equatings could be attributed to overall bias. The results for all equating methods in the matched samples were similar to those for the Tucker and frequency estimation methods in the representative samples; these equatings made too small an adjustment for the difference in the difficulty of the test forms. In the representative samples, the chained equipercentile method showed a much smaller bias. The IRT (3PL) and Levine methods tended to agree with each other and were inconsistent in the direction of their bias.  相似文献   

18.
测验等值研究综述   总被引:1,自引:0,他引:1  
本研究从研究历史、概念界定、数据收集设计、等值模型和等值方法、等值误差及不同等值方法的评价标准等五个方面对测验等值研究进行了文献综述,以期为今后等值研究的进一步开展提供理论基础。  相似文献   

19.
While agreeing with van der Linden (this issue) that test equating needs better theoretical underpinnings, my comments criticize several aspects of his article. His examples are, for the most part, worthless; he does not use well‐established terminology correctly; his view of 100 years of attempts to give a theoretical basis for equating is unreasonably dismissive; he exhibits no understanding of the role of the synthetic population for anchor test equating for the nonequivalent groups with anchor test design; he is obtuse regarding the condition of symmetry, requiring it of the estimand but not of the estimator; and his proposal for a foundational basis for all test equating, the “true equating transformation,” allows a different equating function for every examinee, which is way past what equating actually does or hopes to achieve. Most importantly, he appears to think that criticism of others is more important than improved insight that moves a field forward based on the work of many other theorists whose contributions have improved the practice of equating.  相似文献   

20.
Methods of test equating and scaling have been studied in the statistically advanced literature of educational and psychological measurement, and applied extensively where mass testing and selection procedures are required; for example, selection for tertiary study. Usually the procedures themselves require large data sets, and they are applied in order to ensure that students are not advantaged or disadvantaged because of the arbitrary features of the origin and units of measurement of the various tests. However, equating may be necessary where the data sets are relatively small. This paper provides an example of selection into a professional faculty within a university, following study from a range of subjects available in various faculties at the first-year level, and shows the need and a procedure for equating the grades in the different subjects. The procedure itself is based on a psychometric model studied extensively in the psychometric literature, and the purpose of this paper is to introduce the rationale for its application from first principles for a broader professional education audience and to illustrate its application with a practical example.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号