期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Detecting Local Item Dependence in Polytomous Adaptive Data

Jessica L. Mislevy André A. Rupp Jeffrey R. Harring 《Journal of Educational Measurement》2012,49(2):127-147

A rapidly expanding arena for item response theory (IRT) is in attitudinal and health‐outcomes survey applications, often with polytomous items. In particular, there is interest in computer adaptive testing (CAT). Meeting model assumptions is necessary to realize the benefits of IRT in this setting, however. Although initial investigations of local item dependence have been studied both for polytomous items in fixed‐form settings and for dichotomous items in CAT settings, there have been no publications applying local item dependence detection methodology to polytomous items in CAT despite its central importance to these applications. The current research uses a simulation study to investigate the extension of widely used pairwise statistics, Yen's Q₃ Statistic and Pearson's Statistic X², in this context. The simulation design and results are contextualized throughout with a real item bank of this type from the Patient‐Reported Outcomes Measurement Information System (PROMIS). 相似文献

2.

A Nonparametric Approach for Assessing Goodness-of-Fit of IRT Models in a Mixed Format Test

Tie Liang Craig S. Wells 《教育实用测度》2015,28(2):115-129

Investigating the fit of a parametric model plays a vital role in validating an item response theory (IRT) model. An area that has received little attention is the assessment of multiple IRT models used in a mixed-format test. The present study extends the nonparametric approach, proposed by Douglas and Cohen (2001), to assess model fit of three IRT models (three- and two-parameter logistic model, and generalized partial credit model) used in a mixed-format test. The statistical properties of the proposed fit statistic were examined and compared to S-X² and PARSCALE’s G². Overall, RISE (Root Integrated Square Error) outperformed the other two fit statistics under the studied conditions in that the Type I error rate was not inflated and the power was acceptable. A further advantage of the nonparametric approach is that it provides a convenient graphical inspection of the misfit. 相似文献

3.

An Assessment of the Nonparametric Approach for Evaluating the Fit of Item Response Models

Tie Liang Craig S. Wells Ronald K. Hambleton 《Journal of Educational Measurement》2014,51(1):1-17

As item response theory has been more widely applied, investigating the fit of a parametric model becomes an important part of the measurement process. There is a lack of promising solutions to the detection of model misfit in IRT. Douglas and Cohen introduced a general nonparametric approach, RISE (Root Integrated Squared Error), for detecting model misfit. The purposes of this study were to extend the use of RISE to more general and comprehensive applications by manipulating a variety of factors (e.g., test length, sample size, IRT models, ability distribution). The results from the simulation study demonstrated that RISE outperformed G² and S‐X² in that it controlled Type I error rates and provided adequate power under the studied conditions. In the empirical study, RISE detected reasonable numbers of misfitting items compared to G² and S‐X², and RISE gave a much clearer picture of the location and magnitude of misfit for each misfitting item. In addition, there was no practical consequence to classification before and after replacement of misfitting items detected by three fit statistics. 相似文献

4.

A Comparison of Item Fit Statistics for Mixed IRT Models

Kyong Hee Chon Won-Chan Lee Stephen B. Dunbar 《Journal of Educational Measurement》2010,47(3):318-338

In this study we examined procedures for assessing model-data fit of item response theory (IRT) models for mixed format data. The model fit indices used in this study include PARSCALE's G² , Orlando and Thissen's S − X² and S − G² , and Stone's χ²* and G²* . To investigate the relative performance of the fit statistics at the item level, we conducted two simulation studies: Type I error and power studies. We evaluated the performance of the item fit indices for various conditions of test length, sample size, and IRT models. Among the competing measures, the summed score-based indices S − X² and S − G² were found to be the sensible and efficient choice for assessing model fit for mixed format data. These indices performed well, particularly with short tests. The pseudo-observed score indices, χ²* and G²* , showed inflated Type I error rates in some simulation conditions. Consistent with the findings of current literature, the PARSCALE's G² index was rarely useful, although it provided reasonable results for long tests. 相似文献

5.

Assessing Fit of Unidimensional Graded Response Models Using Bayesian Methods

Xiaowen Zhu Clement A. Stone 《Journal of Educational Measurement》2011,48(1):81-97

The posterior predictive model checking method is a flexible Bayesian model‐checking tool and has recently been used to assess fit of dichotomous IRT models. This paper extended previous research to polytomous IRT models. A simulation study was conducted to explore the performance of posterior predictive model checking in evaluating different aspects of fit for unidimensional graded response models. A variety of discrepancy measures (test‐level, item‐level, and pair‐wise measures) that reflected different threats to applications of graded IRT models to performance assessments were considered. Results showed that posterior predictive model checking exhibited adequate power in detecting different aspects of misfit for graded IRT models when appropriate discrepancy measures were used. Pair‐wise measures were found more powerful in detecting violations of the unidimensionality and local independence assumptions. 相似文献

6.

An Empirical Examination of the IRT Information of Polytomously Scored Reading Items Under the Generalized Partial Credit Model

John R. Donoghue 《Journal of Educational Measurement》1994,31(4):295-311

Using Muraki's (1992) generalized partial credit IRT model, polytomous items (responses to which can be scored as ordered categories) from the 1991 field test of the NAEP Reading Assessment were calibrated simultaneously with multiple-choice and short open-ended items. Expected information of each type of item was computed. On average, four-category polytomous items yielded 2.1 to 3.1 times as much IRT information as dichotomous items. These results provide limited support for the ad hoc rule of weighting k-category polytomous items the same as k - 1 dichotomous items for computing total scores. Polytomous items provided the most information about examinees of moderately high proficiency; the information function peaked at 1.0 to 1.5, and the population distribution mean was 0. When scored dichotomously, information in polytomous items sharply decreased, but they still provided more expected information than did the other response formats. For reference, a derivation of the information function for the generalized partial credit model is included. 相似文献

7.

An NCME Instructional Module on Polytomous Item Response Theory Models

Randall David Penfield 《Educational Measurement》2014,33(1):36-48

A polytomous item is one for which the responses are scored according to three or more categories. Given the increasing use of polytomous items in assessment practices, item response theory (IRT) models specialized for polytomous items are becoming increasingly common. The purpose of this ITEMS module is to provide an accessible overview of polytomous IRT models. The module presents commonly encountered polytomous IRT models, describes their properties, and contrasts their defining principles and assumptions. After completing this module, the reader should have a sound understating of what a polytomous IRT model is, the manner in which the equations of the models are generated from the model's underlying step functions, how widely used polytomous IRT models differ with respect to their definitional properties, and how to interpret the parameters of polytomous IRT models. 相似文献

8.

Asymptotic Standard Errors of Observed‐Score Equating With Polytomous IRT Models

Bjrn Andersson 《Journal of Educational Measurement》2016,53(4):459-477

In observed‐score equipercentile equating, the goal is to make scores on two scales or tests measuring the same construct comparable by matching the percentiles of the respective score distributions. If the tests consist of different items with multiple categories for each item, a suitable model for the responses is a polytomous item response theory (IRT) model. The parameters from such a model can be utilized to derive the score probabilities for the tests and these score probabilities may then be used in observed‐score equating. In this study, the asymptotic standard errors of observed‐score equating using score probability vectors from polytomous IRT models are derived using the delta method. The results are applied to the equivalent groups design and the nonequivalent groups design with either chain equating or poststratification equating within the framework of kernel equating. The derivations are presented in a general form and specific formulas for the graded response model and the generalized partial credit model are provided. The asymptotic standard errors are accurate under several simulation conditions relating to sample size, distributional misspecification and, for the nonequivalent groups design, anchor test length. 相似文献

9.

Stepwise Analysis of Differential Item Functioning Based on Multiple-Group Partial Credit Model

Eiji Muraki 《Journal of Educational Measurement》1999,36(3):217-232

Bock, Muraki, and Pfeiffenberger (1988) proposed a dichotomous item response theory (IRT) model for the detection of differential item functioning (DIF), and they estimated the IRT parameters and the means and standard deviations of the multiple latent trait distributions. This IRT DIF detection method is extended to the partial credit model (Masters, 1982; Muraki, 1993) and presented as one of the multiple-group IRT models. Uniform and non-uniform DIF items and heterogeneous latent trait distributions were used to generate polytomous responses of multiple groups. The DIF method was applied to this simulated data using a stepwise procedure. The standardized DIF measures for slope and item location parameters successfully detected the non-uniform and uniform DIF items as well as recovered the means and standard deviations of the latent trait distributions.This stepwise DIF analysis based on the multiple-group partial credit model was then applied to the National Assessment of Educational Progress (NAEP) writing trend data. 相似文献

10.

A More Flexible Bayesian Multilevel Bifactor Item Response Theory Model

Ken A. Fujimoto 《Journal of Educational Measurement》2020,57(2):255-285

Multilevel bifactor item response theory (IRT) models are commonly used to account for features of the data that are related to the sampling and measurement processes used to gather those data. These models conventionally make assumptions about the portions of the data structure that represent these features. Unfortunately, when data violate these models' assumptions but these models are used anyway, incorrect conclusions about the cluster effects could be made and potentially relevant dimensions could go undetected. To address the limitations of these conventional models, a more flexible multilevel bifactor IRT model that does not make these assumptions is presented, and this model is based on the generalized partial credit model. Details of a simulation study demonstrating this model outperforming competing models and showing the consequences of using conventional multilevel bifactor IRT models to analyze data that violate these models' assumptions are reported. Additionally, the model's usefulness is illustrated through the analysis of the Program for International Student Assessment data related to interest in science. 相似文献

11.

Comparing the Fit of Item Response Theory and Factor Analysis Models

Alberto Maydeu-Olivares Li Cai Adolfo Hernández 《Structural equation modeling》2013,20(3):333-356

Linear factor analysis (FA) models can be reliably tested using test statistics based on residual covariances. We show that the same statistics can be used to reliably test the fit of item response theory (IRT) models for ordinal data (under some conditions). Hence, the fit of an FA model and of an IRT model to the same data set can now be compared. When applied to a binary data set, our experience suggests that IRT and FA models yield similar fits. However, when the data are polytomous ordinal, IRT models yield a better fit because they involve a higher number of parameters. But when fit is assessed using the root mean square error of approximation (RMSEA), similar fits are obtained again. We explain why. These test statistics have little power to distinguish between FA and IRT models; they are unable to detect that linear FA is misspecified when applied to ordinal data generated under an IRT model. 相似文献

12.

Modeling Partial Knowledge on Multiple‐Choice Items Using Elimination Testing

Qian Wu Tinne De Laet Rianne Janssen 《Journal of Educational Measurement》2019,56(2):391-414

Single‐best answers to multiple‐choice items are commonly dichotomized into correct and incorrect responses, and modeled using either a dichotomous item response theory (IRT) model or a polytomous one if differences among all response options are to be retained. The current study presents an alternative IRT‐based modeling approach to multiple‐choice items administered with the procedure of elimination testing, which asks test‐takers to eliminate all the response options they consider to be incorrect. The partial credit model is derived for the obtained responses. By extracting more information pertaining to test‐takers’ partial knowledge on the items, the proposed approach has the advantage of providing more accurate estimation of the latent ability. In addition, it may shed some light on the possible answering processes of test‐takers on the items. As an illustration, the proposed approach is applied to a classroom examination of an undergraduate course in engineering science. 相似文献

13.

IRT Model Misspecification and Measurement of Growth in Vertical Scaling

Daniel M. Bolt Sien Deng Sora Lee 《Journal of Educational Measurement》2014,51(2):141-162

Functional form misfit is frequently a concern in item response theory (IRT), although the practical implications of misfit are often difficult to evaluate. In this article, we illustrate how seemingly negligible amounts of functional form misfit, when systematic, can be associated with significant distortions of the score metric in vertical scaling contexts. Our analysis uses two‐ and three‐parameter versions of Samejima's logistic positive exponent model (LPE) as a data generating model. Consistent with prior work, we find LPEs generally provide a better comparative fit to real item response data than traditional IRT models (2PL, 3PL). Further, our simulation results illustrate how 2PL‐ or 3PL‐based vertical scaling in the presence of LPE‐induced misspecification leads to an artificial growth deceleration across grades, consistent with that commonly seen in vertical scaling studies. The results raise further concerns about the use of standard IRT models in measuring growth, even apart from the frequently cited concerns of construct shift/multidimensionality across grades. 相似文献

14.

A Comparison Between Linear IRT Observed‐Score Equating and Levine Observed‐Score Equating Under the Generalized Kernel Equating Framework

Haiwen Chen 《Journal of Educational Measurement》2012,49(3):269-284

In this article, linear item response theory (IRT) observed‐score equating is compared under a generalized kernel equating framework with Levine observed‐score equating for nonequivalent groups with anchor test design. Interestingly, these two equating methods are closely related despite being based on different methodologies. Specifically, when using data from IRT models, linear IRT observed‐score equating is virtually identical to Levine observed‐score equating. This leads to the conclusion that poststratification equating based on true anchor scores can be viewed as the curvilinear Levine observed‐score equating. 相似文献

15.

An NCME Instructional Module on Item‐Fit Statistics for Item Response Theory Models

下载免费PDF全文

Allison J. Ames Randall D. Penfield 《Educational Measurement》2015,34(3):39-48

Drawing valid inferences from item response theory (IRT) models is contingent upon a good fit of the data to the model. Violations of model‐data fit have numerous consequences, limiting the usefulness and applicability of the model. This instructional module provides an overview of methods used for evaluating the fit of IRT models. Upon completing this module, the reader will have an understanding of traditional and Bayesian approaches for evaluating model‐data fit of IRT models, the relative advantages of each approach, and the software available to implement each method. 相似文献

16.

Logistic Regression and Its Use in Detecting Differential Item Functioning in Polytomous Items

Ann W. French Timothy R. Miller 《Journal of Educational Measurement》1996,33(3):315-332

A computer simulation study was conducted to determine the feasibility of using logistic regression procedures to detect differential item functioning (DIF) in polytomous items. One item in a simulated test of 25 items contained DIF; parameters' for that item were varied to create three conditions of nonuniform DIF and one of uniform DIF. Item scores were generated using a generalized partial credit model, and the data were recoded into multiple dichotomies in order to use logistic regression procedures. Results indicate that logistic regression is powerful in detecting most forms of DIF; however, it required large amounts of data manipulation, and interpretation of the results was sometimes difficult. Some logistic regression procedures may be useful in the post hoc analysis of DlF for polytomous items. 相似文献

17.

Performance of the generalized S-X² item fit index for the graded response model

Taehoon Kang Troy T. Chen 《Asia Pacific Education Review》2011,12(1):89-96

The utility of Orlando and Thissen’s (2000, 2003) S-X² fit index was extended to the model-fit analysis of the graded response model (GRM). The performance of a modified S-X² in assessing item-fit of the GRM was investigated in light of empirical Type I error rates and power with a simulation study having various conditions typically encountered in applied testing situations. The results show that the Type I error rates were controlled adequately around the nominal alpha by S-X². The power of the S-X² statistic was much lower when the source of misfit was multidimensionality than when it was due to discrepancy from the true GRM curves. Once the data size increased sufficiently, however, appropriate power was obtained regardless of the source of the item-misfit. In summary, the generalized S-X² appears to be a promising index for investigating item fit for polytomous items in educational and psychological assessments. 相似文献

18.

Asymptotic Standard Errors for Item Response Theory True Score Equating of Polytomous Items

下载免费PDF全文

Cheow Cher Wong 《Journal of Educational Measurement》2015,52(1):106-120

Building on previous works by Lord and Ogasawara for dichotomous items, this article proposes an approach to derive the asymptotic standard errors of item response theory true score equating involving polytomous items, for equivalent and nonequivalent groups of examinees. This analytical approach could be used in place of empirical methods like the bootstrap method, to obtain standard errors of equated scores. Formulas are introduced to obtain the derivatives for computing the asymptotic standard errors. The approach was validated using mean‐mean, mean‐sigma, random‐groups, or concurrent calibration equating of simulated samples, for tests modeled using the generalized partial credit model or the graded response model. 相似文献

19.

An Instructional Module on Mokken Scale Analysis

下载免费PDF全文

Stefanie A. Wind 《Educational Measurement》2017,36(2):50-66

Mokken scale analysis (MSA) is a probabilistic‐nonparametric approach to item response theory (IRT) that can be used to evaluate fundamental measurement properties with less strict assumptions than parametric IRT models. This instructional module provides an introduction to MSA as a probabilistic‐nonparametric framework in which to explore measurement quality, with an emphasis on its application in the context of educational assessment. The module describes both dichotomous and polytomous formulations of the MSA model. Examples of the application of MSA to educational assessment are provided using data from a multiple‐choice physical science assessment and a rater‐mediated writing assessment. 相似文献

20.

三参数等级反应模型及其信息函数的应用

陈青丁树良《考试研究》2009,(2):77-84

项目反应理论（Item Response Theory,IRT）是现代教育心理测量领域中最有影响的一种测量理论,它的一个明确目标是扩展模型的种类以至于能够处理实际测试中任何形式的反应数据。在已有的各种模型研究中,对于多级评分项目,只考虑到项目区分度和难度。但在实际测验中,此类项目还可能存在猜测度。本研究基于Samejima等级反应模型,将项目猜测度融合到多级评分模型中,提出了三参数等级反应模型（Three-parameter Graded Response Model,3PL-GRM）。由于忽略多级反应项目的猜测度会使得该项目的信息量虚假升高,本研究还进一步将3PL—GRM的信息函数应用到试卷质量分析中。相似文献