首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 703 毫秒
1.
Most of the existing classification accuracy indices of attribute patterns lose effectiveness when the response data is absent in diagnostic testing. To handle this issue, this article proposes new indices to predict the correct classification rate of a diagnostic test before administering the test under the deterministic noise input “and” gate (DINA) model. The new indices include an item‐level expected classification accuracy (ECA) for attributes and a test‐level ECA for attributes and attribute patterns, and both of them are calculated based solely on the known item parameters and Q ‐matrix. Theoretical analysis showed that the item‐level ECA could be regarded as a measure of correct classification rates of attributes contributed by an item. This article also illustrates how to apply the item‐level ECA for attributes to estimate the correct classification rate of attributes patterns at the test level. Simulation results showed that two test‐level ECA indices, ECA_I_W (an index based on the independence assumption and the weighted sum of the item‐level ECAs) and ECA_C_M (an index based on Gaussian Copula function that incorporates the dependence structure of the events of attribute classification and the simple average of the item‐level ECAs), could make an accurate prediction for correct classification rates of attribute patterns.  相似文献   

2.
As with any psychometric models, the validity of inferences from cognitive diagnosis models (CDMs) determines the extent to which these models can be useful. For inferences from CDMs to be valid, it is crucial that the fit of the model to the data is ascertained. Based on a simulation study, this study investigated the sensitivity of various fit statistics for absolute or relative fit under different CDM settings. The investigation covered various types of model–data misfit that can occur with the misspecifications of the Q‐matrix, the CDM, or both. Six fit statistics were considered: –2 log likelihood (–2LL), Akaike's information criterion (AIC), Bayesian information criterion (BIC), and residuals based on the proportion correct of individual items (p), the correlations (r), and the log‐odds ratio of item pairs (l). An empirical example involving real data was used to illustrate how the different fit statistics can be employed in conjunction with each other to identify different types of misspecifications. With these statistics and the saturated model serving as the basis, relative and absolute fit evaluation can be integrated to detect misspecification efficiently.  相似文献   

3.
The information matrix can equivalently be determined via the expectation of the Hessian matrix or the expectation of the outer product of the score vector. The identity of these two matrices, however, is only valid in case of a correctly specified model. Therefore, differences between the two versions of the observed information matrix indicate model misfit. The equality of both matrices can be tested with the so‐called information matrix test as a general test of misspecification. This test can be adapted to item response models in order to evaluate the fit of single items and the fit of the whole scale. The performance of different versions of the test is compared in a simulation study with existing tests of model fit, among them the test of Orlando and Thissen, the score test of local independence due to Glas and Suarez‐Falcon, and the limited information approach of Maydeu‐Olivares and Joe. In general, the different versions of the information matrix test adhere to the nominal Type I error rate and have high power for detecting misspecified item characteristic curves. Additionally, some versions of the test can be used in order to detect violations of the local independence assumption.  相似文献   

4.
Most model fit analyses in cognitive diagnosis assume that a Q matrix is correct after it has been constructed, without verifying its appropriateness. Consequently, any model misfit attributable to the Q matrix cannot be addressed and remedied. To address this concern, this paper proposes an empirically based method of validating a Q matrix used in conjunction with the DINA model. The proposed method can be implemented with other considerations such as substantive information about the items, or expert knowledge about the domain, to produce a more integrative framework of Q‐matrix validation. The paper presents the theoretical foundation for the proposed method, develops an algorithm for its practical implementation, and provides real and simulated data applications to examine its viability. Relevant issues regarding the implementation of the method are discussed.  相似文献   

5.
The purpose of this study is to apply the attribute hierarchy method (AHM) to a subset of SAT critical reading items and illustrate how the method can be used to promote cognitive diagnostic inferences. The AHM is a psychometric procedure for classifying examinees’ test item responses into a set of attribute mastery patterns associated with different components from a cognitive model. The study was conducted in two steps. In step 1, three cognitive models were developed by reviewing selected literature in reading comprehension as well as research related to SAT Critical Reading. Then, the cognitive models were validated by having a sample of students think aloud as they solved each item. In step 2, psychometric analyses were conducted on the SAT critical reading cognitive models by evaluating the model‐data fit between the expected and observed response patterns produced from two random samples of 2,000 examinees who wrote the items. The model that provided best data‐model fit was then used to calculate attribute probabilities for 15 examinees to illustrate our diagnostic testing procedure.  相似文献   

6.
Compared to unidimensional item response models (IRMs), cognitive diagnostic models (CDMs) based on latent classes represent examinees' knowledge and item requirements using discrete structures. This study systematically examines the viability of retrofitting CDMs to IRM‐based data with a linear attribute structure. The study utilizes a procedure to make the IRM and CDM frameworks comparable and investigates how estimation accuracy is affected by test diagnosticity and the match between the true and fitted models. The study shows that comparable results can be obtained when highly diagnostic IRM data are retrofitted with CDM, and vice versa, retrofitting CDMs to IRM‐based data in some conditions can result in considerable examinee misclassification, and model fit indices provide limited indication of the accuracy of item parameter estimation and attribute classification.  相似文献   

7.
Classification consistency and accuracy are viewed as important indicators for evaluating the reliability and validity of classification results in cognitive diagnostic assessment (CDA). Pattern‐level classification consistency and accuracy indices were introduced by Cui, Gierl, and Chang. However, the indices at the attribute level have not yet been constructed. This study puts forward a simple approach to estimating the indices at both the attribute and the pattern level through one single test administration. Detailed elaboration is made on how the upper and lower bounds for the attribute‐level accuracy can be derived from the variance of error of the attribute mastery probability estimate. In addition, based on Cui's pattern‐level indices, an alternative approach to estimating the attribute‐level indices is also proposed. Comparative analysis of simulation results indicate that the new indices are very desirable for evaluating test‐retest consistency and correct classification rate.  相似文献   

8.
In some tests, examinees are required to choose a fixed number of items from a set of given items to answer. This practice creates a challenge to standard item response models, because more capable examinees may have an advantage by making wiser choices. In this study, we developed a new class of item response models to account for the choice effect of examinee‐selected items. The results of a series of simulation studies showed: (1) that the parameters of the new models were recovered well, (2) the parameter estimates were almost unbiased when the new models were fit to data that were simulated from standard item response models, (3) failing to consider the choice effect yielded shrunken parameter estimates for examinee‐selected items, and (4) even when the missingness mechanism in examinee‐selected items did not follow the item response functions specified in the new models, the new models still yielded a better fit than did standard item response models. An empirical example of a college entrance examination supported the use of the new models: in general, the higher the examinee's ability, the better his or her choice of items.  相似文献   

9.
近年来关于DINA模型的相关研究显示,样本量、先验分布、经验贝叶斯或完全贝叶斯估计方法、样本的代表性、项目功能差异和Q阵误指等,均可能是导致DINA项目参数估计发生偏差的原因。使用Monte Carlo模拟试验,对DINA项目参数(猜测参数和失误参数)的组合变化类型和偏差量进行考察,通过条件极大似然估计法估计知识状态,发现项目参数估计值与真值偏差不大时,对知识状态估计的精度影响不大;但是项目参数偏离真值较大时,尤其是在三种组合类型上,对属性掌握存在明显的高估或低估现象。研究结果对于诊断测验等值有一定的启示:若两个测验上锚题的项目参数出现了较大的偏差(0.1),则需要考虑等值的必要性。  相似文献   

10.
Performance assessments, scenario‐based tasks, and other groups of items carry a risk of violating the local item independence assumption made by unidimensional item response theory (IRT) models. Previous studies have identified negative impacts of ignoring such violations, most notably inflated reliability estimates. Still, the influence of this violation on examinee ability estimates has been comparatively neglected. It is known that such item dependencies cause low‐ability examinees to have their scores overestimated and high‐ability examinees' scores underestimated. However, the impact of these biases on examinee classification decisions has been little examined. In addition, because the influence of these dependencies varies along the underlying ability continuum, whether or not the location of the cut‐point is important in regard to correct classifications remains unanswered. This simulation study demonstrates that the strength of item dependencies and the location of an examination systems’ cut‐points both influence the accuracy (i.e., the sensitivity and specificity) of examinee classifications. Practical implications of these results are discussed in terms of false positive and false negative classifications of test takers.  相似文献   

11.
Orlando and Thissen's S‐X 2 item fit index has performed better than traditional item fit statistics such as Yen's Q1 and McKinley and Mill's G2 for dichotomous item response theory (IRT) models. This study extends the utility of S‐X 2 to polytomous IRT models, including the generalized partial credit model, partial credit model, and rating scale model. The performance of the generalized S‐X 2 in assessing item model fit was studied in terms of empirical Type I error rates and power and compared to G2. The results suggest that the generalized S‐X 2 is promising for polytomous items in educational and psychological testing programs.  相似文献   

12.
In equating, when common items are internal and scoring is conducted in terms of the number of correct items, some pairs of total scores (X) and common‐item scores (V) can never be observed in a bivariate distribution of X and V; these pairs are called structural zeros. This simulation study examines how equating results compare for different approaches to handling structural zeros. The study considers four approaches: the no‐smoothing, unique‐common, total‐common, and adjusted total‐common approaches. This study led to four main findings: (1) the total‐common approach generally had the worst results; (2) for relatively small effect sizes, the unique‐common approach generally had the smallest overall error; (3) for relatively large effect sizes, the adjusted total‐common approach generally had the smallest overall error; and, (4) if sole interest focuses on reducing bias only, the adjusted total‐common approach was generally preferable. These results suggest that, when common items are internal and log‐linear bivariate presmoothing is performed, structural zeros should be maintained, even if there is some loss in the moment preservation property.  相似文献   

13.
In automated test assembly (ATA), the methodology of mixed‐integer programming is used to select test items from an item bank to meet the specifications for a desired test form and optimize its measurement accuracy. The same methodology can be used to automate the formatting of the set of selected items into the actual test form. Three different cases are discussed: (i) computerized test forms in which the items are presented on a screen one at a time and only their optimal order has to be determined; (ii) paper forms in which the items need to be ordered and paginated and the typical goal is to minimize paper use; and (iii) published test forms with the same requirements but a more sophisticated layout (e.g., double‐column print). For each case, a menu of possible test‐form specifications is identified, and it is shown how they can be modeled as linear constraints using 0–1 decision variables. The methodology is demonstrated using two empirical examples.  相似文献   

14.
A Monte Carlo simulation study was conducted to investigate the effects on structural equation modeling (SEM) fit indexes of sample size, estimation method, and model specification. Based on a balanced experimental design, samples were generated from a prespecified population covariance matrix and fitted to structural equation models with different degrees of model misspecification. Ten SEM fit indexes were studied. Two primary conclusions were suggested: (a) some fit indexes appear to be noncomparable in terms of the information they provide about model fit for misspecified models and (b) estimation method strongly influenced almost all the fit indexes examined, especially for misspecified models. These 2 issues do not seem to have drawn enough attention from SEM practitioners. Future research should study not only different models vis‐à‐vis model complexity, but a wider range of model specification conditions, including correctly specified models and models specified incorrectly to varying degrees.  相似文献   

15.
This study evaluated the classification accuracy of a second grade oral reading fluency curriculum‐based measure (R‐CBM) in predicting third grade state test performance. It also compared the long‐term classification accuracy of local and publisher‐recommended R‐CBM cut scores. Participants were 266 students who were divided into a calibration sample (n = 170) and two cross‐validation samples (n = 46; n = 50), respectively. Using calibration sample data, local fall, winter, and spring R‐CBM cut scores for predicting students’ state test performance were developed using three methods: discriminant analysis (DA), logistic regression (LR), and receiver operating characteristic curve analysis (ROC). The classification accuracy of local and publisher‐recommended cut scores was evaluated across subsamples. Only DA and ROC produced cut scores that maintained adequate sensitivity (≥.70) across cohorts; however, LR and publisher‐recommended scores had higher levels of specificity and overall correct classification. Implications for developing local cut scores are discussed.  相似文献   

16.
The purpose of the current study was to examine the validity and diagnostic accuracy of the Intervention Selection Profile—Social Skills (ISP‐SS), a brief social skills assessment tool intended for use with students in need of Tier 2 intervention. Participants included 160 elementary and middle school students who had been identified through universal screening as at risk for behavioral concerns. Teacher participants ( n = 71) rated each of these students using both the ISP‐SS and the Social Skills Improvement System—Rating Scales (SSiS‐RS), with the latter measure serving as the criterion within validity and diagnostic accuracy analyses. Confirmatory factor analysis supported ISP‐SS structural validity, indicating ISP‐SS items broadly conformed to a single “Social Skills” factor. Follow‐up analyses suggested ISP‐SS broad scale scores demonstrated adequate internal consistency reliability, with hierarchical omega coefficient equal to 0.86. Correlational analyses supported the concurrent validity of ISP‐SS items, finding each ISP‐SS item to be moderately or highly related to its corresponding SSiS‐RS subscale. Finally, analyses indicated three of the seven ISP‐SS items that demonstrated sufficient diagnostic accuracy; however, findings suggest additional revisions are needed if the ISP‐SS is to be appropriate for use in schools. Implications for practice and future research are discussed.  相似文献   

17.
In previous research (Hu & Bentler, 1998, 1999), 2 conclusions were drawn: standardized root mean squared residual (SRMR) was the most sensitive to misspecified factor covariances, and a group of other fit indexes were most sensitive to misspecified factor loadings. Based on these findings, a 2-index strategy-that is, SRMR coupled with another index-was proposed in model fit assessment to detect potential misspecification in both the structural and measurement model parameters. Based on our reasoning and empirical work presented in this article, we conclude that SRMR is not necessarily most sensitive to misspecified factor covariances (structural model misspecification), the group of indexes (TLI, BL89, RNI, CFI, Gamma hat, Mc, or RMSEA) are not necessarily more sensitive to misspecified factor loadings (measurement model misspecification), and the rationale for the 2-index presentation strategy appears to have questionable validity.  相似文献   

18.
A Spanish translation of the Sixteen Personality Factors1 test was administered to 524 freshmen at Venezuela’s Central University. The responses were scored using the original scoring key provided by the editor of the test: (1) The results of the test were submitted to an item analysis program (FORTAP) . The results of the item analysis showed that only four factors (C, H, F, Q4) had acceptable reliability (Internal Consistency = +.50). (2) The acceptable factors (C, H, F, Q4) and items within these factors susceptible to improvement were analyzed from two points of views, verbal content and statistical indices (biserial correlations, X50 and Beta). (3) A Reciprocal Averages Program was used to explore the possibility of improving the factor reliability through changing the weighted values of the response alternatives.

These results indicate that the correct grammatical translation from English to Spanish of the Sixteen Factor Personality Test was not sufficient to obtain acceptable reliability indexes. Item analyses were useful in detecting faulty items.  相似文献   

19.
Using a sample of 348 middle school students, we gathered evidence regarding the internal consistency of scores, as well as the internal factor structure and convergent validity evidence for inferences from a self‐report questionnaire called the Self‐Regulation Strategy Inventory–Self Report. Confirmatory factor analysis revealed that the fit indexes for a hierarchical model (composite, three factors) and a single‐level, three‐factor model were highly similar but mixed. Respecification of the hierarchical model based on conceptual overlap of items led to substantial improvement in the overall fit of the model, as indicated by the root mean square error of approximation, chi‐square/df, and the comparative fit index. Correlational analyses also provided strong convergent validity evidence, as the three subscales exhibited statistically significant relations with four motivation beliefs (i.e., self‐efficacy, perceived instrumentality, task interest, perceived responsibility) and two distinct markers of regulation‐related behaviors (i.e., teacher ratings, office discipline referrals).  相似文献   

20.
This study examined the performance of 4 correlation-based fit indexes (marginal and conditional pseudo R 2s; average and conditional concordance correlations) in detecting misspecification in mean structures in growth curve models. Their performance was also compared to that of 4 traditional SEM fit indexes. We found that the marginal pseudo R 2 and average concordance correlation were able to detect misspecification in the marginal mean structure (average change trajectory). The conditional pseudo R 2 and concordance correlation could detect misspecification when it occurred in the conditional mean structure (individual change trajectory) or in both mean structures. Compared to the SEM fit indexes, the correlation-based fit indexes were more robust to sample size but were less robust to data properties such as magnitude of population mean and measurement error. Theoretical and practical implications of the results and directions for future research are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号