首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Classification consistency and accuracy are viewed as important indicators for evaluating the reliability and validity of classification results in cognitive diagnostic assessment (CDA). Pattern‐level classification consistency and accuracy indices were introduced by Cui, Gierl, and Chang. However, the indices at the attribute level have not yet been constructed. This study puts forward a simple approach to estimating the indices at both the attribute and the pattern level through one single test administration. Detailed elaboration is made on how the upper and lower bounds for the attribute‐level accuracy can be derived from the variance of error of the attribute mastery probability estimate. In addition, based on Cui's pattern‐level indices, an alternative approach to estimating the attribute‐level indices is also proposed. Comparative analysis of simulation results indicate that the new indices are very desirable for evaluating test‐retest consistency and correct classification rate.  相似文献   

2.
Using a complex simulation study we investigated parameter recovery, classification accuracy, and performance of two item‐fit statistics for correct and misspecified diagnostic classification models within a log‐linear modeling framework. The basic manipulated test design factors included the number of respondents (1,000 vs. 10,000), attributes (3 vs. 5), and items (25 vs. 50) as well as different attribute correlations (.50 vs. .80) and marginal attribute difficulties (equal vs. different). We investigated misspecifications of interaction effect parameters under correct Q‐matrix specification and two types of Q‐matrix misspecification. While the misspecification of interaction effects had little impact on classification accuracy, invalid Q‐matrix specifications led to notably decreased classification accuracy. Two proposed item‐fit indexes were more strongly sensitive to overspecification of Q‐matrix entries for items than to underspecification. Information‐based fit indexes AIC and BIC were sensitive to both over‐ and underspecification.  相似文献   

3.
We report a multidimensional test that examines middle grades teachers’ understanding of fraction arithmetic, especially multiplication and division. The test is based on four attributes identified through an analysis of the extensive mathematics education research literature on teachers’ and students’ reasoning in this content area. We administered the test to a national sample of 990 in‐service middle grades teachers and analyzed the item responses using the log‐linear cognitive diagnosis model. We report the diagnostic quality of the test at the item level, mastery classifications for teachers, and attribute relationships. Our results demonstrate that, when a test is grounded in research on cognition and is designed to be multidimensional from the onset, it is possible to use diagnostic classification models to detect distinct patterns of attribute mastery.  相似文献   

4.
In cognitive diagnostic models (CDMs), a set of fine-grained attributes is required to characterize complex problem solving and provide detailed diagnostic information about an examinee. However, it is challenging to ensure reliable estimation and control computational complexity when The test aims to identify the examinee's attribute profile in a large-scale map of attributes. To address this problem, this study proposes a cognitive diagnostic multistage testing by partitioning hierarchically structured attributes (CD-MST-PH) as a multistage testing for CDM. In CD-MST-PH, multiple testlets can be constructed based on separate attribute groups before testing occurs, which retains the advantages of multistage testing over fully adaptive testing or the on-the-fly approach. Moreover, testlets are offered sequentially and adaptively, thus improving test accuracy and efficiency. An item information measure is proposed to compute the discrimination power of an item for each attribute, and a module assembly method is presented to construct modules anchored at each separate attribute group. Several module selection indices for CD-MST-PH are also proposed by modifying the item selection indices used in cognitive diagnostic computerized adaptive testing. The results of simulation study show that CD-MST-PH can improve test accuracy and efficiency relative to the conventional test without adaptive stages.  相似文献   

5.
BP神经网络是目前应用最广泛的人工神经网络模型之一,在分类和识别上表现出良好的特性,因此被研究者用于认知诊断评估以对被试进行诊断分类。通过模拟研究,考查属性个数、属性层级关系、测验长度、题目质量、测试样本量5个因素对BP神经网络在认知诊断中分类准确性的影响。结果表明:1)基于BP神经网络的认知诊断分类准确率不依赖于测试样本量;2)题目质量和测验长度对BP神经网络的诊断准确率有显著的积极影响;3)属性个数对BP神经网络的分类准确率有消极影响;4)题目质量一定程度上会影响BP诊断方法在不同属性层级结构上的分类准确率。  相似文献   

6.
Compared to unidimensional item response models (IRMs), cognitive diagnostic models (CDMs) based on latent classes represent examinees' knowledge and item requirements using discrete structures. This study systematically examines the viability of retrofitting CDMs to IRM‐based data with a linear attribute structure. The study utilizes a procedure to make the IRM and CDM frameworks comparable and investigates how estimation accuracy is affected by test diagnosticity and the match between the true and fitted models. The study shows that comparable results can be obtained when highly diagnostic IRM data are retrofitted with CDM, and vice versa, retrofitting CDMs to IRM‐based data in some conditions can result in considerable examinee misclassification, and model fit indices provide limited indication of the accuracy of item parameter estimation and attribute classification.  相似文献   

7.
本文探究不同因素对确定性输入噪声"或"门模型(DINO)判准率的影响。模拟实验表明:DINO模型更适用于离散型属性层级结构;测验长度的合理增加有助于提高诊断的准确率;DINO模型对属性层级结构不敏感;认知属性个数的增加会降低DINO模型的诊断准确率,实际应用中建议认知属性个数控制在6个以下。  相似文献   

8.
When cut scores for classifications occur on the total score scale, popular methods for estimating classification accuracy (CA) and classification consistency (CC) require assumptions about a parametric form of the test scores or about a parametric response model, such as item response theory (IRT). This article develops an approach to estimate CA and CC nonparametrically by replacing the role of the parametric IRT model in Lee's classification indices with a modified version of Ramsay's kernel‐smoothed item response functions. The performance of the nonparametric CA and CC indices are tested in simulation studies in various conditions with different generating IRT models, test lengths, and ability distributions. The nonparametric approach to CA often outperforms Lee's method and Livingston and Lewis's method, showing robustness to nonnormality in the simulated ability. The nonparametric CC index performs similarly to Lee's method and outperforms Livingston and Lewis's method when the ability distributions are nonnormal.  相似文献   

9.
This article introduces procedures for the computation and asymptotic statistical inference for classification consistency and accuracy indices specifically designed for cognitive diagnostic assessments. The new classification indices can be used as important indicators of the reliability and validity of classification results produced by cognitive diagnostic assessments. For tests with known or previously calibrated item parameters, the sampling distributions of the two new indices are shown to be asymptotically normal. To illustrate the computations of the new indices, we apply them to the real diagnostic data from a fraction subtraction test (Tatsuoka). We also use simulated data to evaluate their performances and distributional properties.  相似文献   

10.
信度和效度是衡量一个测量工具质量的关键指标,教育认知诊断测验中的信度和效度研究近年来受到研究者的关注。诊断测验的信度系数基本上源自基于α系数的属性信度系数、经验属性信度系数、四分相关系数、模拟重测一致性和分类一致性指标;效度系数主要包括模拟判准率、分类准确性和理论构想效度等。教育认知诊断测验的信度和效度研究较新,仍存在着一定的不足且缺乏全面的比较研究,更缺少系统的评价体系。  相似文献   

11.
The development of cognitive diagnostic‐computerized adaptive testing (CD‐CAT) has provided a new perspective for gaining information about examinees' mastery on a set of cognitive attributes. This study proposes a new item selection method within the framework of dual‐objective CD‐CAT that simultaneously addresses examinees' attribute mastery status and overall test performance. The new procedure is based on the Jensen‐Shannon (JS) divergence, a symmetrized version of the Kullback‐Leibler divergence. We show that the JS divergence resolves the noncomparability problem of the dual information index and has close relationships with Shannon entropy, mutual information, and Fisher information. The performance of the JS divergence is evaluated in simulation studies in comparison with the methods available in the literature. Results suggest that the JS divergence achieves parallel or more precise recovery of latent trait variables compared to the existing methods and maintains practical advantages in computation and item pool usage.  相似文献   

12.
提出了一种基于多属性分类的KNN改进算法,可有效提高传统的欧几里德KNN算法和基于信息熵的KNN改进算法的分类准确度。首先,按照单个属性不同属性值的个数占整个属性包含样本的比例进行属性的分类,分为基于信息熵的KNN算法处理的离散属性和基于传统欧几里德KNN相似度处理的连续属性两类,然后分别对不同属性进行区别处理;其次,将两类不同处理后得到的结果按比例求和作为样本之间的距离;最后,选取与待测样本的距离最小的k个样本判断测试样本的决策属性类别。  相似文献   

13.
Cognitive diagnosis models (CDMs) have been developed to evaluate the mastery status of individuals with respect to a set of defined attributes or skills that are measured through testing. When individuals are repeatedly administered a cognitive diagnosis test, a new class of multilevel CDMs is required to assess the changes in their attributes and simultaneously estimate the model parameters from the different measurements. In this study, the most general CDM of the generalized deterministic input, noisy “and” gate (G‐DINA) model was extended to a multilevel higher order CDM by embedding a multilevel structure into higher order latent traits. A series of simulations based on diverse factors was conducted to assess the quality of the parameter estimation. The results demonstrate that the model parameters can be recovered fairly well and attribute mastery can be precisely estimated if the sample size is large and the test is sufficiently long. The range of the location parameters had opposing effects on the recovery of the item and person parameters. Ignoring the multilevel structure in the data by fitting a single‐level G‐DINA model decreased the attribute classification accuracy and the precision of latent trait estimation. The number of measurement occasions had a substantial impact on latent trait estimation. Satisfactory model and person parameter recoveries could be achieved even when assumptions of the measurement invariance of the model parameters over time were violated. A longitudinal basic ability assessment is outlined to demonstrate the application of the new models.  相似文献   

14.
The attribute hierarchy method (AHM) is a psychometric procedure for classifying examinees' test item responses into a set of structured attribute patterns associated with different components from a cognitive model of task performance. Results from an AHM analysis yield information on examinees' cognitive strengths and weaknesses. Hence, the AHM can be used for cognitive diagnostic assessment. The purpose of this study is to introduce and evaluate a new concept for assessing attribute reliability using the ratio of true score variance to observed score variance on items that probe specific cognitive attributes. This reliability procedure is evaluated and illustrated using both simulated data and student response data from a sample of algebra items taken from the March 2005 administration of the SAT. The reliability of diagnostic scores and the implications for practice are also discussed.  相似文献   

15.
This article describes an ongoing project to develop a formative, inferential reading comprehension assessment of causal story comprehension. It has three features to enhance classroom use: equated scale scores for progress monitoring within and across grades, a scale score to distinguish among low‐scoring students based on patterns of mistakes, and a reading efficiency index. Instead of two response types for each multiple‐choice item, correct and incorrect, each item has three response types: correct and two incorrect response types. Prior results on reliability, convergent and discriminant validity, and predictive utility of mistake subscores are briefly described. The three‐response‐type structure of items required rethinking the item response theory (IRT) modeling. IRT‐modeling results are presented, and implications for formative assessments and instructional use are discussed.  相似文献   

16.
Consider test data, a specified set of dichotomous skills measured by the test, and an IRT cognitive diagnosis model (ICDM). Statistical estimation of the data set using the ICDM can provide examinee estimates of mastery for these skills, referred to generally as attributes. With such detailed information about each examinee, future instruction can be tailored specifically for each student, often referred to as formative assessment. However, use of such cognitive diagnosis models to estimate skills in classrooms can require computationally intensive and complicated statistical estimation algorithms, which can diminish the breadth of applications of attribute level diagnosis. We explore the use of sum-scores (each attribute measured by a sum-score) combined with estimated model-based sum-score mastery/nonmastery cutoffs as an easy-to-use and intuitive method to estimate attribute mastery in classrooms and other settings where simple skills diagnostic approaches are desirable. Using a simulation study of skills diagnosis test settings and assuming a test consisting of a model-based calibrated set of items, correct classification rates (CCRs) are compared among four model-based approaches for estimating attribute mastery, namely using full model-based estimation and three different methods of computing sum-scores (simple sum-scores, complex sum-scores, and weighted complex sum-scores) combined with model-based mastery sum-score cutoffs. In summary, the results suggest that model-based sum-scores and mastery cutoffs can be used to estimate examinee attribute mastery with only moderate reductions in CCRs in comparison with the full model-based estimation approach. Certain topics are mentioned that are currently being investigated, especially applications in classroom and textbook settings.  相似文献   

17.
In this article, procedures are described for estimating single-administration classification consistency and accuracy indices for complex assessments using item response theory (IRT). This IRT approach was applied to real test data comprising dichotomous and polytomous items. Several different IRT model combinations were considered. Comparisons were also made between the IRT approach and two non-IRT approaches including the Livingston-Lewis and compound multinomial procedures. Results for various IRT model combinations were not substantially different. The estimated classification consistency and accuracy indices for the non-IRT procedures were almost always lower than those for the IRT procedures.  相似文献   

18.
The purpose of this study is to apply the attribute hierarchy method (AHM) to a subset of SAT critical reading items and illustrate how the method can be used to promote cognitive diagnostic inferences. The AHM is a psychometric procedure for classifying examinees’ test item responses into a set of attribute mastery patterns associated with different components from a cognitive model. The study was conducted in two steps. In step 1, three cognitive models were developed by reviewing selected literature in reading comprehension as well as research related to SAT Critical Reading. Then, the cognitive models were validated by having a sample of students think aloud as they solved each item. In step 2, psychometric analyses were conducted on the SAT critical reading cognitive models by evaluating the model‐data fit between the expected and observed response patterns produced from two random samples of 2,000 examinees who wrote the items. The model that provided best data‐model fit was then used to calculate attribute probabilities for 15 examinees to illustrate our diagnostic testing procedure.  相似文献   

19.
A key consideration when giving any computerized adaptive test (CAT) is how much adaptation is present when the test is used in practice. This study introduces a new framework to measure the amount of adaptation of Rasch‐based CATs based on looking at the differences between the selected item locations (Rasch item difficulty parameters) of the administered items and target item locations determined from provisional ability estimates at the start of each item. Several new indices based on this framework are introduced and compared to previously suggested measures of adaptation using simulated and real test data. Results from the simulation indicate that some previously suggested indices are not as sensitive to changes in item pool size and the use of constraints as the new indices and may not work as well under different item selection rules. The simulation study and real data example also illustrate the utility of using the new indices to measure adaptation at both a group and individual level. Discussion is provided on how one may use several of the indices to measure adaptation of Rasch‐based CATs in practice.  相似文献   

20.
The initial years as an early career academic (ECA) are challenging times as those new to the academy attempt to balance the three aspects of their role: teaching, research and service, while also coming to terms with both overt and hidden expectations. Formal mentoring arrangements for ECAs are threatened by competing demands on time. Additionally, they may not fully support the needs of ECAs as they can be more closely aligned to university needs than those of the ECA. The purpose of this paper is to open conversations about ECAs finding ways to develop agency. We use a reflective inquiry approach to identify and respond to the ideological and hegemonic influences on the experiences of ECAs. We also promote self-sustaining peer support and informal mentoring from more senior staff as complementary forms of professional learning.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号