期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

APPLICATION OF COMPUTERIZED ADAPTIVE TESTING TO EDUCATIONAL PROBLEMS 总被引：1，自引：0，他引：1

DAVID J. WEISS G. GAGE KINGSBURY 《Journal of Educational Measurement》1984,21(4):361-375

Three applications of computerized adaptive testing (CAT) to help solve problems encountered in educational settings are described and discussed. Each of these applications makes use of item response theory to select test questions from an item pool to estimate a student's achievement level and its precision. These estimates may then be used in conjunction with certain testing strategies to facilitate certain educational decisions. The three applications considered are (a) adaptive mastery testing for determining whether or not a student has mastered a particular content area, (b) adaptive grading for assigning grades to students, and (c) adaptive self-referenced testing for estimating change in a student's achievement level. Differences between currently used classroom procedures and these CAT procedures are discussed. For the adaptive mastery testing procedure, evidence from a series of studies comparing conventional and adaptive testing procedures is presented showing that the adaptive procedure results in more accurate mastery classifications than do conventional mastery tests, while using fewer test questions. 相似文献

2.

The Effect of Model Misspecification on Classification Decisons Made Using a Computerized Test

John C. Kalohn Judith A. Spray 《Journal of Educational Measurement》1999,36(1):47-59

Many computerized testing algorithms require the fitting of some item response theory (IRT) model to examinees' responses to facilitate item selection, the determination of test stopping rules, and classification decisions. Some IRT models are thought to be particularly useful for small volume certification programs that wish to make the transition to computerized adaptive testing (CAT). The one-parameter logistic model (1-PLM) is usually assumed to require a smaller sample size than the three-parameter logistic model (3-PLM) for item parameter calibrations. This study examined the effects of model misspecification on the precision of the decisions made using the sequential probability ratio test (SPRT). For this comparison, the 1-PLM was used to estimate item parameters, even though the items' characteristics were represented by a 3-PLM. Results demonstrated that the 1-PLM produced considerably more decision errors under simulation conditions similar to a real testing environment, compared to the true model and to a fixed-form standard reference set of items. 相似文献

3.

Evaluating Comparability in Computerized Adaptive Testing: Issues, Criteria and an Example

Tianyou Wang Michael J. Kolen 《Journal of Educational Measurement》2001,38(1):19-49

When a computerized adaptive testing (CAT) version of a test co-exists with its paper-and-pencil (P&P) version, it is important for scores from the CAT version to be comparable to scores from its P&P version. The CAT version may require multiple item pools for test security reasons, and CAT scores based on alternate pools also need to be comparable to each other. In this paper, we review research literature on CAT comparability issues and synthesize issues specific to these two settings. A framework of criteria for evaluating comparability was developed that contains the following three categories of criteria: validity criterion, psychometric property/reliability criterion, and statistical assumption/test administration condition criterion. Methods for evaluating comparability under these criteria as well as various algorithms for improving comparability are described and discussed. Focusing on the psychometric property/reliability criterion, an example using an item pool of ACT Assessment Mathematics items is provided to demonstrate a process for developing comparable CAT versions and for evaluating comparability. This example illustrates how simulations can be used to improve comparability at the early stages of the development of a CAT. The effects of different specifications of practical constraints, such as content balancing and item exposure rate control, and the effects of using alternate item pools are examined. One interesting finding from this study is that a large part of incomparability may be due to the change from number-correct score-based scoring to IRT ability estimation-based scoring. In addition, changes in components of a CAT, such as exposure rate control, content balancing, test length, and item pool size were found to result in different levels of comparability in test scores. 相似文献

4.

Dual‐Objective Item Selection Criteria in Cognitive Diagnostic Computerized Adaptive Testing

下载免费PDF全文

Hyeon‐Ah Kang Susu Zhang Hua‐Hua Chang 《Journal of Educational Measurement》2017,54(2):165-183

The development of cognitive diagnostic‐computerized adaptive testing (CD‐CAT) has provided a new perspective for gaining information about examinees' mastery on a set of cognitive attributes. This study proposes a new item selection method within the framework of dual‐objective CD‐CAT that simultaneously addresses examinees' attribute mastery status and overall test performance. The new procedure is based on the Jensen‐Shannon (JS) divergence, a symmetrized version of the Kullback‐Leibler divergence. We show that the JS divergence resolves the noncomparability problem of the dual information index and has close relationships with Shannon entropy, mutual information, and Fisher information. The performance of the JS divergence is evaluated in simulation studies in comparison with the methods available in the literature. Results suggest that the JS divergence achieves parallel or more precise recovery of latent trait variables compared to the existing methods and maintains practical advantages in computation and item pool usage. 相似文献

5.

计算机自适应测试中的连续概率比测试模式

樊军《考试研究》2012,(4):61-67

计算机自适应性测试中的连续概率比例试模式,是一种适用于普通教师利用网络技术在班级教学这样的小规模测试中评估学生语言学习效果的测试模式。其基本原理就是估计被试连续测试时答对与答错的概率,然后与“掌握”和“未掌握”两个相互对立的假设作比较而产生相应的决策。它一方面可以弥补基于IRT测试模式应用范围的不足,_另一方面可以更好地帮助教师完成对于学生语言能力的评估。相似文献

6.

计算机化自适应考试前景广阔

陆衣言《金陵科技学院学报(社会科学版)》1998,(4)

该文介绍了将认知科学中的知识空间理论、心理和教育测量中的项目反应理论和计算机技术相结合的计算机化自适应测验及其在提高测验的效率和精确度方面的作用。相似文献

7.

Some Practical Examples of Computer-Adaptive Sequential Testing

Richard M. Luecht Ronald J. Nungester 《Journal of Educational Measurement》1998,35(3):229-249

Computerized testing has created new challenges for the production and administration of test forms. Many testing organizations engaged in or considering computerized testing may find themselves changing from well-established procedures for handcrafiing a small number of paper-and-pencil test forms to procedures for mass producing many computerized test forms. This paper describes an integratedapproach to test development and administration called computer-adaptive sequential testing, or CAST. CAST is a structured approach to test construction which incorporates both adaptive testing methods with automated test assembly to allow test developers to maintain a greater degree of control over the production, quality assurance, and administration of different types of computerized tests. CAST retains much of the efficiency of traditional computer adaptive testing (CAT) and can be modified for computer mastery testing (CMT) applications. The CAST framework is described in detail and several applications are demonstrated using a medical licensure example. 相似文献

8.

计算机化自适应测验(CAT)的发展和前景展望(续)

张华华程莹《考试研究》2005,(2)

三、CAT中对的估计(一)MLE(极大似然估计法)假设一个能力水平为θ的被试对n道项目X_1,X_2,…,X_n作答。θ的估计可以通过使(8)式所示的似然函数最大化的方式来得到。令(?)_n为此时所得的θ估计。显然(?)_n也是(9)式的极大似然估计。已知在一定的条件下,(?)_n符合渐进正态,其均值为θ,方差近似为I~(-1)_n((?)_n)。目前的CAT设计大多通过递归方式在被试回答一个新的项目之后得到最新的θ估计,并根据信息最大化法抽取下一个项目。相似文献

9.

Detecting Local Item Dependence in Polytomous Adaptive Data

Jessica L. Mislevy André A. Rupp Jeffrey R. Harring 《Journal of Educational Measurement》2012,49(2):127-147

A rapidly expanding arena for item response theory (IRT) is in attitudinal and health‐outcomes survey applications, often with polytomous items. In particular, there is interest in computer adaptive testing (CAT). Meeting model assumptions is necessary to realize the benefits of IRT in this setting, however. Although initial investigations of local item dependence have been studied both for polytomous items in fixed‐form settings and for dichotomous items in CAT settings, there have been no publications applying local item dependence detection methodology to polytomous items in CAT despite its central importance to these applications. The current research uses a simulation study to investigate the extension of widely used pairwise statistics, Yen's Q₃ Statistic and Pearson's Statistic X², in this context. The simulation design and results are contextualized throughout with a real item bank of this type from the Patient‐Reported Outcomes Measurement Information System (PROMIS). 相似文献

10.

Measuring Mastery Across Grades: An Application to Spelling Ability

Daniël Van Nijlen Rianne Janssen 《教育实用测度》2013,26(4):367-387

The distinction between quantitative and qualitative differences in mastery is essential when monitoring student progress and is crucial for instructional interventions to deal with learning difficulties. Mixture item response theory (IRT) models can provide a convenient way to make the distinction between quantitative and qualitative differences in mastery. The use of latent groups, rather than focusing on manifest groupings like gender or grade, in these models is very informative to give a substantive interpretation to the qualitative differences. In the current study, mixture IRT modeling is applied to the mastery of two crucial rules in vowel duration spelling in Dutch by pupils in the four final grades of primary school. Results indicate that differences in mastery of the spelling rules are not strictly quantitative. Three latent groups of pupils can be distinguished that show qualitative differences in the mastery of one of the crucial spelling rules involved. 相似文献

11.

基于IRT理论的CAT系统设计

肖艳群《扬州职业大学学报》2004,8(4):36-38

随着多媒体计算机及网络技术的发展，一种将计算机技术与项目反应理论(IRT)相结合的计算机适应性测试(CAT)技术已引起人们的重视。本介绍了IRT的基本理论，并在此基础上研究了CAT系统的实现模型和利用JSP实现CAT系统的关键技术。相似文献

12.

AN EVALUATION MODEL FOR MASTERY TESTING1

JOHN A. EMRICK 《Journal of Educational Measurement》1971,8(4):321-326

Noting the desirability of the current shift toward mastery testing and criterion-referenced test procedures, an evaluation model is presented which should be useful and practical for such purposes. This model is based on the assumptions that the learning of fundamental skills can be considered all or none, that each item response on a single skill test represents an unbiased sample of the examinee's true mastery status, that measurement error occurring on the test (as estimated from the average interitem correlation) can be of only one type (α or β) for each examinee, and that through practical and theoretical considerations of evaluation error costs and item error characteristics, an optimal mastery criterion can be calculated. Each of these assumptions is discussed and the resultant mastery criteria algorithm is described along with an example from the IPI math program. 相似文献

13.

Generic questioning strategies for linking teaching and testing

Thomas M. Haladyna 《Educational technology research and development : ETR & D》1991,39(1):73-81

Modern instructional theory and research suggest that the content of instruction should be closely linked with testing. The content of an instructional program should not focus solely on memorization of facts but should also include higher level thinking. Three uses of tests within any instructional program are: (1) practice on objectives, (2) feedback about mastery of those objectives, and (3) summative evaluation. The context-dependent item set is proposed as a useful tool for measuring many higher level objectives. A generic method for developing context-dependent test item sets is proposed, and several examples are provided. The procedure is useful for developing a larger number of test items that can be used for any of the three uses of tests. The procedure also seems to apply to a wide variety of subject matter. 相似文献

14.

Considering the Use of General and Modified Assessment Items in Computerized Adaptive Testing

Adam E. Wyse Anthony D. Albano 《教育实用测度》2015,28(2):156-167

This article used several data sets from a large-scale state testing program to examine the feasibility of combining general and modified assessment items in computerized adaptive testing (CAT) for different groups of students. Results suggested that several of the assumptions made when employing this type of mixed-item CAT may not be met for students with disabilities that have typically taken alternate assessments based on modified achievement standards (AA-MAS). A simulation study indicated that the abilities of AA-MAS students can be underestimated or overestimated by the mixed-item CAT, depending on students’ location on the underlying ability scale. These findings held across grade levels and test lengths. The mixed-item CAT appeared to function well for non-AA-MAS students. 相似文献

15.

A Comparison of Self-Adapted and Computerized Adaptive Tests

Steven L. Wise Barbara S. Plake Phillip L. Johnson Linda L. Roos 《Journal of Educational Measurement》1992,29(4):329-339

According to item response theory (IRT), examinee ability estimation is independent of the particular set of test items administered from a calibrated pool. Although the most popular application of this feature of IRT is computerized adaptive (CA) testing, a recently proposed alternative is self-adapted (SA) testing, in which examinees choose the difficulty level of each of their test items. This study compared examinee performance under SA and CA tests, finding that examinees taking the SA test (a) obtained significantly higher ability scores and (b) reported significantly lower posttest state anxiety. The results of this study suggest that SA testing is a desirable format for computer-based testing. 相似文献

16.

Hybrid Computerized Adaptive Testing: From Group Sequential Design to Fully Sequential Design

Shiyu Wang Haiyan Lin Hua‐Hua Chang Jeff Douglas 《Journal of Educational Measurement》2016,53(1):45-62

Computerized adaptive testing (CAT) and multistage testing (MST) have become two of the most popular modes in large‐scale computer‐based sequential testing. Though most designs of CAT and MST exhibit strength and weakness in recent large‐scale implementations, there is no simple answer to the question of which design is better because different modes may fit different practical situations. This article proposes a hybrid adaptive framework to combine both CAT and MST, inspired by an analysis of the history of CAT and MST. The proposed procedure is a design which transitions from a group sequential design to a fully sequential design. This allows for the robustness of MST in early stages, but also shares the advantages of CAT in later stages with fine tuning of the ability estimator once its neighborhood has been identified. Simulation results showed that hybrid designs following our proposed principles provided comparable or even better estimation accuracy and efficiency than standard CAT and MST designs, especially for examinees at the two ends of the ability range. 相似文献

17.

基于IRT的大学英语词汇在线自适应测试系统的设计 总被引：1，自引：1，他引：0

赵传海吴敏叶艳《现代教育技术》2008,18(12):87-90

如何科学有效地测量学习者的词汇量,以及测量其对词汇的掌握程度是当前语言研究者十分关注的问题。文章根据词汇的广度、深度之间的相关性,提出了在广度测试的基础上进行深度测试的思想,并将项目反应理论的测试方法、设计思想,应用到实际测试系统中,最终设计实现了基于项目反应理论的大学英语四、六级在线自适应单词测试系统。相似文献

18.

On-demand testing and maintaining standards for general qualifications in the UK using item response theory: possibilities and challenges

Qingping He 《Educational research; a review for teachers and all concerned with progress in education》2013,55(1):89-112

Background:?Although on-demand testing is being increasingly used in many areas of assessment, it has not been adopted in high stakes examinations like the General Certificate of Secondary Education (GCSE) and General Certificate of Education Advanced level (GCE A level) offered by awarding organisations (AOs) in the UK. One of the major issues with on-demand testing is that some of the methods used for maintaining the comparability of standards over time in conventional testing are no longer available and the development of new methods is required.

Purpose:?This paper proposes an item response theory (IRT) framework for implementing on-demand testing and maintaining the comparability of standards over time for general qualifications, including GCSEs and GCE A levels, in the UK and discusses procedures for its practical implementation.

Sources of evidence:?Sources of evidence include literature from the fields of on-demand testing, the design of computer-based assessment, the development of IRT, and the application of IRT in educational measurement.

Main argument:?On-demand testing presents many advantages over conventional testing. In view of the nature of general qualifications, including the use of multiple components and multiple question types, the advances made in item response modelling over the past 30 years, and the availability of complex IRT analysis software systems, coupled with increasing IRT expertise in awarding organisations, IRT models could be used to implement on-demand testing in high stakes examinations in the UK. The proposed framework represents a coherent and complete approach to maintaining standards in on-demand testing. The procedures for implementing the framework discussed in the paper could be adapted by people to suit their own needs and circumstances.

Conclusions:?The use of IRT to implement on-demand testing could prove to be one of the viable approaches to maintaining standards over time or between test sessions for UK general qualifications. 相似文献

19.

RELIABILITY OF CRITERION-REFERENCED TESTS: A DECISION-THEORETIC FORMULATION

H. SWAMINATHAN RONALD K. HAMBLETON JAMES ALGINA 《Journal of Educational Measurement》1974,11(4):263-267

It has been suggested that the primary purpose for criterion-referenced testing in objective-based instructional programs is to classify examinees into mastery states or categories on the objectives included in the test. We have proposed that the reliability of the criterion-referenced test scores be defined in terms of the consistency of the decision-making process across repeated administrations of the test. Specifically, reliability is defined as a measure of agreement over and above that which can be expected by chance between the decisions made about examinee mastery states in repeated test administrations for each objective measured by the criterion-referenced test. 相似文献

20.

Detection of Test Collusion via Kullback–Leibler Divergence

Dmitry I. Belov 《Journal of Educational Measurement》2013,50(2):141-163

The development of statistical methods for detecting test collusion is a new research direction in the area of test security. Test collusion may be described as large‐scale sharing of test materials, including answers to test items. Current methods of detecting test collusion are based on statistics also used in answer‐copying detection. Therefore, in computerized adaptive testing (CAT) these methods lose power because the actual test varies across examinees. This article addresses that problem by introducing a new approach that works in two stages: in Stage 1, test centers with an unusual distribution of a person‐fit statistic are identified via Kullback–Leibler divergence; in Stage 2, examinees from identified test centers are analyzed further using the person‐fit statistic, where the critical value is computed without data from the identified test centers. The approach is extremely flexible. One can employ any existing person‐fit statistic. The approach can be applied to all major testing programs: paper‐and‐pencil testing (P&P), computer‐based testing (CBT), multiple‐stage testing (MST), and CAT. Also, the definition of test center is not limited by the geographic location (room, class, college) and can be extended to support various relations between examinees (from the same undergraduate college, from the same test‐prep center, from the same group at a social network). The suggested approach was found to be effective in CAT for detecting groups of examinees with item pre‐knowledge, meaning those with access (possibly unknown to us) to one or more subsets of items prior to the exam. 相似文献