首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
计算机化自适应测验(CAT)在理论与实践中得到广泛应用。目前许多CAT研究可以归纳为两种研究范式:实测作答的CAT研究范式和测验作答数据模拟的CAT研究范式。CAT模拟研究方法的步骤有模型选择、题库模拟、测试起点、选题策略、测验终止策略等。CAT模拟研究的主要趋势有:选题策略、终止策略仍然是CAT研究的重点;CAT模拟研究的设计内容更适合实际测验情况;CAT研究设计采取多因素设计;模拟结果多方面综合评价等。  相似文献   

2.
国外计算机自适应性测验选题策略的研究   总被引:2,自引:0,他引:2  
1传统选题策略的局限性 选题策略是计算机自适应性测验(Computerized Adaptive Testing,CAT)一个非常重要的环节.它直接影响到测验的效率问题。其中比较常用的选题策略主要有两种,一种是信息函数最大化策略。另一种是加权离差模型(Weighted Deviation Model,WDM)。  相似文献   

3.
在计算机化自适应测验(CAT)中,a分层法(a-STR)是较为独特且运用较广的一种选题策略,它可以有效控制题目曝光率以提高测验的安全性。动态a分层法(DAS)是针对a-STR分层数固定不变,需要人为在考试前确定且没有失效时间等一些不足提出的一种选题策略,但DAS本身在测验效率和曝光率控制上的表现并不优秀,对此,在0-1评分的CAT中,通过引入曝光因子(ecf)和最大信息量分层策略(MIS),提出一种复合型选题策略,以期对DAS进行改进。计算机模拟结果显示,新的选题策略在测验效率和曝光率控制方面均优于DAS,达到了研究的目的。  相似文献   

4.
计算机自适应测验中Rasch模型稳健性的模拟研究   总被引:1,自引:0,他引:1  
本研究采用模拟数据的方法,在计算机自适应测验(Computer Adaptive Test,简称CAT)中分别采用Rasch及Birnbaum两种模型估计能力,通过比较两者的误差均方根(Root Mean Square Error,简称RMSE)、平均差异(Average Deviation,简称AD)及能力相关,对Rasch模型在CAT中的稳健性进行了研究。结果发现Rasch模型在区分度不等的条件下仍然能较准确地估计被试的能力水平,具有很强的稳健性。  相似文献   

5.
本研究应用Caojing等人的Bayesian IRT Guessing系列模型,分析初中二年级学生在汉语词汇测验中的猜测行为,使用DIC3指标评价模型的拟合程度,并将参数估计结果与双参数Logistic模型进行了比较。研究发现:(1)猜测模型的拟合度优于双参数Logistic模型;(2)初中二年级测验数据最适合临界猜测模型(IRT-TG),约有3.5%的学生存在TG型猜测行为;(3)猜测者的存在会明显影响本身的能力估计与项目难度估计,但是对非猜测者的能力及区分度参数估计影响不大。  相似文献   

6.
该文介绍并比较了计算机化自适应测验(computerized adaptive testing,CAT)环境中的MLE、WLE、MAP、EAP等几种常用能力估计方法的发展演变以及各自的原理与特性,并对这些能力估计方法的发展脉络及其特性做了简要总结与评价,最后展望了未来CAT中能力估计的发展趋势。  相似文献   

7.
计算机自适应测验(Computerized Adaptive Testing,简称CAT)是用项目反应理论建立题库,并由计算机根据被试能力水平自动选择测题,最终对被试能力作出估计的一种新型测验,其目的在于通过被试正确回答题目难度的高低来评价其能力。  相似文献   

8.
语言类篇章测验中经常出现题组题,由于可能违背局部独立性假设,使用传统项目反应理论会导致一系列误差。本文在讨论三个改进模型Polytomous模型、题组模型和双因子模型的基础上,分别使用题组模型和独立模型对汉语能力测试的题目进行检验和分析。结果发现:汉语能力测试中的题组题总体依存度不高;题组模型适合于汉语能力测试的篇章听力和篇章阅读类的题目;独立模型和题组模型对题目难度参数的估计较为接近,对于区分度则有明显差异;两种模型对个人能力估计的一致性很高,但在能力估计的标准误上差别很大。  相似文献   

9.
计算机自适应测验(CAT)是建立在项目反应理论基础上,由计算机根据被试能力水平自动选择测题,从而对被试能力做出估计的新型测验。计算机自适应测验呈现给考生的试题是依据被试在前一个试题作答的表现好坏来决定的,其实现条件应囊括以下五个部分。  相似文献   

10.
计算机自适应考试是项目反应理论和计算机技术想结合的产物,本文依据项目反应理论,对自适应考试系统的中的能力估计、选题策略和终止规则等关键模块的设计进行了较为深入的探讨,并提出了基于J2EE系统实现的模型框架。  相似文献   

11.
A key consideration when giving any computerized adaptive test (CAT) is how much adaptation is present when the test is used in practice. This study introduces a new framework to measure the amount of adaptation of Rasch‐based CATs based on looking at the differences between the selected item locations (Rasch item difficulty parameters) of the administered items and target item locations determined from provisional ability estimates at the start of each item. Several new indices based on this framework are introduced and compared to previously suggested measures of adaptation using simulated and real test data. Results from the simulation indicate that some previously suggested indices are not as sensitive to changes in item pool size and the use of constraints as the new indices and may not work as well under different item selection rules. The simulation study and real data example also illustrate the utility of using the new indices to measure adaptation at both a group and individual level. Discussion is provided on how one may use several of the indices to measure adaptation of Rasch‐based CATs in practice.  相似文献   

12.
This paper presents the item and test information functions of the Rank two-parameter logistic models (Rank-2PLM) for items with two (pair) and three (triplet) statements in forced-choice questionnaires. The Rank-2PLM model for pairs is the MUPP-2PLM (Multi-Unidimensional Pairwise Preference) and, for triplets, is the Triplet-2PLM. Fisher's information and directional information are described, and the test information for Maximum Likelihood (ML), Maximum A Posterior (MAP), and Expected A Posterior (EAP) trait score estimates is distinguished. Expected item/test information indexes at various levels are proposed and plotted to provide diagnostic information on items and tests. The expected test information indexes for EAP scores may be difficult to compute due to a typical test's vast number of item response patterns. The relationships of item/test information with discrimination parameters of statements, standard error, and reliability estimates of trait score estimates are discussed and demonstrated using real data. Practical suggestions for checking the various expected item/test information indexes and plots are provided.  相似文献   

13.
This study focused on the effects of administration mode (computer-adaptive test [CAT] versus self-adaptive test [SAT]), item-by-item answer feedback (present versus absent), and test anxiety on results obtained from computerized vocabulary tests. Examinees were assigned at random to four testing conditions (CAT with feedback, CAT without feedback, SAT with feedback, SAT without feedback). Examinees completed the Test Anxiety Inventory (Spielberger, 1980) before taking their assigned computerized tests. Results showed that the CATs were more reliable and took less time to complete than the SATs. Administration time for both the CATs and SATs was shorter when feedback was provided than when it was not, and this difference was most pronounced for examinees at medium to high levels of test anxiety. These results replicate prior findings regarding the precision and administrative efficiency of CATs and SATs but point to new possible benefits of including answer feedback on such tests.  相似文献   

14.
Previous simulation studies of computerized adaptive tests (CATs) have revealed that the validity and precision of proficiency estimates can be maintained when review opportunities are limited to items within successive blocks. Our purpose in this study was to evaluate the effectiveness of CATs with such restricted review options in a live testing setting. Vocabulary CATs were compared under four conditions: (a) no item review allowed, (b) review allowed only within successive 5-item blocks, (c) review allowed only within successive lO-item blocks, and (d) review allowed only after answering all 40 items. Results revealed no trust-worthy differences among conditions in vocabulary proficiency estimates, measurement error, or testing time. Within each review condition, ability estimates and number correct scores increased slightly after review, more answers were changed from wrong to right than from right to wrong, most examinees who changed answers improved proficiency estimates by doing so, and nearly all examinees indicated that they had an adequate opportunity to review their previous answers. These results suggest that restricting review opportunities on CATs may provide a viable way to satisfy examinee desires, maintain validity and measurement precision, and keep testing time at acceptable levels.  相似文献   

15.
Computerized adaptive testing (CAT) is a testing procedure that adapts an examination to an examinee's ability by administering only items of appropriate difficulty for the examinee. In this study, the authors compared Lord's flexilevel testing procedure (flexilevel CAT) with an item response theory-based CAT using Bayesian estimation of ability (Bayesian CAT). Three flexilevel CATs, which differed in test length (36, 18, and 11 items), and three Bayesian CATs were simulated; the Bayesian CATs differed from one another in the standard error of estimate (SEE) used for terminating the test (0.25, 0.10, and 0.05). Results showed that the flexilevel 36- and 18-item CATs produced ability estimates that may be considered as accurate as those of the Bayesian CAT with SEE = 0.10 and comparable to the Bayesian CAT with SEE = 0.05. The authors discuss the implications for classroom testing and for item response theory-based CAT.  相似文献   

16.
Computer adaptive testing,big data and algorithmic approaches to education   总被引:1,自引:0,他引:1  
This article critically considers the promise of computer adaptive testing (CAT) and digital data to provide better and quicker data that will improve the quality, efficiency and effectiveness of schooling. In particular, it uses the case of the Australian NAPLAN test that will become an online, adaptive test from 2016. The article argues that CATs are specific examples of technological ensembles which are producing, and working through, new subjectivities. In particular, CATs leverage opportunities for big data and algorithmic approaches to education that are symptomatic of what Deleuze saw as the shift from disciplinary to control institutions and societies.  相似文献   

17.
Although extensive research exists on the use of curriculum‐based measures for progress monitoring, little is known about using computer adaptive tests (CATs) for progress‐monitoring purposes. The purpose of this study was to evaluate the impact of the frequency of data collection on individual and group growth estimates using a CAT. Data were available for 278 fourth‐ and fifth‐grade students. Growth estimates were obtained when five, three, and two data collections were available across 18 weeks. Data were analyzed by grade to evaluate any observed differences in growth. Further, root mean square error values were obtained to evaluate differences in individual student growth estimates across data collection schedules. Group‐level estimates of growth did not differ across data collection schedules; however, growth estimates for individual students varied across the different schedules of data collection. Implications for using CATs to monitor student progress at the individual or group level are discussed.  相似文献   

18.
The alignment between a test and the content domain it measures represents key evidence for the validation of test score inferences. Although procedures have been developed for evaluating the content alignment of linear tests, these procedures are not readily applicable to computerized adaptive tests (CATs), which require large item pools and do not use fixed test forms. This article describes the decisions made in the development of CATs that influence and might threaten content alignment. It outlines a process for evaluating alignment that is sensitive to these threats and gives an empirical example of the process.  相似文献   

19.
Recent simulation studies indicate that there are occasions when examinees can use judgments of relative item difficulty to obtain positively biased proficiency estimates on computerized adaptive tests (CATs) that permit item review and answer change. Our purpose in the study reported here was to evaluate examinees' success in using these strategies while taking CATs in a live testing setting. We taught examinees two item difficulty judgment strategies designed to increase proficiency estimates. Examinees who were taught each strategy and examinees who were taught neither strategy were assigned at random to complete vocabulary CATs under conditions in which review was allowed after completing all items and when review was allowed only within successive blocks of items. We found that proficiency estimate changes following review were significantly higher in the regular review conditions than in the strategy conditions. Failure to obtain systematically higher scores in the strategy conditions was due in large part to errors examinees made in judging the relative difficulty of CAT items.  相似文献   

20.
The purpose of this article is to present an analytical derivation for the mathematical form of an average between-test overlap index as a function of the item exposure index, for fixed-length computerized adaptive tests (CATs). This algebraic relationship is used to investigate the simultaneous control of item exposure at both the item and test levels. The results indicate that, in fixed-length CATs, control of the average between-test overlap is achieved via the mean and variance of the item exposure rates of the items that constitute the CAT item pool. The mean of the item exposure rates is easily manipulated. Control over the variance of the item exposure rates can be achieved via the maximum item exposure rate (rmax). Therefore, item exposure control methods which implement a specification of rmax (e.g., Sympson & Hetter, 1985) provide the most direct control at both the item and test levels.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号