首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 984 毫秒
1.
计算机化自适应测验(CAT)模拟是CAT研究的主要方法之一。CAT模拟结果的评价分析内容主要包括三个方面:被试能力估计与被试能力分类分析、题库试题使用情况分析和CAT测验作答过程分析。CAT模拟结果的分析模式主要分为整体分析和细化分析两种模式。本研究从测验模拟返真性能、测验准确性、题库安全性、题库使用率、测验分类效率与准确性、多测验目标约束控制的实现程度等角度概述CAT模拟结果的各类评价指标。CAT模拟结果的评价角度和评价指标需要根据CAT研究目标和测验情境要求加以确定。  相似文献   

2.
本文选取聊城市初一、初二、高一、高二学生作为被试,以被试完成的作文作品为研究样本,运用概化理论的随机双侧面交叉设计,对作文评价指标、评分者数量的界定问题进行研究。研究表明,适当增加评分者或评价指标数量均能降低测验误差,提高测验信度;随着评分者或评价指标数量的逐渐增加,测验误差降低或测验信度提高的幅度将变得很小。该文为高考作文评价时确定较为合适的评价指标、评分者数量提供了科学依据  相似文献   

3.
在英语教学中,阅读教学是重中之重。学生通过有效的英语阅读,可以扩大知识面,增加词汇量,写作时能语言准确、内容丰富,交际时不会因为词汇贫乏而言不达意。提高英语阅读能力的重要性在中考试题中也有所体现。近年来阅读理解题在中考试题中的比重逐年增加,阅读理解部分的短文由两篇、三篇增加到四篇,题量也在逐渐增加,测验  相似文献   

4.
由于测验安全性、试卷组卷不当等问题,有些测验的题本相互之间不能或者没有设置锚题。对作答不同题本的被试进行分数比较时,需要用到测验等值技术。不同于有锚题测验能通过题本之间的锚题进行等值,无锚题情境下的测验需要借助于一些特殊方法进行等值。目前,对无锚题测验进行等值主要有三种方式,一种是通过测验中具体的题目,也就是构建相同的"锚题"来进行等值,如构造随机等组测验法和利用题目先验信息进行等值的方法;一种是通过构建相同被试组来进行等值,即构造随机等组样本法;还有一种是借助于测验题目所考查的认知属性来进行等值,一般是基于一种认知诊断模型——规则空间模型来进行操作。  相似文献   

5.
试题:将等物质的量的两种氧化物溶于100mL硫酸,而后逐滴加入1.00mol/L的NaOH溶液。当加入的NaOH的体积V1=50mL时,开始有沉淀析出,且沉淀量随NaOH加入量逐渐增加。当NaOH的体积V2=650mL时,淀淀量达最大值,继续滴加NaOH时,沉淀量逐渐减小。当V3≥750mL时,沉淀量不再改变。  相似文献   

6.
纵观近年来各地中考试题可以发现——命题者在设计单项选择题时不断增加交际语言的考查力度,在题干的设计上主要突出语境化、口语化的特征,使试题由原先单纯的语法选择转向语言情境限制下的语意选择,这无疑增加了这类试题的难度。同学们要解好该类试题,除了要掌握必要的语言知识外,还要具备一定的解题技巧。本文试结合近年各地典型试题就该类题型的解题技巧作分类简析。  相似文献   

7.
近年来,许多高校为提高教学质量和考试信度,大胆地改革了传统的考试方法,进行了题多面广、题活量大、题型新颖的题库建设,使考试体系有了突破性的改革.题库在国外又被称为试题银行和试题图书馆,在教育测量学中的本意是:以某一教育测量中的题库理论为基础,通过一定的数学模型对试测后的试题进行多项性能指标分析,从中选出符合质量要求的优秀试题,当试题数量多到足以完全代表本学科的全部知识内容,经索引编排并且具有试卷生成系统,实现了最佳试  相似文献   

8.
基于项目反应理论中的LOGISTIC双参数模型研究共同题非等组设计下,考生能力分布与被试量对等值的影响。等值方法采用分别校准下的项目特征曲线法、Stocking-Lord法、Haebara法。等值结果采用等值分数标准误、等值系数标准误、共同题参数稳定性三种方法进行评价。研究结果表明,考生能力分布越接近,被试量越大,等值误差越小;且Stocking-Lord法较Haebara法的等值结果更稳定。  相似文献   

9.
计算机自适应测验中Rasch模型稳健性的模拟研究   总被引:1,自引:0,他引:1  
本研究采用模拟数据的方法,在计算机自适应测验(Computer Adaptive Test,简称CAT)中分别采用Rasch及Birnbaum两种模型估计能力,通过比较两者的误差均方根(Root Mean Square Error,简称RMSE)、平均差异(Average Deviation,简称AD)及能力相关,对Rasch模型在CAT中的稳健性进行了研究。结果发现Rasch模型在区分度不等的条件下仍然能较准确地估计被试的能力水平,具有很强的稳健性。  相似文献   

10.
随着英语中考命题的不断创新,单选题的命制更加贴近学生的生活实际,凸显出试题题干设计的情境化和生活化,在较真实的语境中对考生的语言知识和语言运用能力进行考查,避免了单纯考查语言知识的情况,也增加了试题的灵活性和难度。本文试结合典型试题来"解密"中考单选题命题的常见手法及应试策略,以期帮助同学们决胜2010年中考。  相似文献   

11.
Administering tests under time constraints may result in poorly estimated item parameters, particularly for items at the end of the test (Douglas, Kim, Habing, & Gao, 1998; Oshima, 1994). Bolt, Cohen, and Wollack (2002) developed an item response theory mixture model to identify a latent group of examinees for whom a test is overly speeded, and found that item parameter estimates for end-of-test items in the nonspeeded group were similar to estimates for those same items when administered earlier in the test. In this study, we used the Bolt et al. (2002) method to study the effect of removing speeded examinees on the stability of a score scale over an II-year period. Results indicated that using only the nonspeeded examinees for equating and estimating item parameters provided a more unidimensional scale, smaller effects of item parameter drift (including fewer drifting items), and less scale drift (i.e., bias) and variability (i.e., root mean squared errors) when compared to the total group of examinees.  相似文献   

12.
When tests are administered under fixed time constraints, test performances can be affected by speededness. Among other consequences, speededness can result in inaccurate parameter estimates in item response theory (IRT) models, especially for items located near the end of tests (Oshima, 1994). This article presents an IRT strategy for reducing contamination in item difficulty estimates due to speededness. Ordinal constraints are applied to a mixture Rasch model (Rost, 1990) so as to distinguish two latent classes of examinees: (a) a "speeded" class, comprised of examinees that had insufficient time to adequately answer end-of-test items, and (b) a "nonspeeded" class, comprised of examinees that had sufficient time to answer all items. The parameter estimates obtained for end-of-test items in the nonspeeded class are shown to more accurately approximate their difficulties when the items are administered at earlier locations on a different form of the test. A mixture model can also be used to estimate the class memberships of individual examinees. In this way, it can be determined whether membership in the speeded class is associated with other student characteristics. Results are reported for gender and ethnicity.  相似文献   

13.
In the presence of test speededness, the parameter estimates of item response theory models can be poorly estimated due to conditional dependencies among items, particularly for end‐of‐test items (i.e., speeded items). This article conducted a systematic comparison of five‐item calibration procedures—a two‐parameter logistic (2PL) model, a one‐dimensional mixture model, a two‐step strategy (a combination of the one‐dimensional mixture and the 2PL), a two‐dimensional mixture model, and a hybrid model‐–by examining how sample size, percentage of speeded examinees, percentage of missing responses, and way of scoring missing responses (incorrect vs. omitted) affect the item parameter estimation in speeded tests. For nonspeeded items, all five procedures showed similar results in recovering item parameters. For speeded items, the one‐dimensional mixture model, the two‐step strategy, and the two‐dimensional mixture model provided largely similar results and performed better than the 2PL model and the hybrid model in calibrating slope parameters. However, those three procedures performed similarly to the hybrid model in estimating intercept parameters. As expected, the 2PL model did not appear to be as accurate as the other models in recovering item parameters, especially when there were large numbers of examinees showing speededness and a high percentage of missing responses with incorrect scoring. Real data analysis further described the similarities and differences between the five procedures.  相似文献   

14.
In this study, the authors explored the importance of item difficulty (equated delta) as a predictor of differential item functioning (DIF) of Black versus matched White examinees for four verbal item types (analogies, antonyms, sentence completions, reading comprehension) using 13 GRE-disclosed forms (988 verbal items) and 11 SAT-disclosed forms (935 verbal items). The average correlation across test forms for each item type (and often the correlation for each individual test form as well) revealed a significant relationship between item difficulty and DIF value for both GRE and SAT. The most important finding indicates that for hard items, Black examinees perform differentially better than matched ability White examinees for each of the four item types and for both the GRE and SAT tests! The results further suggest that the amount of verbal context is an important determinant of the magnitude of the relationship between item difficulty and differential performance of Black versus matched White examinees. Several hypotheses accounting for this result were explored.  相似文献   

15.
Practical use of the matrix sampling (i.e. item sampling) technique requires the assumption that an examinee's response to an item is independent of the context in which the item occurs. This assumption was tested experimentally by comparing the responses of examinees to a population of items with the responses of examinees to item samples. Matrix sampling mean and variance estimates for verbal, quantitative, and attitude tests were used as dependent variables to test for differences between the “context” and “out-of-context” groups. The estimates obtained from both treatment groups were also compared with actual population values. No significant differences were found between treatments on matrix sample parameter estimates for any of the three types of tests.  相似文献   

16.
In some tests, examinees are required to choose a fixed number of items from a set of given items to answer. This practice creates a challenge to standard item response models, because more capable examinees may have an advantage by making wiser choices. In this study, we developed a new class of item response models to account for the choice effect of examinee‐selected items. The results of a series of simulation studies showed: (1) that the parameters of the new models were recovered well, (2) the parameter estimates were almost unbiased when the new models were fit to data that were simulated from standard item response models, (3) failing to consider the choice effect yielded shrunken parameter estimates for examinee‐selected items, and (4) even when the missingness mechanism in examinee‐selected items did not follow the item response functions specified in the new models, the new models still yielded a better fit than did standard item response models. An empirical example of a college entrance examination supported the use of the new models: in general, the higher the examinee's ability, the better his or her choice of items.  相似文献   

17.
Two item selection algorithms were compared in simulated linear and adaptive tests of cognitive ability. One algorithm selected items that maximally differentiated between examinees. The other used item response theory (IRT) to select items having maximum information for each examinee. Normally distributed populations of 1,000 cases were simulated, using test lengths of 4, 5, 6, and 7 items. Overall, adaptive tests based on maximum information provided the most information over the widest range of ability values and, in general, differentiated among examinees slightly better than the other tests. Although the maximum differentiation technique may be adequate in some circumstances, adaptive tests based on maximum information are clearly superior.  相似文献   

18.
The examinee‐selected‐item (ESI) design, in which examinees are required to respond to a fixed number of items in a given set of items (e.g., choose one item to respond from a pair of items), always yields incomplete data (i.e., only the selected items are answered and the others have missing data) that are likely nonignorable. Therefore, using standard item response theory models, which assume ignorable missing data, can yield biased parameter estimates so that examinees taking different sets of items to answer cannot be compared. To solve this fundamental problem, in this study the researchers utilized the specific objectivity of Rasch models by adopting the conditional maximum likelihood estimation (CMLE) and pairwise estimation (PE) methods to analyze ESI data, and conducted a series of simulations to demonstrate the advantages of the CMLE and PE methods over traditional estimation methods in recovering item parameters in ESI data. An empirical data set obtained from an experiment on the ESI design was analyzed to illustrate the implications and applications of the proposed approach to ESI data.  相似文献   

19.
The graded response model can be used to describe test-taking behavior when item responses are classified into ordered categories. In this study, parameter recovery in the graded response model was investigated using the MULTILOG computer program under default conditions. Based on items having five response categories, 36 simulated data sets were generated that varied on true θ distribution, true item discrimination distribution, and calibration sample size. The findings suggest, first, the correlations between the true and estimated parameters were consistently greater than 0.85 with sample sizes of at least 500. Second, the root mean square error differences between true and estimated parameters were comparable with results from binary data parameter recovery studies. Of special note was the finding that the calibration sample size had little influence on the recovery of the true ability parameter but did influence item-parameter recovery. Therefore, it appeared that item-parameter estimation error, due to small calibration samples, did not result in poor person-parameter estimation. It was concluded that at least 500 examinees are needed to achieve an adequate calibration under the graded model.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号