期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

《中国考试》2016,(12)

计算机化自适应测验(CAT)模拟是CAT研究的主要方法之一。CAT模拟结果的评价分析内容主要包括三个方面:被试能力估计与被试能力分类分析、题库试题使用情况分析和CAT测验作答过程分析。CAT模拟结果的分析模式主要分为整体分析和细化分析两种模式。本研究从测验模拟返真性能、测验准确性、题库安全性、题库使用率、测验分类效率与准确性、多测验目标约束控制的实现程度等角度概述CAT模拟结果的各类评价指标。CAT模拟结果的评价角度和评价指标需要根据CAT研究目标和测验情境要求加以确定。相似文献

2.

中学作文评价的概化理论模拟研究

田金亭余嘉元戴冰青《教育测量与评价(理论版)》2012,(8):4-7

本文选取聊城市初一、初二、高一、高二学生作为被试,以被试完成的作文作品为研究样本,运用概化理论的随机双侧面交叉设计,对作文评价指标、评分者数量的界定问题进行研究。研究表明,适当增加评分者或评价指标数量均能降低测验误差,提高测验信度;随着评分者或评价指标数量的逐渐增加,测验误差降低或测验信度提高的幅度将变得很小。该文为高考作文评价时确定较为合适的评价指标、评分者数量提供了科学依据相似文献

3.

浅谈中学英语阅读教学

季小杰《黑河教育》2013,(5):66-66

在英语教学中,阅读教学是重中之重。学生通过有效的英语阅读,可以扩大知识面,增加词汇量,写作时能语言准确、内容丰富,交际时不会因为词汇贫乏而言不达意。提高英语阅读能力的重要性在中考试题中也有所体现。近年来阅读理解题在中考试题中的比重逐年增加,阅读理解部分的短文由两篇、三篇增加到四篇,题量也在逐渐增加,测验相似文献

4.

无锚题测验等值设计方法研究进展

《考试研究》2017,(3)

由于测验安全性、试卷组卷不当等问题,有些测验的题本相互之间不能或者没有设置锚题。对作答不同题本的被试进行分数比较时,需要用到测验等值技术。不同于有锚题测验能通过题本之间的锚题进行等值,无锚题情境下的测验需要借助于一些特殊方法进行等值。目前,对无锚题测验进行等值主要有三种方式,一种是通过测验中具体的题目,也就是构建相同的"锚题"来进行等值,如构造随机等组测验法和利用题目先验信息进行等值的方法;一种是通过构建相同被试组来进行等值,即构造随机等组样本法;还有一种是借助于测验题目所考查的认知属性来进行等值,一般是基于一种认知诊断模型——规则空间模型来进行操作。相似文献

5.

2001年全国高考化学试题（广东、河南卷）第24题分析

李玉安吴庶民《中学化学教学参考》2001,(10):45-46

试题：将等物质的量的两种氧化物溶于100mL硫酸，而后逐滴加入1．00mol／L的NaOH溶液。当加入的NaOH的体积V1=50mL时，开始有沉淀析出，且沉淀量随NaOH加入量逐渐增加。当NaOH的体积V2=650mL时，淀淀量达最大值，继续滴加NaOH时，沉淀量逐渐减小。当V3≥750mL时，沉淀量不再改变。相似文献

6.

中考语境型试题解题技巧分类例析

卢爱龙《中学生英语》2004,(2)

纵观近年来各地中考试题可以发现——命题者在设计单项选择题时不断增加交际语言的考查力度,在题干的设计上主要突出语境化、口语化的特征,使试题由原先单纯的语法选择转向语言情境限制下的语意选择,这无疑增加了这类试题的难度。同学们要解好该类试题,除了要掌握必要的语言知识外,还要具备一定的解题技巧。本文试结合近年各地典型试题就该类题型的解题技巧作分类简析。相似文献

7.

题库建设与管理简论

鲁宏杰《中国远程教育(综合版)》1993,(7)

近年来,许多高校为提高教学质量和考试信度,大胆地改革了传统的考试方法,进行了题多面广、题活量大、题型新颖的题库建设,使考试体系有了突破性的改革.题库在国外又被称为试题银行和试题图书馆,在教育测量学中的本意是:以某一教育测量中的题库理论为基础,通过一定的数学模型对试测后的试题进行多项性能指标分析,从中选出符合质量要求的优秀试题,当试题数量多到足以完全代表本学科的全部知识内容,经索引编排并且具有试卷生成系统,实现了最佳试相似文献

8.

考生能力分布与被试量对IRT等值的影响

韩晓杰任杰《考试研究》2021,(1):58-64

基于项目反应理论中的LOGISTIC双参数模型研究共同题非等组设计下,考生能力分布与被试量对等值的影响。等值方法采用分别校准下的项目特征曲线法、Stocking-Lord法、Haebara法。等值结果采用等值分数标准误、等值系数标准误、共同题参数稳定性三种方法进行评价。研究结果表明,考生能力分布越接近,被试量越大,等值误差越小;且Stocking-Lord法较Haebara法的等值结果更稳定。相似文献

9.

计算机自适应测验中Rasch模型稳健性的模拟研究 总被引：1，自引：0，他引：1

邓远平蔡艳罗照盛《考试研究》2006,(3)

本研究采用模拟数据的方法,在计算机自适应测验(Computer Adaptive Test,简称CAT)中分别采用Rasch及Birnbaum两种模型估计能力,通过比较两者的误差均方根(Root Mean Square Error,简称RMSE)、平均差异(Average Deviation,简称AD)及能力相关,对Rasch模型在CAT中的稳健性进行了研究。结果发现Rasch模型在区分度不等的条件下仍然能较准确地估计被试的能力水平,具有很强的稳健性。相似文献

10.

单选题命题方法“解密”与精练

陈立蓉《初中生世界(初三物理版)》2010,(Z5):40-49

随着英语中考命题的不断创新,单选题的命制更加贴近学生的生活实际,凸显出试题题干设计的情境化和生活化,在较真实的语境中对考生的语言知识和语言运用能力进行考查,避免了单纯考查语言知识的情况,也增加了试题的灵活性和难度。本文试结合典型试题来"解密"中考单选题命题的常见手法及应试策略,以期帮助同学们决胜2010年中考。相似文献

11.

A Method for Maintaining Scale Stability in the Presence of Test Speededness

James A. Wollack Allan S. Cohen Craig S. Wells 《Journal of Educational Measurement》2003,40(4):307-330

Administering tests under time constraints may result in poorly estimated item parameters, particularly for items at the end of the test (Douglas, Kim, Habing, & Gao, 1998; Oshima, 1994). Bolt, Cohen, and Wollack (2002) developed an item response theory mixture model to identify a latent group of examinees for whom a test is overly speeded, and found that item parameter estimates for end-of-test items in the nonspeeded group were similar to estimates for those same items when administered earlier in the test. In this study, we used the Bolt et al. (2002) method to study the effect of removing speeded examinees on the stability of a score scale over an II-year period. Results indicated that using only the nonspeeded examinees for equating and estimating item parameters provided a more unidimensional scale, smaller effects of item parameter drift (including fewer drifting items), and less scale drift (i.e., bias) and variability (i.e., root mean squared errors) when compared to the total group of examinees. 相似文献

12.

Item Parameter Estimation Under Conditions of Test Speededness: Application of a Mixture Rasch Model With Ordinal Constraints

Daniel M. Bolt Allan S. Cohen James A. Wollack 《Journal of Educational Measurement》2002,39(4):331-348

When tests are administered under fixed time constraints, test performances can be affected by speededness. Among other consequences, speededness can result in inaccurate parameter estimates in item response theory (IRT) models, especially for items located near the end of tests (Oshima, 1994). This article presents an IRT strategy for reducing contamination in item difficulty estimates due to speededness. Ordinal constraints are applied to a mixture Rasch model (Rost, 1990) so as to distinguish two latent classes of examinees: (a) a "speeded" class, comprised of examinees that had insufficient time to adequately answer end-of-test items, and (b) a "nonspeeded" class, comprised of examinees that had sufficient time to answer all items. The parameter estimates obtained for end-of-test items in the nonspeeded class are shown to more accurately approximate their difficulties when the items are administered at earlier locations on a different form of the test. A mixture model can also be used to estimate the class memberships of individual examinees. In this way, it can be determined whether membership in the speeded class is associated with other student characteristics. Results are reported for gender and ethnicity. 相似文献

13.

A Comparison of Item Calibration Procedures in the Presence of Test Speededness

Youngsuk Suh Sun‐Joo Cho James A. Wollack 《Journal of Educational Measurement》2012,49(3):285-311

In the presence of test speededness, the parameter estimates of item response theory models can be poorly estimated due to conditional dependencies among items, particularly for end‐of‐test items (i.e., speeded items). This article conducted a systematic comparison of five‐item calibration procedures—a two‐parameter logistic (2PL) model, a one‐dimensional mixture model, a two‐step strategy (a combination of the one‐dimensional mixture and the 2PL), a two‐dimensional mixture model, and a hybrid model‐–by examining how sample size, percentage of speeded examinees, percentage of missing responses, and way of scoring missing responses (incorrect vs. omitted) affect the item parameter estimation in speeded tests. For nonspeeded items, all five procedures showed similar results in recovering item parameters. For speeded items, the one‐dimensional mixture model, the two‐step strategy, and the two‐dimensional mixture model provided largely similar results and performed better than the 2PL model and the hybrid model in calibrating slope parameters. However, those three procedures performed similarly to the hybrid model in estimating intercept parameters. As expected, the 2PL model did not appear to be as accurate as the other models in recovering item parameters, especially when there were large numbers of examinees showing speededness and a high percentage of missing responses with incorrect scoring. Real data analysis further described the similarities and differences between the five procedures. 相似文献

14.

Item Difficulty of Four Verbal Item Types and an Index of Differential Item Functioning for Black and White Examinees 总被引：1，自引：0，他引：1

Roy Freedle Irene Kostin 《Journal of Educational Measurement》1990,27(4):329-343

In this study, the authors explored the importance of item difficulty (equated delta) as a predictor of differential item functioning (DIF) of Black versus matched White examinees for four verbal item types (analogies, antonyms, sentence completions, reading comprehension) using 13 GRE-disclosed forms (988 verbal items) and 11 SAT-disclosed forms (935 verbal items). The average correlation across test forms for each item type (and often the correlation for each individual test form as well) revealed a significant relationship between item difficulty and DIF value for both GRE and SAT. The most important finding indicates that for hard items, Black examinees perform differentially better than matched ability White examinees for each of the four item types and for both the GRE and SAT tests! The results further suggest that the amount of verbal context is an important determinant of the magnitude of the relationship between item difficulty and differential performance of Black versus matched White examinees. Several hypotheses accounting for this result were explored. 相似文献

15.

AN INVESTIGATION OF THE CONTEXT EFFECT IN MATRIX SAMPLING1

KEN SIROTNIK 《Journal of Educational Measurement》1970,7(3):199-207

Practical use of the matrix sampling (i.e. item sampling) technique requires the assumption that an examinee's response to an item is independent of the context in which the item occurs. This assumption was tested experimentally by comparing the responses of examinees to a population of items with the responses of examinees to item samples. Matrix sampling mean and variance estimates for verbal, quantitative, and attitude tests were used as dependent variables to test for differences between the “context” and “out-of-context” groups. The estimates obtained from both treatment groups were also compared with actual population values. No significant differences were found between treatments on matrix sample parameter estimates for any of the three types of tests. 相似文献

16.

Item Response Models for Examinee‐Selected Items

Wen‐Chung Wang Kuan‐Yu Jin Xue‐Lan Qiu Lei Wang 《Journal of Educational Measurement》2012,49(4):419-445

In some tests, examinees are required to choose a fixed number of items from a set of given items to answer. This practice creates a challenge to standard item response models, because more capable examinees may have an advantage by making wiser choices. In this study, we developed a new class of item response models to account for the choice effect of examinee‐selected items. The results of a series of simulation studies showed: (1) that the parameters of the new models were recovered well, (2) the parameter estimates were almost unbiased when the new models were fit to data that were simulated from standard item response models, (3) failing to consider the choice effect yielded shrunken parameter estimates for examinee‐selected items, and (4) even when the missingness mechanism in examinee‐selected items did not follow the item response functions specified in the new models, the new models still yielded a better fit than did standard item response models. An empirical example of a college entrance examination supported the use of the new models: in general, the higher the examinee's ability, the better his or her choice of items. 相似文献

17.

A Comparison of Item Selection Routines in Linear and Adaptive Tests

Deborah L. Schnipke Bert F. Green 《Journal of Educational Measurement》1995,32(3):227-242

Two item selection algorithms were compared in simulated linear and adaptive tests of cognitive ability. One algorithm selected items that maximally differentiated between examinees. The other used item response theory (IRT) to select items having maximum information for each examinee. Normally distributed populations of 1,000 cases were simulated, using test lengths of 4, 5, 6, and 7 items. Overall, adaptive tests based on maximum information provided the most information over the widest range of ability values and, in general, differentiated among examinees slightly better than the other tests. Although the maximum differentiation technique may be adequate in some circumstances, adaptive tests based on maximum information are clearly superior. 相似文献

18.

Parameter Estimation in Rasch Models for Examinee‐Selected Items

下载免费PDF全文

Chen‐Wei Liu Wen‐Chung Wang 《Journal of Educational Measurement》2017,54(4):518-549

The examinee‐selected‐item (ESI) design, in which examinees are required to respond to a fixed number of items in a given set of items (e.g., choose one item to respond from a pair of items), always yields incomplete data (i.e., only the selected items are answered and the others have missing data) that are likely nonignorable. Therefore, using standard item response theory models, which assume ignorable missing data, can yield biased parameter estimates so that examinees taking different sets of items to answer cannot be compared. To solve this fundamental problem, in this study the researchers utilized the specific objectivity of Rasch models by adopting the conditional maximum likelihood estimation (CMLE) and pairwise estimation (PE) methods to analyze ESI data, and conducted a series of simulations to demonstrate the advantages of the CMLE and PE methods over traditional estimation methods in recovering item parameters in ESI data. An empirical data set obtained from an experiment on the ESI design was analyzed to illustrate the implications and applications of the proposed approach to ESI data. 相似文献

19.

Parameter Recovery in the Graded Response Model Using MULTILOG

Steve P. Reise Jiayuan Yu 《Journal of Educational Measurement》1990,27(2):133-144

The graded response model can be used to describe test-taking behavior when item responses are classified into ordered categories. In this study, parameter recovery in the graded response model was investigated using the MULTILOG computer program under default conditions. Based on items having five response categories, 36 simulated data sets were generated that varied on true θ distribution, true item discrimination distribution, and calibration sample size. The findings suggest, first, the correlations between the true and estimated parameters were consistently greater than 0.85 with sample sizes of at least 500. Second, the root mean square error differences between true and estimated parameters were comparable with results from binary data parameter recovery studies. Of special note was the finding that the calibration sample size had little influence on the recovery of the true ability parameter but did influence item-parameter recovery. Therefore, it appeared that item-parameter estimation error, due to small calibration samples, did not result in poor person-parameter estimation. It was concluded that at least 500 examinees are needed to achieve an adequate calibration under the graded model. 相似文献

20.

Relation between examinees’ true knowledge and examination scores: systematic review and exemplary calculations on Pick-N items

《Educational Research Review》2022

相似文献