期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

曾灵秀李然《四川教育学院学报》2006,22(Z2):59-60

计算机化自适应测验是一种全新的测验形式,它与传统的测验相比有无可比拟的优点.作者介绍了计算机化自适应测验的理论基础及其测试的环节,并对计算机自适应测验实践应用情况作了简要的描述. 相似文献

2.

一帆《教育测量与评价(理论版)》2014,(3):24-24

题库是现代计算机技术的产物,是计算机化测验的基础。无论是计算机智能化组卷测验,还是计算机自适应测验,都离不开题库。题库对心理与教育测验进一步科学化、现代化起到了较大的推动作用。在项目反应理论指导下的题库建设一般可分为题库总体设计、命题与参数设计、设计题库的生成系统、设计评分和解释等。相似文献

3.

计算机化线性测验与自适应测验的等效性研究

李心钰陆宏《现代教育技术》2022,(1):85-93

基于计算机的测验已逐渐普及,但不同的计算机测验形式在测量相同任务时可能会产生测验结果的偏差,从而导致教育测量与评价结果的不公平性。文章基于项目反应理论,探讨了计算机化线性测验与计算机自适应测验在测验效率、测验结果的统计学特征及其对考生个体心理特质的影响是否等效等问题,并以师范生"现代教育技术"课程为例开展了实证研究,结果显示:两种测验中考生的分数具有可比性,计算机自适应测验具有更高的测验效率与测验信度,但有无即时反馈对考生测验焦虑的影响较大;而计算机化线性测验具有更合理的内容效度,有无即时反馈对考生测验焦虑的影响较小。文章的研究不仅对教学评价中测验形式的选择是否公平合理进行了科学分析,而且为施测者根据测验场景有针对性地选择测验形式提供了理论参考。相似文献

4.

网络中的化学自适应测验系统的研制

李广洲丁金芳《中国考试》2006,(8):46-50

基于网络的化学自适应测验是在项目反应理论、充分考虑计算机化自适应测验特点的基础上研制开发的测验系统。本文阐述了系统研制的三个主要方面,即后台数据库的建立、前台界面的创建和CGI应用程序的编写,介绍了该测验系统在网络中初步运行的情况,总结了系统的优点,并揭示了该项研究的意义。相似文献

5.

计算机化自适应测验模拟方法的研究范式与特点

《中国考试》2016,(1)

计算机化自适应测验(CAT)在理论与实践中得到广泛应用。目前许多CAT研究可以归纳为两种研究范式:实测作答的CAT研究范式和测验作答数据模拟的CAT研究范式。CAT模拟研究方法的步骤有模型选择、题库模拟、测试起点、选题策略、测验终止策略等。CAT模拟研究的主要趋势有:选题策略、终止策略仍然是CAT研究的重点;CAT模拟研究的设计内容更适合实际测验情况;CAT研究设计采取多因素设计;模拟结果多方面综合评价等。相似文献

6.

能力测量发展中的若干新趋势

漆书青《江西师范大学学报(哲学社会科学版)》2005,38(5):107-109

在认知心理学、现代测量模型探索与信息技术的推动下，21世纪的能力测量，出现了测验连续性校订、计算机化自适应测验、智能化项目创编以及跟教学结合在一起的动态测量等新趋势，从而使能力测量技术革新和对教育与社会生活的影响，出现崭新局面。相似文献

7.

计算机化自适应考试前景广阔

陆衣言《金陵科技学院学报(社会科学版)》1998,(4)

该文介绍了将认知科学中的知识空间理论、心理和教育测量中的项目反应理论和计算机技术相结合的计算机化自适应测验及其在提高测验的效率和精确度方面的作用。相似文献

8.

"双减"背景下计算机化自适应多阶段测试的设计与算法改进

杨志明夏胜俊《教育测量与评价(理论版)》2021,(11):3-9

计算机化自适应多阶段测试是精准减负的一种有效手段,因为它会自动引导学生尽可能作答与其能力水平相适应的题目,从而节省出作答太难或太易题目所浪费的大量时间和精力.不过,我国目前的一些计算机测试系统缺乏现代测评技术的有力支撑,部分题库在知识内容和能力维度的标识与编码、题目参数的估计与等值,以及分数的算法与使用方面存在着较大缺陷.本文简要分析了自适应测试的基本模式、操作流程、使用条件和主要优点,具体讨论了计算机化自适应多阶段测试系统的设计,以及基于测验总分的单参数logistic模型和基于作答反应模式的双参数logistic模型的算分方法,为提升计算机化自适应测试的水平,进而促进教师因材施教、减轻学生作业负担和考试负担提供了考试科学视角下的新办法. 相似文献

9.

引入曝光因子的最大信息量动态分层法

《中国考试》2013,(2)

在计算机化自适应测验(CAT)中,a分层法(a-STR)是较为独特且运用较广的一种选题策略,它可以有效控制题目曝光率以提高测验的安全性。动态a分层法(DAS)是针对a-STR分层数固定不变,需要人为在考试前确定且没有失效时间等一些不足提出的一种选题策略,但DAS本身在测验效率和曝光率控制上的表现并不优秀,对此,在0-1评分的CAT中,通过引入曝光因子(ecf)和最大信息量分层策略(MIS),提出一种复合型选题策略,以期对DAS进行改进。计算机模拟结果显示,新的选题策略在测验效率和曝光率控制方面均优于DAS,达到了研究的目的。相似文献

10.

CAT模拟结果的分析模式与评价指标

《中国考试》2016,(12)

计算机化自适应测验(CAT)模拟是CAT研究的主要方法之一。CAT模拟结果的评价分析内容主要包括三个方面:被试能力估计与被试能力分类分析、题库试题使用情况分析和CAT测验作答过程分析。CAT模拟结果的分析模式主要分为整体分析和细化分析两种模式。本研究从测验模拟返真性能、测验准确性、题库安全性、题库使用率、测验分类效率与准确性、多测验目标约束控制的实现程度等角度概述CAT模拟结果的各类评价指标。CAT模拟结果的评价角度和评价指标需要根据CAT研究目标和测验情境要求加以确定。相似文献

11.

Evaluating Comparability in Computerized Adaptive Testing: Issues, Criteria and an Example

Tianyou Wang Michael J. Kolen 《Journal of Educational Measurement》2001,38(1):19-49

When a computerized adaptive testing (CAT) version of a test co-exists with its paper-and-pencil (P&P) version, it is important for scores from the CAT version to be comparable to scores from its P&P version. The CAT version may require multiple item pools for test security reasons, and CAT scores based on alternate pools also need to be comparable to each other. In this paper, we review research literature on CAT comparability issues and synthesize issues specific to these two settings. A framework of criteria for evaluating comparability was developed that contains the following three categories of criteria: validity criterion, psychometric property/reliability criterion, and statistical assumption/test administration condition criterion. Methods for evaluating comparability under these criteria as well as various algorithms for improving comparability are described and discussed. Focusing on the psychometric property/reliability criterion, an example using an item pool of ACT Assessment Mathematics items is provided to demonstrate a process for developing comparable CAT versions and for evaluating comparability. This example illustrates how simulations can be used to improve comparability at the early stages of the development of a CAT. The effects of different specifications of practical constraints, such as content balancing and item exposure rate control, and the effects of using alternate item pools are examined. One interesting finding from this study is that a large part of incomparability may be due to the change from number-correct score-based scoring to IRT ability estimation-based scoring. In addition, changes in components of a CAT, such as exposure rate control, content balancing, test length, and item pool size were found to result in different levels of comparability in test scores. 相似文献

12.

Computerized adaptive testing in instructional settings 总被引：3，自引：0，他引：3

R. Edwin Welch Theodore W. Frick 《Educational technology research and development : ETR & D》1993,41(3):47-62

Item response theory (IRT) has most often been used in research on computerized adaptive testing (CAT). Depending on the model used, IRT requires between 200 and 1,000 examinees for estimating item parameters. Thus, it is not practical for instructional designers to develop their own CAT based on the IRT model. Frick improved Wald's sequential probability ratio test (SPRT) by combining it with normative expert systems reasoning, referred to as an EXSPRT-based CAT. While previous studies were based on re-enactments from historical test data, the present study is the first to examine how well these adaptive methods function in a real-time testing situation. Results indicate that the EXSPRT-I significantly reduced test lengths and was highly accurate in predicting mastery. EXSPRT is apparently a viable and practical alternative to IRT for assessing mastery of instructional objectives. 相似文献

13.

Local Dependence in an Operational CAT: Diagnosis and Implications

Mary Pommerich Daniel O. Segall 《Journal of Educational Measurement》2008,45(3):201-223

The accuracy of CAT scores can be negatively affected by local dependence if the CAT utilizes parameters that are misspecified due to the presence of local dependence and/or fails to control for local dependence in responses during the administration stage. This article evaluates the existence and effect of local dependence in a test of Mathematics Knowledge. Diagnostic tools were first used to evaluate the existence of local dependence in items that were calibrated under a 3PL model. A simulation study was then used to evaluate the effect of local dependence on the precision of examinee CAT scores when the 3PL model was used for selection and scoring. The diagnostic evaluation showed strong evidence for local dependence. The simulation suggested that local dependence in parameters had a minimal effect on CAT score precision, while local dependence in responses had a substantial effect on score precision, depending on the degree of local dependence present. 相似文献

14.

计算机化自适应测验(CAT)的发展和前景展望(续)

张华华程莹《考试研究》2005,(2)

三、CAT中对的估计(一)MLE(极大似然估计法)假设一个能力水平为θ的被试对n道项目X_1,X_2,…,X_n作答。θ的估计可以通过使(8)式所示的似然函数最大化的方式来得到。令(?)_n为此时所得的θ估计。显然(?)_n也是(9)式的极大似然估计。已知在一定的条件下,(?)_n符合渐进正态,其均值为θ,方差近似为I~(-1)_n((?)_n)。目前的CAT设计大多通过递归方式在被试回答一个新的项目之后得到最新的θ估计,并根据信息最大化法抽取下一个项目。相似文献

15.

基于移动Agent的考试系统研究

向广利《培训与研究》2006,23(2):23-26

本文简要地介绍了移动Agent特点,分析了传统的网络考试系统的缺点。利用移动Agent设计出新型的网络考试系统,并给出几个关键的Agent详细设计,指出移动Agent的网络考试系统需要关注的问题。相似文献

16.

融合数据增广技术与机器学习算法的个人信用评分研究

陆健健江开忠《教育技术导刊》2009,19(8):40-43

为了提高个人信用评分模型算法预测精准率,受视觉领域数据增广思路启发,提出融合数据增广技术与机器学习算法的个人信用评分模型。该模型首先对原始个人信用数据进行数据增广处理,然后基于机器学习分类算法训练一个二分类个人信用评分模型,最后基于公开个人信用数据集,分别建立未经过数据增广和经过数据增广处理后的个人信用评分模型。对比准确率、精确率、召回率、F1 得分、AUC 值和 ROC 曲线等 6 个性能评价指标,结果显示,相较于仅基于机器学习算法的个人信用评分模型,融合了数据增广技术与机器学习算法的个人信用评分模型使得分类性能得到了一定提升,分类准确率平均高出 5%。相似文献

17.

The affordances of AI-enabled automatic scoring applications on learners’ continuous learning intention: An empirical study in China

Shixuan Fu Huimin Gu Bo Yang 《British journal of educational technology : journal of the Council for Educational Technology》2020,51(5):1674-1692

Traditional educational giants and natural language processing companies have launched several artificial intelligence (AI)-enabled digital learning applications to facilitate language learning. One typical application of AI in digital language education is the automatic scoring application that provides feedback on pronunciation repeat outcomes. This research is motivated by the usage of automatic scoring-empowered digital learning tools by language learners, and set out to uncover the influencing mechanisms of AI-enabled automatic scoring application affordances on learners’ continuous learning intention. Specifically, based on affordance theory, we found several automatic scoring application affordances through in-depth interviews. Considering the current lack of investigations on the mechanisms underlying automatic scoring application and its implications for learners’ learning behaviors, we built a model to examine the role of automatic scoring application affordances on cognitive/emotional engagement and following continuous learning intention. We further examined the moderation role of in-job learners and student learners on the above relationships. The model was tested using a survey of 260 Chinese foreign language learners who used AI-empowered learning tools to facilitate their language learning practices. This study explores why learners continuously use AI-enabled automatic scoring applications by identifying the affordances that differentiate it from traditional educational technologies. Practitioners could take the identified affordances into account when designing AI-enabled language learning applications. 相似文献

18.

计算机智能辅助评分系统定标集选取和优化方法研究 总被引：2，自引：0，他引：2

何屹松孙媛媛张凯付瑞吉《中国考试》2020,(1):30-36

在计算机智能评分研究中,选取定标样本对建立评分模型至关重要。通过对不同定标集人机评分的对比研究,提出“专家随机抽取+智能挑选样卷+聚类分段补充”的定标集选取方法。这种方法提升了评分模型对于各分数段的建模能力,符合高考等考试环境下考生成绩呈正态分布的特点,拓展了对专家评分和阅卷教师评分的综合学习能力,使得计算机智能辅助评分系统能够通过深度学习的方法,更加全面地理解和掌握评分标准。相似文献

19.

计算机自适应测验中Rasch模型稳健性的模拟研究 总被引：1，自引：0，他引：1

邓远平蔡艳罗照盛《考试研究》2006,(3)

本研究采用模拟数据的方法,在计算机自适应测验(Computer Adaptive Test,简称CAT)中分别采用Rasch及Birnbaum两种模型估计能力,通过比较两者的误差均方根(Root Mean Square Error,简称RMSE)、平均差异(Average Deviation,简称AD)及能力相关,对Rasch模型在CAT中的稳健性进行了研究。结果发现Rasch模型在区分度不等的条件下仍然能较准确地估计被试的能力水平,具有很强的稳健性。相似文献

20.

主观题评分标准研究 总被引：1，自引：0，他引：1

周群《考试研究》2007,(1)

本文以2006年上海市高考政治学科论述题评分标准为例,从三个方面研究如何评价主观题评分标准的优劣,即每个评分项是否具有相对独立性;根据若干评分项的结果是否能够推测出考生的综合论述的能力;每个评分项等第划分是否合理。因子分析表明该主观题四个评分项具有单维性,一个因子可以解释为考生的综合论述能力。相关分析表明四个评分项均具有相对独立性,对推测考生的综合论述能力起到了彼此独立的作用。Rasch评分量表模型分析显示,各评分项等级划分基本合理,但个别等级出现信息量不足,在此基础上,提出了改进评分标准的若干建议。相似文献