共查询到20条相似文献,搜索用时 613 毫秒
1.
计算机化自适应测验的理论与应用 总被引:1,自引:0,他引:1
计算机化自适应测验是一种全新的测验形式,它与传统的测验相比有无可比拟的优点.作者介绍了计算机化自适应测验的理论基础及其测试的环节,并对计算机自适应测验实践应用情况作了简要的描述. 相似文献
2.
一帆 《教育测量与评价(理论版)》2014,(3):24-24
题库是现代计算机技术的产物,是计算机化测验的基础。无论是计算机智能化组卷测验,还是计算机自适应测验,都离不开题库。题库对心理与教育测验进一步科学化、现代化起到了较大的推动作用。在项目反应理论指导下的题库建设一般可分为题库总体设计、命题与参数设计、设计题库的生成系统、设计评分和解释等。 相似文献
3.
基于计算机的测验已逐渐普及,但不同的计算机测验形式在测量相同任务时可能会产生测验结果的偏差,从而导致教育测量与评价结果的不公平性。文章基于项目反应理论,探讨了计算机化线性测验与计算机自适应测验在测验效率、测验结果的统计学特征及其对考生个体心理特质的影响是否等效等问题,并以师范生"现代教育技术"课程为例开展了实证研究,结果显示:两种测验中考生的分数具有可比性,计算机自适应测验具有更高的测验效率与测验信度,但有无即时反馈对考生测验焦虑的影响较大;而计算机化线性测验具有更合理的内容效度,有无即时反馈对考生测验焦虑的影响较小。文章的研究不仅对教学评价中测验形式的选择是否公平合理进行了科学分析,而且为施测者根据测验场景有针对性地选择测验形式提供了理论参考。 相似文献
4.
基于网络的化学自适应测验是在项目反应理论、充分考虑计算机化自适应测验特点的基础上研制开发的测验系统。本文阐述了系统研制的三个主要方面,即后台数据库的建立、前台界面的创建和CGI应用程序的编写,介绍了该测验系统在网络中初步运行的情况,总结了系统的优点,并揭示了该项研究的意义。 相似文献
5.
6.
漆书青 《江西师范大学学报(哲学社会科学版)》2005,38(5):107-109
在认知心理学、现代测量模型探索与信息技术的推动下,21世纪的能力测量,出现了测验连续性校订、计算机化自适应测验、智能化项目创编以及跟教学结合在一起的动态测量等新趋势,从而使能力测量技术革新和对教育与社会生活的影响,出现崭新局面。 相似文献
7.
陆衣言 《金陵科技学院学报(社会科学版)》1998,(4)
该文介绍了将认知科学中的知识空间理论、心理和教育测量中的项目反应理论和计算机技术相结合的计算机化自适应测验及其在提高测验的效率和精确度方面的作用。 相似文献
8.
计算机化自适应多阶段测试是精准减负的一种有效手段,因为它会自动引导学生尽可能作答与其能力水平相适应的题目,从而节省出作答太难或太易题目所浪费的大量时间和精力.不过,我国目前的一些计算机测试系统缺乏现代测评技术的有力支撑,部分题库在知识内容和能力维度的标识与编码、题目参数的估计与等值,以及分数的算法与使用方面存在着较大缺陷.本文简要分析了自适应测试的基本模式、操作流程、使用条件和主要优点,具体讨论了计算机化自适应多阶段测试系统的设计,以及基于测验总分的单参数logistic模型和基于作答反应模式的双参数logistic模型的算分方法,为提升计算机化自适应测试的水平,进而促进教师因材施教、减轻学生作业负担和考试负担提供了考试科学视角下的新办法. 相似文献
9.
10.
11.
When a computerized adaptive testing (CAT) version of a test co-exists with its paper-and-pencil (P&P) version, it is important for scores from the CAT version to be comparable to scores from its P&P version. The CAT version may require multiple item pools for test security reasons, and CAT scores based on alternate pools also need to be comparable to each other. In this paper, we review research literature on CAT comparability issues and synthesize issues specific to these two settings. A framework of criteria for evaluating comparability was developed that contains the following three categories of criteria: validity criterion, psychometric property/reliability criterion, and statistical assumption/test administration condition criterion. Methods for evaluating comparability under these criteria as well as various algorithms for improving comparability are described and discussed. Focusing on the psychometric property/reliability criterion, an example using an item pool of ACT Assessment Mathematics items is provided to demonstrate a process for developing comparable CAT versions and for evaluating comparability. This example illustrates how simulations can be used to improve comparability at the early stages of the development of a CAT. The effects of different specifications of practical constraints, such as content balancing and item exposure rate control, and the effects of using alternate item pools are examined. One interesting finding from this study is that a large part of incomparability may be due to the change from number-correct score-based scoring to IRT ability estimation-based scoring. In addition, changes in components of a CAT, such as exposure rate control, content balancing, test length, and item pool size were found to result in different levels of comparability in test scores. 相似文献
12.
Computerized adaptive testing in instructional settings 总被引:3,自引:0,他引:3
R. Edwin Welch Theodore W. Frick 《Educational technology research and development : ETR & D》1993,41(3):47-62
Item response theory (IRT) has most often been used in research on computerized adaptive testing (CAT). Depending on the model
used, IRT requires between 200 and 1,000 examinees for estimating item parameters. Thus, it is not practical for instructional
designers to develop their own CAT based on the IRT model. Frick improved Wald's sequential probability ratio test (SPRT)
by combining it with normative expert systems reasoning, referred to as an EXSPRT-based CAT. While previous studies were based
on re-enactments from historical test data, the present study is the first to examine how well these adaptive methods function
in a real-time testing situation. Results indicate that the EXSPRT-I significantly reduced test lengths and was highly accurate
in predicting mastery. EXSPRT is apparently a viable and practical alternative to IRT for assessing mastery of instructional
objectives. 相似文献
13.
The accuracy of CAT scores can be negatively affected by local dependence if the CAT utilizes parameters that are misspecified due to the presence of local dependence and/or fails to control for local dependence in responses during the administration stage. This article evaluates the existence and effect of local dependence in a test of Mathematics Knowledge. Diagnostic tools were first used to evaluate the existence of local dependence in items that were calibrated under a 3PL model. A simulation study was then used to evaluate the effect of local dependence on the precision of examinee CAT scores when the 3PL model was used for selection and scoring. The diagnostic evaluation showed strong evidence for local dependence. The simulation suggested that local dependence in parameters had a minimal effect on CAT score precision, while local dependence in responses had a substantial effect on score precision, depending on the degree of local dependence present. 相似文献
14.
三、CAT中对的估计(一)MLE(极大似然估计法)假设一个能力水平为θ的被试对n道项目X_1,X_2,…,X_n作答。θ的估计可以通过使(8)式所示的似然函数最大化的方式来得到。令(?)_n为此时所得的θ估计。显然(?)_n也是(9)式的极大似然估计。已知在一定的条件下,(?)_n符合渐进正态,其均值为θ,方差近似为I~(-1)_n((?)_n)。目前的CAT设计大多通过递归方式在被试回答一个新的项目之后得到最新的θ估计,并根据信息最大化法抽取下一个项目。 相似文献
15.
本文简要地介绍了移动Agent特点,分析了传统的网络考试系统的缺点。利用移动Agent设计出新型的网络考试系统,并给出几个关键的Agent详细设计,指出移动Agent的网络考试系统需要关注的问题。 相似文献
16.
为了提高个人信用评分模型算法预测精准率,受视觉领域数据增广思路启发,提出融合数据增广技术与机器学习算法的个人信用评分模型。该模型首先对原始个人信用数据进行数据增广处理,然后基于机器学习分类算法训练一个二分类个人信用评分模型,最后基于公开个人信用数据集,分别建立未经过数据增广和经过数据增广处理后的个人信用评分模型。对比准确率、精确率、召回率、F1 得分、AUC 值和 ROC 曲线等 6 个性能评价指标,结果显示,相较于仅基于机器学习算法的个人信用评分模型,融合了数据增广技术与机器学习算法的个人信用评分模型使得分类性能得到了一定提升,分类准确率平均高出 5%。 相似文献
17.
Shixuan Fu Huimin Gu Bo Yang 《British journal of educational technology : journal of the Council for Educational Technology》2020,51(5):1674-1692
Traditional educational giants and natural language processing companies have launched several artificial intelligence (AI)-enabled digital learning applications to facilitate language learning. One typical application of AI in digital language education is the automatic scoring application that provides feedback on pronunciation repeat outcomes. This research is motivated by the usage of automatic scoring-empowered digital learning tools by language learners, and set out to uncover the influencing mechanisms of AI-enabled automatic scoring application affordances on learners’ continuous learning intention. Specifically, based on affordance theory, we found several automatic scoring application affordances through in-depth interviews. Considering the current lack of investigations on the mechanisms underlying automatic scoring application and its implications for learners’ learning behaviors, we built a model to examine the role of automatic scoring application affordances on cognitive/emotional engagement and following continuous learning intention. We further examined the moderation role of in-job learners and student learners on the above relationships. The model was tested using a survey of 260 Chinese foreign language learners who used AI-empowered learning tools to facilitate their language learning practices. This study explores why learners continuously use AI-enabled automatic scoring applications by identifying the affordances that differentiate it from traditional educational technologies. Practitioners could take the identified affordances into account when designing AI-enabled language learning applications. 相似文献
18.
19.
计算机自适应测验中Rasch模型稳健性的模拟研究 总被引:1,自引:0,他引:1
本研究采用模拟数据的方法,在计算机自适应测验(Computer Adaptive Test,简称CAT)中分别采用Rasch及Birnbaum两种模型估计能力,通过比较两者的误差均方根(Root Mean Square Error,简称RMSE)、平均差异(Average Deviation,简称AD)及能力相关,对Rasch模型在CAT中的稳健性进行了研究。结果发现Rasch模型在区分度不等的条件下仍然能较准确地估计被试的能力水平,具有很强的稳健性。 相似文献