期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

NCME 2007 Presidential Address: The Concordance Table: An Invitation to Misuse Test Scores

Daniel R. Eignor 《Educational Measurement》2008,27(4):30-33

This article discusses a particular type of concordance table and the potential for test score misuse that may result from employing such a table. The concordance that is discussed is typically created between scores on different, nonequatable versions of a test that share the same or close to the same test title. These concordance tables often appear in the context of relating scores on computerized adaptive and paper‐and‐pencil versions of the same test. When such a table is presented in a complete point‐by‐point fashion, relating each reported score on the scale of the new version of the test to a reported score on the scale of the old version of the test, test score users will typically treat the table as if it represented an equating of scores between the two versions, and directly replace scores on the new version of the test by scores on the old version. This clearly represents a misuse of the test scores. Suggestions for avoiding this misuse of test scores from concordance tables are provided. 相似文献

2.

跨年级小学生数学学力认知诊断测验的IRT垂直等值分析

王欣瑜《中国考试》2021,(2)

本研究采用“共同题?锚测验”设计,使用R语言ltm程序包中的IRT两参数模型进行各年级小学生数学学力认知诊断测验和被试参数的估计,并使用equateIRT程序包进行跨年级小学生数学学力认知诊断测验各项参数的等值转换。结果表明,等值转换后各年级测验的题目难度和小学生数学学力均随年级增长而逐渐递增,不同学校、民族、性别学生的数学学力发展差异性特征均与理论假设相符。本研究验证了采用IRT垂直等值方法构建跨年级小学生数学学力发展水平垂直量表的可行性,为制定系统性补救教学方案和自适应题库建设提供了必要的实证证据。相似文献

3.

“一年多考”机制下稳定英语高考难度的三种方法

杨志明《教育测量与评价(理论版)》2019,(3):15-18

英语高考试行"一年多考"是一项了不起的进步,但多次考试之间的难度波动往往会给直接使用原始分数做招生决定带来极大的麻烦。本文探讨了稳定测验难度的三种方法:国际考试行业的标准做法、借用标准设定思想的专家评定方法,以及反向使用效度证据的小规模代表性样本试测方法。期待这些方法可以给考试一线工作者提供更多的选择。相似文献

4.

以统一考试校准高中成绩的高考改革方案

谢小庆《考试研究》2006,(3)

本文提出了一个符合中国国情的高考改革方案。这一方案的主要特点是以高中校内成绩作为高校招生录取的主要依据,以全国或全省统一考试作为高中成绩的校准参照,统一考试成绩完全与考生脱钩。相似文献

5.

“齐”生死与“化”生死—从生死问题看庄子与葛洪的审美立场与意蕴

阳淼田晓膺《西南师范大学学报(人文社会科学版)》2009,35(1):87-91

“齐”生死与“化”生死分别是道家庄子与道教葛洪对待生死问题的重要分歧之一,“齐”生死遵循“道法自然”,“化”生死引导人走向长生,它们不同却又相承。“齐”与“化”二字,在生死观念不同的背后,还引出了它们审美立场的差异与审美意蕴的不同。从这差异之中,可以看出道教美学作为一门宗教美学的一些特点来。相似文献

6.

The Philosophical Aspects of IRT Equating: Modeling Drift to Evaluate Cohort Growth in Large‐Scale Assessments

Husein Taherbhai Daeryong Seo 《Educational Measurement》2013,32(1):2-14

Calibration and equating is the quintessential necessity for most large‐scale educational assessments. However, there are instances when no consideration is given to the equating process in terms of context and substantive realization, and the methods used in its execution. In the view of the authors, equating is not merely an exhibit of the statistical methodology, but it is also a reflection of the thought process undertaken in its execution. For example, there is hardly any discussion in literature of the ideological differences in the selection of an equating method. Furthermore, there is little evidence of modeling cohort growth through an identification and use of construct‐relevant linking items’ drift, using the common item nonequivalent group equating design. In this article, the authors philosophically justify the use of Huynh's statistical method for the identification of construct‐relevant outliers in the linking pool. The article also dispels the perception of scale instability associated with the inclusion of construct‐relevant outliers in the linking item pool and concludes that an appreciation of the rationale used in the selection of the equating method, together with the use of linking items in modeling cohort growth, can be beneficial to the practitioners. 相似文献

7.

测验等值是开发中考评价功能之必需

杨悦《教育科学》2010,26(1)

中考是各地区规模较大和有影响力的高利害性考试,只有建立科学完善的考试评价系统才能充分发挥中考对地区初中教学多方面的服务作用,而建立完善考试评价系统的必备程序是等值。IRT等值的步骤包括估计项目参数、进行IRT量表转换以及制作分数转换表。相似文献

8.

Measurement,Sampling, and Equating Errors in Large-Scale Assessments

Margaret Wu 《Educational Measurement》2010,29(4):15-27

In large-scale assessments, such as state-wide testing programs, national sample-based assessments, and international comparative studies, there are many steps involved in the measurement and reporting of student achievement. There are always sources of inaccuracies in each of the steps. It is of interest to identify the source and magnitude of the errors in the measurement process that may threaten the validity of the final results. Assessment designers can then improve the assessment quality by focusing on areas that pose the highest threats to the results. This paper discusses the relative magnitudes of three main sources of error with reference to the objectives of assessment programs: measurement error, sampling error, and equating error. A number of examples from large-scale assessments are used to illustrate these errors and their impact on the results. The paper concludes by making a number of recommendations that could lead to an improvement of the accuracies of large-scale assessment results. 相似文献

9.

考试分数等值的新框架 总被引：1，自引：0，他引：1

谢小庆《考试研究》2008,(2):4-17

对考试分数进行等值处理不仅是保证测验信度和公平性的重要环节,也是建立题库和实现计算机化自适应性考试的核心环节。由美国教育协会(ACE)和全美教育测量学会(NCME)联合组织编写的《教育测量》一书被称为教育测量领域中的"圣经"。在2006年出版的《教育测量》(第四版)中提出了一个关于考试分数等值的新框架。本文介绍了这一新框架,并结合作者多年从事考试分数等值的实践,对等值问题进行了讨论。相似文献

10.

On-demand testing and maintaining standards for general qualifications in the UK using item response theory: possibilities and challenges

Qingping He 《Educational research; a review for teachers and all concerned with progress in education》2013,55(1):89-112

Background:?Although on-demand testing is being increasingly used in many areas of assessment, it has not been adopted in high stakes examinations like the General Certificate of Secondary Education (GCSE) and General Certificate of Education Advanced level (GCE A level) offered by awarding organisations (AOs) in the UK. One of the major issues with on-demand testing is that some of the methods used for maintaining the comparability of standards over time in conventional testing are no longer available and the development of new methods is required.

Purpose:?This paper proposes an item response theory (IRT) framework for implementing on-demand testing and maintaining the comparability of standards over time for general qualifications, including GCSEs and GCE A levels, in the UK and discusses procedures for its practical implementation.

Sources of evidence:?Sources of evidence include literature from the fields of on-demand testing, the design of computer-based assessment, the development of IRT, and the application of IRT in educational measurement.

Main argument:?On-demand testing presents many advantages over conventional testing. In view of the nature of general qualifications, including the use of multiple components and multiple question types, the advances made in item response modelling over the past 30 years, and the availability of complex IRT analysis software systems, coupled with increasing IRT expertise in awarding organisations, IRT models could be used to implement on-demand testing in high stakes examinations in the UK. The proposed framework represents a coherent and complete approach to maintaining standards in on-demand testing. The procedures for implementing the framework discussed in the paper could be adapted by people to suit their own needs and circumstances.

Conclusions:?The use of IRT to implement on-demand testing could prove to be one of the viable approaches to maintaining standards over time or between test sessions for UK general qualifications. 相似文献