期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

李雪《牡丹江教育学院学报》2010,(6):159-161

为保证语言测试题目的质量和加强题库建设,本文基于经典测试理论,使用Gitest Ⅲ对一份高考试卷（阅读部分）题目进行项目分析,结果显示：该阅读题目的难度、区分度较理想,但难度分布并不理想。建议在使用题库中的组合试卷前先进行试测,以改进试题的难度分布以及部分题目选项的质量,从而提高试题的信度和效度。相似文献

2.

PISA科学素养评估框架的启示

《中学生物教学》2017,(21)

在探讨PISA测试对科学素养评价框架和具体题目的设定与对应分析时,依次介绍了PISA科学素养评价框架中的科学能力维度、科学知识分类、题目背景和认知层次要求以及难度层次评价标准,并对我国在学生科学探究与理性思维素养评价方面给出建议。相似文献

3.

实用汉语水平认定考试（C．TEST）语音测试现状分析

聂丹《考试研究》2009,(4):79-89

本文根据历次“实用汉语水平认定考试（C．TEST）”测试的题目数据,对C．TEST语音测试的试卷质量、题型效果、测试点类型等方面进行综合分析。结果表明,语音测试题整体区分度较好,但普遍容易;语音读辨题型的难度明显大于听辨题型,区分度也是前者好于后者;不同测试点类型的题目平均区分度相差不大,但难度存在一定差距。笔者认为,C．TEST语音测试当务之急是提高题目的整体难度,特别是听辨题以及声母、声一韵测试题的难度。相似文献

4.

从2009年浙江省数学高考理科第22题看试题的区分度

周德生王红卫《中学教研》2009,(8):38-40

高考是选拔性考试，试题一定要有区分度，以利于不同层次学校对人才的选拔．区分度是指试题对不同考生的知识、能力水平的鉴别程度．如果一个题目的测试结果使水平高的考生答对（得高分），而水平较低的考生答错（得低分），那么它的区分能力就很强．题目的区分度反映了试题这种区分能力的高低．一般认为，区分度的数值达到了0．3，便可以接受；低于0．3的题目，区分能力差．相似文献

5.

PISA式汉语阅读测验的编制与维度评价

曹亦薇顾秋艳《考试研究》2010,(4):80-92

PISA测验着眼于学生的终生发展,其测验编制思想给各国教育评价带来了深刻的变革。本研究在PISA阅读测验理论与框架基础上,编制了PISA式汉语阅读测验。该测验包含三篇阅读材料,共18个测验项目。通过对测验难度、区分度、信度、效度的检测,并使用全息Bifactor模型进行维度评价。结果表明,编制的PISA式汉语阅读测验难度适中,具有较好区分度,信效度基本合格。同时,基本达到PISA对阅读测验能力结构的要求,较好地考查了学生的一般阅读理解能力,以及信息提取、文本解释、反思和评价等三个子维度的能力。相似文献

6.

高考语文阅读主观题评分方法对题目参数分析的影响

温红博李峰《考试研究》2020,(1):65-73

针对目前高考语文阅读主观题评分方法的局限,提出基于SOLO理论的分类评价法和基于阅读认知过程的建构整合模型(CI)评分法。选择1019名学生高考语文阅读三道主观题的真实作答,采用三种评分法评分,采用项目反应理论对三道主观题进行测量学分析,结果表明:相对于原始评分法,SOLO评分法和CI评分法题目之间具有更高的相关,测验模型拟合更佳,题目区分度较高,题目得分的难度阈限和步长更合理,题目的信息量更大,而CI评分法又明显优于SOLO评分法。研究支持了将CI评方法作为高考语文阅读主观题评分方法的潜在优势。相似文献

7.

基于人工神经网络的C．TEST阅读理解题目难度的预测研究

付佩宣《暨南大学华文学院学报》2014,(4):71-78

实用汉语水平认定考试（简称C．TEST）是用来测试母语非汉语的外籍人士在国际环境下社会生活以及日常工作中实际运用汉语能力的考试。由于C．TEST的考试题目公开,题库数量较小,所以通过一般标准化考试采用的在部分目标被试中实施预测（fieldtest）的方法来获取考试题目的难度参数存在困难。然而,人工神经网络技术作为现代人工智能研究的成果,在预测（prediction）领域发挥了很大作用。本文选取C．TEST（A—D级）的阅读理解题目作为研究材料,运用人工神经网络技术对其难度进行预测,得到了网络预测难度值与实际考试难度值显著相关的研究结果。这一结果表明,利用人工神经网络模型对语言测验的题目难度等参数进行预测是可行的。相似文献

8.

TEM-4阅读理解理论效度及项目区分度研究

丛进《烟台师范学院学报(哲学社会科学版)》2008,(3):81-84

本研究采用定量和定性的研究方法,探讨了英语专业四级（TEM-4）阅读理解部分的理论效度和项目区分度。结果表明,TEM-4阅读理解部分具有比较理想的理论效度,但其项目区分度应进一步提高。本研究对新版考试大纲取消快速阅读部分的举措也提了供一定的实证支持。相似文献

9.

关于PRETCO考试A级阅读部分的效度分析

王丽《考试周刊》2011,(54):9-10

作者以全国英语应用能力考试A级（2005—2010年）阅读理解部分为测试题目,从题材和体裁,所考查学生的阅读技能及阅读速度,以及易读度等方面对抽样文本进行了内容效度的分析。研究结果表明PRETCO考试A级阅读部分具有较高的内容效度,但是在阅读技能和难度方面存在一些问题,作者就此提出了一些改进建议。相似文献

10.

合理设计难度强化试题局部区分功能

邹丽华《大连教育学院学报》2014,(1):20-22

从经典测量理论和中考实践两个视角研究中考物理试题区分度与难度的关系,发现单凭控制试题难度难以实现对区分度的控制。通过题目得分率与物理总分关系的研究,探讨难度和区分度相同或相近题目局部区分功能的差异,探究中考试题难度和局部区分功能的设计方法及应用。相似文献

11.

命题者：影响阅读理解测试效度的一个因素

李雪曾用强《考试研究》2012,(4):49-60

本研究应用项目反应理论,从被试的阅读能力值和题目的难度值这两个方面,分析阅读理解测试中多项选择题命题者对考试效度的影响。实验设计中,将两组被试同时施测于一项“阅读水平测试”,根据测试结果估计出的两组被试能力值之间无显著性差异。再次将这两组被试分别施测于两位不同命题者所命制的题目,尽管这些题目均产生于相同的阅读材料,且题目的难度值之间并没有显著性差异,被试的表现却显著不同。Rasch模型认为,被试表现由被试能力和试题难度共同决定。因此,可以推测,这是由于不同命题者所命制的题目影响了被试的表现,并进而影响了使用多项选择题进行阅读理解测试的效度。相似文献

12.

蕴涵量表法在HSK阅读理解测验公平性研究中的应用

柴省三《考试研究》2012,(5):54-62

阅读理解能力测验中所选择的文章在内容方面对不同专业背景的考生亚团体是否具有公平性的问题,是测验效度高低的重要证据,也是测验效度验证（validation）的重要环节。本研究以中国语言与文学专业考生为目标组,分别将经济学专业和生物医学专业考生作为参照组,采用效标测量和蕴涵量表分析相结合的方法,对HSK（高等）阅读理解测验的文章难度对三个不同专业背景的考生组的公平性问题进行了检验。研究结果表明,两个参照组考生尽管具有各自的相对专业优势,但他们在六篇阅读材料上获得的难度排列顺序与目标组考生完全一致;虽然目标组考生不具备汉语知识以外的其他专业优势,但因为HSK考试所选择的阅读材料没有涉及语言知识本身以外的特殊专业要求,因而测验对三个不同专业背景的考生具有较高的公平性。相似文献

13.

浅析阅读理解考试中的测试方法效应问题

冯悦《广东技术师范学院学报》2007,(6):89-93

本文研究的是不同的测试方法-单项选择和信息转移-是否会在阅读理解考试中产生测试方法效应的问题.除对学生的考试成绩(分数)进行分析外,本研究还进一步对试题的难度值进行了分析,而本研究中试题难度是通过项目反应理论(Item Response Theory)计算得到的.结果显示不同测试方法的确会影响题目难度及考生的考试表现,就试题难度而言信息转移比单项选择更难. 相似文献

14.

Teachers’ and students’ perceptions of assessments: A review and a study into the ability and accuracy of estimating the difficulty levels of assessment items

Gerard van de Watering Janine van der Rijt 《Educational Research Review》2006,1(2):133-147

In today's higher education, high quality assessments play an important role. Little is known, however, about the degree to which assessments are correctly aimed at the students’ levels of competence in relation to the defined learning goals. This article reviews previous research into teachers’ and students’ perceptions of item difficulty. It focuses on the item difficulty of assessments and students’ and teachers’ abilities to estimate item difficulty correctly. The review indicates that teachers tend to overestimate the difficulty of easy items and underestimate the difficulty of difficult items. Students seem to be better estimators of item difficulty. The accuracy of the estimates can be improved by: the information the estimators or teachers have about the target group and their earlier assessment results; defining the target group before the estimation process; the possibility of having discussions about the defined target group of students and their corresponding standards during the estimation process; and by the amount of training in item construction and estimating. In the subsequent study, the ability and accuracy of teachers and students to estimate the difficulty levels of assessment items was examined. In higher education, results show that teachers are able to estimate the difficulty levels correctly for only a small proportion of the assessment items. They overestimate the difficulty level of most of the assessment items. Students, on the other hand, underestimate their own performances. In addition, the relationships between the students’ perceptions of the difficulty levels of the assessment items and their performances on the assessments were investigated. Results provide evidence that the students who performed best on the assessments underestimated their performances the most. Several explanations are discussed and suggestions for additional research are offered. 相似文献

15.

中高级汉语阅读教学中的自然阅读概括训练

郭凌云《重庆师专学报》2012,(5):142-146

在中高级汉语阅读教学中进行自然阅读概括训练,能够更好地扩展学生的阅读视域,促进篇章概括分析能力的提高,弥补课堂阅读活动中的缺失,有助于解决中高级汉语学习的瓶颈问题,提升留学生阅读认知能力,实现汉语教学的整体目标。在训练过程中,教师应在以下几个方面对学生进行引导和训练：其一,教师应为学生的阅读提供一些选择与参考;其二,教师应培养学生全面阅读的意识;其三,教师应鼓励学生进行与课堂阅读文本的相关性阅读;其四,教师应根据不同学生的认知特点对学生的阅读进行调控;其五,对学生阅读概括策略进行训练;其六,重视学生阅读成果的展示、总结与评估。相似文献

16.

The Relationship Between Item Parameters and Item Fit

Hamzeh Dodeen 《Journal of Educational Measurement》2004,41(3):261-270

The effect of item parameters (discrimination, difficulty, and level of guessing) on the item-fit statistic was investigated using simulated dichotomous data. Nine tests were simulated using 1,000 persons, 50 items, three levels of item discrimination, three levels of item difficulty, and three levels of guessing. The item fit was estimated using two fit statistics: the likelihood ratio statistic (X²_B), and the standardized residuals (SRs). All the item parameters were simulated to be normally distributed. Results showed that the levels of item discrimination and guessing affected the item-fit values. As the level of item discrimination or guessing increased, item-fit values increased and more items misfit the model. The level of item difficulty did not affect the item-fit statistic. 相似文献

17.

Effects of Item Wording on Sex Bias 总被引：1，自引：0，他引：1

Joyce R. McLarty A. Candace Noble Renee M. Huntley 《Journal of Educational Measurement》1989,26(3):285-293

This study examined the effects of gender-related item-wording changes on the performance of male and female examinees. Mathematics word problems and English language items were created in neuter, male, and female versions. Items were administered to randomly equivalent samples of about 300 high school juniors and seniors. Loglinear analysis was used to assess the impact of item gender and its interaction with examinee sex on the difficulty and discrimination of each item in each context. No items were found to have sex bias in either context. Mathematics items did not have different difficulty or discrimination in the three gender versions. Neither mathematics nor English items had different discrimination levels in the three gender-related versions. Some English items, however, were found to have different difficulty levels in the three gender-related versions. These difficulty differences were not systematic." none of the three gender versions appeared consistently more or less difficult than the others. 相似文献

18.

The Relationship of Content Characteristics of GRE Analytical Reasoning Items to Their Difficulties and Discriminations

Clark L. Chalifour Donald E. Powers 《Journal of Educational Measurement》1989,26(2):120-132

In actual test development practice, the number o f test items that must be developed and pretested is typically greater, and sometimes much greater, than the number that is eventually judged suitable for use in operational test forms. This has proven to be especially true for one item type–analytical reasoning-that currently forms the bulk of the analytical ability measure of the GRE General Test. This study involved coding the content characteristics of some 1,400 GRE analytical reasoning items. These characteristics were correlated with indices of item difficulty and discrimination. Several item characteristics were predictive of the difficulty of analytical reasoning items. Generally, these same variables also predicted item discrimination, but to a lesser degree. The results suggest several content characteristics that could be considered in extending the current specifications for analytical reasoning items. The use of these item features may also contribute to greater efficiency in developing such items. Finally, the influence of these various characteristics also provides a better understanding of the construct validity of the analytical reasoning item type. 相似文献