相对于选拔性考试,基于标准的学业水平考试更加注重对学生所达到的学习成就的考查,从而体现监控教学质量这一功能,这就要求学业水平考试试题与课程标准有很高的吻合度。从教育统计学基本原理出发,结合对于2015年上海市普通高中学业水平考试物理试卷的统计分析,从定量研究的角度,尝试建立一种更为简洁、合理的描述学业水平考试试题一致性的新模型,并讨论此一致性模型如何发挥作用。  相似文献   

高中信息技术学业水平合格性考试作为标准参照性考试,命题过程需要按照考试目标及要求做好难度控制,通过准确预估试题难度控制试卷难度,实现考试结果与考试目标的一致。命题难度控制技术包括试题的难度预估、试卷难度的控制。通过确定影响难度的主要客观因素、设计简便易行的试题难度计算方法、建立试题难度预估的参照模型等三个环节探究试题难度预估的方法,结合实例进一步探究试卷难度的控制技术。  相似文献   

目前物理学业水平考试与内容标准一致性的研究者所研究的样本还不包含全国各个省份的学业水平考试试题。2017年四川省将正式施行新高考改革方案,国务院指出学业水平考试成绩将纳入普通高校招生的综合评价体系。本文以四川省为例,采用SEC一致性分析法将2013~2015年的四川省物理学业水平考试分文理科与《普通高中物理课程标准(实验)》进行了量化统计分析,分析考试特点发现在认知水平和内容主题的考查上面都不具有统计学意义上的显著一致性。提高学业水平考试与课程标准的一致性是建立和健全学业水平考试制度的有效体现,并提出了关于学业水平考试试题的特点的几点思考。  相似文献   

为检验高中物理学业水平合格性考试试题与课程标准的一致性,采用SEC模型,从知识内容和认知水平两个方面分析试题与物理课程标准的一致性程度,从试题看核心素养测试的探索。研究结果表明,2022年三个省高中物理学业水平合格性考试试题与课程标准不具备统计学意义上的显著一致性。试题对物理学科核心素养测试进行了积极探索,但仍存在有待改进的方面。最后提出依据课程标准的内容要求落实素养全面考查,遵循学业质量要求合理设置试题难度和题型,基于物理学科的本质实现科学探究能力的有效测评等建议。  相似文献   

本文介绍了2011年上海市普通高中学业水平考试生命科学卷的能力结构及知识结构,并对试卷中部分的试题进行了分析,诠释了学业考的"标准参照"理念,并据此对今后教学中的一些问题提出针对性的建议。  相似文献   

试题难度一般通过实际测试考生而获得,但这种预试方法的实施具有一定局限性。难度的主观预估方法无需依赖考生,主要由学科专家根据经验对试题难度进行预测,因此在中、高考等考试实践中受到广泛应用。在研究和实践中,研究者们不断完善主观预估法,并提出不同的估计方法。本文对传统的主观评判法与配对比较的难度估计法进行介绍,以期更系统地认识难度的主观预估方法,促进主观预估法在考试实践中的应用。  相似文献   

本刊讯2015年2月15日,上海市教委在其政务网站(www.shmec.gov.cn)上公布《上海市普通高中学业水平考试实施办法》(征求意见稿)和《上海市普通高中学生综合素质评价实施办法》(征求意见稿),面向社会公开征求意见。2014年9月,国务院印发了《关于深化考试招生制度改革的实施意见》,选择上海市和浙江省开展高考综合改革试点。为适应本市深化高中课程改革和高校考试招生综合改革的需要,上海市教委研制了《上海市普通高中学业水平考试实施办  相似文献   

难度决定考试分数的分布,直接影响考试的评价与选拔功能,受到较高的关注。难度一般通过考后对考试数据的统计分析得到,这时合格分数线巴经确定,无法再进行调整。预估难度是在命题阶段由命题专家结合试题内容,通过构建标准常模,进行合理地评定而得到的试题难度。在当前应用愿始分报告考试成绩的情况下,预估难度对确保考试稳定与公平尤其重要。预估难度不同于实测难度,可以进行控制与调整。命题工作不仅是编制试题的过程,而且是预估难度的设计过程。  相似文献   

2009年10月28日,酝酿6年,几经修改的《上海市普通高中学业水平考试实施细则(试行)》正式公布。从2010年起,上海将在普通高中全面推行“学业水平考试”。  相似文献   

初中思想品德学业水平考试,作为教学环节的重要组成部分,对改进、提高初中思想品德课堂教学,提高教育教学质量具有重要意义。随着今年初中学业水平考试的脚步日渐临近,广大教师越来越关注去年学业水平考试题渗透出来的命题原则、范围、难易度及命题方式等,以提高初三备考的针对性、有效性。有鉴于此,笔者以菏泽市2011年初中学业水平考试思想品德试题为例,谈谈当前的思想品德学业水平考试备考过程中应注意的几个问题,以飨读者。一、必须加强基本知识、基本技能  相似文献   


The understanding of what makes a question difficult is a crucial concern in assessment. To study the difficulty of test questions, we focus on the case of PISA, which assesses to what degree 15-year-old students have acquired knowledge and skills essential for full participation in society. Our research question is to identify PISA science item characteristics that could influence the item’s proficiency level. It is based on an a-priori item analysis and a statistical analysis. Results show that only the cognitive complexity and the format out of the different characteristics of PISA science items determined in our a-priori analysis have an explanatory power on an item’s proficiency levels. The proficiency level cannot be explained by the dependence/independence of the information provided in the unit and/or item introduction and the competence. We conclude that in PISA, it appears possible to anticipate a high proficiency level, that is, students’ low scores for items displaying a high cognitive complexity. In the case of a middle or low cognitive complexity level item, the cognitive complexity level is not sufficient to predict item difficulty. Other characteristics play a crucial role in item difficulty. We discuss anticipating the difficulties in assessment in a broader perspective.  相似文献   

朱宇 《考试研究》2008,(2):107-115
高中学业水平考试是新课程改革背景下出现的新型教育考试。它考查中学生的知识掌握水平、学科素养、学习能力,应该成为高校招生的重要评价指标之一,并能为社会用人单位提供有用的信息。高中学业水平考试的命题组卷,应关注的首要指标是试题有无较全面地覆盖课程标准要求的内容,以及试题的难度范围是否足够广泛。作为高考招生的指标之一,高中学业水平考试成绩合格或达到一定等级应该成为一个关键的录用前提条件。至于考生取得哪个等级的学业水平考试成绩可以被录用,则最终取决于招生院校和专业的要求。  相似文献   

Standard item response theory (IRT) models fit to examination responses ignore the fact that sets of items (testlets) often are matched with a single common stimulus (e.g., a reading comprehension passage). In this setting, all items given to an examinee are unlikely to be conditionally independent (given examinee proficiency). Models that assume conditional independence will overestimate the precision with which examinee proficiency is measured. Overstatement of precision may lead to inaccurate inferences as well as prematurely ended examinations in which the stopping rule is based on the estimated standard error of examinee proficiency (e.g., an adaptive test). The standard three parameter IRT model was modified to include an additional random effect for items nested within the same testlet (Wainer, Bradlow, & Du, 2000). This parameter, γ characterizes the amount of local dependence in a testlet.
We fit 86 TOEFL testlets (50 reading comprehension and 36 listening comprehension) with the new model, and obtained a value for the variance of γ for each testlet. We compared the standard parameters (discrimination (a), difficulty (b) and guessing (c)) with what is obtained through traditional modeling. We found that difficulties were well estimated either way, but estimates of both a and c were biased if conditional independence is incorrectly assumed. Of greater import, we found that test information was substantially over-estimated when conditional independence was incorrectly assumed.  相似文献   

普通高中学业水平考试是伴随高中新课程改革蓬勃发展起来的。黑龙江省实施普通高中学业水平考试以来,已取得了显著的成效,但同时也存在着诸多的问题。为走出学业水平考试的困境,有必要在严格、规范考试命题的前提下,尝试建立我省普通高中学业水平考试试题库,以期更好地积极推进我省普通高中学业水平考试工作。  相似文献   

通过对2003年广东省高中毕业考试物理试卷分析评价,得出试卷的特点是试卷能注重基础知识及能力的考查;注重与日常生活中的物理现象紧密结合;强调对实验技能的检查;并能将物理知识与高科技知识紧密结合等四个方面.对今后物理命题提出:少出死记硬背型的题目;适当降低题目的难度;适当增加开放性试题以及紧密结合新大纲等建议.  相似文献   

依据SOLO分类理论对2020年实施新高考的北京、天津、山东、海南四省市普通高中学业水平等级考试化学试题的能力结构进行比较分析,探查试题的命制特点和规律。研究发现:2020年四省市化学试题在SOLO各结构水平层次上均有涉及,多元结构水平和关联结构水平试题占比较高,体现了试题的选拔性;试题中各知识模块的SOLO结构水平层次分布整体均衡,试题内容全面,注重对基础知识的考查,体现了试题的基础性。为了进一步优化化学课堂教学、促进学生全面发展,中学化学教师可以从以下几个方面着手:对标学业质量标准,统筹规划教学的深度与广度;重视实际问题解决,加强学科素养和关键能力的培养;加强教学情境创设,重视创新意识和迁移应用能力的培养;优化课堂教学设计,重视学生高阶思维能力的培养。  相似文献   

中考化学考试试卷结构建模的技术指标包括试卷的结构模式、内容要素、能力要素、题型要素、难度要素、分数要素、时限要素,它是中考化学命题、审题评估监控的标准,是实现试题及试卷质量控制的依据。将上述试卷结构构建的技术指标,采用中考化学命题多维细目表的方式,可直接用于指导中考化学试题的命制。  相似文献   

Empirical studies demonstrated Type-I error (TIE) inflation (especially for highly discriminating easy items) of the Mantel-Haenszel chi-square test for differential item functioning (DIF), when data conformed to item response theory (IRT) models more complex than Rasch, and when IRT proficiency distributions differed only in means. However, no published study manipulated proficiency variance ratio (VR). Data were generated with the three-parameter logistic (3PL) IRT model. Proficiency VRs were 1, 2, 3, and 4. The present study suggests inflation may be greater, and may affect all highly discriminating items (low, moderate, and high difficulty), when IRT proficiency distributions of reference and focal groups differ also in variances. Inflation was greatest on the 21-item test (vs. 41) and 2,000 total sample size (vs. 1,000). Previous studies had not systematically examined sample size ratio. Sample size ratio of 1:1 produced greater TIE inflation than 3:1, but primarily for total sample size of 2,000.  相似文献   

Constituting a metacognitive strategy, system competence or systems thinking can only assume its assigned key function as a basic concept for the school subject of geography in Germany after a theoretical and empirical foundation has been established. A measurement instrument is required which is suitable both for supporting students and for the evaluation of methodical‐didactic measures. Such a tool is theoretically anchored in an empirically validated geography‐didactic and cognition‐psychological competence model, providing a differentiated representation of both the internal structure of a competency and the proficiency levels. The starting point of this foundation was the development of a normative‐theoretically derived model of geographic system competence. Its empirical validation was performed in different phases aimed at operationalising the competence model by means of test problems. In order to analyse the factor structure of the theoretical model, various item response models were estimated. The item levels of difficulty expected in the competence model were related to the empirical levels of difficulty and predicted by means of ordinary least squares regression to verify the model for proficiency levels. The two‐dimensional competence model – with the two dimensions ‘system organisation and behaviour’ and ‘system‐adequate intention to act’ – exhibits a better fit in reference to the model fit criteria than the one‐dimensional and three‐dimensional models. The correlations between the expected and empirical item difficulties are positive. Items that should be more difficult according to the competence model are actually shown to be more difficult. These findings suggest the reliability and validity of this new measurement instrument for diagnosing and promoting geographical system competence. It has to be implemented in practice as the next step.  相似文献   

