首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 234 毫秒
1.
高中学业水平考试研究(二):考试质量评价   总被引:1,自引:0,他引:1  
周群 《考试研究》2012,(6):20-28
学业水平考试试题和试卷的质量直接影响学业评价和诊断结果的有效性和可靠性。本文以上海市高中思想政治学科学业水平考试为例,从试题和题组功能偏差、试题得分与总分的相关系数、识别指数分析及分类一致性和准确性四个方面对考试的质量进行了定量评价,以介绍学业水平考试质量评价的方法。  相似文献   

2.
分类标引质量的基本要求是准确性的一致性,本文通过图书分类标引的准确性和一致性论述,探讨了分类标引不准确,不一致的表现形式并分析其存在的原因,建议采取有效措施提供分类标引的工作质量。  相似文献   

3.
近年来初中数学分类讨论的试题成为中考的重点。翻开中考的数学试卷,发现开放性、探索性较强的试题往往会作为整张试卷的压轴题,绝大多数的考生也会在此丢掉分数。实际上,导致丢分的很大一部分原因是学生缺乏逻辑思考的严密性,以及全面分析问题的周密性。因此,考生们要想在解答分类讨论试题中不丢掉太多分数的话,锻炼这方面的分类讨论思想十分重要。同样,分类讨论思想也是教学中的一大难点。为了应对占试卷份额越来越大的分类讨论试题,教师们正积极研究如何在数学中实施分类讨论的教学,以及怎样锻炼学生的分类讨论思想。因此,本文从以上的几个方面,重点探讨如何在初中数学学习中建立分类讨论的思想。  相似文献   

4.
方程(组)与函数是初中数学的重要知识,方程(组)与函数相结合的试题是考试的热点,也是难点.解决此类问题需要学生具备较强的理解能力、收集和处理信息的能力,运用分类讨论、数形结合等方法,解决实际问题的能力.初中数学中函数与方程(组)相结合的类型体现在以下几方面.  相似文献   

5.
SOLO分类评价法及其应用研究   总被引:1,自引:0,他引:1  
刘艳 《宜春学院学报》2008,30(Z1):158-160
SOLO分类评价理论将观察到的学习结果分成五个水平:前结构水平、单一结构水平、多元结构水平、拓展抽象水平、关联水平。如果将上述五个层次赋予不同的等级分数,那么学生对问题回答的质量就可以被量化,量化的分数作为终结性评价的依据,这样SOLO分类评价法就可以用来解决学生思维水平层次的问题。探究用SOLO分类评价法进行试题编制的一般方法,设计相关数学知识的测试题,并将其在实践中加以检验和分析。结合分析结果得出SOLO分类评价理论的特点及其存在的问题。  相似文献   

6.
函数、不等式是初中数学的重要知识,不等式与函数结合的试题也是中考命题的重要角色。此类试题不仅考查数形结合思想、函数思想、分类讨论思想,还着重考查学生阅读理解能力、收集和处理信息的能力,运用数学知识解决实际问题的能力等。下面就不等式在函数中的运用精选数例,供读者学习、借鉴和参考。  相似文献   

7.
普通高中学业考试与教学基本要求的一致性问题是上海市课程评价领域的热点问题。基于韦伯一致性分析模式,对2018年上海市高中信息科技学业考试某场次考试试题与《上海市高中信息科技学科教学基本要求》从知识种类一致性、知识深度一致性、知识广度一致性、知识分布平衡性等四个维度进行了定量研究,为信息科技学业考试试题质量分析的途径提供参考。  相似文献   

8.
近年来的中考试题中,经常出现函数和三角形相结合的综合题,以考查学生的能力,今就和特殊三角形、相似三角形相关的函数问题的常见类型,着重分析用分类讨论思想给出其解题思路。  相似文献   

9.
简便计算,是小学数学“数与代数“领域中非常重要的一个内容,学好这部分内容可以使学生灵活地对一些能够简便计算的整数四则混合运算试题进行简便计算,也为后续学习小数、分数四则混合运算试题时进行简便计算奠定基础,因此怎么来学好这部分内容就值得我们深入的研究。本文试图从简便计算试题的特征、分类,辨别是否能简便计算以及一些特殊类型的简便计算这几个角度进行分析研究,最终达到提高学生简便计算的能力这一目标。  相似文献   

10.
《考试周刊》2019,(77):38-39
试题是评价小学语文教学质量和实效性的一个重要途径,但是试题评价的内容和形式是否合理,这一点需要教师在应用的过程中使用专门的评价工具对其进行评价,在此基础上对试题进行改进和完善。SOLO分类理论与学习质量的分层评价具有一致性,运用以SOLO分类理论对核心素养导向下的小学语文试题评价极具实践意义。此种评价方式能够对主观题作出更加科学、全面的评价,同时将学生的思维质量评价纳入到评价当中,充分体现出以学生为主的试题评价发展要求。  相似文献   

11.
Book reviews     
Background:?A recent article published in Educational Research on the reliability of results in National Curriculum testing in England (Newton, The reliability of results from national curriculum testing in England, Educational Research 51, no. 2: 181–212, 2009) suggested that: (1) classification accuracy can be calculated from classification consistency; and (2) classification accuracy on a single test administration is higher than classification consistency across two tests.

Purpose:?This article shows that it is not possible to calculate classification accuracy from classification consistency. It then shows that, given reasonable assumptions about the distribution of measurement error, the expected classification accuracy on a single test administration is higher than the expected classification consistency across two tests only in the case of a pass–fail test, but not necessarily for tests that classify test-takers into more than two categories.

Main argument and conclusion:?Classification accuracy is defined in terms of a ‘true score’ specified in a psychometric model. Three things must be known or hypothesised in order to derive a value for classification accuracy: (1) a psychometric model relating observed scores to true scores; (2) the location of the cut-scores on the score scale; and (3) the distribution of true scores in the group of test-takers.  相似文献   

12.
This article presents a method for estimating the accuracy and consistency of classifications based on test scores. The scores can be produced by any scoring method, including a weighted composite. The estimates use data from a single form. The reliability of the score is used to estimate effective test length in terms of discrete items. The true-score distribution is estimated by fitting a 4-parameter beta model. The conditional distribution of scores on an alternate form, given the true score, is estimated from a binomial distribution based on the estimated effective test length. Agreement between classifications on alternate forms is estimated by assuming conditional independence, given the true score. Evaluation of the method showed estimates to be within 1 percentage point of the actual values in most cases. Estimates of decision accuracy and decision consistency statistics were only slightly affected by changes in specified minimum and maximum possible scores.  相似文献   

13.
随着国内外教育测量理念的转变,传统的常模参照测验所提供的相对性评价信息已无法满足考试用户和考生的需求,标准参照测验(CriterionReferenced Test,CRT)的社会价值越来越受到重视。在对被试掌握程度进行分类决策的CRT测验中,如何确定恰当的测验长度和合格分数是影响测验分类误差的重要因素。本文在对CRT测验研究的现状、原理和用途进行考察的基础上,专门介绍了二项式概率模型在CRT测验长度决策研究中的理论和过程,并以误差控制为原则,对二项式模型在综合性标准参照语言测验长度和合格分数决策中的应用过程进行了研究。  相似文献   

14.
A common suggestion made in the psychometric literature for fixed‐length classification tests is that one should design tests so that they have maximum information at the cut score. Designing tests in this way is believed to maximize the classification accuracy and consistency of the assessment. This article uses simulated examples to illustrate that one can obtain higher classification accuracy and consistency by designing tests that have maximum test information at locations other than at the cut score. We show that the location where one should maximize the test information is dependent on the length of the test, the mean of the ability distribution in comparison to the cut score, and, to a lesser degree, whether or not one wants to optimize classification accuracy or consistency. Analyses also suggested that the differences in classification performance between designing tests optimally versus maximizing information at the cut score tended to be greatest when tests were short and the mean of ability distribution was further away from the cut score. Larger differences were also found in the simulated examples that used the 3PL model compared to the examples that used the Rasch model.  相似文献   

15.
Domain scores have been proposed as a user-friendly way of providing instructional feedback about examinees' skills. Domain performance typically cannot be measured directly; instead, scores must be estimated using available information. Simulation studies suggest that IRT-based methods yield accurate group domain score estimates. Because simulations can represent best-case scenarios for methodology, it is important to verify results with a real data application. This study administered a domain of elementary algebra (EA) items created from operational test forms. An IRT-based group-level domain score was estimated from responses to a subset of taken items (comprised of EA items from a single operational form) and compared to the actual observed domain score. Domain item parameters were calibrated both using item responses from the special study and from national operational administrations of the items. The accuracy of the domain score estimates were evaluated within schools and across school sizes for each set of parameters. The IRT-based domain score estimates typically were closer to the actual domain score than observed performance on the EA items from the single form. Previously simulated findings for the IRT-based domain score estimation procedure were supported by the results of the real data application.  相似文献   

16.
Studies investigating invariance have often been limited to measurement or prediction invariance. Selection invariance, wherein the use of test scores for classification results in equivalent classification accuracy between groups, has received comparatively little attention in the psychometric literature. Previous research suggests that some form of selection bias (lack of selection invariance) will exist in most testing contexts, where classification decisions are made, even when meeting the conditions of measurement invariance. We define this conflict between measurement and selection invariance as the invariance paradox. Previous research has found test reliability to be an important factor in minimizing selection bias. This study demonstrates that the location of maximum test information may be a more important factor than overall test reliability in minimizing decision errors between groups.  相似文献   

17.
In discussion of the properties of criterion-referenced tests, it is often assumed that traditional reliability indices, particularly those based on internal consistency, are not relevant. However, if the measurement errors involved in using an individual's observed score on a criterion-referenced test to estimate his or her universe scores on a domain of items are compared to errors of an a priori procedure that assigns the same universe score (the mean observed test score) to all persons, the test-based procedure is found to improve the accuracy of universe score estimates only if the test reliability is above 0.5. This suggests that criterion-referenced tests with low reliabilities generally will have limited use in estimating universe scores on domains of items.  相似文献   

18.
This article considers psychometric properties of composite raw scores and transformed scale scores on mixed-format tests that consist of a mixture of multiple-choice and free-response items. Test scores on several mixed-format tests are evaluated with respect to conditional and overall standard errors of measurement, score reliability, and classification consistency and accuracy under three item response theory (IRT) frameworks: unidimensional IRT (UIRT), simple structure multidimensional IRT (SS-MIRT), and bifactor multidimensional IRT (BF-MIRT) models. Illustrative examples are presented using data from three mixed-format exams with various levels of format effects. In general, the two MIRT models produced similar results, while the UIRT model resulted in consistently lower estimates of reliability and classification consistency/accuracy indices compared to the MIRT models.  相似文献   

19.
In this article, procedures are described for estimating single-administration classification consistency and accuracy indices for complex assessments using item response theory (IRT). This IRT approach was applied to real test data comprising dichotomous and polytomous items. Several different IRT model combinations were considered. Comparisons were also made between the IRT approach and two non-IRT approaches including the Livingston-Lewis and compound multinomial procedures. Results for various IRT model combinations were not substantially different. The estimated classification consistency and accuracy indices for the non-IRT procedures were almost always lower than those for the IRT procedures.  相似文献   

20.
针对目前高考语文阅读主观题评分方法的局限,提出基于SOLO理论的分类评价法和基于阅读认知过程的建构整合模型(CI)评分法。选择1019名学生高考语文阅读三道主观题的真实作答,采用三种评分法评分,采用项目反应理论对三道主观题进行测量学分析,结果表明:相对于原始评分法,SOLO评分法和CI评分法题目之间具有更高的相关,测验模型拟合更佳,题目区分度较高,题目得分的难度阈限和步长更合理,题目的信息量更大,而CI评分法又明显优于SOLO评分法。研究支持了将CI评方法作为高考语文阅读主观题评分方法的潜在优势。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号