首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 234 毫秒
1.
本文在测验长度理论的基础上,对国外学者在标准参照测验长度研究中常用的二项式模型、不肯定区域模型和项目反应理论模型的理论基础、适用条件以及操作思路进行了探讨,以期对国内标准参照测验长度的研究提供参考与借鉴。  相似文献   

2.
罗莲 《中国考试》2007,(6):18-22
本文据美国教育研究协会等机构编写的《教育与心理测试标准》,探讨了“常模参照”和“标准参照”两术语的内涵、使用及其关系。该《测试标准》认为,从同一个测验得到的分数,可做出常模参照和标准参照两种解释。两者是从分数解释的意义上划分的,并非两种不同的测验。因此,以前将测验割裂为“常模参照测验”和“标准参照测验”的二元划分是不恰当的。  相似文献   

3.
确定行业资格考试的合格标准   总被引:2,自引:0,他引:2  
1 问题的提出合格标准(又称分界标准或及格分数)的确定是标准参照测验中一个极为重要的理论和技术问题。行业资格考试属于典型的标准参照测验,其考试目标就是有效估计被试的知识与技能水平,并与合格标准进行比较,从而对被试的从业资格做出判断和决策。从技术上讲,合格标准又是后期统计分析的基础和前提,没有一个科学、有效的合格标准,行业资格考试的各种技术和政策问题都难得到合理解决。  相似文献   

4.
本文通过标准参照测验的研究,以编制标准参照测验的理论为依据,以项目反应理论指导,结合操作系统的考试科目给出了标准参照测验的编制过程。  相似文献   

5.
“测验连接”概念框架演变述评   总被引:1,自引:0,他引:1  
程乾 《考试研究》2013,(2):71-79
测验连接是心理与教育测量研究中一个重要的领域,是通过统计方法将一个测验的分数以另一个测验的分数单位表示,或者将两个测验的分数表示在共同的分数量尺上。虽然测验连接有较长的研究历史,但是不同学者对其有不同分类。其中有些分类术语别无二致,但其定义却大相径庭,这使研究者和实践者产生了极大混乱。鉴于此,有必要从历史的角度梳理连接的概念框架及其变化,以便更好地理解和应用测验连接。  相似文献   

6.
本文从国内外一些重大的标准参照测验的分数体系出发,探讨其共同点和不同点,为以后的标准参照测验分数体系提供参考。  相似文献   

7.
测验的常模   总被引:1,自引:0,他引:1  
在教育测量学中,我们要想正确地解释、评价和使用测验的分数,必须借助于某项参照标准。常模就是这样一种参照标准。  相似文献   

8.
Angoff方法有效性的检验研究   总被引:2,自引:0,他引:2  
Angoff方法是标准参照测验中确定分数线的常用方法,本研究提出了质量系数、校正的Angoff分数线等概念和计算方法,对Angoff方法有效性的检验进行了研究,考察了评判者的个人特性和质量系数的相关,研究验证了Angoff方法得到的分数线是有效的。  相似文献   

9.
在标准参照测验中,有关确定合格分数方法的研究一直受到人们的特别关注,它不仅具有重要的理论意义,更具有实际的指导价值。根据所依据信息的不同类型,合格分数的确定可从以下三个方面进行。  相似文献   

10.
测验长度(test length)是影响语言测试信度和效度的重要因素之一。本文借助概化理论(Generalizability Theory,GT)的固定侧面s×(i:p)嵌套设计和边际效用递减法则(the Law of Diminishing Marginal Utility),对中国汉语水平考试(HSK[中级])的测验长度进行了实证研究。研究结果显示:由130题构成的HSK[中级]测验具有相当高的测验信度,概化系数(Eρ2)可达0.8890,即使将测验的题目数量减少至120题或110题,测验的概化系数仍可以达到0.8856和0.8816(分别降低了0.38%和0.83%),这种测验长度的缩减不仅明显地降低了研发成本,而且提高了测试效率,完全能够满足标准化考试在误差控制方面的较高要求,并确保测验结果和分数解释具有较高的信度和效度。  相似文献   

11.
A reliability coefficient for criterion-referenced tests is developed from the assumptions of classical test theory. This coefficient is based on deviations of scores from the criterion score, rather than from the mean. The coefficient is shown to have several of the important properties of the conventional normreferenced reliability coefficient, including its interpretation as a ratio of variances and as a correlation between parallel forms, its relationship to test length, its estimation from a single form of a test, and its use in correcting for attenuation due to measurement error. Norm-referenced measurement is considered as a special case of criterion-referenced measurement.  相似文献   

12.
In discussion of the properties of criterion-referenced tests, it is often assumed that traditional reliability indices, particularly those based on internal consistency, are not relevant. However, if the measurement errors involved in using an individual's observed score on a criterion-referenced test to estimate his or her universe scores on a domain of items are compared to errors of an a priori procedure that assigns the same universe score (the mean observed test score) to all persons, the test-based procedure is found to improve the accuracy of universe score estimates only if the test reliability is above 0.5. This suggests that criterion-referenced tests with low reliabilities generally will have limited use in estimating universe scores on domains of items.  相似文献   

13.
This article focuses on the practical use of Bloom's Taxonomy of Educational Objectives. The current status of analyzing and classifying test items and behavioral objectives was examined in this study. Specifically, the purpose of this study was to analyze and classify the ISIS minicourse performance objectives and criterion-referenced test items according to Bloom's cognitive Taxonomy in order to determine what levels of cognition the ISIS instructional materials are directed. The performance objectives and test items of thirty-three ISIS minicourses and criterion-referenced tests were collected and classified. Four research questions were posed in the study. The findings indicate that ISIS minicourse test items and performance objectives are written primarily at the Knowledge and Comprehension levels. The ISIS instructional materials reflect low percentages of upper cognitive level test items and performance objectives. Based upon the use of a chi-square analysis, twenty-four of the ISIS minicourses and tests demonstrate a positive congruence between their performance objectives and criterion-referenced test items. Nine ISIS minicourses were found to demonstrate a negative relationship between their performance objectives and test items. Implications and Recommendations based on the findings of the studies are provided.  相似文献   

14.
The propositions advanced and defended in this article are: (1) that it is more urgent for educators to reach agreement on their general purposes and goals than to specify in detail the outcomes they seek; (2) that insistance on detailed statements of educational objectives is questionable; (3) that teachers should be more concerned with developing a pupil's cognitive resources than with changing his behavior; (4) that criterion-referenced measures should supplement, not supplant norm-referenced measures; and (5) that conventional test statistics are appropriate for criterion-referenced tests if they are based on appropriate test responses.  相似文献   

15.
Due to variation in test difficulty, the use of pre-fixed cut-off scores in criterion-referenced standard setting methods may lead to variation in grades and pass rates. This paper aims to empirically investigate the strength of this relationship. To this end we examine a dataset of over 500 observations from an institution of higher education in The Netherlands over the period 2008–2013. We measure variation in test difficulty by using students’ perceptions of the validity of the examination and by recording personnel changes in the primary instructor. The latter measure is based on the considerable variation in teachers’ ability to assess test difficulty that is found in the literature. Other explanatory variables are course evaluations, instructor evaluations and self-reported study time. Variation in student quality is controlled for by measuring course results in deviation from the cohort average. We take a panel approach in estimating the effect of the explanatory variables on the variability in grades and pass rates. Our findings indicate that exam validity and instructor change are significantly related to variation in test results. The latter finding supports the hypothesis that instructors’ difficulty in assessing test difficulty may introduce subjectivity in criterion-referenced standard setting methods.  相似文献   

16.
In this paper, an attempt has been made to synthesize some of the current thinking in the area of criterion-referenced testing as well as to provide the beginning of an integration of theory and method for such testing. Since criterion-referenced testing is viewed from a decision-theoretic point of view, approaches to reliability and validity estimation consistent with this philosophy are suggested. Also, to improve the decision-making accuracy of criterion-referenced tests, a Bayesian procedure for estimating true mastery scores has been proposed. This Bayesian procedure uses information about other members of a student's group (collateral information), but the resulting estimation is still criterion referenced rather than norm referenced in that the student is compared to a standard rather than to other students. In theory, the Bayesian procedure increases the “effective length” of the test by improving the reliability, the validity, and more importantly, the decision-making accuracy of the criterion-referenced test scores.  相似文献   

17.
目前我国大学的英语考试大都采用了多项选择题,且大至全国性的统考,小至学校一门课程的期末考试,多项选择题在试题中所占的比重可以高达85%。这种形式的考试还被冠以“标准化考试”、“是客观题”,并认为具有“阅卷省事”等特点,导致人们对此类试题产生不客观、不全面的认识。实践证明,此类试题的缺点和负面作用是客观存在的,在某种意义上还十分严重。因此,恰如其分地评价多项选择题,这对大学英语考试的正确导向、对英语人才的培养意义重大。  相似文献   

18.
Several meanings of the term multiple measures exist. One of these is the use of assessments from different sources, such as an external test, along with a state-developed test. The use of multiple sources is increasing, especially due to increased federal Title I requirements for state accountability programs and associated increases in the amount and costs of mandated testing. Several issues seem pertinent for states considering combining assessments from internal sources (usually criterion-referenced tests) and external sources (usually norm-referenced tests) into their accountability programs. These are explored from the standpoint of the impact of federally required decision making for schools based on test data. Other possible uses are mentioned briefly.  相似文献   

19.
The continuous testing framework, where both successful and unsuccessful examinees have to demonstrate continued proficiency at frequent prespecified intervals, is a framework that is used in noncognitive assessment and is gaining in popularity in cognitive assessment. Despite the rigorous advantages of this framework, this paper demonstrates that there is significant inflation in false negatives as both passers and failers continually take a test, especially for examinees closer to the passing score. Several passing policies are investigated to control the inflation of false negatives while maintaining low false‐positive rates for fixed‐length tests. Lastly, recommendations are made for testing professionals who wish to utilize the rigorous nature of the continuous testing framework while also avoiding the inflation of qualified examinees failing.  相似文献   

20.
Criterion-reference as opposed to norm-reference applies to the scores from a test and not to the content or format of a test, hence it is proper to refer to criterion-referenced scores or measures and not to criterion-referenced tests. This concept of criterion-referenced measures is applicable to formative evaluation generally or whenever the objective is mastery of subject matter rather than discrimination among students.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号