首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 328 毫秒
1.
Recently, there has been an increasing level of interest in subscores for their potential diagnostic value. Haberman (2008b) suggested reporting an augmented subscore that is a linear combination of a subscore and the total score. Sinharay and Haberman (2008) and Sinharay (2010) showed that augmented subscores often lead to more accurate diagnostic information than subscores. In order to report augmented subscores operationally, they should be comparable across the different forms of a test. One way to achieve comparability is to equate them. We suggest several methods for equating augmented subscores. Results from several operational and simulated data sets show that the error in the equating of augmented subscores appears to be small in most practical situations.  相似文献   

2.
Recently, there has been an increasing level of interest in subscores for their potential diagnostic value. Haberman suggested a method based on classical test theory to determine whether subscores have added value over total scores. In this article I first provide a rich collection of results regarding when subscores were found to have added value for several operational data sets. Following that I provide results from a detailed simulation study that examines what properties subscores should possess in order to have added value. The results indicate that subscores have to satisfy strict standards of reliability and correlation to have added value. A weighted average of the subscore and the total score was found to have added value more often.  相似文献   

3.
4.
The value‐added method of Haberman is arguably one of the most popular methods to evaluate the quality of subscores. The method is based on the classical test theory and deems a subscore to be of added value if the subscore predicts the corresponding true subscore better than does the total score. Sinharay provided an interpretation of the added value of subscores in terms of scores and subscores on parallel forms. This article extends the results of Sinharay and considers the prediction of a subscore on a parallel form from both the subscore and the total raw score on the original form. The resulting predictor essentially becomes the augmented subscore suggested by Haberman. The proportional reduction in mean squared error of the resulting predictor is interpreted as a squared multiple correlation coefficient. The practical usefulness of the derived results is demonstrated using an operational data set.  相似文献   

5.
Recent research has proposed a criterion to evaluate the reportability of subscores. This criterion is a value‐added ratio (VAR), where values greater than 1 suggest that the true subscore is better approximated by the observed subscore than by the total score. This research extends the existing literature by quantifying statistical significance and effect size for using VAR to provide practical guidelines for subscore interpretation and reporting. Findings indicate that subscores with VAR ≥ 1.1 are a minimum requirement for a meaningful contribution to a user's score interpretation; subscores with .9 < VAR < 1.1 are redundant with the total score and subscores with VAR ≤ .9 would be misleading to report. Additionally, we discuss what to do when subscores do not add value, yet must be reported, as well as when VAR ≥ 1.1 may be undesirable.  相似文献   

6.
The study examined two approaches for equating subscores. They are (1) equating subscores using internal common items as the anchor to conduct the equating, and (2) equating subscores using equated and scaled total scores as the anchor to conduct the equating. Since equated total scores are comparable across the new and old forms, they can be used as an anchor to equate the subscores. Both chained linear and chained equipercentile methods were used. Data from two tests were used to conduct the study and results showed that when more internal common items were available (i.e., 10–12 items), then using common items to equate the subscores is preferable. However, when the number of common items is very small (i.e., five to six items), then using total scaled scores to equate the subscores is preferable. For both tests, not equating (i.e., using raw subscores) is not reasonable as it resulted in a considerable amount of bias.  相似文献   

7.
Abstract

This papers confronts the social‐psychological problem of the relation between values and persons in everyday life. For this purpose simple operational definitions, as used in psychometric work, are not adequate. There must be a clear conception of the person, as an individual, in a social setting. A model which meets these requirements, with seven components, is described. This is used to illustrate what ‘having values’ might mean. Three applications are briefly outlined.  相似文献   

8.
The psychometric properties of the Infant–Toddler Environment Rating Scale-Revised Edition (ITERS-R) were examined using 153 classrooms from child-care centers where resources were tied to center performance. An exploratory factor analysis revealed that the scale measures one global aspect of quality. To decrease redundancy, subsets of items were selected randomly and by experts who rated items according to ease of administration and importance to quality. The shorter subsets demonstrated good discriminant validity, adequate to good psychometric properties, and high associations to the full ITERS-R score. They also demonstrated similar associations to staff education and staff-to-child ratio, as the full instrument. The best assessment of quality was demonstrated by the shortened subset that included items that assess both structural and process features of quality. Multilevel-analyses indicated that classrooms from the same providers score more similarly on ITERS-R than classrooms from other providers. The implications for using the ITERS-R in high-stakes contexts are discussed.  相似文献   

9.
Brennan noted that users of test scores often want (indeed, demand) that subscores be reported, along with total test scores, for diagnostic purposes. Haberman suggested a method based on classical test theory (CTT) to determine if subscores have added value over the total score. One way to interpret the method is that a subscore has added value only if it has a better agreement than the total score with the corresponding subscore on a parallel form. The focus of this article is on classification of the examinees into “pass” and “fail” (or master and nonmaster) categories based on subscores. A new CTT‐based method is suggested to assess whether classification based on a subscore is in better agreement, than classification based on the total score, with classification based on the corresponding subscore on a parallel form. The method can be considered as an assessment of the added value of subscores with respect to classification. The suggested method is applied to data from several operational tests. The added value of subscores with respect to classification is found to be very similar, except at extreme cutscores, to their added value from a value‐added analysis of Haberman.  相似文献   

10.
Four types of study relating to the sensitivity of cloze to intersentential constraint are reviewed: (1) factor analytical and correlational studies; (2) studies which varied the length of available context; (3) studies which varied the mode of presentation; (4) studies which varied the quality of coherent context. Many of the investigations failed to provide convincing evidence that cloze is sensitive to intersentential constraint. However, this could be explained in part by weaknesses in the studies themselves. The factor analytical and correlational studies often failed to employ an adequate measure of global comprehension. While other studies demonstrated the importance of immediate context they did not specifically address the question of intersentential constraint. Quality of context needs to be considered separately from length of context. The quality of context studies provided conflicting results. Again, some of these investigations have serious weaknesses which tend to prejudice the results. The most reliable recent studies suggest that cloze is sensitive to intersentential constraint. Finally, tentative recommendations are made for establishing testing conditions which are most likely to encourage the use of intersentential constraint.  相似文献   

11.
Most growth models implicitly assume that test scores have been vertically scaled. What may not be widely appreciated are the different choices that must be made when creating a vertical score scale. In this paper empirical patterns of growth in student achievement are compared as a function of different approaches to creating a vertical scale. Longitudinal item‐level data from a standardized reading test are analyzed for two cohorts of students between Grades 3 and 6 and Grades 4 and 7 for the entire state of Colorado from 2003 to 2006. Eight different vertical scales were established on the basis of choices made for three key variables: Item Response Theory modeling approach, linking approach, and ability estimation approach. It is shown that interpretations of empirical growth patterns appear to depend upon the extent to which a vertical scale has been effectively “stretched” or “compressed” by the psychometric decisions made to establish it. While all of the vertical scales considered show patterns of decelerating growth across grade levels, there is little evidence of scale shrinkage.  相似文献   

12.
Brennan ( 2012 ) noted that users of test scores often want (indeed, demand) that subscores be reported, along with total test scores, for diagnostic purposes. Haberman ( 2008 ) suggested a method based on classical test theory (CTT) to determine if subscores have added value over the total score. According to this method, a subscore has added value if the corresponding true subscore is predicted better by the subscore than by the total score. In this note, parallel‐forms scores are considered. It is proved that another way to interpret the method of Haberman is that a subscore has added value if it is in better agreement than the total score with the corresponding subscore on a parallel form. The suggested interpretation promises to make the method of Haberman more accessible because several practitioners find the concept of parallel forms more acceptable or easier to understand than that of a true score. Results are shown for data from two operational tests.  相似文献   

13.
A scale was developed to assess primary school Teachers’ Self-Efficacy on Education for Sustainable Development (TSESESD). It includes four domains of competences: values and ethics, systems thinking, emotions and feelings, and actions. The scale development is consistent with key principles of educational and social psychology research. Nine hundred twenty-four (924) primary education student teachers and 88 in-service primary teachers participated in the study. Findings demonstrated that TSESESD has good psychometric properties, strong validity and reliability scores, adequate internal consistency (Cronbach α?=?0.97), and satisfactory mean inter-correlation of items within domains (M?=?0.78). TSESESD is considered a reliable instrument for teacher preparation programs aiming to develop primary school teachers’ self-efficacy in ESD.  相似文献   

14.
This study investigates the relationships among factor correlations, inter-item correlations, and the reliability estimates of subscores, providing a guideline with respect to psychometric properties of useful subscores. In addition, it compares subscore estimation methods with respect to reliability and distinctness. The subscore estimation methods explored in the current study include augmentation based on classical test theory and multidimensional item response theory (MIRT). The study shows that there is no estimation method that is optimal according to both criteria. Augmented subscores show the most improvement in reliability compared to observed subscores but are the least distinct.  相似文献   

15.
Existing research indicates that emotions are integral components of teachers’ jobs and lives, but knowledge regarding functional relations between teachers’ emotions, their antecedents and their effects on teachers, teaching and students is still quite scarce. One possible reason for this knowledge gap is the lack of adequate operationalisation of the teacher-emotion construct. Thus, the aim of this research was to develop a psychometrically grounded and contextually specific multidimensional self-report instrument aimed at assessing the specific emotions teachers experience in relation to their work and profession. Based on the contemporary component definition of emotion, and using a mixed-method approach (qualitative and quantitative), through a series of five empirical studies (N1 = 25, N2 = 300, N3 = 315, N4 = 391 and N5 = 1314), the Teacher Emotion Questionnaire (TEQ) has been developed. The instrument contains scales assessing emotions of joy, pride, love, fatigue, anger and hopelessness. All scales have adequate psychometric characteristics and are theoretically meaningfully related to the criterion variables examined. Added value of the TEQ scales over the more general measures of affect is also demonstrated.  相似文献   

16.
The Assessment Experience Questionnaire has been widely used to measure conditions of learning from assessment. It is one of three methods used in the ‘Transforming the Experience of Students through Assessment’ research process, originally funded by the Higher Education Academy to explore programme assessment patterns, and now used extensively in universities in the United Kingdom. Given the growth of assessment and feedback research over the last decade, the Assessment Experience Questionnaire is ripe for revision. Critics have queried its theoretical and statistical robustness. This study investigated the psychometric properties of the Assessment Experience Questionnaire, as the first step in the process of strengthening the instrument. Specifically, we examined the validity of the questionnaire with a sample of final year undergraduate students from eight UK universities (n?=?633). Results were mixed, confirming that the questionnaire has some value, but indicating that not all sub-scales possess adequate psychometric properties to underpin confident conclusions. As a result, we have embarked on a process of making conceptual modifications to the Assessment Experience Questionnaire, both to update the theoretical constructs, and to ensure stronger overall validity.  相似文献   

17.
Understanding attitudes toward science and measuring them remain two major challenges for science teaching. This article reviews the concept of attitudes toward science and their measurement. It subsequently analyzes the psychometric properties of the Test of Science-Related Attitudes (TOSRA), such as its construct validity, its discriminant and concurrent validity, and its reliability. The evidence presented suggests that TOSRA, in its Spanish-adapted version, has adequate construct validity regarding its theoretical referents, as well as good indexes of reliability. In addition, it determines the attitudes toward science of secondary school students in Santiago de Chile (n?=?664) and analyzes the sex variable as a differentiating factor in such attitudes. The analysis by sex revealed low-relevance gender difference. The results are contrasted with those obtained in English-speaking countries. This TOSRA sample showed good psychometric parameters for measuring and evaluating attitudes toward science, which can be used in classrooms of Spanish-speaking countries or with immigrant populations with limited English proficiency.  相似文献   

18.
19.
Will subscores provide additional information than what is provided by the total score? Is there a method that can estimate more trustworthy subscores than observed subscores? To answer the first question, this study evaluated whether the true subscore was more accurately predicted by the observed subscore or total score. To answer the second question, three subscore estimation methods (i.e., subscore estimated from the observed subscore, total score, or a combination of both the subscore and total score) were compared. Analyses were conducted using data from six licensure tests. Results indicated that reporting subscores at the examinee level may not be necessary as they did not provide much additional information over what is provided by the total score. However, at the institutional level (for institution size ≥ 30), reporting subscores may not be harmful, although they may be redundant because the subscores were predicted equally well by the observed subscores or total scores. Finally, results indicated that estimating the subscore using a combination of observed subscore and total score resulted in the highest reliability.  相似文献   

20.
在"不再制定考试大纲"的条件下,根据课程标准命题是今后高考、中考等大规模教育考试工作的常态。文章首先从知识、能力、功能和质量四个角度,具体讨论了取消考试大纲所面临的命题挑战,然后从课程标准的操作性解读、现代教育测量模型的中国化处理、命题团队的建设与培养、题库建设四个方面,提出了取消考试大纲后大规模教育考试命题的具体建议。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号