首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The matched pair technique for writing and scoring true-false items was designed to compensate for the acquiescence response set of primary grade children. The claim that this technique increases reliability to an appreciable extent over traditional true-false scoring was investigated by comparing alpha internal consistency coefficients computed for the matched pair true-false, traditional true-false, and three other scoring schemes. Both the total sample coefficients and individual classroom coefficients were computed from the standardization sample of a primary grade economics achievement test (Primary Test of Economic Understanding). Classroom reliability coefficients computed from the matched pair scores were found to be higher than those from scores computed by the other methods. Total sample coefficients obtained from four of the five methods were nearly equal. Evidence of the effects of each scoring technique on concurrent validity is also presented. Contrary to expectations, the correlations of traditional and matched pair scores with Iowa Test of Basic Skills (ITBS) subtests (when adjusted for differing reliabilities) were approximately equal.  相似文献   

2.
This article is a pedagogical piece on coefficient alpha (α) and its uses. The classical approach to test reliability is explained. Test‐retest, alternative‐forms, and internal‐consistency methods of approximating test reliability are described, equations are derived for each method, and α is shown to be a lower‐bound internal‐consistency approximation to test reliability. Emphasis is placed on the effects of violations of model assumptions on reliability estimation. The classical models are conceptualized as structural equation models and are displayed in path diagrams. Special emphasis is placed on the failure of α to meet certain basic criteria as an index of test homogeneity.  相似文献   

3.
It is shown that the maximum likelihood estimator of the widely used omega coefficient for reliability of multicomponent measuring instruments converges almost surely to the population reliability coefficient for normal congeneric measures with uncorrelated errors as sample size increases indefinitely. This strong consistency implies convergence in probability (consistency) as well as in distribution for the omega estimator. Strong consistency is also demonstrated for the maximal reliability estimator associated with the optimal linear combination of the instrument components. The findings of this note add (i) to the recommendation to use in the general normality case the omega estimator in empirical research, (ii) to the critical literature on the popular coefficient alpha then, and (iii) to the literature on the properties of the optimal linear combination of observed measures and the maximal reliability estimator.  相似文献   

4.
This study of the reliability and validity of scales from the Child's Report of Parental Behavior (CRPBI) presents data on the utility of aggregating the ratings of multiple observers. Subjects were 680 individuals from 170 families. The participants in each family were a college freshman student, the mother, the father, and 1 sibling. The results revealed moderate internal consistency (M = .71) for all rater types on the 18 subscales of the CRPBI, but low interrater agreement (M = .30). The same factor structure was observed across the 4 rater types; however, aggregation within raters across salient scales to form estimated factor scores did not improve rater convergence appreciably (M = .36). Aggregation of factor scores across 2 raters yields much higher convergence (M = .51), and the 4-rater aggregates yielded impressive generalizability coefficients (M = .69). These and other analyses suggested that the responses of each family member contained a small proportion of true variance and a substantial proportion of factor-specific systematic error. The latter can be greatly reduced by aggregating scores across multiple raters.  相似文献   

5.
A group of 46 full-term and 54 high-risk preterm (less than 1,500 grams birthweight) infants were tested at 6, 7, and/or 8 months of age (corrected age for preterms) on a battery of problems assessing visual recognition memory and tactual-visual cross-modal transfer. At all 3 ages, scores obtained on aggregates of 6-11 problems in the battery significantly predicted 3-year Stanford-Binet IQ: correlations ranged from r = .37 to r = .63, and clustered between r = .50 and r = .60. When aggregates from 2 or 3 ages were used as predictors, multiple correlations were as high as R = .60 and R = .70. Cutoffs for predicting children at risk for mental retardation (IQ less than 70) or cognitive delay (IQ less than 85) showed reasonable sensitivity and specificity, although low scores were poor at detecting IQs less than 70. The internal consistency of composites, indexed by alpha coefficients, was unexpectedly low, primarily because the problems shared little variance. However, stability coefficients between assessments as much as 1 and 2 months apart were moderate in magnitude, ranging from r = .30 to r = .50. Considering the high degree of predictive validity, the stability figures appear to be better estimates of reliability for these measures than are indices of internal consistency. The relations reported here were similar for both full-terms and preterms.  相似文献   

6.
This study presents the building of an instrument to measure personal conceptions of intelligence based upon Dweck research, and some exploratory evidence. The instrument is directed to adolescents, has got more items than the original one and incorporates new aspects, such as the importance of effort and ability in relation with personal conceptions of intelligence. The results of a factor analysis evidenced the existence of two distinct factors — a static and a dynamic one — that explain together 31.7% of the total variance. The internal consistency of the scales evidenced alpha coefficients between .74 and .80. The results of a test-retest reliability study (with a month interval) proved to be better for the static scale than to the the dynamic one, as well as the results of an external validity study (correlations with grade point average). Some differential exploratory studies showed differences in personal conceptions of intelligence related to school grades (5th to 11th): the scores increased from the 5th to the 11th grade, showing that older students were less “static” (more “dynamic”), and also related to the socio-economic status (high vs. low): the higher SES subjects appeared less “static” (more “dynamic”) than the lower SES subjects.  相似文献   

7.
Reliability of Scores From Teacher-Made Tests   总被引:1,自引:0,他引:1  
Reliability is the property of a set of test scores that indicates the amount of measurement error associated with the scores. Teachers need to know about reliability so that they can use test scores to make appropriate decisions about their students. The level of consistency of a set of scores can he estimated by using the methods of internal analysis to compute a reliability coefficient. This coefficient, which can range between 0.0 and +1.0, usually has values around 0.50 for teacher-made tests and around 0.90 for commercially prepared standardized tests. Its magnitude can be affected by such factors as test length, test-item difficulty and discrimination, time limits, and certain characteristics of the group—extent of their testwiseness, level of student motivation, and homogeneity in the ability measured by the test.  相似文献   

8.
The purpose of the current study was to examine the validity and diagnostic accuracy of the Intervention Selection Profile—Social Skills (ISP‐SS), a brief social skills assessment tool intended for use with students in need of Tier 2 intervention. Participants included 160 elementary and middle school students who had been identified through universal screening as at risk for behavioral concerns. Teacher participants ( n = 71) rated each of these students using both the ISP‐SS and the Social Skills Improvement System—Rating Scales (SSiS‐RS), with the latter measure serving as the criterion within validity and diagnostic accuracy analyses. Confirmatory factor analysis supported ISP‐SS structural validity, indicating ISP‐SS items broadly conformed to a single “Social Skills” factor. Follow‐up analyses suggested ISP‐SS broad scale scores demonstrated adequate internal consistency reliability, with hierarchical omega coefficient equal to 0.86. Correlational analyses supported the concurrent validity of ISP‐SS items, finding each ISP‐SS item to be moderately or highly related to its corresponding SSiS‐RS subscale. Finally, analyses indicated three of the seven ISP‐SS items that demonstrated sufficient diagnostic accuracy; however, findings suggest additional revisions are needed if the ISP‐SS is to be appropriate for use in schools. Implications for practice and future research are discussed.  相似文献   

9.
OBJECTIVE: The goal was to develop a retrospective inventory of parental threatening behavior to facilitate a better understanding of such behavior's role in the etiology of psychological distress. METHOD: Inventory items were developed based on theory and 135 students' responses to a question eliciting examples of threatening parental behavior. Following item development, two additional student samples (n = 200 and n = 603) completed batteries of self-report measures. Responses were used to eliminate unstable or redundant items from the inventory and to examine the inventory's psychometric properties. RESULTS: Factor analysis of the inventory revealed three factors, accounting for 66.2% of variance; this factor structure is compatible with theory, and consistent across maternal behavior scores, paternal behavior scores, and combined maternal and paternal scores. Cronbach's coefficient alphas indicated acceptable internal consistency; Pearson correlation coefficients indicated acceptable 4-week test-retest reliability. Moderate intercorrelations with two retrospective measures of childhood experiences suggested construct validity. Regression analyses demonstrated the ability of the inventory to predict both anxious and depressive symptomatology and lifetime symptoms of anxiety and depressive disorder. Normative data on combined parent scores, maternal scores, and paternal scores are also presented. CONCLUSIONS: Initial psychometric testing of the Parent Threat Inventory (PTI) suggests it is a reliable and valid tool for investigating the developmental antecedents of adult psychological distress. Further research should focus on addressing two limitations: (1) lack of normative and psychometric data on men and women suffering from clinical disorders, and (2) lack of validation by parental reporting.  相似文献   

10.
This article uses definitions provided by Cronbach in his seminal paper for coefficient α to show the concepts of reliability, dimensionality, and internal consistency are distinct but interrelated. The article begins with a critique of the definition of reliability and then explores mathematical properties of Cronbach's α. Internal consistency and dimensionality are then discussed as defined by Cronbach. Next, functional relationships are given that relate reliability, internal consistency, and dimensionality. The article ends with a demonstration of the utility of these concepts as defined. It is recommended that reliability, internal consistency, and dimensionality each be quantified with separate indices, but that their interrelatedness be recognized. High levels of unidimensionality and internal consistency are not necessary for reliability as measured by α nor, more importantly, for interpretability of test scores.  相似文献   

11.
《教育实用测度》2013,26(3):295-308
In some measurement settings internal consistency reliability of a measure must be based on a partition of the instrument into only 2 parts that cannot be further subdivided. Each of these 2 parts yields only a single score. If the functional lengths of the parts appear to be unequal or the parts are scored on different scales, the setting calls for a congeneric coefficient. It is shown that a single-valued estimate of the total score reliability is possible only if an assumption is made about the comparative size of the error variances of the parts. Without such an assumption, a range of reliability estimates is consistent with the part-test variances and covariance. But if the reliability of 1 part can be estimated independent of scores on the 2nd part, then a single-valued congeneric estimate of total score reliability is possible.  相似文献   

12.
Wording effect refers to the systematic method variance caused by positive and negative item wordings on a self-report measure. This Monte Carlo simulation study investigated the impact of ignoring wording effect on the reliability and validity estimates of a self-report measure. Four factors were considered in the simulation design: (a) the number of positively and negatively worded items, (b) the loadings on the trait and the wording effect factors, (c) sample size, and (d) the magnitude of population validity coefficient. The findings suggest that the unidimensional model that ignores the negative wording effect would underestimate the composite reliability and criterion-related validity, but overestimate the homogeneity coefficient. The magnitude of relative bias of the composite reliability was generally small and acceptable, whereas the relative bias for the homogeneity coefficient and criterion-related validity coefficient was negatively correlated with the strength of the general trait factor.  相似文献   

13.
The main points of Sijtsma and Green and Yang in Educational Measurement: Issues and Practice (34, 4) are that reliability, internal consistency, and unidimensionality are distinct and that Cronbach's alpha may be problematic. Neither of these assertions are at odds with Davenport, Davison, Liou, and Love in the same issue. However, many authors in the testing community mention these terms not only together, but sometimes as if they are synonymous. Moreover, Cronbach's coefficient alpha is very popular as an index of reliability. Thus, articles discussing alpha are not only appropriate, but necessary. Our concerns are the same as formed the genesis of prior (2009) articles by these same authors, Sijtsma and Green and Yang. This rejoinder also makes comments about item parcels when tests are multidimensional and about factor analytic approaches to assessing reliability.  相似文献   

14.
《教育实用测度》2013,26(3):249-253
A test segment that lacks content validity with respect to a criterion may be deleted for that reason. At issue is the effect on reliability and validity as measured by the coefficients arising from classical test theory. Assuming that the predictor test has some reasonable degree of internal consistency, deleting a segment of meaningful size is certain to reduce reliability. However, Feldt (1997) showed that a concomitant rise in the validity coefficient may occur under certain limited conditions. The present research further characterizes the circumstances under which validity changes may occur as a result of deletion of a predictor test segment. Specifically, for a positive outcome, one seeks a relatively large correlation between the scores from the deleted segment and the remaining items coupled with a relatively low correlation between scores from the deleted segment and the criterion.  相似文献   

15.
This study evaluated the reliability and validity of a performance assessment designed to measure students' thinking and reasoning skills in mathematics. The QUASAR Cognitive Assessment Instrument (QCA1) was administered to over 1.700 sixth and seventh grade students of various ethnic backgrounds in six schools that are participating in the QUASAR project. The consistency of students' responses across tasks and the validity for inferences drawn from the scores on the assessment to the more broadly-defined construct domain were examined. The intertask consistency and the dimensionality of the assessment was assessed through the use of polychoric correlations and confirmatory factor analysis, and the generalizability of the derived scores was examined through the use of generalizability theory. The results from the confirmatory factor analysis indicate that a one-factor model fits the data for each of the four QCAI forms. The major findings from the generalizability studies (person x task and person x rater x task) indicate that, for each of the four forms, the person x task variance component accounts for the largest percentage of the total variability and the percentage of variance accounted for by the variance components that include the rater effect is negligible. The variance components that-include the rater effect were negligible. The generalizability and dependability coefficients for the person x task decision studies (nt, = 9) range from .71-.84. These results indicate that the use of nine tasks may not be adequate for generalizing to the larger domain of mathematics for individual student level scores. The QUASAR project, however, is interested in assessing mathematics achievement at the program level not the student level; therefore, these coefficients are not alarmingly low.  相似文献   

16.
It is shown that in general the popular coefficient alpha estimator for reliability of multi-component measuring instruments converges almost surely to a quantity that is not equal to the population reliability coefficient. This convergence with probability 1 is a stronger statement than convergence in probability (consistency) and convergence in distribution for the alpha estimator, which have been studied in the past. In the special case of congeneric measures with uncorrelated errors and equal loadings on the common true score, the alpha estimator converges almost surely to the population reliability coefficient that equals population alpha, which implies also its consistency as a reliability estimator. When the loadings are unequal but sufficiently high and similar, the alpha estimator converges almost surely to population alpha that is essentially indistinguishable from the population reliability coefficient, which implies alpha’s approximate consistency then. For the general case, the results entail that the alpha estimator is not a consistent estimator of reliability. The findings add to the critical literature on coefficient alpha in the general case, as well as to the justification of its use as a dependable measuring instrument reliability estimator in special cases and settings resulting under appropriate restrictive conditions, and are illustrated using a numerical example.  相似文献   

17.
探讨医务人员心理和谐的结构,编制医务人员心理和谐问卷。采用自编《医务人员心理和谐调查问卷》对1-035名医务人员进行问卷调查,并对调查结果进行探索性因素分析和验证性因素分析,考察问卷的结构和信度、效度。医务人员心理和谐问卷包括:人际和谐感、社会和谐感、自我和谐感、环境和谐感的二阶四因子结构,可解释总方差变异的67.75%;模型与数据的拟合情况符合拟合良好标准;总问卷的 Cronbachα系数为0.968,分量表在0.880-0.970之间。医务人员心理和谐问卷具有良好的心理测量学指标,可在相关研究和实践领域使用。  相似文献   

18.
OBJECTIVE: The aim was to construct and test the reliability (utility, internal consistency, interrater agreement) and the validity (internal validity, concurrent validity) of a scale for home visiting social nurses to identify risks of physical abuse and neglect in mothers with a newborn child. METHOD: A 71-item scale was constructed based on a literature review and focus group sessions with social nurses and paraprofessionals who had experience with underprivileged families. This scale was applied in a random sample of 40 home visiting social nurses, who collected data in a sample of 373 nonabusive and 18 abusive/neglectful mothers with a newborn child. RESULTS: Items with prevalence rates below 5% and items making no significant difference between maltreating and non-maltreating mothers were omitted. The final version contained 20 items. This scale showed high internal consistency (alpha = .92) and high interrater reliability (r = .97). Exploratory factor analysis yielded a three-factor solution: Isolation (8 items, explaining 62.17% of the common variance), Psychological complexity (6 items, 18.86%), and Communication problems (6 items, 8.41%). Scores on Communication problems and Isolation significantly predicted scores on a social deprivation scale, which significantly distinguished maltreating from non-maltreating mothers. Mothers scoring high on Communication problems or Isolation obtained higher scores for social deprivation than low-scoring mothers. CONCLUSIONS: Home visiting nurses can identify risks for physical abuse and neglect among mothers with a newborn infant by focusing on signs of social isolation, distorted communication and psychological problems.  相似文献   

19.
This study investigated the reliability and structural validity of Elementary Reading Attitude Survey ([ERAS]; McKenna and Kear, 1990) scores in 575 academically talented students attending an academic summer program. Results indicated that ERAS Academic and Recreational scores had satisfactory internal consistency coefficients, and that participants’ reading attitudes were near the top of the normative distribution of ERAS scores. Exploratory factor analysis of ERAS scores supported two factors measuring academic and recreational reading attitudes, and significantly higher scores on reading attitudes were found for girls at three grade levels, with medium to large effect sizes.  相似文献   

20.
Research Findings: Few rating scales measure social competence in very young Spanish or Catalan children. We aimed to analyze the psychometric characteristics of the California Preschool Social Competence Scale (CPSCS) when applied to a Spanish- and Catalan-speaking population. Children were rated by their respective teachers within 6 months following their 4th birthday in two population-based birth cohorts in Spain (N = 378). A confirmatory factor analysis (CFA) was used to compare the underlying structure of the Spanish–Catalan version with that of the original version. Cronbach's alpha coefficient was used to determine the internal consistency of each of the confirmed factors. Cohen's kappa formula was used to calculate the test–retest reliability in a small subset of children who were rated again one month later. Five correlated factors (Considerateness, Task Orientation, Extraversion, Verbal Facility, and Response to Unfamiliar) were optimally confirmed as a result of CFA. The first three factors had robust internal consistency. The kappa coefficient was satisfactory in 29 items out of 30. Children's cognitive abilities as assessed by the McCarthy Scales, children's gender, maternal social class and level of education were related to the social competence scores as indicators of criterion-related factors. Practice or Policy: The bilingual version of the CPSCS has good psychometric properties allowing it to be used in further studies in either Spanish or Catalan populations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号