期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Latent trait models for assessing multirater multiattribute agreement

Zheng Zhou Tao Xin 《Psychology in the schools》2007,44(5):515-525

The traditional kappa statistic in assessing interrater agreement is not adequate when multiraters and multiattributes are involved. In this article, latent trait models are proposed to assess the multirater multiattribute (MRMA) agreement. Data from the Third International Mathematics and Science Studies (TIMSS) are used to illustrate the application of the latent trait models. Results showed that among four possible latent trait models, the correlated uniqueness model had the best fit to assess the MRMA agreement. Furthermore, in coding a set of different attributes, the coding accuracy within the same rater may differ across attributes. Likewise, when different raters rate the same attribute, the accuracy in rating varies among the raters. Thus, the latent models provide us with a more refined and accurate assessment of interrater agreement. The application of the latent trait models is important in school psychology research and intervention because accurate assessment of children's functioning is fundamental in designing effective intervention strategies. © 2007 Wiley Periodicals, Inc. Psychol Schs 44: 515–525, 2007. 相似文献

2.

Closing the Year with Appreciation to Our Peer Reviewers

William C. Diehl 《The American journal of distance education》2013,27(4):221-222

相似文献

3.

Evaluation of Procedure-Based Scoring for Hands-On Science Assessment

Gail P. Baxter Richard J. Shavelson Susan R. Goldman Jerry Pine 《Journal of Educational Measurement》1992,29(1):1-17

This article evaluates a procedure-based scoring system for a performance assessment (an observed paper towels investigation) and a notebook surrogate completed by fifth-grade students varying in hands-on science experience. Results suggested interrater reliability of scores for observed performance and notebooks was adequate (>.80) with the reliability of the former higher. In contrast, interrater agreement on procedures was higher for observed hands-on performance (.92) than for notebooks (.66). Moreover, for the notebooks, the reliability of scores and agreement on procedures varied by student experience, but this was not so for observed performance. Both the observed-performance and notebook measures correlated less with traditional ability than did a multiple-choice science achievement test. The correlation between the two performance assessments and the multiple-choice test was only moderate (mean = .46), suggesting that different aspects of science achievement have been measured. Finally, the correlation between the observed-performance scores and the notebook scores was .83, suggesting that notebooks may provide a reasonable, albeit less reliable, surrogate for the observed hands-on performance of students. 相似文献

4.

Adjustment scales for children and adolescents and Native American Indians: Factorial validity generalization for Ojibwe youths

Gary L. Canivez 《Psychology in the schools》2006,43(6):685-694

Replication of the core syndrome factor structure of the Adjustment Scales for Children and Adolescents (ASCA; P.A. McDermott, N.C. Marston, & D.H. Stott, 1993) is reported for a sample of 183 Native American Indian (Ojibwe) children and adolescents from North Central Minnesota. The six ASCA core syndromes produced an identical two‐factor solution as the standardization data through principal axis analysis using multiple criteria for the number of factors to extract and retain. Varimax, direct oblimin, and promax rotations produced identical results and nearly identical factor‐structure coefficients. Coefficients of congruence resulted in an excellent match to the factorial results of the ASCA standardization sample and a large, independent sample. It was concluded that for these Ojibwe students, the ASCA measures two independent dimensions of psychopathology (i.e., Overactivity and Underactivity) that are similar to the conduct problems/externalizing and withdrawal/internalizing dimensions commonly found in the child psychopathology assessment literature. © 2006 Wiley Periodicals, Inc. Psychol Schs 43: 685–694, 2006. 相似文献

5.

Replication of the Adjustment Scales for Children and Adolescents core syndrome factor structure

Gary L. Canivez 《Psychology in the schools》2004,41(2):191-199

Independent examination and replication of the core syndrome factor structure of the Adjustment Scales for Children and Adolescents (ASCA; McDermott, Marston, & Stott, 1993) is reported. A sample of 1,020 children were randomly selected from their classroom and rated on the ASCA by their teacher. The six ASCA core syndromes produced a two‐factor solution through principle axis analysis using multiple criteria for the number of factors to extract and retain. Varimax, direct oblimin, and promax rotations produced identical results and nearly identical factor structure coefficients. It was concluded that the ASCA indeed measures two independent dimensions of psychopathology (Overactivity and Underactivity) that are similar to the conduct problems/externalizing and withdrawal/internalizing dimensions commonly found in the child psychopathology assessment literature (Cicchetti & Toth, 1991; Quay, 1986). © 2004 Wiley Periodicals, Inc. Psychol Schs 41: 191–199, 2004. 相似文献

6.

Construct validity of the adjustment scales for children and adolescents and the preschool and kindergarten behavior scales: Convergent and divergent evidence

Gary L. Canivez Jaime D. Rains 《Psychology in the schools》2002,39(6):621-633

Construct validity (convergent and divergent) of the Adjustment Scales for Children and Adolescents (ASCA; McDermott, Marston, & Stott, 1993) and the Preschool and Kindergarten Behavior Scales (PKBS; Merrell, 1994a) is presented. Regular classroom teachers (n = 38) randomly selected 5‐ and 6‐year‐old children (N = 123) and rated them on the ASCA and PKBS in counterbalanced order. Convergent evidence of construct validity was observed for the PKBS Externalizing Problems scale and the ASCA Overactivity syndrome. Divergent evidence of construct validity was provided for the PKBS Externalizing Problems scale and ASCA Underactivity syndrome. Convergent and divergent evidence of construct validity for the PKBS Internalizing Problems scale and ASCA Overactivity and Underactivity syndromes was mixed. Results were identical to those of Canivez and Bordenkircher (2002). © 2002 Wiley Periodicals, Inc. Psychol Schs 39: 621–633, 2002. 相似文献

7.

School Counseling in Belize: Poised for Great Development

Shirlene Smith-Augustine Miriam Wagner 《International journal for the advancement of counseling》2012,34(4):320-330

The majority of persons serving as school counselors in Belize do not have the formal training proposed by standard setting bodies, such as the U.S. National Board of Certified Counselors (NBCC) and the American School Counseling Association (ASCA). However, those serving as counselors readily identify responsibilities that parallel those advocated by the ASCA National Model (ASCA 2005). This article identifies characteristics of Belize school counselors, and reviews current school counseling practices and the implications of these for the future of school counseling in that setting. Opportunities for standardization and professional development for school counselors, solidification of a professional identity, and barriers to educational attainment are also explored. 相似文献

8.

SPSS macros for assessing the reliability and agreement of student evaluations of teaching

Donald D. Morley 《Assessment & Evaluation in Higher Education》2009,34(6):659-671

相似文献

9.

CLASSROOM CONTEXTS AS THE FRAMEWORK FOR ASSESSING SOCIAL–EMOTIONAL ADJUSTMENT: A NATIONAL STUDY IN TRINIDAD AND TOBAGO

Paul A. Mcdermott Marley W. Watkins Anna Rhoad Drogalis Jessica L. Chao Frank C. Worrell Tracey E. Hall 《Psychology in the schools》2016,53(6):626-640

Contextually based assessments reveal the circumstances accompanying maladjustment (the when, where, and with whom) and supply clues to the motivations underpinning problem behaviors. The Adjustment Scales for Children and Adolescents (ASCA) is a teacher rating scale composed of indicators describing behavior in 24 classroom situational contexts. This study examines the Trinidad and Tobago national normative process for the ASCA contextual dimensions with a representative sample of elementary school children (N = 900). Exploratory and confirmatory factor analyses yielded the same three dimensions (peer context problems, teacher context problems, and learning context problems) observed in U.S. national samples. Dimensions were scaled using item response theory (IRT) and Bayesian scoring methods, with peer and learning context problems scores relating more strongly to clinical behavior disturbances and learning context problems showing stronger association with classroom learning styles. Implications for future research and practice are discussed. 相似文献

10.

On the Superior Statistical Properties of Frequency Scales in Job Analyses

Ben Babcock Nicole M. Risk Adam E. Wyse 《Educational Measurement》2020,39(2):85-95

This study compared the statistical properties of four job analysis task survey response scale types: criticality, difficulty in learning, importance, and frequency. We used nine job analysis studies spanning two fields, medical imaging and allied health professionals, to compare the job analysis scales in terms of variability and interrater agreement. Results showed that frequency scales using absolute anchors had greater between-task variability and higher interrater agreement for all nine studies. This may have occurred due to what has been described by past research as self-presentation bias. In this case, an aggregate base percentage of respondents always responded that tasks in their domain are highly critical, highly important, and easy to learn. These results showed that frequency scales with absolute anchors yielded data with better statistical performance than other more subjective scales. These properties do not answer the question of whether a scale matches an exam's purpose, which is the most important consideration for job analyses. They do, however, suggest that, if statistics are a primary deciding factor, strong consideration should be given to using frequency scales with absolute anchors. 相似文献

11.

Agreement Between Lay Participants and Professional Assessors: Support of a Group Assessment Procedure for Selection Purposes

Zipora Shechtman 《Journal of Personnel Evaluation in Education》1998,12(1):5-17

This study investigates agreement between professional assessors and laypersons (participants) in a group procedure that draws from assessment center principles designed to evaluate candidates to teacher-education programs. Earlier studies have established the validity of this assessment procedure and indicated high interrater agreement of professionals. Evidence that participants concur with professional evaluators will further increase our confidence in the process. The study was conducted in Israel and encompassed 159 applicants to two different educational programs. Results showed high correlations between professional and participant ratings, suggesting that the interactional process provides sufficient information for lay assessors to reach judgments that agree with expert evaluations. Nonetheless, the finding that professional ratings were significantly lower than peer and self-evaluations seems to imply that participant assessors can enhance, but by no means replace, professionals. The social and economic benefits of including lay participants in the assessment process are discussed. 相似文献

12.

Comparison of interrater reliability on the torrance tests of creative thinking for gifted and nongifted students

Arlene Rosenthal Stephen T. Demers William Stilwell Sheila Graybeal Joseph Zins 《Psychology in the schools》1983,20(1):35-40

Despite their widespread use in identifying and evaluating programs for gifted and talented students, the Torrance Tests of Creative Thinking were standardized on samples that excluded gifted children. The interrater reliability of measures like the TTCT has been questioned repeatedly, yet studies with average students have demonstrated high interrater reliability. This study compares the interrater reliability of the TTCT for groups of gifted and nongifted elementary-school-aged students. Results indicated most interrater reliability coefficients exceeding .90 for both gifted and nongifted groups. However, multivariate analysis of variance indicated significant mean differences across the three self-trained raters for both gifted and nongifted groups. Consequently, use of a single scorer to evaluate TTCT protocols is recommended, especially where specific cutoff scores are used to select students. 相似文献

13.

Interrater reliability in a california middle school english/language arts portfolio assessment program

Terry Underwood Sandra Murphy 《Assessing Writing》1998,5(2):201-230

This article examines statistical evidence for the reliability of a locally developed portfolio assessment system across three separate portfolio scoring sessions over the course of a year and finds that the English teachers at this California site were able to demonstrate “strong” levels of interrater agreement. Further, the statistical evidence reported indicates that levels of agreement improved with each scoring session without mandating a fixed task portfolio menu or resorting to piece-by-piece scoring procedures. The study argues that defensible local portfolio assessment systems can be developed which enhance, rather than diminish, teacher professionalism while still providing dependable data for external purposes. 相似文献

14.

Cross‐informant agreement of children's social‐emotional skills: An investigation of ratings by teachers,parents, and students from a nationally representative sample

下载免费PDF全文

Frank M. Gresham Stephen N. Elliott Sarah Metallo Shelby Byrd Elizabeth Wilson Kaitlan Cassidy 《Psychology in the schools》2018,55(2):208-223

This study examines the agreement across informant pairs of teachers, parents, and students regarding the students’ social‐emotional learning (SEL) competencies. Two student subsamples representative of the social skills improvement system (SSIS) SEL edition rating forms national standardization sample were examined: first, 168 students (3rd to 12th grades) with ratings by three informants (a teacher, a parent, and the student him/herself) and a second group of 164 students who had ratings by two raters in a similar role—two parents or two teachers. To assess interrater agreements, two methods were employed: calculation of q correlations among pairs of raters and effect size indices to capture the extant rater pairs differed in their assessments of social‐emotional skills. The empirical results indicated that pairs of different types of informants exhibited greater than chance levels of agreement as indexed by significant interrater correlations; teacher–parent informants showed higher correlations than teacher–student or parent–student pairs across all SEL competency domains assessed, and pairs of similar informants exhibited significantly higher correlations than pairs of dissimilar informants. Study limitations are identified and future research needs outlined. 相似文献

15.

Empirical classification of infant-mother relationships from interactive behavior and crying during reunion

J E Richters E Waters B E Vaughn 《Child development》1988,59(2):512-522

Multiple discriminant function analysis (MDFA) was conducted with data from 255 Strange Situations conducted and scored by Ainsworth and her colleagues. Cross-validated discriminant functions and classification weights were obtained, allowing attachment classifications (A, B, C) to be assigned directly from scores on interactive behavior and crying during reunion episodes. In the past, classification agreement within laboratories has often been used as a training criterion. Unfortunately, this does not insure that classification criteria agreed upon within a laboratory are comparable across laboratories, nor does it insure that agreed upon criteria will yield the same classifications that would have been assigned by the researchers who developed the scoring system. The present results enable researchers who have mastered the scoring systems for reunion behavior and crying to obtain attachment classifications directly from scores on these variables. Alternatively, this procedure may be used to guide the training of, and validate classification decisions by, local judges. 相似文献

16.

Development and testing of a direct observation code training protocol for elementary aged students with attention deficit/hyperactivity disorder

Naomi J. Steiner Tahnee Sidhu Kirsten Rene Kathryn Tomasetti Elizabeth Frenette Robert T. Brennan 《Educational Assessment, Evaluation and Accountability》2013,25(4):281-302

Observational measures can add objective data to both research and clinical evaluations of children’s behavior in the classroom. However, they pose challenges for training and attaining high levels of interrater reliability between observers. The Behavioral Observation of Students in Schools (BOSS) is a commonly used school-based observation instrument that is well adapted to measure symptoms of attention deficit/hyperactivity disorder (ADHD) in the classroom setting. Reliable use of the BOSS for clinical or research purposes requires training to reach reliable standards (kappa?≥?0.80). The current study conducted training observations in one suburban and one urban elementary school in the Greater Boston area. To enhance interrater reliability and reduce training time, supplemental guidelines, including 30 additional rules to follow, were developed over two consecutive school years. The complete protocol was then used for training in the third school year. To reach sufficient interrater reliability (kappa?≥?0.80) during training, 45 training observations were required in the first year while, in the third year, only 17 observations were required. High interrater reliability was sustained after training across all three school years, accumulating a total of 1,001 post-training observations. It is estimated that clinicians or researchers following this proposed protocol, who are naive to the BOSS, will require approximately 30 training observations to reach proficient reliability. We believe this protocol will make the BOSS more accessible for clinical and research usage, and the procedures used to obtain high interrater reliability using the BOSS are broadly applicable to a variety of observational measures. 相似文献

17.

A scale for home visiting nurses to identify risks of physical abuse and neglect among mothers with newborn infants

Grietens H Geeraert L Hellinckx W 《Child abuse & neglect》2004,28(3):321-337

OBJECTIVE: The aim was to construct and test the reliability (utility, internal consistency, interrater agreement) and the validity (internal validity, concurrent validity) of a scale for home visiting social nurses to identify risks of physical abuse and neglect in mothers with a newborn child. METHOD: A 71-item scale was constructed based on a literature review and focus group sessions with social nurses and paraprofessionals who had experience with underprivileged families. This scale was applied in a random sample of 40 home visiting social nurses, who collected data in a sample of 373 nonabusive and 18 abusive/neglectful mothers with a newborn child. RESULTS: Items with prevalence rates below 5% and items making no significant difference between maltreating and non-maltreating mothers were omitted. The final version contained 20 items. This scale showed high internal consistency (alpha = .92) and high interrater reliability (r = .97). Exploratory factor analysis yielded a three-factor solution: Isolation (8 items, explaining 62.17% of the common variance), Psychological complexity (6 items, 18.86%), and Communication problems (6 items, 8.41%). Scores on Communication problems and Isolation significantly predicted scores on a social deprivation scale, which significantly distinguished maltreating from non-maltreating mothers. Mothers scoring high on Communication problems or Isolation obtained higher scores for social deprivation than low-scoring mothers. CONCLUSIONS: Home visiting nurses can identify risks for physical abuse and neglect among mothers with a newborn infant by focusing on signs of social isolation, distorted communication and psychological problems. 相似文献

18.

The Global Assessment of School Functioning (GASF): Criterion validity and interrater reliability

Arthur Maerlender Joseph Palamara Jonathan Lichtenstein 《Psychology in the schools》2020,57(6):990-998

相似文献

19.

Agreement and Stability of Teacher Rating Scales for Assessing ADHD in Preschoolers

Sandra B. Loughran 《Early Childhood Education Journal》2003,30(4):247-253

This study investigated the agreement and stability of 3 teacher rating Scales used to assess ADHD in preschool children: the ADHD Rating Scale, the Child Attention Profile (CAP), and the Conners' Teacher Rating Scale-28 (CTRS-28). A sample of suburban children (n = 60) was observed and rated by their teachers and assistant teachers at preschool level (Time 1) and 4 years later at the elementary school level (Time 2). Agreement among the rating scales and interrater agreement between teacher and assistant teacher ratings yielded noticeably stronger correlations at Time 2 than at Time 1. Over the 4-year interval of the study, there was a significant change in the number of children identified as potential ADHD risks. It is probable there were a high number of false-positive indications in the preschool ADHD screenings. It is also possible that immature behavior of preschool children may mimic ADHD behavior at the elementary school level. 相似文献

20.

Assessing child-rearing behaviors: a comparison of ratings made by mother, father, child, and sibling on the CRPBI 总被引：9，自引：0，他引：9

J C Schwarz M L Barton-Henry T Pruzinsky 《Child development》1985,56(2):462-479

This study of the reliability and validity of scales from the Child's Report of Parental Behavior (CRPBI) presents data on the utility of aggregating the ratings of multiple observers. Subjects were 680 individuals from 170 families. The participants in each family were a college freshman student, the mother, the father, and 1 sibling. The results revealed moderate internal consistency (M = .71) for all rater types on the 18 subscales of the CRPBI, but low interrater agreement (M = .30). The same factor structure was observed across the 4 rater types; however, aggregation within raters across salient scales to form estimated factor scores did not improve rater convergence appreciably (M = .36). Aggregation of factor scores across 2 raters yields much higher convergence (M = .51), and the 4-rater aggregates yielded impressive generalizability coefficients (M = .69). These and other analyses suggested that the responses of each family member contained a small proportion of true variance and a substantial proportion of factor-specific systematic error. The latter can be greatly reduced by aggregating scores across multiple raters. 相似文献