首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
ADHD is one of the most common referrals to school psychologists and child mental health providers. Although a best practice assessment of ADHD requires more than the use of rating scales, rating scales are one of the primary components in the assessment of ADHD. Therefore, the goal of this paper is to provide the reader with a critical and comparative evaluation of the five most commonly used, narrow‐band, published rating scales for the assessment of ADHD. Reviews were conducted in four main areas: content and use, standardization sample and norms, scores and interpretation, and psychometric properties. It was concluded the rating scales with the strongest standardization samples and evidence for reliability and validity are the ADDES, the ADHD‐IV, and the CRS‐R. In determining which of these to use, the prospective users may want to reflect on their goals for the assessment. The ACTeRS and the ADHDT are not recommended for use because they are lacking crucial information in their manuals and have less well‐documented evidence of reliability and validity. Conclusions and recommendations for scale usage are discussed. © 2003 Wiley Periodicals, Inc. Psychol Schs 40: 341–361, 2003.  相似文献   

2.
ABSTRACT

Students’ attitude towards science (SAS) is often a subject of investigation in science education research. Survey of rating scale is commonly used in the study of SAS. The present study illustrates how Rasch analysis can be used to provide psychometric information of SAS rating scales. The analyses were conducted on a 20-item SAS scale used in an existing dataset of The Trends in International Mathematics and Science Study (TIMSS) (2011). Data of all the eight-grade participants from Hong Kong and Singapore (N?=?9942) were retrieved for analyses. Additional insights from Rasch analysis that are not commonly available from conventional test and item analyses were discussed, such as invariance measurement of SAS, unidimensionality of SAS construct, optimum utilization of SAS rating categories, and item difficulty hierarchy in the SAS scale. Recommendations on how TIMSS items on the measurement of SAS can be better designed were discussed. The study also highlights the importance of using Rasch estimates for statistical parametric tests (e.g. ANOVA, t-test) that are common in science education research for group comparisons.  相似文献   

3.
基于多层面Rasch模型,研究分析某省随机抽样高中考生短文朗读和自由交谈两种口语考试任务的评分维度及量表的使用情况。结果表明,短文朗读任务和自由交谈任务的评分维度设置均较合理,能够较准确地反映考生的能力,但是短文朗读量表的等级之间存在非等距性问题,自由交谈任务评分维度中"交际策略"与其他三个维度存在显著差异。这些信息对于修改和完善评分量表及相关维度具有重要意义。  相似文献   

4.
Drawing from multiple theoretical frameworks representing cognitive and educational psychology, we present a writing task and scoring system for measurement of students’ informative writing. Participants in this study were 72 fifth- and sixth-grade students who wrote compositions describing real-world problems and how mathematics, science, and social studies information could be used to solve those problems. Of the 72 students, 69 were able to craft a cohesive response that not only demonstrated planning in writing structure but also elaboration of relevant knowledge in one or more domains. Many-facet Rasch Modeling (MFRM) techniques were used to examine the reliability and validity of scores for the writing rating scale. Additionally, comparison of fifth- and sixth-grade responses supported the validity of scores, as did the results of a correlational analysis with scores from an overall interest measure. Recommendations for improving writing scoring systems based on the findings of this investigation are provided.  相似文献   

5.
This investigation examines the development of two scales that measure elaboration and behaviors associated with stewardship in children. The scales were developed using confirmatory factor analysis to investigate their construct validity, reliability, and psychometric properties. Results suggest that a second-order factor model structure provides the best fit. This model produced: (1) a stewardship elaboration scale measuring interest and cognitive engagement in stewardship issues, and (2) a stewardship behavior scale measuring in-park, community, and home behaviors. These scales will be useful for evaluating environmental educational programs focused on environmental and park stewardship. The scales may also help researchers assess whether environmental education results in participants elaborating on persuasive messaging, thereby increasing the likelihood that behavioral intentions leading to behavior change will occur.  相似文献   

6.
Numerous researchers have proposed methods for evaluating the quality of rater‐mediated assessments using nonparametric methods (e.g., kappa coefficients) and parametric methods (e.g., the many‐facet Rasch model). Generally speaking, popular nonparametric methods for evaluating rating quality are not based on a particular measurement theory. On the other hand, popular parametric methods for evaluating rating quality are often based on measurement theories such as invariant measurement. However, these methods are based on assumptions and transformations that may not be appropriate for ordinal ratings. In this study, I show how researchers can use Mokken scale analysis (MSA), which is a nonparametric approach to item response theory, to evaluate rating quality within the framework of invariant measurement without the use of potentially inappropriate parametric techniques. I use an illustrative analysis of data from a rater‐mediated writing assessment to demonstrate how one can use numeric and graphical indicators from MSA to gather evidence of validity, reliability, and fairness. The results from the analyses suggest that MSA provides a useful framework within which to evaluate rater‐mediated assessments for evidence of validity, reliability, and fairness that can supplement existing popular methods for evaluating ratings.  相似文献   

7.
Efficacy of the Measure of Understanding of Macroevolution (MUM) as a measurement tool has been a point of contention among scholars needing a valid measure for knowledge of macroevolution. We explored the structure and construct validity of the MUM using Rasch methodologies in the context of a general education biology course designed with an emphasis on macroevolution content. The Rasch model was utilized to quantify item- and test-level characteristics, including dimensionality, reliability, and fit with the Rasch model. Contrary to previous work, we found that the MUM provides a valid, reliable, and unidimensional scale for measuring knowledge of macroevolution in introductory non-science majors, and that its psychometric behavior does not exhibit large changes across time. While we found that all items provide productive measurement information, several depart substantially from ideal behavior, warranting a collective effort to improve these items. Suggestions for improving the measurement characteristics of the MUM at the item and test levels are put forward and discussed.  相似文献   

8.
Given the central importance of the Nature of Science (NOS) and Scientific Inquiry (SI) in national and international science standards and science learning, empirical support for the theoretical delineation of these constructs is of considerable significance. Furthermore, tests of the effects of varying magnitudes of NOS knowledge on domain‐specific science understanding and belief require the application of instruments validated in accordance with AERA, APA, and NCME assessment standards. Our study explores three interrelated aspects of a recently developed NOS instrument: (1) validity and reliability; (2) instrument dimensionality; and (3) item scales, properties, and qualities within the context of Classical Test Theory and Item Response Theory (Rasch modeling). A construct analysis revealed that the instrument did not match published operationalizations of NOS concepts. Rasch analysis of the original instrument—as well as a reduced item set—indicated that a two‐dimensional Rasch model fit significantly better than a one‐dimensional model in both cases. Thus, our study revealed that NOS and SI are supported as two separate dimensions, corroborating theoretical distinctions in the literature. To identify items with unacceptable fit values, item quality analyses were used. A Wright Map revealed that few items sufficiently distinguished high performers in the sample and excessive numbers of items were present at the low end of the performance scale. Overall, our study outlines an approach for how Rasch modeling may be used to evaluate and improve Likert‐type instruments in science education.  相似文献   

9.
The purpose of this study was to examine a technique for the development of performance rating scales to measure achievement in courses whose objectives require complex behaviors not easily measurable with paper and pencil achievement tests. A facet-factorial approach to rating scale construction was employed (i.e. the behavior was conceptualized as multidimensional and items for the scales were selected by employing factor analytical techniques) to construct scales to measure clarinet music performance. The three major results of the study were: 1) a thirty-item rating scale based on a six factor structure of clarinet music performance; 2) high inter-judge reliability estimates for both the total score (above .90) and the scale scores (above .60); and, 3) criterion-related validity coefficients greater than .80. Results of the investigation suggest that the facet-factorial approach can be an effective technique for the construction of rating scales to measure complex behavior such as music performance.  相似文献   

10.
The current scales for self-blame are not suitable for school bullying scenarios and most lack validity. This study used a self-developed scale to measure bullied victims’ tendency to self-blame and further examined whether victims and bully/victims exhibited different tendencies toward self-blame under both bullied and generalized scenarios. The study consisted of 1,320 student participants from grades five to nine. The research instrument was a self-constructed bullied-victim self-blame scale (BSS), and the results were analyzed using the Rasch rating scale model. The Rasch results showed strong evidence of BSS reliability and validity. The results indicated that participants’ self-blaming tendency scores were positively correlated with depression (= .31). In addition, participants’ self-blaming scores in relational bullying were higher than those in verbal and physical bullying. The self-blaming tendency of bully/victims under bullied scenarios was higher than that of victims, but no difference was found between bully/victims and victims for generalised scenarios. The participants’ tendency to self-blame under generalised scenarios was significantly higher than under bullied scenarios. The tendencies of various roles to self-blame under different scenarios and the self-blaming counselling strategies for victims are discussed at the end of this study.  相似文献   

11.
Although Likert-type rating scales are used in a great number of early childhood studies, knowledge of how the number of response options affects the psychometric properties of scales used with children is limited. The purpose of this study is to contribute to this knowledge. Data were collected from second grade students and third grade students. Accordingly, 1,092 second- and third-graders completed a 2-point, 3-point, and 4-point version of the School Attachment Scale for Children and Adolescents. Participants came from 11 schools, different in terms of socioeconomic status. The children received the versions approximately three weeks apart. Results revealed that as the number of response options increased, the means tended to decrease and the distribution to be normal. For the 2-point version, most items were below the cut-off point in terms of discrimination indexes. Compared to the 2-point version, there was a significant increase in discrimination indexes for the 3- and 4-point versions, and the items’ discrimination indexes were high. It was concluded that the reliability coefficient increased with an increasing number of response options for all subdimensions of the scale. When the validity estimations of the three subdimensions were examined for the three versions of the scale, it was found that the 3- and 4-point versions were appropriate for the validity and that the validity of the 2-point version was weak. It was observed that using 2-point Likert-type scales with children negatively affected the psychometric properties and that these properties improved with an increased number of response options.  相似文献   

12.
主观题评分标准研究   总被引:1,自引:0,他引:1  
本文以2006年上海市高考政治学科论述题评分标准为例,从三个方面研究如何评价主观题评分标准的优劣,即每个评分项是否具有相对独立性;根据若干评分项的结果是否能够推测出考生的综合论述的能力;每个评分项等第划分是否合理。因子分析表明该主观题四个评分项具有单维性,一个因子可以解释为考生的综合论述能力。相关分析表明四个评分项均具有相对独立性,对推测考生的综合论述能力起到了彼此独立的作用。Rasch评分量表模型分析显示,各评分项等级划分基本合理,但个别等级出现信息量不足,在此基础上,提出了改进评分标准的若干建议。  相似文献   

13.
主观测试实施过程中,由于存在多种因素导致最终测试结果的信度和效度降低,因此,对影响测试信度和效度各种因素的发现和分析就显得格外重要.本文主要介绍基于试题反应理论的多侧面模式产生背景、基本框架、在国内外教育测评上的典型应用以及此模式的局限性,从而说明多侧面模式作为一种新的测评模式,可以较全面地找出影响测试信度和效度的因素,特别是评分员主观效应因素,并能够对其进行客观分析.近年来,该模式在国内外教育测评上的应用也越来越广泛.  相似文献   

14.
A pilot study was conducted to evaluate and improve the rating procedure proposed for use in a research effort designed to assess the essay writing ability of college sophomores.Generalizability theory and the Many-Facet Rasch Model were each used to (a) estimate potential sources of error in the rating, (b) to obtain reliability estimates, and (c) to make recommendations for improving the rating process. Variance due to Task (writing prompt) and the Person-by-Task interaction were high while the variance attributable to Raters and Occasion was low. Twenty-two percent of the variability in the ratings was unexplained. The common and unique features of generalizability theory and the Many-Facet Rasch Model are described, and the advantages and disadvantages of each are discussed.  相似文献   

15.
The article examines theoretical issues associated with measurement in the human sciences and ensuring data from rating scale instruments are measures. An argument is made that using raw scores from rating scale instruments for subsequent arithmetic operations and applying linear statistics is less preferable than using measures. These theoretical matters are then illustrated by a report on the application of the Rasch Rating Scale Model in an investigation into elementary school classroom learning culture.  相似文献   

16.
17.
A method for combining multiple scale responses from job or task surveys based on a hierarchical ranking scheme is presented. A rationale for placing the resulting ordinal information onto an interval scale of measurement using the Rasch Rating Scale Model is also provided. After a simple linear transformation, the item or task parameter estimates can be used to obtain item weights to be used in constructing test blueprints. Prior weights can then be used to modify the item weights after data collection, based either on content balancing requirements or Bayesian prior content weights from SMEs (subject matter experts). Finally a method is suggested to link two or more surveys, again using the Rasch Rating Scale Model and the computer program, Bigsteps, when it is desirable to shorten the length of the typical job or task survey.  相似文献   

18.
This study established a Chinese scale for measuring high school students’ ocean literacy. This included testing its reliability, validity, and differential item functioning (DIF) with the aim of compensating for the lack of DIF tests focusing on current scales. The construct validity and reliability were verified and tested by analyzing the established scale’s items using the Rasch model, and a gender DIF test was conducted to ensure the test results’ fairness when distinct groups were compared simultaneously. The results indicated that the scale established in this study is unidimensional and possesses favorable internal consistency and construct validity. The gender DIF test results indicated that several items were difficult for either female or male students to correctly answer; however, the experts and scholars discussed these items individually and suggested retaining them. The final Chinese version of the ocean literacy scale developed here comprises 48 items that can reflect high school students’ understanding of ocean literacy—which helps students understand the topics of marine science encountered in real life.  相似文献   

19.
The authors report on the development of a brief dyslexia screening measure based on revising the 65-item Hong Kong Behaviour Checklist of Specific Learning Difficulties in Reading and Writing. Teachers’ ratings of 1063 primary students aged 6–14 years on the behaviour checklist provided data for its psychometric evaluation using traditional measurement and Rasch measurement model analyses. Rasch scaling suggested that the revised 36-item checklist could be regarded as a unidimensional scale that assesses global dyslexic dysfunction, and receiver operating characteristics analysis suggested that a score of 18 could be an optimal cut-off score when it is used as a dyslexia screening measure. The validity of this revised checklist was supported by its substantial and significant correlations with external measures of literacy and cognitive skills. Implications of the findings for the use of adaptive testing to provide an effective procedure for screening are discussed.  相似文献   

20.
Because the psychological assessment of high ability usually concentrates on intelligence testing, it is pertinent to discuss the validity of intelligence test batteries. The well‐known Wechsler's scales are analyzed and evaluated. Based on psychometric models, especially the Rasch model, analyses are made of some German editions, which show that hardly a single subtest scores fairly. That is, the true extent of testees’ abilities will not be correctly represented by the scores obtained under current scoring rules. Since many of the items of the analyzed editions correspond to items of the American edition (WISC‐R), the same shortcomings must also be suspect for that test battery. In this light, the administration of these tests is no longer acceptable. However, it is shown that Wechsler's basic concept is worthwhile when accompanied by (modern) psychometric tools: a new (German) test battery, AID, is introduced which, in particular, conforms to economic requirements if high ability is to be assessed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号