首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
Although a few studies report sizable score gains for examinees who repeat performance‐based assessments, research has not yet addressed the reliability and validity of inferences based on ratings of repeat examinees on such tests. This study analyzed scores for 8,457 single‐take examinees and 4,030 repeat examinees who completed a 6‐hour clinical skills assessment required for physician licensure. Each examinee was rated in four skill domains: data gathering, communication‐interpersonal skills, spoken English proficiency, and documentation proficiency. Conditional standard errors of measurement computed for single‐take and multiple‐take examinees indicated that ratings were of comparable precision for the two groups within each of the four skill domains; however, conditional errors were larger for low‐scoring examinees regardless of retest status. In addition, on their first attempt multiple‐take examinees exhibited less score consistency across the skill domains but on their second attempt their scores became more consistent. Further, the median correlation between scores on the four clinical skill domains and three external measures was .15 for multiple‐take examinees on their first attempt but increased to .27 for their second attempt, a value, which was comparable to the median correlation of .26 for single‐take examinees. The findings support the validity of inferences based on scores from the second attempt.  相似文献   

The article presents the results of a study focusing on the family situation, education and interpersonal relations of adults (26–35 years old) who in their adolescence (16–19 years old) displayed exceptional giftedness. One group of those surveyed were national academic award winners (90). The control group consisted of 90 people of no outstanding academic achievement. The research found many differences between these two groups, both in the family situation and in interpersonal relations. High achievers were raised in families of higher social and professional status, and almost 72.2% of them decided to continue their academic career after they had graduated from university. The national academic award winners showed higher scores in shyness and lower scores in sociability in interpersonal relations.  相似文献   

This study examines cognitive ability profiles of children with specific age-based normative weaknesses in reading comprehension and compares those profiles to the profiles of (a) children with at least average achievement in reading comprehension, reading decoding skills, and mathematics and (b) children with low achievement across the 3 achievement areas. When compared across 9 cognitive ability composite scores derived from Cattell–Horn–Carroll theory and measured by the Woodcock–Johnson III [Woodcock, McGrew, & Mather (2001). Woodcock–Johnson. Itasca, IL: Riverside], groups differed in overall level of performance. When individual abilities were considered, the poor comprehenders scored significantly lower than the average achievement group on all nine composite scores and significantly lower than the normative population on all composite scores except Processing Speed and Long-Term Retrieval. In contrast, the poor comprehenders also scored significantly higher than the low achievement group on all composite scores except for Visual–Spatial Thinking and Phonemic Awareness. Although the poor comprehenders as a group scored lowest on composite scores measuring language- and knowledge-based abilities, review of the profiles of individual poor comprehenders revealed no consistent pattern of performance across cognitive ability composite scores.  相似文献   

Every year, thousands of college and university applicants with learning disabilities (LD) present scores from standardized examinations as part of the admissions process for postsecondary education. Many of these scores are from tests administered with nonstandard procedures due to the examinees' learning disabilities. Using a sample of college students with LD and a control sample, this study investigated the criterion validity and comparability of scores on the Miller Analogies Test when accommodations for the examinees with LD were in place. Scores for examinees with LD from test administrations with accommodations were similar to those of examinees without LD on standard administrations, but less well associated with grade point averages. The results of this study provide evidence that although scores for examinees with LD from nonstandard test administrations are comparable to scores for examinees without LD, they have less criterion validity and are less meaningful for their intended purpose.  相似文献   

Clear personality differences were found for a sample of academically talented students when compared to a general population of same age students. On the Myers‐Briggs dimensions, the academically talented students differed significantly from the comparison group on all four dimensions. Specifically, the academically talented group expressed greater preferences for introversion, intuition, and thinking. Although there were more judging types in this group than in the comparison group, overall more academically talented students expressed a preference for a perceptive style. They also tended to be higher on achievement motivation and lower on interpersonal and social concerns. In particular, a cognitive style that empasizes a thinking over a feeling mode appears to mediate gender differences in mathematics ability and achievement.  相似文献   

The purpose of this study was to assess the dimensionality of two forms of a large-scale standardized test separately for 3 ethnic groups of examinees and to investigate whether differences in their latent trait composites have any impact on unidimensional item response theory true-score equating functions. Specifically, separate equating functions for African American and Hispanic examinees were compared to those of a Caucasian group as well as the total test taker population. On both forms, a 2-dimensional model adequately accounted for the item responses of Caucasian and African American examinees, whereas a more complex model was required for the Hispanic subgroup. The differences between equating functions for the 3 ethnic groups and the total test taker population were small and tended to be located at the low end of the score scale.  相似文献   

18 severely malnourished children (IM) who participated in a 3-year home-visiting program were compared with 2 other comparison groups comprising 17 severely malnourished (NIM) and 19 adequately nourished children (controls). On enrollment, all the groups were in the same hospital, and both malnourished groups had lower developmental levels than the controls. The IM group received intervention for 3 years after hospitalization, consisting of weekly or 2 weekly home visits with toy demonstrations. At 7, 8, 9, and 14 years after leaving the hospital, the 3 groups were compared on tests of school achievement and IQ. The NIM group showed no sign of reducing their deficits, and at the 14-year follow-up they had markedly lower scores on the WISC verbal and performance scales, the Wide Range Achievement Test (WRAT), and the Peabody Picture Vocabulary Test (PPVT), than the controls. Throughout the follow-up the IM group's scores were intermediate between the NIM and the controls in every test. At the 14-year follow-up, their scores were significantly higher than those of the NIM group in the WISC verbal scale, and the difference approached significance in the WRAT. We conclude that psychosocial intervention should be an integral part of treatment for severely malnourished children.  相似文献   

The Woodcock Language Proficiency Battery (WLPB) was administered to 18 learning-disabled adolescents from a culturally variant population in rural southern Louisiana and to a matching group of normal achievers. ANOVA results revealed that group status (learning-disabled vs. normally achieving) had a significant effect on cluster scores (Oral Language, Reading, and Written Language), p < .001. Significant effects for cluster scores were not discovered; however, the significant interaction (p < .05) between subjects and cluster scores was determined to be the result of group differences on all cluster scores. Kolmogorov-Smirnov Z-test results indicated that all WLPB cluster scores for the learning-disabled group were significantly lower (p < .001) than the test's means for each of these clusters; but, for the achieving group, only the Oral Language Cluster mean was significantly lower than the WLPB mean, p < .001. Additional t-test investigation revealed that the learning-disabled group means were significantly lower than the achieving group means on all subtests except for Picture Vocabulary. A modified contrastive linguistic analysis did not uncover the existence of test bias for semantics when error responses were evaluated. The results of this study suggest that the WLPB may be a useful tool for culturally diverse students when interpretation is based on a community norm perspective.  相似文献   

IN THIS STUDY the relationship between perceived competence and performance was addressed, notably with respect to motor behaviour. In addition to Harter's (1985) perceived competence questionnaire, a motor skills test was administered to a group of 128 4th grade children of regular elementary schools. From this sample two samples were chosen. One group (N= 40) was involved in motoric remedial teaching classes, a group of classmates (N =38) was selected as a matched comparison group from the remainder of the sample. Both groups were compared on each of the measuring instruments. The motoric remedial teaching group scored significantly lower than the group of classmates on perceived athletic competence and on motor skills performance. Further, a significant difference was observed between both groups on the correlation coefficient between perceived athletic competence and motor skills performance. The results provide information which externally validates the notion of perceived competence; they also underline the relevance of the concept of perceived athletic competence in the field of motoric remedial teaching.  相似文献   

This study evaluated the effect of sound-symbol association training on visual and phonological memory in children with a history of dyslexia. Pretests of phonological and visual memory, a sound-symbol training procedure, and phonological and visual memory posttests were administered to children with dyslexia, to children whose dyslexia had been compensated through remedial training, and to age- and reading level-matched comparison groups. Deficits in visual and phonological memory and memory for sound-symbol associations were demonstrated in the dyslexia group. For children with dyslexia and children whose dyslexia had been remediated, the sound-symbol training scores were significantly associated with word and pseudoword reading scores and were significantly lower than those of the comparison groups. Children with dyslexia and children whose dyslexia had been compensated showed significantly less facilitation of phonological memory following the training than did typical readers. Skilled readers showed some reduction in accuracy of visual memory following the training, which may be the result of interference of verbalization with a predominantly visual task. A parallel decrease was not observed in the children with dyslexia, possibly because these children did not use the verbal cues. Children with dyslexia and children whose dyslexia had been compensated seemed to have difficulty encoding the novel sounds in memory. As a result, they derived less phonological memory advantage and less visual memory interference from the training than did typical readers. Children in the compensated dyslexia group scored lower on sound-symbol training than their age peers. In other respects, the scores of these children were equivalent to those of the typically reading comparison groups. Children in the compensated dyslexia group exhibited higher phonological rehearsal, iconic memory, and associative memory scores than children in the dyslexia group. Implications for the remediation of dyslexia are discussed.  相似文献   

The person-fit literature assumes that aberrant response patterns could be a sign of person mismeasurement, but this assumption has rarely, if ever, been empirically investigated before. We explore the validity of test responses and measures of 10-year-old examinees whose response patterns on a commercial standardized paper-and-pencil mathematics test were flagged as aberrant. Validity evidence was collected through postexamination reflective interviews with 31 of the 80 pupils flagged as aberrant and their teachers, and teacher assessment (TA) judgments for the whole examination cohort of 674 examinees. Analysis suggested that interview-adjusted scores were significantly better fitting than expected by chance, but only some adjustments suggest serious mismeasurement. In addition, disagreement between TA and test scores was significantly greater for aberrant examinees, and partially predicted the interview adjustments. We conclude that person misfit statistics when combined with TA might be a useful antidote to mismeasurement, and we discuss the implications for assessment research and practice.  相似文献   

特教中专残疾学生SCL-90评定结果分析   总被引:4,自引:2,他引:4  
本研究采用分层随机取样法,自山东省特教中专选取盲生、肢残生和聋生共200人,自济南民政学校抽取健全生200人,以SCL-90量表为研究工具,按照严格的施测程序来完成的.研究表明残疾学生总的心理健康水平极其显著地低于健全学生;由于残疾原因不同,三类残疾学生心理障碍的表现也略有不同肢残生在躯体化得分上显著高于聋生,在人际关系敏感上显著高于盲生.  相似文献   

Formula scoring is a procedure designed to reduce multiple-choice test score irregularities due to guessing. Typically, a formula score is obtained by subtracting a proportion of the number of wrong responses from the number correct. Examinees are instructed to omit items when their answers would be sheer guesses among all choices but otherwise to guess when unsure of an answer. Thus, formula scoring is not intended to discourage guessing when an examinee can rule out one or more of the options within a multiple-choice item. Examinees who, contrary to the instructions, do guess blindly among all choices are not penalized by formula scoring on the average; depending on luck, they may obtain better or worse scores than if they had refrained from this guessing. In contrast, examinees with partial information who refrain from answering tend to obtain lower formula scores than if they had guessed among the remaining choices. (Examinees with misinformation may be exceptions.) Formula scoring is viewed as inappropriate for most classroom testing but may be desirable for speeded tests and for difficult tests with low passing scores. Formula scores do not approximate scores from comparable fill-in-the-blank tests, nor can formula scoring preclude unrealistically high scores for examinees who are very lucky.  相似文献   

This study assessed the ability of history students to choose the essay topic on which they can get the highest score. A second, equally important question was whether the score on the chosen topic was more highly related to other indicators of proficiency in history than the score on the unchosen topic. Overall, for both U.S. and European history, scores were about one third of a standard deviation higher for the preferred topic than for the other topic. For U.S. history, about 32% of the students made the wrong choice; that is, 32% got a higher score on the other topic than on the preferred topic. In European history, 29% made the wrong choice. In the U.S. history sample, the preferred essay correlated .40 with an external criterion score, compared to .34 for the other essay; in the European history sample, the preferred essay correlated .52 with the external criterion, compared to .44 for the other topic.  相似文献   

Research Findings: The current study looks at the validity of a voluntary self-report Quality Rating and Improvement System (QRIS) and the characteristics of participating childcare centers. The self-reported quality indicators are compared to external ratings of quality Early Childhood Environment Rating Scale-Revised ([ECERS-R]) and correlated with variables such as size of center and number of state subsidy clients. ECERS-R scores were unrelated to capacity but significantly lower for centers with a large percentage of state-supported clients. Regarding self-reported quality, centers frequently under-reported their quality and what was claimed was not always externally validated, suggesting a self-report QRIS may not be an accurate assessment of quality. Additionally, no significant differences in quality were found between centers participating and those not participating in the self-report QRIS. Practice or Policy: Self-reported childcare quality was not accurate in this study. Although providers over-reported some quality, they frequently under-reported quality, by claiming fewer indicators than external validators found. When centers are unmotivated to participate in a voluntary, self-report QRIS, when items reported are the easiest to report, and when existing quality indicators are unreported, a self-reported QRIS cannot validly reflect quality. Because providers over-reported and under-reported quality criteria, it is doubtful the system truly incentivizes desired quality changes.  相似文献   

The trustworthiness of low-stakes assessment results largely depends on examinee effort, which can be measured by the amount of time examinees devote to items using solution behavior (SB) indices. Because SB indices are calculated for each item, they can be used to understand how examinee motivation changes across items within a test. Latent class analysis (LCA) was used with the SB indices from three low-stakes assessments to explore patterns of solution behavior across items. Across tests, the favored models consisted of two classes, with Class 1 characterized by high and consistent solution behavior (>90% of examinees) and Class 2 by lower and less consistent solution behavior (<10% of examinees). Additional analyses provided supportive validity evidence for the two-class solution with notable differences between classes in self-reported effort, test scores, gender composition, and testing context. Although results were generally similar across the three assessments, striking differences were found in the nature of the solution behavior pattern for Class 2 and the ability of item characteristics to explain the pattern. The variability in the results suggests motivational changes across items may be unique to aspects of the testing situation (e.g., content of the assessment) for less motivated examinees.  相似文献   

To investigate the influence of an innovative math fluency intervention, 36 middle‐school students were randomly assigned to either an experimental (the Detect, Practice, Repair [DPR]) or control condition (reading intervention). After covarying pretest scores, the DPR treatment produced a significantly higher (p = .016) adjusted mean (M) math score (M = 47.53, standard deviation [SD] = 3.26) for the intervention group when compared to the control group (M = 33.31, SD = 4.39). The intervention is described so that teachers and consulting school psychologists can implement the steps for individuals or groups (e.g., in a multitiered response to intervention model). © 2009 Wiley Periodicals, Inc.  相似文献   


The authors’ purpose was to explore the effects of a supplementary, guided, silent reading intervention with 80 struggling third-grade readers who were retained at grade level as a result of poor performance on the reading portion of a criterion referenced state assessment. The students were distributed in 11 elementary schools in a large, urban school district in the state of Florida. A matched, quasi-experimental design was constructed using propensity scores for this study. Students in the guided, silent reading intervention, Reading Plus, evidenced higher, statistically significant mean scores on the Florida Comprehensive Assessment Test criterion assessment measure of reading at posttest. The effect size, favoring the guided, silent reading intervention group was large, 1 full standard deviation, when comparing the 2 comparison groups’ mean posttest scores. As such, the results indicate a large advantage for providing struggling third-grade readers guided silent reading fluency practice in a computer-based practice environment. No significant difference was found between the treatment and control group on the Stanford Achievement Test–10 (SAT-10) posttest scores, although posttest scores for the treatment group trended higher than the control. After conducting a power analysis, it was determined that the sample size (n = 80) was too small to provide sufficient statistical power to detect a difference in third-grade students’ SAT-10 scores.  相似文献   

Sixty WISC-III protocols, administered by graduate students in training, were examined to obtain preliminary data on the frequency and types of administration and scoring errors that examinees commit. Results were compared with previous studies that have evaluated examiner errors on the Wechsler scales. In general, the present results were consistent with those of previous studies that have illustrated that a large number of scoring errors are committed by graduate students as well as by other professional groups. The majority of errors committed by participants in this study were general errors. That is, errors were not specific to a particular subtest. The five most frequent errors included failure to query, failure to record responses verbatim, reporting Full Scale IQ incorrectly, reporting Verbal IQ incorrectly, and adding individual subtest scores incorrectly. However, the traditional difficult to score Verbal subtests were not as troublesome for examiners in this study as they were for examiners in previous studies. In addition, significant decreases in the mean number of errors per protocol and in the number of most frequently occurring errors per protocol were noted. © 1998 John Wiley & Sons, Inc.  相似文献   

This study compared the motor activity technique of learning, using physical education activities, with traditional ways of developing science concepts with fifth grade slow learning children. Two groups of ten children each were equated on the basis of pretest scores. Both groups were taught by the same classroom teacher. One group was taught through motor activity learning and the other by traditional procedures. Both groups were retested after a two-week teaching period, and again after a three-month extended interval. The difference in the posttest scores favored the motor activity learning group, p < .01 (t = 4.33, df 9). The difference in the extended interval test also favored the same group, p < .001 (t = 6.37, df9). Using the differences in test scores as criteria for learning, the children in the motor activity learning group learned and retained significantly more than those in the traditional group.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号