首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The principle of aggregation states that the sum of a set of multiple measurements is a more stable and representative estimator than any single measurement. This greater representation occurs because there is inevitably some error associated with measurement. By combining numerous exemplars, such errors of measurement are averaged out, leaving a clearer view of underlying relationships. The present study explored the effect of score aggregation over various time periods on correlations among a number of reliable measures frequently used in open-field testing. Twenty-six male rats were given four open-field tests (4 min in duration) at 48-h intervals. Ambulation, rearing, and defecation responses were measured on a minute-by-minute basis in the open-field tests. Correlation matrices were calculated among the three measures for unaggregated scores (1-min totals) and for scores aggregated over daily tests, and mean correlation coefficients were computed for all three pairwise comparisons of the three response variables. These mean correlations were then compared to those obtained when the open-field measures were aggregated over all 4 test days. The results showed that aggregation produced substantial increases in correlation-coefficient magnitude. The correlation between ambulation and rearing increased from a mean of .39 to a value of .81. Similar increases were observed when defecation scores were correlated with ambulation (?.17 to ?.59) and rearing (?.16 to ?.49). Thus aggregation is an important factor to be considered in the design of psychobiological correlational studies.  相似文献   

2.
The No Child Left Behind Act of 2001 mandated statewide accountability testing and focused the accountability conversation on reading. Therefore, the current study examined the relationship between curriculum‐based measurement for reading (R‐CBM) and state accountability test scores, potential grade differences in relationship magnitude, and differences in relationship magnitude among R‐CBM and Maze as they compare to state test scores. Data for 5,472 students in Grades 3, 5, 7, and 8 were correlated and resulted in corrected coefficients that ranged from .51 (eighth graders) to .71 (third graders) for R‐CBM and .49 (eighth graders) to .54 (seventh graders) for Maze. The coefficients between R‐CBM and state test scores were significantly larger for third and fifth graders than those for eighth graders. No significant differences in magnitude were found between the correlation coefficients for state test scores to R‐CBM and to Maze among seventh or eighth graders. Potential implications and suggestions for future research are included. © 2006 Wiley Periodicals, Inc. Psychol Schs 43: 527–535, 2006.  相似文献   

3.
Equivalent forms of a ten-item completion test were constructed. The same test items then were rewritten in matching format and in multiple-choice format, resulting in two forms (A and B) of each of three types of test. All tests were administered to 73 examinees, and parallel-forms reliability coefficients (correlation between scores on A and B) were calculated. These empirically obtained values were compared to the values of the reliability coefficient predicted from theoretically derived equations which indicate the influence of chance success due to guessing on test reliability. In accordance with theory it was found that the completion test was more reliable than the matching test and that the matching test was more reliable than the multiple-choice test. The empirically obtained reliability coefficients were very close to those predicted from the mathematically derived formulas.  相似文献   

4.
利用20天的时间,每天10:00到15:00随机抽查,测量本校1000名大学生(男女生各500人)左右侧肱动脉血压的差异性;测量江苏省徐州市醒狮居民小区居民500名(男女各250人),年龄在20至69岁之间,探究血压与年龄和性别之间的关系。在大学生组中,将测量所得的结果通过χ2检验,u检验,其p值均大于0.05;典型相关分析检验中,相关系数约为1,说明大学生左右肱动脉血压无显著差异。在居民组中,运用matlab 7.0对数据进行拟合,结果显示随年龄的增长,舒张压、收缩压均逐渐升高,且收缩压的增长要大于舒张压的增长,男性的脉压整体高于女性,大约在45岁之后,双方差异逐渐减小。  相似文献   

5.
An essential question when computing test–retest and alternate forms reliability coefficients is how many days there should be between tests. This article uses data from reading and math computerized adaptive tests to explore how the number of days between tests impacts alternate forms reliability coefficients. Results suggest that the highest alternate forms reliability coefficients were obtained when the second test was administered at least 2 to 3 weeks after the first test. Even though reliability coefficients after this amount of time were often similar, results suggested a potential tradeoff in waiting longer to retest as student ability tended to grow with time. These findings indicate that if keeping student ability similar is a concern that the best time to retest is shortly after 3 weeks have passed since the first test. Additional analyses suggested that alternate forms reliability coefficients were lower when tests were shorter and that narrowing the first test ability distribution of examinees also impacted estimates. Results did not appear to be largely impacted by differences in first test average ability, student demographics, or whether the student took the test under standard or extended time. It is suggested that for math and reading tests, like the ones analyzed in this article, the optimal retest interval would be shortly after 3 weeks have passed since the first test.  相似文献   

6.
Since both intelligence and reading readiness tests are widely used to predict reading success in grade one, both types of tests were administered to all entering first-grade pupils in two schools. Performance on these measures was correlated with end-of-year teacher marks and scores on a standardized reading test. Although the predictive validity coefficients for the readiness test were larger than those for the intelligence test on both criteria, the differences were not statistically reliable. When IQ was added to reading readiness scores, the multiple correlation rose from .61 to .67. Adding father occupation, sex, and age failed to meaningfully increase the accuracy in prediction; the multiple R’s increased only to . 68.  相似文献   

7.
Reliability has a long history as one of the key psychometric properties of a test. However, a given test might not measure people equally reliably. Test scores from some individuals might have considerably greater error than others. This study proposed two approaches using intraindividual variation to estimate test reliability for each person. A simulation study suggested that the parallel tests approach and the structural equation modeling approach recovered the simulated reliability coefficients. Then in an empirical study, where 45 females were measured daily on the Positive and Negative Affect Schedule (PANAS) for 45 consecutive days, separate estimates of reliability were generated for each person. Results showed that reliability estimates of the PANAS varied substantially from person to person. The methods provided in this article apply to tests measuring changeable attributes and require repeated measures across time on each individual. This article also provides a set of parallel forms of PANAS.  相似文献   

8.
In order to assess familial resemblance for measures of reading performance, data from 314 pairs of twins in which at least one member of each pair is reading-disabled [142 monozygotic (MZ) and 172 dizygotic (DZ) twin pairs], 273 matched control pairs (131 MZ and 142 DZ pairs), and their parents were subjected to both correlation and regression analyses. Results indicate that parent-offspring resemblance in families of reading-disabled probands does not differ substantially from that in families of controls. In general, the correlations and regressions for MZ twin pairs are greater than those for DZ twins; thus, individual differences in reading performance are due at least in part to heritable influences. As expected, regression coefficients are consistently larger than correlation coefficients for both parent-offspring and proband-cotwin comparisons in the reading-disabled sample, illustrating that regression analyses are more appropriate than correlations for assessing familial resemblance in selected samples.  相似文献   

9.
In this study, we examined the reliability and validity of two curriculum-based measures as indicators of performance in a content-area classroom. Participants were 58 students in a 7th-grade social studies class. CBM measures were student- and administrator-read vocabulary-matching probes. Criterion measures were knowledge pre- and post-tests, the social studies subtest of the Iowa Test of Basic Skills, and student grades. Results revealed moderate alternate-form reliability for both vocabulary-matching measures. Reliability of the measures was increased by combining scores across two testing sessions. Correlations between the predictor and criterion variables were moderate to moderately strong, with the exception of those between vocabulary-matching and student grades. Observed scores for students with LD were lower than for students without LD on both student- and administrator-read vocabulary-matching measures. Few differences in reliability and validity coefficients were found between the student- and administrator-read measures. Results are discussed in terms of the use of CBM as a system for monitoring performance and designing interventions for students with learning disabilities in content-area classrooms.  相似文献   

10.
Using a sample of 908 eleventh grade science stream male and female students from similar socioeconomic area schools, variance based psychometric properties of three paper-and-pencil tests of logical thinking (Longeot test, Lawson's test TOFR, and Tobin and Capie's test TOLT) are investigated. A sub-sample of 212 students took the three tests in randomly allocated different sequential orders of presentation, while 696 students took only two tests. Alfa coefficients for each test separately and for the three tests combined together, concurrent validity coefficients, measures of item difficulty, item discrimination, item-criterion correlation, and 30-day stability coefficients are calculated. Considering the relative homogeneity of the sample, the reliability coefficients of the tests are judged satisfactory, but concurrent validity coefficients are quite low which implies incongruency in decisions made on the basis of the three tests. Need for estimating various psychometric parameters of alternative tests of logical thinking over different grade populations is emphasized.  相似文献   

11.
A key predictor of young people’s future outcomes is their level of academic achievement whilst at school. In England this is most commonly measured by achievement in GCSEs. However, not all pupils will have taken the same set of GCSE examinations as, for example, they may make different subject choices. For this reason, GCSE performance is often aggregated into a simple measure such as ‘mean GCSE grade’ before being used in statistical models. This paper investigates the merits of using an alternative method, based upon the relatively new technique of Generalised Boosting Models, which does not require for GCSE results in different subjects to be aggregated together. The importance of this research is that by evaluating the predictive performance of such a method we can ascertain how much useful information is lost in the process of GCSE aggregation. The results show that traditional predictions based upon simple aggregated measures of GCSE attainment are fairly similar to those based upon the more complex approach. This provides some confidence that, for the majority of outcomes, only a small amount of predictive information will be lost through the use of aggregated measures of GCSE performance.  相似文献   

12.
This research report describes the development of dance attitude scales which were 'grounded' in the opinions of groups of 11-16-year-old pupils in six schools throughout England. Factor analyses involving an initial sample of 368 male and female secondary school pupils produced four embryonic scales which were then tested with a sample of 1,668 adolescents. Satisfactory internal reliability coefficients were achieved and scale intercorrelations provided evidence of the scales as distinct measures. Further analyses undertaken to test validity employing data from the larger sample indicated that two of the proposed scales, Ballet and Male Dancers, may be valid measures of attitude, although less confidence can be placed in the third and fourth scales. Suggestions are made for the development and application of the inventory.  相似文献   

13.
The validity of open-field test measures was assessed by simple and multiple regression analysis, taking the difference in experience of a novel situation as the external criterion. Ambulation in the first 1 min, rearing, latency to urination, and urination score were valid indexes of a new operational concept presented in this study, and they were also reliable. The weighting coefficients were obtained for the reduction of multiple measure scores to a single score that represents the concept.  相似文献   

14.
The reliability and validity of the WRAT were investigated with 191 Mexican-American children. Internal consistency reliability coefficients for the WRAT were found to be high and comparable to those reported in the WRAT manual. Correlations between the WRAT subtests and those of the MAT all were significant and suggested moderate to high relationship between these two measures. It was concluded that the WRAT meets minimum requirements of reliability and validity with Mexican-American children.  相似文献   

15.
This report is a review of reliability data on the PPVT obtained from 32 research studies published between 1965 and 1974. Much of the research was done on Head Start children. Overall, the median of reliability coefficients reported here (0.72) has remained remarkably close to the original median of 0.77 found in standardizing the test. Unexpectedly, elapsed time between test and retest had only a slight effect on the reliability coefficients. However, as expected, the greater range in ages and ability levels of subjects, the higher were the reliabilities. For average children in the elementary grades, and for retarded people of all ages, PPVT scores remained relatively stable over time and there was close equivalence between alternate forms. Scores were least stable for preschool children, especially from minority groups. Black preschool girls were more variable in their performance on the PPVT than boys, and preschool girls generally were more responsive than boys to play periods conducted before testing was begun. A number of variables associated with examiners and setting affected the scores on the test. As expected, raw scores tended to yield slightly higher reliabilities than MA and considerably higher reliabilities than IQ scores.  相似文献   

16.
The relatedness of behavior elicited by reward reduction (successive negative contrast procedure) and behaviors produced by three animal models of anxiety (open-field emergence, elevated plus-maze, and context-shock fear conditioning) was examined by correlational and factor analytic procedures. Factor analysis (oblique rotation) indicated substantial independence among the tests: Trials 1 and 2 of the plus-maze loaded on two different factors unaccompanied by any other test; open-field emergence and context-shock fear loaded on the same factor; and negative contrast loaded on a fourth factor. However, negative contrast proved to be a dynamic process, with factor loadings changing across a 4-day postshift period—moving from an independent loading on the 1st postshift day to being clustered with context-shock fear and open-field emergence on the 2nd and 3rd postshift days to being clustered with just context-shock fear on the last postshift day. These latter data support a multistage theory of successive negative contrast.  相似文献   

17.
Intercorrelations among multiple true-false items were examined to determine to what extent each true-false option can be treated as independent. Results from 157 health science students and 170 medical students showed that correlations between true-false options associated with the same stem were from 2.6 to 7.0 times larger than those from different stems. This suggests that results from previous research indicating that each true-false option could be treated as an independent item cannot be generalized to other tests and examinee populations without supporting evidence. Four scoring methods were explored which varied chance success levels and scoring for partial knowledge. The results showed that scoring methods incorporating partial knowledge were more reliable and possessed greater concurrent and predictive validity than those minimizing chance success. Methods for computing reliability estimates were compared and suggestions were offered regarding practical use  相似文献   

18.
19.
The standard error of measurement usefully provides confidence limits for scores in a given test, but is it possible to quantify the reliability of a test with just a single number that allows comparison of tests of different format? Reliability coefficients do not do this, being dependent on the spread of examinee attainment. Better in this regard is a measure produced by dividing the standard error of measurement by the test's ‘reliability length’, the latter defined as the maximum possible score minus the most probable score obtainable by blind guessing alone. This, however, can be unsatisfactory with negative marking (formula scoring), as shown by data on 13 negatively marked true/false tests. In these the examinees displayed considerable misinformation, which correlated negatively with correct knowledge. Negative marking can improve test reliability by penalizing such misinformation as well as by discouraging guessing. Reliability measures can be based on idealized theoretical models instead of on test data. These do not reflect the qualities of the test items, but can be focused on specific test objectives (e.g. in relation to cut‐off scores) and can be expressed as easily communicated statements even before tests are written.  相似文献   

20.
There has been a growing consensus among the educational measurement experts and psychometricians that test taker characteristics may unduly affect the performance on tests. This may lead to construct-irrelevant variance in the scores and thus render the test biased. Hence, it is incumbent on test developers and users alike to provide evidence that their tests are free of such bias. The present study exploited generalizability theory to examine the presence of gender differential performance on a high-stakes language proficiency test, the University of Tehran English Proficiency Test. An analysis of the performance of 2,343 examinees who had taken the test in 2009 indicated that the relative contributions of different facets to score variance were almost uniform across the gender groups. Further, there is no significant interaction between items and persons, indicating that the relative standings of the persons were uniform across all items. The lambda reliability coefficients were also uniformly high. All in all, the study provides evidence that the test is free of gender bias and enjoys a high level of dependability.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号