首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The statistical conclusion validity of early intervention research studies was examined by conducting a post hoc power analysis of 484 statistical tests from 49 early intervention articles. Statistical power determinations were made based on Cohen's (1977) criteria for small, medium, and large effect sizes. The analysis revealed that the median power to detect small, medium, and large effect sizes ranged from .08 to .46. Four percent of early intervention studies had adequate power (.80 or greater) to detect medium intervention effects and 18% to detect large intervention effects. The power values suggest poor statistical conclusion validity in the analyzed research and should alert investigators to the possibility of Type II experimental errors in the early intervention research literature. The argument is made that low statistical conclusion validity has practical consequences in relation to program evaluation and cost-effectiveness determinations.  相似文献   

2.

The APA Task Force on Statistical Inference recently recommended reporting effect sizes alongside results of statistical significance tests. The purpose of this article is to investigate effect size usage in gifted education research and to follow up on a similar investigation published by Plucker (1997). A content analysis of effect size reporting was conducted of articles published in the Journal for the Education of the Gifted, Roeper Review, and Gifted Child Quarterly from 1995–2000. Results of the present study were similar to the findings of Plucker (1997): No statistical difference in reporting was found across journals or across years, and a moderate difference was found between effect size reporting in univariate versus multivariate statistics. The benefits to gifted education research of understanding the relationship among sample size, effect size, and statistical power are discussed.  相似文献   

3.
A calculation of the probability of rejecting H0 when it should be rejected (power) was completed on each of the 66 applicable articles in Volumes 6 and 7 (1969, 1970) of the Journal of Research in Science Teaching. These power calculations utilized the effect size definitions and tables developed by Cohen (1969). The mean power of each article to detect small, medium, and large effect sizes was determined from its major statistical tests. These mean powers were then compiled and analyzed. The powers calculated for the different effect sizes were disturbingly low (small, 0.22; medium, 0.71; large, 0.87) but not generally as low as Cohen (1962) found in an analysis of another behavioral journal. Recommendations for improving confidence in research in science teaching is provided and centers on significant increases in sample sizes and an understanding of power and its relation to a, effect size and sample size.  相似文献   

4.
This study adapted an effect size measure used for studying differential item functioning (DIF) in unidimensional tests and extended the measure to multidimensional tests. Two effect size measures were considered in a multidimensional item response theory model: signed weighted P‐difference and unsigned weighted P‐difference. The performance of the effect size measures was investigated under various simulation conditions including different sample sizes and DIF magnitudes. As another way of studying DIF, the χ2 difference test was included to compare the result of statistical significance (statistical tests) with that of practical significance (effect size measures). The adequacy of existing effect size criteria used in unidimensional tests was also evaluated. Both effect size measures worked well in estimating true effect sizes, identifying DIF types, and classifying effect size categories. Finally, a real data analysis was conducted to support the simulation results.  相似文献   

5.
Researchers are often interested in testing the effectiveness of an intervention on multiple outcomes, for multiple subgroups, at multiple points in time, or across multiple treatment groups. The resulting multiplicity of statistical hypothesis tests can lead to spurious findings of effects. Multiple testing procedures (MTPs) are statistical procedures that counteract this problem by adjusting p values for effect estimates upward. Although MTPs are increasingly used in impact evaluations in education and other areas, an important consequence of their use is a change in statistical power that can be substantial. Unfortunately, researchers frequently ignore the power implications of MTPs when designing studies. Consequently, in some cases, sample sizes may be too small, and studies may be underpowered to detect effects as small as a desired size. In other cases, sample sizes may be larger than needed, or studies may be powered to detect smaller effects than anticipated. This paper presents methods for estimating statistical power for multiple definitions of statistical power and presents empirical findings on how power is affected by the use of MTPs.  相似文献   

6.
Meta-analysis is a statistical method that is increasingly utilized to combine and compare the results of previous primary studies. However, because of the lack of comprehensive guidelines for how to use meta-analysis, many meta-analysis studies have failed to consider important aspects, such as statistical programs, power analysis, publication bias, model selection, test of heterogeneity, and identification of heterogeneity. Therefore, the current study reviewed 84 meta-analysis studies conducted in Korea to examine proper application of the six categories named above. With regard to the issue of effect sizes, it was found that most of the meta-analysis studies obtained more than ten effect sizes, which seem to be an adequate number for representing an issue. However, many studies failed to consider the other issues: power analysis, publication bias, model selection, test of heterogeneity, and identification of heterogeneity.  相似文献   

7.
Abstract

This paper and the accompanying tool are intended to complement existing supports for conducting power analysis tools by offering a tool based on the framework of Minimum Detectable Effect Sizes (MDES) formulae that can be used in determining sample size requirements and in estimating minimum detectable effect sizes for a range of individual- and group-random assignment design studies and for common quasi-experimental design studies. The paper and accompanying tool cover computation of minimum detectable effect sizes under the following study designs: individual random assignment designs, hierarchical random assignment designs (2-4 levels), block random assignment designs (2-4 levels), regression discontinuity designs (6 types), and short interrupted time-series designs. In each case, the discussion and accompanying tool consider the key factors associated with statistical power and minimum detectable effect sizes, including the level at which treatment occurs and the statistical models (e.g., fixed effect and random effect) used in the analysis. The tool also includes a module that estimates for one and two level random assignment design studies the minimum sample sizes required in order for studies to attain user-defined minimum detectable effect sizes.  相似文献   

8.
We present statistical tests for departures from random expectation in spatial memory tasks. We consider two common protocols for spatial memory experiments. In the first one, subjects are allowed to search a fixed number of sites. In the second protocol, subjects are allowed to search until they achieve a fixed number of successes. In either of these protocols, the subjects involved may or may not revisit sites that have been previously searched or exploited. This yields four situations to consider: fixed number of sites searched or fixed number of successes, with or without revisits. We derive analytical expressions for the probability mass functions, expectations, and variances associated with each type of null hypothesis. We present three statistical tests of these hypotheses: the Kolmogorov-Smirnov test, the ordinary sign test, and theZ test. We use our results to demonstrate a priori calculation of sample sizes and statistical power and to consider a mixed model of sampling with and without replacement.  相似文献   

9.
This paper presents the results of a simulation study to compare the performance of the Mann-Whitney U test, Student?s t test, and the alternate (separate variance) t test for two mutually independent random samples from normal distributions, with both one-tailed and two-tailed alternatives. The estimated probability of a Type I error was controlled (in the sense of being reasonably close to the attainable level) by all three tests when the variances were equal, regardless of the sample sizes. However, it was controlled only by the alternate t test for unequal variances with unequal sample sizes. With equal sample sizes, the probability was controlled by all three tests regardless of the variances. When it was controlled, we also compared the power of these tests and found very little difference. This means that very little power will be lost if the Mann-Whitney U test is used instead of tests that require the assumption of normal distributions.  相似文献   

10.
Statistical power was estimated for 3 randomization tests used with multiple-baseline designs. In 1 test, participants were randomly assigned to baseline conditions; in the 2nd, intervention points were randomly assigned; and in the 3rd, the authors used both forms of random assignment. Power was studied for several series lengths (N = 10, 20, 30), several effect sizes (d = 0, 0.5, 1.0, 1.5, 2.0), and several levels of autocorrelation among the errors (p 1 = 0, .1, .2, .3, .4, and .5). Power was found to be similar among the 3 tests. Power was low for effect sizes of 0.5 and 1.0 but was often adequate (> .80) for effect sizes of 1.5 and 2.0.  相似文献   

11.
Previous studies so far have investigated various aspects of cyberbullying. Using meta‐analytic approaches, the study was primarily to determine the target factors predicting individuals’ perpetration and victimization in cyberbullying. A meta‐analysis of 77 studies containing 418 primary effect sizes was conducted to exam the relative magnitude of demographic, individual, and contextual predictors. Several study characteristics (i.e., sample age, sample gender, study location, publication status, and publication year) were further analyzed as moderators. The results showed the average effect size of each predictor for both cyberbully and cybervictim groups. Several significant shared and unique predictors were identified as important factors for designing effective prevention and intervention programs. The implications of the findings for future research were discussed in relation to interventions on cyberbullying.  相似文献   

12.
There have been many studies of the comparability of computer-administered and paper-administered tests. Not surprisingly (given the variety of measurement and statistical sampling issues that can affect any one study) the results of such studies have not always been consistent. Moreover, the quality of computer-based test administration systems has changed considerably over recent years, as has the computer-experience of students. This study synthesizes the results of 81 studies performed between 1997 and 2007. The estimated effect size across all studies was very small (–.01 weighted, .00 unweighted). Meta-analytic methods were used to ascertain whether grade (elementary, middle, or high school) or subject (English Language Arts, Mathematics, Reading, Science, or Social Studies) had an impact on comparability. Grade appeared to have no affect on comparability. Subject did appear to affect comparability, with computer administration appearing to provide a small advantage for English Language Arts and Social Studies test (effect sizes of .11 and .15, respectively), and paper administration appearing to provide a small advantage for Mathematics tests (effect size of??.06).  相似文献   

13.
This paper develops a new mathematical model of electrical power system,in which the transient saliency effect of synchronous machine is taken into account.Thecomputation results show that the new model has higher precision and less computationlabor.So it is suitable for the analysis and controller design of transient power system.  相似文献   

14.
本文应用Greig-Smith邻接格子样方法取样和区组分析 ,对广西桂林石山次生灌丛的主要种群进行种间分布格局和种间相关性分析 ,结果表明 :1、广西桂林石山次生灌丛四种主植物种群九龙藤 (BauhiniaChampioniBenth)、薄叶鼠李 (RhamnusLeptophyllaSchneid)、小果蔷薇 (RosaCymosaTratt)和竹叶椒 (ZanthoxylumplanispinumS.etZ)种群在次生灌丛群落中呈集群分布趋势 ,并且各种群的集群分布格局的聚块强度大小与取样面积大小有关。2、广西桂林石山次生灌丛的主要种群间性状类似的种群比较容易表现为负相关关系 ,反之 ,往往表现为正相关 ,或者相关关系复杂化  相似文献   

15.
The authors investigated 2 issues concerning the power of latent growth modeling (LGM) in detecting linear growth: the effect of the number of repeated measurements on LGM's power in detecting linear growth and the comparison between LGM and some other approaches in terms of power for detecting linear growth. A Monte Carlo simulation design was used, with 3 crossed factors (growth magnitude, number of repeated measurements, and sample size) and 1,000 replications within each cell condition. The major findings were as follows: For 3 repeated measurements, a substantial proportion of samples failed to converge in structural equation modeling; the number of repeated measurements did not show any effect on the statistical power of LGM in detecting linear growth; and the LGM approach outperformed both the dependent t test and repeated-measures analysis of variance (ANOVA) in terms of statistical power for detecting growth under the conditions of small growth magnitude and small to moderate sample size conditions. The multivariate repeated-measures ANOVA approach consistently underperformed the other tests.  相似文献   

16.
An entire elementary school system with 60% white and 40% black pupils was given several abiity tests group-administered by 12 white and eight black examiners (Es). The tests measured verbal and nonverbal IQ, perceptual-motor cognitive development, “speed and persistence” under neutral and motivating instructions, listening-attention, and short-term rote memory for numbers. With the exception of the “speed and persistence” test, on which white Es yielded significantly and consistently higher mean scores than black Es for both white and black pupils across grades one to six, the results for the various cognitive ability tests showed that the race of the E did not produce large or consistent effects in the testing of white and black pupils.  相似文献   

17.
本文用微扰理论的方法讨论了原子核的有限大小对原子能缓的影响,并对介原子进行了具体分析,得到了对较重元素的介原子其体积效应不可忽视的结论。  相似文献   

18.
The objective was to examine the impact of different types of accommodations on performance in content tests such as mathematics. The meta‐analysis included 14 U.S. studies that randomly assigned school‐aged English language learners (ELLs) to test accommodation versus control conditions or used repeated measures in counter‐balanced order. Individual effect sizes (Glass's d) were calculated for 50 groups of ELLs and 32 groups of non‐ELLs. Individual effect sizes for English language and native language accommodations were classified into groups according to type of accommodation and timing conditions. Means and standard errors were calculated for each category. The findings suggest that accommodations that require extra printed materials need generous time limits for both the accommodated and unaccommodated groups to ensure that they are effective, equivalent in scale to the original test, and therefore more valid owing to reduced construct‐irrelevant variance. Computer‐administered glossaries were effective even when time limits were restricted. Although the Plain English accommodation had very small average effect sizes, inspection of individual effect sizes suggests that it may be much more effective for ELLs at intermediate levels of English language proficiency. For Spanish‐speaking students with low proficiency in English, the Spanish test version had the highest individual effect size (+1.45).  相似文献   

19.
A resampling study was conducted to compare the statistical bias and standard errors of nonequivalent-groups linear test equating in small samples of examinees. Sample sizes of 15, 25, 50, and 100 were examined. One thousand samples of each size were drawn with replacement from each of 5 archival data files from teacher subject area tests. For each test, data files from 2 parallel forms were used. Results suggest trivial levels of equating bias even with small samples, but substantial increases in standard errors as sample size decreases. Results were interpreted in terms of applications to testing situations in which small numbers of examinees are available.  相似文献   

20.
The early detection of item drift is an important issue for frequently administered testing programs because items are reused over time. Unfortunately, operational data tend to be very sparse and do not lend themselves to frequent monitoring analyses, particularly for on‐demand testing. Building on existing residual analyses, the authors propose an item index that requires only moderate‐to‐small sample sizes to form data for time‐series analysis. Asymptotic results are presented to facilitate statistical significance tests. The authors show that the proposed index combined with time‐series techniques may be useful in detecting and predicting item drift. Most important, this index is related to a well‐known differential item functioning analysis so that a meaningful effect size can be proposed for item drift detection.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号