首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In carefully examining the thesis of a paper by Anderson, Nelson, and Edgington (1984) concerning the socalled Fool’s Type IIa error, one realizes that certain fundamental statistical tenets have been overlooked or ignored. The purpose of the present paper is to discount the notion of a Fool’s Type IIa error under the Neyman-Pearson philosophy of testing statistical hypotheses and to highlight the need for improved statistical education related to hypothesis testing. If the importance of Type I and Type II errors cannot be quantified, then the Neyman-Pearson approach to hypothesis testing is of no value, and hence a Fool’s Type IIa error is irrelevant. If statistical testing errors are important and can be quantified, then adjustment for the Fool’s Type IIa error region is equivalent to increasing the probability of making a Type I error.  相似文献   

2.
We focus on the problem of ignoring statistical independence. A binomial experiment is used to determine whether judges could match, based on looks alone, dogs to their owners. The experimental design introduces dependencies such that the probability of a given judge correctly matching a dog and an owner changes from trial to trial. We show how this dependence alters the probability of a successful match of dog to owner, and thus alters the expected number of successful matches and the variance of this expected quantity. Finally, we show that a false assumption of independence that results in incorrect probability calculations changes the probability of incorrectly rejecting the null hypothesis (i.e. the Type I error).  相似文献   

3.
Multivariate analysis of variance (MANOVA) is widely used in educational research to compare means on multiple dependent variables across groups. Researchers faced with the problem of missing data often use multiple imputation of values in place of the missing observations. This study compares the performance of 2 methods for combining p values in the context of a MANOVA, with the typical default for dealing with missing data: listwise deletion. When data are missing at random, the new methods maintained the nominal Type I error rate and had power comparable to the complete data condition. When 40% of the data were missing completely at random, the Type I error rates for the new methods were inflated, but not for lower percents.  相似文献   

4.
Failure to meet the sphericity assumption in repeated measurements analysis of variance can have serious consequences for both omnibus and specific comparison tests. It is shown that, in educational research journals, the relevance of this assumption has hardly been recognized. The risk of an inflated Type I error rate can be minimized by calculating separate error terms and by applying conservative tests. This paper illustrates how this is done. Some notes on the use of three mainframe computer packages are also provided. It is argued that when using these packages, as specific comparisons are typically tested as planned instead of post hoc tests, the outcomes should be interpreted in a conservative sense.  相似文献   

5.
A study was conducted to determine if analysis of variance techniques are appropriate when the dependent variable has a dichotomous (zero-one) distribution. Several 1-, 2-, and 3-way analysis of variance configurations were investigated with regard to both the size of the Type I error and the Power. The findings show the analysis of variance to be an appropriate statistical technique for analyzing dichotomous data in fixed effects models where cell frequencies are equal under the following conditions: (a) the proportion of responses in the smaller response category is equal to or greater than .2 and there are at least 20 degrees of freedom for error, or (b) the proportion of responses in the smaller response category is less than .2 and there are at least 40 degrees of freedom for error.  相似文献   

6.
Experiments in captivity have provided evidence for social learning, but it remains challenging to demonstrate social learning in the wild. Recently, we developed network-based diffusion analysis (NBDA; 2009) as a new approach to inferring social learning from observational data. NBDA fits alternative models of asocial and social learning to the diffusion of a behavior through time, where the potential for social learning is related to a social network. Here, we investigate the performance of NBDA in relation to variation in group size, network heterogeneity, observer sampling errors, and duration of trait diffusion. We find that observation errors, when severe enough, can lead to increased Type I error rates in detecting social learning. However, elevated Type I error rates can be prevented by coding the observed times of trait acquisition into larger time units. Collectively, our results provide further guidance to applying NBDA and demonstrate that the method is more robust to sampling error than initially expected. Supplemental materials for this article may be downloaded from http://lb.psychonomic-journals.org/content/supplemental.  相似文献   

7.
Abstract

Researchers conducting structural equation modeling analyses rarely, if ever, control for the inflated probability of Type I errors when evaluating the statistical significance of multiple parameters in a model. In this study, the Type I error control, power and true model rates of famsilywise and false discovery rate controlling procedures were compared with rates when no multiplicity control was imposed. The results indicate that Type I error rates become severely inflated with no multiplicity control, but also that familywise error controlling procedures were extremely conservative and had very little power for detecting true relations. False discovery rate controlling procedures provided a compromise between no multiplicity control and strict familywise error control and with large sample sizes provided a high probability of making correct inferences regarding all the parameters in the model.  相似文献   

8.
The authors sought to identify through Monte Carlo simulations those conditions for which analysis of covariance (ANCOVA) does not maintain adequate Type I error rates and power. The conditions that were manipulated included assumptions of normality and variance homogeneity, sample size, number of treatment groups, and strength of the covariate-dependent variable relationship. Alternative tests studied were Quade's procedure, Puri and Sen's solution, Burnett and Barr's rank difference scores, Conover and Iman's rank transformation test, Hettmansperger's procedure, and the Puri-Sen-Harwell-Serlin test. For balanced designs, the ANCOVA F test was robust and was often the most powerful test through all sample-size designs and distributional configurations. With unbalanced designs, with variance heterogeneity, and when the largest treatment-group variance was matched with the largest group sample size, the nonparametric alternatives generally outperformed the ANCOVA test. When sample size and variance ratio were inversely coupled, all tests became very liberal; no test maintained adequate control over Type I error.  相似文献   

9.
This study explores higher education research in Asia. Drawing on scientometrics, the mapping of science and social network analysis, this paper examines the publications of 38 specialised journals on higher education over the past three decades. The findings indicate a growing number of higher education research publications but the proportion of Asian publications in relation to the total world publications in higher education research remains stationary. The higher education research community in Asia is heavily concentrated in a few countries and universities, resting on a relatively small number of core scholars who publish research in the international specialised higher education journals. In response to increasing challenges in Asian higher education systems, the paper suggests that the higher education research community in Asia needs to be expanded and include more regional and international collaborations  相似文献   

10.
Under the generalizability‐theory (G‐theory) framework, the estimation precision of variance components (VCs) is of significant importance in that they serve as the foundation of estimating reliability. Zhang and Lin advanced the discussion of nonadditivity in data from a theoretical perspective and showed the adverse effects of nonadditivity on the estimation precision of VCs in 2016. Contributing to this line of research, the current article directs the discussion of nonadditivity from a theoretical perspective to a practical application and highlights the importance of detecting nonadditivity in G‐theory applications. To this end, Tukey's test for nonadditivity is the only method to date that is appropriate for the typical single‐facet G‐theory design, in which a single observation is made per element within a facet. The current article evaluates the Type I and Type II error rates of Tukey's test. Results show that Tukey's test is satisfactory in controlling for falsely detecting nonadditivity when the data are actually additive and that it is generally powerful in detecting nonadditivity when it exists. Finally, the article demonstrates an application of Tukey's test in detecting nonadditivity in a judgmental study of educational standards and shows how Tukey's test results can be used to correct imprecision in the estimated VC in the presence of nonadditivity.  相似文献   

11.
The purpose of this simulation study was to assess the performance of latent variable models that take into account the complex sampling mechanism that often underlies data used in educational, psychological, and other social science research. Analyses were conducted using the multiple indicator multiple cause (MIMIC) model, which is a flexible and effective tool for relating observed and latent variables. The data were simulated in a hierarchical framework (e.g., individuals nested in schools) so that a multilevel modeling approach would be appropriate. Analyses were conducted accounting for and not accounting for the nested data to determine the impact of ignoring such multilevel data structures in full structural equation models. Results highlight the differences in modeling results when the analytic strategy is congruent with the data structure and what occurs when this congruency is absent. Type I error rates and power for the standard and multilevel methods were similar for within-cluster variables and for the multilevel model with between-cluster variables. However, Type I error rates were inflated for the standard approach when modeling between-cluster variables.  相似文献   

12.
This study examined the effect of sample size ratio and model misfit on the Type I error rates and power of the Difficulty Parameter Differences procedure using Winsteps. A unidimensional 30-item test with responses from 130,000 examinees was simulated and four independent variables were manipulated: sample size ratio (20/100/250/500/1000); model fit/misfit (1 PL and 3PLc =. 15 models); impact (no difference/mean differences/variance differences/mean and variance differences); and percentage of items with uniform and nonuniform DIF (0%/10%/20%). In general, the results indicate the importance of ensuring model fit to achieve greater control of Type I error and adequate statistical power. The manipulated variables produced inflated Type I error rates, which were well controlled when a measure of DIF magnitude was applied. Sample size ratio also had an effect on the power of the procedure. The paper discusses the practical implications of these results.  相似文献   

13.
This article extends the Bonett (2003a) approach to testing the equality of alpha coefficients from two independent samples to the case of m ≥ 2 independent samples. The extended Fisher-Bonett test and its competitor, the Hakstian-Whalen (1976) test, are illustrated with numerical examples of both hypothesis testing and power calculation. Computer simulations are used to compare the performance of the two tests and the Feldt (1969) test (for m = 2) in terms of power and Type I error control. It is shown that the Fisher-Bonett test is just as effective as its competitors in controlling Type I error, is comparable to them in power, and is equally robust against heterogeneity of error variance.  相似文献   

14.
This study examined and compared various statistical methods for detecting individual differences in change. Considering 3 issues including test forms (specific vs. generalized), estimation procedures (constrained vs. unconstrained), and nonnormality, we evaluated 4 variance tests including the specific Wald variance test, the generalized Wald variance test, the specific likelihood ratio (LR) variance test, and the generalized LR variance test under both constrained and unconstrained estimation for both normal and nonnormal data. For the constrained estimation procedure, both the mixture distribution approach and the alpha correction approach were evaluated for their performance in dealing with the boundary problem. To deal with the nonnormality issue, we used the sandwich standard error (SE) estimator for the Wald tests and the Satorra–Bentler scaling correction for the LR tests. Simulation results revealed that testing a variance parameter and the associated covariances (generalized) had higher power than testing the variance solely (specific), unless the true covariances were zero. In addition, the variance tests under constrained estimation outperformed those under unconstrained estimation in terms of higher empirical power and better control of Type I error rates. Among all the studied tests, for both normal and nonnormal data, the robust generalized LR and Wald variance tests with the constrained estimation procedure were generally more powerful and had better Type I error rates for testing variance components than the other tests. Results from the comparisons between specific and generalized variance tests and between constrained and unconstrained estimation were discussed.  相似文献   

15.
An Angoff standard setting study generally yields judgments on a number of items by a number of judges (who may or may not be nested in panels). Variability associated with judges (and possibly panels) contributes error to the resulting cut score. The variability associated with items plays a more complicated role. To the extent that the mean item judgments directly reflect empirical item difficulties, the variability in Angoff judgments over items would not add error to the cut score, but to the extent that the mean item judgments do not correspond to the empirical item difficulties, variability in mean judgments over items would add error to the cut score. In this article, we present two generalizability-theory–based analyses of the proportion of the item variance that contributes to error in the cut score. For one approach, variance components are estimated on the probability (or proportion-correct) scale of the Angoff judgments, and for the other, the judgments are transferred to the theta scale of an item response theory model before estimating the variance components. The two analyses yield somewhat different results but both indicate that it is not appropriate to simply ignore the item variance component in estimating the error variance.  相似文献   

16.
根据I-型垂直密度表示及II-型垂直密度表示,分析垂直密度表示(Vertical Density Representation,简记为VDR)的提出与Lebesgue积分创立的异曲同工之处.阐述VDR在随机数生成、概率分布构造、多元分布拟合优度检验等方面的应用.垂直密度表示是一种特殊类型的变量变换,可用以探究概率分布的内在特性.  相似文献   

17.
A paucity of research has compared estimation methods within a measurement invariance (MI) framework and determined if research conclusions using normal-theory maximum likelihood (ML) generalizes to the robust ML (MLR) and weighted least squares means and variance adjusted (WLSMV) estimators. Using ordered categorical data, this simulation study aimed to address these queries by investigating 342 conditions. When testing for metric and scalar invariance, Δχ2 results revealed that Type I error rates varied across estimators (ML, MLR, and WLSMV) with symmetric and asymmetric data. The Δχ2 power varied substantially based on the estimator selected, type of noninvariant indicator, number of noninvariant indicators, and sample size. Although some the changes in approximate fit indexes (ΔAFI) are relatively sample size independent, researchers who use the ΔAFI with WLSMV should use caution, as these statistics do not perform well with misspecified models. As a supplemental analysis, our results evaluate and suggest cutoff values based on previous research.  相似文献   

18.
Analysis of variance is one of the most frequently used statistical analyses in the behavioral, educational, and social sciences, and special attention has been paid to the selection and use of an appropriate effect size measure of association in analysis of variance. This article presents the sample size procedures for precise interval estimation of eta-squared and partial eta-squared in fixed-effects analysis of variance designs. The desired precision of a confidence interval is assessed with respect to (a) the control of expected width and (b) the tolerance probability of interval width within a designated value. In addition, sample size calculations for standardized contrasts of treatment effects and corresponding partial strength of association effect sizes are also considered.  相似文献   

19.
方差分析与主成分分析是工程计算领域重要的统计分析方法,特别是在医学研究、科学试验、生物遗传测序等方面有广泛应用,本文基于MATALB工具,对双因子方差分析、主成分分析两种主要类型进行理论描述和算法设计,从而实现对抽样样本的二维数据分析。  相似文献   

20.
我国出版战线在最近30多年里取得了辉煌的成就,但部分出版物却陷入了内容庸俗化、表达低龄化等误区。在分析出版物在内容形式上五大误区的表现及其对社会信息交流、全民文化素养、中华文化积累和国家文化实力的严重危害的基础上,提出统一思想认识、严格法规建设、加强质量检查、提高作者素质等整治对策。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号