首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 640 毫秒
1.
Ⅰ型极小值分布样本异常数据的检验   总被引:1,自引:1,他引:0  
针对Ⅰ型极小值分布样本的多个异常数据,提出了一种新的检验方法.首先寻找到总体参数的具有较好稳健性的估计量,然后在此基础上构造出检验统计量,进一步求出了该检验统计量精确的概率密度函数和大样本情形下的近似分布.由于检验统计量中的核心统计量——样本分位数,对于异常数据的干扰具有一定的抵抗力,因此利用该方法可以达到有效的检验效果.  相似文献   

2.
双参数指数分布异常值检验   总被引:2,自引:2,他引:0  
讨论了在双参数指数分布场合异常值检验,针对数据中同时有异常大值和异常小值时情形,构造了新的检验统计量并导出其精确分布,同时得到了大样本近似分布.  相似文献   

3.
针对指数分布样本中出现异常小和异常大数据,文献[1]提出的T型检验统计量检验效果不理想的缺陷,利用具有稳健性的样本中位数.给出一种新的检验方法,推导出了检验统计量的概率密度函数.  相似文献   

4.
针对不完全样本观测数据,讨论了一类均匀分布总体参数的区间估计问题.利用样本中位数给出了构造置信区间的一个新枢轴量,推导出了枢轴量的概率密度函数表达式,并且在大样本场合,讨论了总体参数的近似置信区间.该方法不仅适用于不完全数据场合,而且还适用于样本中可能存在异常数据的情形,具有稳健性.  相似文献   

5.
为了检测数据是否符合给定的模型,需要对数据进行统计诊断.研究了基于最大Lq似然估计的广义非线性模型的统计诊断问题.利用3个统计诊断量来检验数据中是否都存在异常点.模拟结果显示,当样本容量较小时,使用最大Lq似然估计方法得到的诊断统计量的结果要比使用极大似然估计(MLE)方法得到的结果大;随着样本容量的增加,它们之间的区别逐渐减小.因此,使用最大Lq似然估计方法比用MLE方法更容易找到数据中的异常点.  相似文献   

6.
统计的任务是从数据里提取信息,探索数据内在的数量规律性,是数据的科学.高中新课程里面增加了统计内容,师生们往往感到不太好把握,很容易把统计教成怎样制作图表,怎样计算样本的数字特征等一些繁琐的计算问题,偏离了统计的实质.统计关注的是一组数据能告诉我们什么信息,我们又能从数据中提取怎样的信息.统计的所有操作(如画图表,计算数字特征)的目的就是从数据里提取信息.教材正是从形与数这两个角度,挖掘蕴涵在样本数据中的总体信息的.通过归纳、列表、绘画等初步整理工作,用样本的频率分布估计总体的分布.这是用图表形象地反映数据的信息.为了定量地把握总体的规律,又要用样本的数据对总体的数字特征进行研究,包括集中量、差异量、相关量等.常见的集中量有算术平均数、中位数、众数等.如果我们从理论上走一点极端,则可以说,一部数理统计学的历史,就是从纵横两个方向对算术平均数这个集中量不断深入研究的历史.众数是较简单的集中量,略而不谈.本文主要谈谈算术平均数及与之有关的中位数的教学设计.[第一段]  相似文献   

7.
刘顿 《初中生》2009,(7):16-19
平均数、中位数、众数是分析数据集中趋势的统计量,极差、方差是分析数据离散情况的统计量.我们可以利用这些统计量分析生活中的一些数据,根据分析的结果作出正确决策.  相似文献   

8.
讨论Ⅰ型极小值分布样本多个异常值的检验,在Dixon型统计量的基础上提出一个新的检验统计量,并导出其分布,同时讨论该统计量的屏蔽效应.  相似文献   

9.
讨论I型极小值分布样本多个异常值的检验,在Dixon型统计量的基础上提出一个新的检验统计量,并导出其分布,同时讨论该统计量的屏蔽效应。  相似文献   

10.
结合居民用水问题的生活实例,运用类比和由特殊到一般的数学方法,引导学生归纳得到百分位数的定义和一组数据的第p百分位数的计算步骤,让学生利用Excel软件处理数据,经历数据分析的基本过程,体会样本估计总体的统计思想,发展数据分析素养.  相似文献   

11.
一类线性模型异常数据的检验   总被引:1,自引:1,他引:0  
针对来自于线性模型Y=Xβ+e的数据,给出了检验其异常数据的新方法.针对误差方差已知和未知两种情形,分别提出两个检验统计量,给出了检验该线性模型中异常数据的具体步骤和检验的拒绝域,并且证明了该检验方法的检验水平.  相似文献   

12.
提出一种基于随机森林方法的异常样本(outliers)检测方法.仿真实验表明,与其他2种基于距离的异常样本检测技术相比,这种方法可以更好地提高模型的准确率,且具有较强的鲁棒性,在处理大规模数据集时还能显著地减少计算时间.  相似文献   

13.
A method was proposed for the detection of outliers and influential observations in the framework of a mixed linear model, prior to the quantitative trait locus (QTL) mapping analysis. We investigated the impact of outliers on QTL mapping for complex traits in a mouse BXD population, and observed that the dropping of outliers could provide the evidence of additional QTL and epistatic loci affecting the 1 stBrain-OB and the 2ndBrain-OB in a cross of the abovementioned population. The results could also reveal a remarkable increase in estimating heritabilities of QTL in the absence of outliers. In addition, simulations were conducted to investigate the detection powers and false discovery rates (FDRs) of QTLs in the presence and absence of outliers. The results suggested that the presence of a small proportion of outliers could increase the FDR and hence decrease the detection power of QTLs. A drastic increase could be obtained in the estimates of standard errors for position, additive and additivex environment interaction effects of QTLs in the presence of outliers.  相似文献   

14.
A 2-stage robust procedure as well as an R package, rsem, were recently developed for structural equation modeling with nonnormal missing data by Yuan and Zhang (2012). Several test statistics that have been used for complete data analysis are employed to evaluate model fit in the 2-stage robust method. However, properties of these statistics under robust procedures for incomplete nonnormal data analysis have never been studied. This study aims to systematically evaluate and compare 5 test statistics, including a test statistic derived from normal-distribution-based maximum likelihood, a rescaled chi-square statistic, an adjusted chi-square statistic, a corrected residual-based asymptotical distribution-free chi-square statistic, and a residual-based F statistic. These statistics are evaluated under a linear growth curve model by varying 8 factors: population distribution, missing data mechanism, missing data rate, sample size, number of measurement occasions, covariance between the latent intercept and slope, variance of measurement errors, and downweighting rate of the 2-stage robust method. The performance of the test statistics varies and the one derived from the 2-stage normal-distribution-based maximum likelihood performs much worse than the other four. Application of the 2-stage robust method and of the test statistics is illustrated through growth curve analysis of mathematical ability development, using data on the Peabody Individual Achievement Test mathematics assessment from the National Longitudinal Survey of Youth 1997 Cohort.  相似文献   

15.
针对传统离群点检测算法的局限性进行研究,利用数据对象之间的相邻关系,提出了一种基于密度和距离相结合的离群检测算法,该算法解决了基于距离的离群检测算法不能准确识别局部离群点的问题,有效避免由于稀疏和密集簇过于邻近的而出现离群点误判的情况。通过在人工模拟数据及真实数据集上的实验测试证明改进算法的可行性,该算法能更有效地检测出数据集中的离群对象。  相似文献   

16.
变异系数的统计推断及其应用   总被引:4,自引:0,他引:4  
吴媚  顾赛赛 《铜仁学院学报》2010,12(1):139-141,144
变异系数是反映总体离散程度的重要指标。应用delta方法研究了样本变异系数的渐近分布,进一步构造了其置信区间及检验统计量,并用Monte Carlo方法给出了置信区间的模拟覆盖概率和检验的模拟功效,最后分析了一组实际考试成绩。  相似文献   

17.
This article presents a method for estimating the accuracy and consistency of classifications based on test scores. The scores can be produced by any scoring method, including a weighted composite. The estimates use data from a single form. The reliability of the score is used to estimate effective test length in terms of discrete items. The true-score distribution is estimated by fitting a 4-parameter beta model. The conditional distribution of scores on an alternate form, given the true score, is estimated from a binomial distribution based on the estimated effective test length. Agreement between classifications on alternate forms is estimated by assuming conditional independence, given the true score. Evaluation of the method showed estimates to be within 1 percentage point of the actual values in most cases. Estimates of decision accuracy and decision consistency statistics were only slightly affected by changes in specified minimum and maximum possible scores.  相似文献   

18.
开放大学的终结性考试成绩数据中异常值存在的概率较大,为考试成绩的统计分析带来一定困难。稳健统计方法对考试成绩数据异常值的处理较为科学。实验研究表明,当考试成绩中存在极高分、极低分、缺考等情况时,稳健统计的统计结果受其影响较小,从而能够更准确、客观地反映学生学习的整体水平。  相似文献   

19.
A new method to detect multiple outliers in multivariate data is proposed. It is a combination of minimum subsets, resampling and self-organizing map (SOM) algorithm introduced by Kohonen,which provides a robust way with neural network. In this method, the number and organization of the neurons are selected by the characteristics of the spectra, e. g., the spectra data are often changed linearly with the concentration of the components and are often measured repeatedly, etc. So the spatial distribution of the neurons can be arranged by this characteristic. With this method, all the outliers in the spectra can be detected, which cannot be solved by the traditional method, and the speed of computation is higher than that of the traditional neural network method. The results of the simulation and the experiment show that this method is simple, effective, intuitionistic and all the outliers in the spectra can be detected in a short time. It is useful when associated with the regression model in the near infra-red research.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号