首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Practitioners typically face situations in which examinees have not responded to all test items. This study investigated the effect on an examinee's ability estimate when an examinee is presented an item, has ample time to answer, but decides not to respond to the item. Three approaches to ability estimation (biweight estimation, expected a posteriori, and maximum likelihood estimation) were examined. A Monte Carlo study was performed and the effect of different levels of omissions on the simulee's ability estimates was determined. Results showed that the worst estimation occurred when omits were treated as incorrect. In contrast, substitution of 0.5 for omitted responses resulted in ability estimates that were almost as accurate as those using complete data. Implications for practitioners are discussed.  相似文献   

2.
This article discusses replication sampling variance estimation techniques that are often applied in analyses using data from complex sampling designs: jackknife repeated replication, balanced repeated replication, and bootstrapping. These techniques are used with traditional analyses such as regression, but are currently not used with structural equation modeling (SEM) analyses. This article provides an extension of these methods to SEM analyses, including a proposed adjustment to the likelihood ratio test, and presents the results from a simulation study suggesting replication estimates are robust. Finally, a demonstration of the application of these methods using data from the Early Childhood Longitudinal Study is included. Secondary analysts can undertake these more robust methods of sampling variance estimation if they have access to certain SEM software packages and data management packages such as SAS, as shown in the article.  相似文献   

3.
Careless responding is a bias in survey responses that disregards the actual item content, constituting a threat to the factor structure, reliability, and validity of psychological measurements. Different approaches have been proposed to detect aberrant responses such as probing questions that directly assess test-taking behavior (e.g., bogus items), auxiliary or paradata (e.g., response times), or data-driven statistical techniques (e.g., Mahalanobis distance). In the present study, gradient boosted trees, a state-of-the-art machine learning technique, are introduced to identify careless respondents. The performance of the approach was compared with established techniques previously described in the literature (e.g., statistical outlier methods, consistency analyses, and response pattern functions) using simulated data and empirical data from a web-based study, in which diligent versus careless response behavior was experimentally induced. In the simulation study, gradient boosting machines outperformed traditional detection mechanisms in flagging aberrant responses. However, this advantage did not transfer to the empirical study. In terms of precision, the results of both traditional and the novel detection mechanisms were unsatisfactory, although the latter incorporated response times as additional information. The comparison between the results of the simulation and the online study showed that responses in real-world settings seem to be much more erratic than can be expected from the simulation studies. We critically discuss the generalizability of currently available detection methods and provide an outlook on future research on the detection of aberrant response patterns in survey research.  相似文献   

4.
《教育实用测度》2013,26(1):47-64
Optimal appropriateness measurement statistically provides the most powerful methods for identifying individuals who are mismeasured by a standardized psychological test or scale. These methods use a likelihood ratio test to compare the hypothesis of normal responding versus the alternative hypothesis that an individual's responses are aberrant in some specified way. According to the Neyman-Pearson Lemma, no other statistic computed from an individual's item responses can achieve a higher rate of detection of the hypothesized measure- ment anomaly at the same false positive rate. Use of optimal methods requires a psychometric model for normal responding, which can be readily obtained from the item response theory literature, and a model for aberrant responding. In this article, several concerns about measurement anomalies are described and transformed into quantitative models. We then show how to compute the likeli- hood of a response pattern u* for each of the aberrance models.  相似文献   

5.
When both model misspecifications and nonnormal data are present, it is unknown how trustworthy various point estimates, standard errors (SEs), and confidence intervals (CIs) are for standardized structural equation modeling parameters. We conducted simulations to evaluate maximum likelihood (ML), conventional robust SE estimator (MLM), Huber–White robust SE estimator (MLR), and the bootstrap (BS). We found (a) ML point estimates can sometimes be quite biased at finite sample sizes if misfit and nonnormality are serious; (b) ML and MLM generally give egregiously biased SEs and CIs regardless of the degree of misfit and nonnormality; (c) MLR and BS provide trustworthy SEs and CIs given medium misfit and nonnormality, but BS is better; and (d) given severe misfit and nonnormality, MLR tends to break down and BS begins to struggle.  相似文献   

6.
This paper proposes a new robust video stabilization algorithm to remove unwanted vibrations in video sequences. A complete theoretical analysis is first established for video stabilization, providing a basis for new stabilization algorithm. Secondly, a new robust global motion estimation (GME) algorithm is proposed. Different from classic methods, the GME algorithm is based on spatlal-temporal filtered motion vectors computed by block-matching methods. In addition, effective schemes are employed in correction phase to prevent boundary artifacts and error accumulation. Experiments show that the proposed algorithm has satisfactory stabilization effects while maintaining good tradeoff between speed and precision.  相似文献   

7.
Robust maximum likelihood (ML) and categorical diagonally weighted least squares (cat-DWLS) estimation have both been proposed for use with categorized and nonnormally distributed data. This study compares results from the 2 methods in terms of parameter estimate and standard error bias, power, and Type I error control, with unadjusted ML and WLS estimation methods included for purposes of comparison. Conditions manipulated include model misspecification, level of asymmetry, level and categorization, sample size, and type and size of the model. Results indicate that cat-DWLS estimation method results in the least parameter estimate and standard error bias under the majority of conditions studied. Cat-DWLS parameter estimates and standard errors were generally the least affected by model misspecification of the estimation methods studied. Robust ML also performed well, yielding relatively unbiased parameter estimates and standard errors. However, both cat-DWLS and robust ML resulted in low power under conditions of high data asymmetry, small sample sizes, and mild model misspecification. For more optimal conditions, power for these estimators was adequate.  相似文献   

8.
When the multivariate normality assumption is violated in structural equation modeling, a leading remedy involves estimation via normal theory maximum likelihood with robust corrections to standard errors. We propose that this approach might not be best for forming confidence intervals for quantities with sampling distributions that are slow to approach normality, or for functions of model parameters. We implement and study a robust analog to likelihood-based confidence intervals based on inverting the robust chi-square difference test of Satorra (2000). We compare robust standard errors and the robust likelihood-based approach versus resampling methods in confirmatory factor analysis (Studies 1 & 2) and mediation analysis models (Study 3) for both single parameters and functions of model parameters, and under a variety of nonnormal data generation conditions. The percentile bootstrap emerged as the method with the best calibrated coverage rates and should be preferred if resampling is possible, followed by the robust likelihood-based approach.  相似文献   

9.
Measurement specialists routinely assume examinee responses to test items are independent of one another. However, previous research has shown that many contemporary tests contain item dependencies and not accounting for these dependencies leads to misleading estimates of item, test, and ability parameters. The goals of the study were (a) to review methods for detecting local item dependence (LID), (b) to discuss the use of testlets to account for LID in context-dependent item sets, (c) to apply LID detection methods and testlet-based item calibrations to data from a large-scale, high-stakes admissions test, and (d) to evaluate the results with respect to test score reliability and examinee proficiency estimation. Item dependencies were found in the test and these were due to test speededness or context dependence (related to passage structure). Also, the results highlight that steps taken to correct for the presence of LID and obtain less biased reliability estimates may impact on the estimation of examinee proficiency. The practical effects of the presence of LID on passage-based tests are discussed, as are issues regarding how to calibrate context-dependent item sets using item response theory.  相似文献   

10.
针对投资组合均值-方差模型,引入了Fast-MCD多变量稳健估计方法,稳健估计模型中股票期望收益和协方差矩阵,减小了离群值对投资组合决策的影响,并结合我国证券市场的特点,对沪市A股市场进行了实证分析,得到了证券投资组合的有效前沿.  相似文献   

11.
As low-stakes testing contexts increase, low test-taking effort may serve as a serious validity threat. One common solution to this problem is to identify noneffortful responses and treat them as missing during parameter estimation via the effort-moderated item response theory (EM-IRT) model. Although this model has been shown to outperform traditional IRT models (e.g., two-parameter logistic [2PL]) in parameter estimation under simulated conditions, prior research has failed to examine its performance under violations to the model’s assumptions. Therefore, the objective of this simulation study was to examine item and mean ability parameter recovery when violating the assumptions that noneffortful responding occurs randomly (Assumption 1) and is unrelated to the underlying ability of examinees (Assumption 2). Results demonstrated that, across conditions, the EM-IRT model provided robust item parameter estimates to violations of Assumption 1. However, bias values greater than 0.20 SDs were observed for the EM-IRT model when violating Assumption 2; nonetheless, these values were still lower than the 2PL model. In terms of mean ability estimates, model results indicated equal performance between the EM-IRT and 2PL models across conditions. Across both models, mean ability estimates were found to be biased by more than 0.25 SDs when violating Assumption 2. However, our accompanying empirical study suggested that this biasing occurred under extreme conditions that may not be present in some operational settings. Overall, these results suggest that the EM-IRT model provides superior item and equal mean ability parameter estimates in the presence of model violations under realistic conditions when compared with the 2PL model.  相似文献   

12.
戴杰思  谢萍 《教育研究》2021,42(2):140-150
与目前运用单一方法的研究相比,多层混合法研究作为一种重要的研究范式,其研究结果可能因更严谨可靠而对政策和实践产生更大的影响。运用多层混合法不仅需要提升研究者的技术能力,还需要转变他们的思维方式。对大多数研究者来说,这可能是一个极大的挑战。当然,这并不是说所有的研究都应该寻求对他人的直接影响;相反,研究的目的各有不同。尽管如此,不同研究领域的研究者一致认为,教师课堂内外赖以生存的社会环境的生态具有不可预测性。对一项在英格兰实施的研究项目进行剖析,探讨多层混合法在教育研究中的应用,发现与单视角的量化研究、元分析或者纯质性研究相比,多层混合法可能提供更微妙的、更细颗粒度的、更严谨可信的、基于循证的领导力研究,从而更为全面地揭示校长如何通过领导力角色、领导行为实现并且保持学校的可持续发展。  相似文献   

13.
Missing data are a common problem in a variety of measurement settings, including responses to items on both cognitive and affective assessments. Researchers have shown that such missing data may create problems in the estimation of item difficulty parameters in the Item Response Theory (IRT) context, particularly if they are ignored. At the same time, a number of data imputation methods have been developed outside of the IRT framework and been shown to be effective tools for dealing with missing data. The current study takes several of these methods that have been found to be useful in other contexts and investigates their performance with IRT data that contain missing values. Through a simulation study, it is shown that these methods exhibit varying degrees of effectiveness in terms of imputing data that in turn produce accurate sample estimates of item difficulty and discrimination parameters.  相似文献   

14.
Data collected from questionnaires are often in ordinal scale. Unweighted least squares (ULS), diagonally weighted least squares (DWLS) and normal-theory maximum likelihood (ML) are commonly used methods to fit structural equation models. Consistency of these estimators demands no structural misspecification. In this article, we conduct a simulation study to compare the equation-by-equation polychoric instrumental variable (PIV) estimation with ULS, DWLS, and ML. Accuracy of PIV for the correctly specified model and robustness of PIV for misspecified models are investigated through a confirmatory factor analysis (CFA) model and a structural equation model with ordinal indicators. The effects of sample size and nonnormality of the underlying continuous variables are also examined. The simulation results show that PIV produces robust factor loading estimates in the CFA model and in structural equation models. PIV also produces robust path coefficient estimates in the model where valid instruments are used. However, robustness highly depends on the validity of instruments.  相似文献   

15.
This study examined and compared various statistical methods for detecting individual differences in change. Considering 3 issues including test forms (specific vs. generalized), estimation procedures (constrained vs. unconstrained), and nonnormality, we evaluated 4 variance tests including the specific Wald variance test, the generalized Wald variance test, the specific likelihood ratio (LR) variance test, and the generalized LR variance test under both constrained and unconstrained estimation for both normal and nonnormal data. For the constrained estimation procedure, both the mixture distribution approach and the alpha correction approach were evaluated for their performance in dealing with the boundary problem. To deal with the nonnormality issue, we used the sandwich standard error (SE) estimator for the Wald tests and the Satorra–Bentler scaling correction for the LR tests. Simulation results revealed that testing a variance parameter and the associated covariances (generalized) had higher power than testing the variance solely (specific), unless the true covariances were zero. In addition, the variance tests under constrained estimation outperformed those under unconstrained estimation in terms of higher empirical power and better control of Type I error rates. Among all the studied tests, for both normal and nonnormal data, the robust generalized LR and Wald variance tests with the constrained estimation procedure were generally more powerful and had better Type I error rates for testing variance components than the other tests. Results from the comparisons between specific and generalized variance tests and between constrained and unconstrained estimation were discussed.  相似文献   

16.
为了更好地抑制信号背景中的非高斯噪声,本文提出了基于分数低阶的双谱定义,并给出在分数低阶有色噪声背景下双谱的直接和间接估计方法。仿真结果表明,基于分数低阶的双谱估计能够有效地抑制非高斯噪声,具有良好的韧性。  相似文献   

17.
Methods of uniform differential item functioning (DIF) detection have been extensively studied in the complete data case. However, less work has been done examining the performance of these methods when missing item responses are present. Research that has been done in this regard appears to indicate that treating missing item responses as incorrect can lead to inflated Type I error rates (false detection of DIF). The current study builds on this prior research by investigating the utility of multiple imputation methods for missing item responses, in conjunction with standard DIF detection techniques. Results of the study support the use of multiple imputation for dealing with missing item responses. The article concludes with a discussion of these results for multiple imputation in conjunction with other research findings supporting its use in the context of item parameter estimation with missing data.  相似文献   

18.
Single‐best answers to multiple‐choice items are commonly dichotomized into correct and incorrect responses, and modeled using either a dichotomous item response theory (IRT) model or a polytomous one if differences among all response options are to be retained. The current study presents an alternative IRT‐based modeling approach to multiple‐choice items administered with the procedure of elimination testing, which asks test‐takers to eliminate all the response options they consider to be incorrect. The partial credit model is derived for the obtained responses. By extracting more information pertaining to test‐takers’ partial knowledge on the items, the proposed approach has the advantage of providing more accurate estimation of the latent ability. In addition, it may shed some light on the possible answering processes of test‐takers on the items. As an illustration, the proposed approach is applied to a classroom examination of an undergraduate course in engineering science.  相似文献   

19.
This article applies Bollen’s (1996) 2-stage least squares/instrumental variables (2SLS/IV) approach for estimating the parameters in an unconditional and a conditional second-order latent growth model (LGM). First, the 2SLS/IV approach for the estimation of the means and the path coefficients in a second-order LGM is derived. An empirical example is then used to show that 2SLS/IV yields estimates that are similar to maximum likelihood (ML) in the estimation of a conditional second-order LGM. Three subsequent simulation studies are then presented to show that the new approach is as accurate as ML and that it is more robust against misspecifications of the growth trajectory than ML. Together, these results suggest that 2SLS/IV should be considered as an alternative to the commonly applied ML estimator.  相似文献   

20.
The purpose of this study is to investigate the effects of missing data techniques in longitudinal studies under diverse conditions. A Monte Carlo simulation examined the performance of 3 missing data methods in latent growth modeling: listwise deletion (LD), maximum likelihood estimation using the expectation and maximization algorithm with a nonnormality correction (robust ML), and the pairwise asymptotically distribution-free method (pairwise ADF). The effects of 3 independent variables (sample size, missing data mechanism, and distribution shape) were investigated on convergence rate, parameter and standard error estimation, and model fit. The results favored robust ML over LD and pairwise ADF in almost all respects. The exceptions included convergence rates under the most severe nonnormality in the missing not at random (MNAR) condition and recovery of standard error estimates across sample sizes. The results also indicate that nonnormality, small sample size, MNAR, and multicollinearity might adversely affect convergence rate and the validity of statistical inferences concerning parameter estimates and model fit statistics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号