首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this study I compared results of chained linear, Tucker, and Levine-observed score equatings under conditions where the new and old forms samples were similar in ability and also when they were different in ability. The length of the anchor test was also varied to examine its effect on the three different equating methods. The three equating methods were compared to a criterion equating to obtain estimates of random equating error, bias, and root mean squared error (RMSE). Results showed that, for most studied conditions, chained linear equating produced fairly good equating results in terms of low bias and RMSE. Levine equating also produced low bias and RMSE in some conditions. Although the Tucker method always produced the lowest random equating error, it produced a larger bias and RMSE than either of the other equating methods. As noted in the literature, these results also suggest that either chained linear or Levine equating be used when new and old form samples differ on ability and/or when the anchor-to-total correlation is not very high. Finally, by testing the missing data assumptions of the three equating methods, this study also shows empirically why an equating method is more or less accurate under certain conditions .  相似文献   

2.
Three local observed‐score kernel equating methods that integrate methods from the local equating and kernel equating frameworks are proposed. The new methods were compared with their earlier counterparts with respect to such measures as bias—as defined by Lord's criterion of equity—and percent relative error. The local kernel item response theory observed‐score equating method, which can be used for any of the common equating designs, had a small amount of bias, a low percent relative error, and a relatively low kernel standard error of equating, even when the accuracy of the test was reduced. The local kernel equating methods for the nonequivalent groups with anchor test generally had low bias and were quite stable against changes in the accuracy or length of the anchor test. Although all proposed methods showed small percent relative errors, the local kernel equating methods for the nonequivalent groups with anchor test design had somewhat larger standard error of equating than their kernel method counterparts.  相似文献   

3.
Clustering-based selective neural network ensemble   总被引:1,自引:0,他引:1  
INTRODUCTION Neural network ensemble is becoming a hot spot in machine learning and data mining recently. Many researchers have shown that simply combining the output of many neural networks can generate more accurate predictions than that of any of the individual networks. Most previous work either focused on how to combine the output of multiple trained networks or how to directly design a good set of neural networks. Theoretical and empirical work showed that a good ensemble is one wh…  相似文献   

4.
This paper presents a new algorithm for clustering a large amount of data.We improved the ant colony clustering algorithm that uses an ant’s swarm intelligence,and tried to overcome the weakness of the classical cluster analysis methods.In our proposed algorithm,improvements in the efficiency of an agent operation were achieved,and a new function "cluster condensation" was added.Our proposed algorithm is a processing method by which a cluster size is reduced by uniting similar objects and incorporating them into the cluster condensation.Compared with classical cluster analysis methods,the number of steps required to complete the clustering can be suppressed to 1% or less by performing this procedure,and the dispersion of the result can also be reduced.Moreover,our clustering algorithm has the advantage of being possible even in a small-field cluster condensation.In addition,the number of objects that exist in the field decreases because the cluster condenses;therefore,it becomes possible to add an object to a space that has become empty.In other words,first,the majority of data is put on standby.They are then clustered,gradually adding parts of the standby data to the clustering data.The method can be adopted for a large amount of data.Numerical experiments confirmed that our proposed algorithm can theoretically applied to an unrestricted volume of data.  相似文献   

5.
This simulation study demonstrates how the choice of estimation method affects indexes of fit and parameter bias for different sample sizes when nested models vary in terms of specification error and the data demonstrate different levels of kurtosis. Using a fully crossed design, data were generated for 11 conditions of peakedness, 3 conditions of misspecification, and 5 different sample sizes. Three estimation methods (maximum likelihood [ML], generalized least squares [GLS], and weighted least squares [WLS]) were compared in terms of overall fit and the discrepancy between estimated parameter values and the true parameter values used to generate the data. Consistent with earlier findings, the results show that ML compared to GLS under conditions of misspecification provides more realistic indexes of overall fit and less biased parameter values for paths that overlap with the true model. However, despite recommendations found in the literature that WLS should be used when data are not normally distributed, we find that WLS under no conditions was preferable to the 2 other estimation procedures in terms of parameter bias and fit. In fact, only for large sample sizes (N = 1,000 and 2,000) and mildly misspecified models did WLS provide estimates and fit indexes close to the ones obtained for ML and GLS. For wrongly specified models WLS tended to give unreliable estimates and over-optimistic values of fit.  相似文献   

6.
Using Monte Carlo simulations, this research examined the performance of four missing data methods in SEM under different multivariate distributional conditions. The effects of four independent variables (sample size, missing proportion, distribution shape, and factor loading magnitude) were investigated on six outcome variables: convergence rate, parameter estimate bias, MSE of parameter estimates, standard error coverage, model rejection rate, and model goodness of fit—RMSEA. A three-factor CFA model was used. Findings indicated that FIML outperformed the other methods in MCAR, and MI should be used to increase the plausibility of MAR. SRPI was not comparable to the other three methods in either MCAR or MAR.  相似文献   

7.
The applications of item response theory (IRT) models assume local item independence and that examinees are independent of each other. When a representative sample for psychometric analysis is selected using a cluster sampling method in a testlet‐based assessment, both local item dependence and local person dependence are likely to be induced. This study proposed a four‐level IRT model to simultaneously account for dual local dependence due to item clustering and person clustering. Model parameter estimation was explored using the Markov Chain Monte Carlo method. Model parameter recovery was evaluated in a simulation study in comparison with three other related models: the Rasch model, the Rasch testlet model, and the three‐level Rasch model for person clustering. In general, the proposed model recovered the item difficulty and person ability parameters with the least total error. The bias in both item and person parameter estimation was not affected but the standard error (SE) was affected. In some simulation conditions, the difference in classification accuracy between models could go up to 11%. The illustration using the real data generally supported model performance observed in the simulation study.  相似文献   

8.
With the use of surveys of instructional effectiveness that use Likert rating scales, bias is a potential threat to the validity of interpretations. Simple summation of ratings or the use of larger samples are not methods for removing bias. In this study, a new model for scaling ratings is examined. The method both identifies and corrects for bias. Working with a database of student ratings of college instruction, the model was tested in terms of a variety of criteria. Results indicated that bias was detected and that it was large enough to warrant our concern. The statistical corrections were significant both in terms of order and magnitude of class means. Implications for future studies include the specification of more potential sources of bias, the interaction of some of these factors, and the development of more systematic evidence supporting the need to be attentive to bias. The many-faceted Rasch model used in this study needs more evaluation before we are convinced of its utility to study and correct for bias, but preliminary evidence is encouraging. Recommendations were offered for a theoretical rationale for studying bias in student ratings of instructional effectiveness and a program of research leading to the use of this model for reporting results for use in improving instruction and for promotion, tenure, and merit decisions. An earlier version of this paper was presented at the annual meeting of the American Educational Research Association, Atlanta April 1993.  相似文献   

9.
Propensity score matching (PSM) has become a popular approach for research studies when randomization is infeasible. However, there are significant differences in the effectiveness of selection bias reduction among the existing PSM methods and, therefore, it is challenging for researchers to select an appropriate matching method. This current study compares four commonly used PSM methods for reducing selection bias on observational data from which the treatment effects are intended to be assessed. The selection bias, standardized bias and percent bias reduction are evaluated for each of the PSM methods using empirical data drawn from the national Education Longitudinal Study of 2002. The results of the current study provide empirical evidence and helpful information for researchers to select effective PSM methods for their research studies.  相似文献   

10.
Summary This study tested the hypothesis that different techniques of classroom observation result in different degrees of learning by teachers-in-training. Specifically, it was predicted that kinescope recordings (prepared in advance) provide a more effective medium of observation than closed-circuit television and that TV observation is in turn more effective than the traditional procedure of direct observation in the classroom. The logical theoretical basis for this hypothesis and the special conditions of experimentation used in this study were elaborated. Measures of two dependent variables were used to test this hypothesis. One measure of the students’ response to these observational techniques, an objective multiple-choice measure of information about methods of teaching, failed to confirm the hypothesis, but did show systematic variation with several other experimental variables. The other measure, an essay examination assessing ability to evaluate an observed classroom lesson critically, revealed strong confirmation of the hypothesis. Several other results emerged. One significant finding indicated that when used by certain instructors, the differential effect of the observational condition can outweigh the very great importance of general scholastic ability as a correlate of gain in learning. Interpretations of these data were made to clarify the role of classroom observation in the teacher training process. This research was supported by a grant from the Educational Media Branch of the U.S. Office of Education.  相似文献   

11.
Bias in teachers’ judgment formation and decision making has long been acknowledged. More specifically, studies have repeatedly demonstrated discrepancies between teacher ratings of minority and majority students with similar academic profiles. Studies have also demonstrated that increasing accountability reduced bias. Little is known, however, about the effect of accountability and bias on the accuracy of decisions. This study investigated the short- and long-term effect of accountability priming on the accuracy of transition decisions. It considered both the extent to which teacher decision accuracy differed for minority and majority students with similar academic profiles (accuracy bias) and differences in levels of confidence for accurate versus erroneous decisions (metacognitive judgment bias). In a longitudinal experimental design, we presented 38 primary school teachers with 9 student vignettes at 3 points in time (baseline, post priming, and 6-month follow-up), varying students’ ethnic background, and asked them to make a school tracking decision for each student. We measured decision accuracy as well as teachers’ level of confidence for each decision. Accuracy and confidence levels were combined to provide two indices of metacognitive judgment accuracy. Results confirmed the hypothesis that accuracy of decisions would improve as a result of increased level of accountability. More specifically, we found that teachers made more accurate decisions after priming, whereby ethnic background differences disappeared. In addition, teachers’ metacognitions varied, whereby after priming decision accuracy was better matched with teachers’ confidence levels. Although accuracy levels were still higher at follow-up than at pre-test, the ethnic bias recurred. This study shows that increased levels of accountability are associated with not only increased decision accuracy but also reduced metacognitive judgment bias, especially in regard to minority students. It also demonstrates accountability may be an effective way of reducing systematic errors in decision making. Findings are discussed in terms of theory and current changes in educational practice.  相似文献   

12.
This study addressed the sampling error and linking bias that occur with small samples in a nonequivalent groups anchor test design. We proposed a linking method called the synthetic function, which is a weighted average of the identity function and a traditional equating function (in this case, the chained linear equating function). Specifically, we compared the synthetic, identity, and chained linear functions for various‐sized samples from two types of national assessments. One design used a highly reliable test and an external anchor, and the other used a relatively low‐reliability test and an internal anchor. The results from each of these methods were compared to the criterion equating function derived from the total samples with respect to linking bias and error. The study indicated that the synthetic functions might be a better choice than the chained linear equating method when samples are not large and, as a result, unrepresentative.  相似文献   

13.
In this article, 3-step methods to include predictors and distal outcomes in commonly used mixture models are evaluated. Two Monte Carlo simulation studies were conducted to compare the pseudo class (PC), Vermunt’s (2010), and the Lanza, Tan, and Bray (LTB) 3-step approaches with respect to bias of parameter estimates in latent class analysis (LCA) and latent profile analysis (LPA) models with auxiliary variables. For coefficients of predictors of class membership, results indicated that Vermunt’s method yielded more accurate estimates for LCA and LPA compared to the PC method. With distal outcomes of latent classes and latent profiles, the LTB method produced the lowest relative bias of coefficient estimates and Type I error rates close to nominal levels.  相似文献   

14.
《Educational Assessment》2013,18(2):119-129
Although some educators have suggested authentic tests as a solution to the problem of artificially inflated scores from teaching to paper-and-pencil tests, we argue that teaching to the test under high-stakes conditions could be more problematic with the new forms of assessment. The wide range of methods that can potentially be used in authentic assessments introduce a method variance that is not part of the construct to be measured. As a consequence, teaching the specific methods used in the assessment potentially invalidates the uses and interpretations that can be made from the test scores by narrowing the definition of the construct measured.  相似文献   

15.
The complex relationships between indicators and water conditions cause fuzzy and gray uncertainties in evaluation of water quality. Compared to conventional single-factor evaluation methods, the combination evaluation method can consider these two uncertainties to produce more objective and reasonable evaluation results. In this paper, we propose a combination evaluation method with two main parts:(1) the use of fuzzy comprehensive evaluation and gray correlation analysis as submodels with which to consider the fuzzy and gray uncertainties and(2) the establishment of a combination model based on minimum bias squares. In addition, using this method, we evaluate the water quality of a ditch in a typical rice–wheat system of Yixing city in the Taihu Lake Basin during three rainfall events. The results show that the ditch water quality is not good and we found the chemical oxygen demand to be the key indicator that affects water quality most significantly. The proposed combination evaluation method is more accurate and practical than single-factor evaluation methods in that it considers the uncertainties of fuzziness and grayness.  相似文献   

16.
Standard errors computed according to the operational practices of international large-scale assessment studies such as the Programme for International Student Assessment’s (PISA) or the Trends in International Mathematics and Science Study (TIMSS) may be biased when cross-national differential item functioning (DIF) and item parameter drift are present. This bias may be somewhat reduced when cross-national DIF is correlated over study cycles, which is the case in PISA. This article reviews existing methods for calculating standard errors for national trends in international large-scale assessments and proposes a new method that takes into account the dependency of linking errors at different time points. We conducted a simulation study to compare the performance of the standard error estimators. The results showed that the newly suggested estimator outperformed the existing estimators as it estimated standard errors more accurately and efficiently across all simulated conditions. Implications for practical applications are discussed.  相似文献   

17.
1Introduction Inrecentyears,multisensordatafusiontechniques havefoundwidespreadapplicationsinmanytracking andsurveillancesystemsaswellasinapplications wherereliabilityisofamainconcern[1].Multisensor datafusionisdefinedasaprocessofintegratingdata frommulti…  相似文献   

18.
基于模糊C均值的异常流量检测模型   总被引:1,自引:0,他引:1  
对网络进行流量异常检测,流量出现异常后再对数据包进行分析,通过这种方法能够降低系统开销,聚类算法是一种有效的异常入侵检测方法,可用在网络流量异常检测中,用于判定当前网络流量是否出现异常,本文将模糊C均值算法应用于流量异常检测模型中,通过实验,该模型能够有效检测出流量的异常状态.  相似文献   

19.
The objective was to examine the impact of different types of accommodations on performance in content tests such as mathematics. The meta‐analysis included 14 U.S. studies that randomly assigned school‐aged English language learners (ELLs) to test accommodation versus control conditions or used repeated measures in counter‐balanced order. Individual effect sizes (Glass's d) were calculated for 50 groups of ELLs and 32 groups of non‐ELLs. Individual effect sizes for English language and native language accommodations were classified into groups according to type of accommodation and timing conditions. Means and standard errors were calculated for each category. The findings suggest that accommodations that require extra printed materials need generous time limits for both the accommodated and unaccommodated groups to ensure that they are effective, equivalent in scale to the original test, and therefore more valid owing to reduced construct‐irrelevant variance. Computer‐administered glossaries were effective even when time limits were restricted. Although the Plain English accommodation had very small average effect sizes, inspection of individual effect sizes suggests that it may be much more effective for ELLs at intermediate levels of English language proficiency. For Spanish‐speaking students with low proficiency in English, the Spanish test version had the highest individual effect size (+1.45).  相似文献   

20.
The term measurement disturbance has been used to describe systematic conditions that affect a measurement process, resulting in a compromised interpretation of person or item estimates. Measurement disturbances have been discussed in relation to systematic response patterns associated with items and persons, such as start‐up, plodding, boredom, or fatigue. An understanding of the different types of measurement disturbances can lead to a more complete understanding of persons or items in terms of the construct being measured. Although measurement disturbances have been explored in several contexts, they have not been explicitly considered in the context of performance assessments. The purpose of this study is to illustrate the use of graphical methods to explore measurement disturbances related to raters within the context of a writing assessment. Graphical displays that illustrate the alignment between expected and empirical rater response functions are considered as they relate to indicators of rating quality based on the Rasch model. Results suggest that graphical displays can be used to identify measurement disturbances for raters related to specific ranges of student achievement that suggest potential rater bias. Further, results highlight the added diagnostic value of graphical displays for detecting measurement disturbances that are not captured using Rasch model–data fit statistics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号