首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Generalizability theory (G theory) employs random-effects ANOVA to estimate the variance components included in generalizability coefficients, standard errors, and other indices of precision. The ANOVA models depend on random sampling assumptions, and the variance-component estimates are likely to be sensitive to violations of these assumptions. Yet, generalizability studies do not typically sample randomly. This kind of inconsistency between assumptions in statistical models and actual data collection procedures is not uncommon in science, but it does raise fundamental questions about the substantive inferences based on the statistical analyses. This article reviews criticisms of sampling assumptions in G theory (and in reliability theory) and examines the feasibility of using representative sampling, stratification, homogeneity assumptions, and replications to address these criticisms.  相似文献   

2.
This article examines the sampling error of the lead of one political party over another as observed in a random sample of voters. The sample size needed to achieve a certain precision is also investigated.  相似文献   

3.
4.
Successful administration of computerized adaptive testing (CAT) programs in educational settings requires that test security and item exposure control issues be taken seriously. Developing an item selection algorithm that strikes the right balance between test precision and level of item pool utilization is the key to successful implementation and long‐term quality control of CAT. This study proposed a new item selection method using the “efficiency balanced information” criterion to address issues with the maximum Fisher information method and stratification methods. According to the simulation results, the new efficiency balanced information method had desirable advantages over the other studied item selection methods in terms of improving the optimality of CAT assembly and utilizing items with low a‐values while eliminating the need for item pool stratification.  相似文献   

5.
In this study we evaluated and compared three item selection procedures: the maximum Fisher information procedure (F), the a-stratified multistage computer adaptive testing (CAT) (STR), and a refined stratification procedure that allows more items to be selected from the high a strata and fewer items from the low a strata (USTR), along with completely random item selection (RAN). The comparisons were with respect to error variances, reliability of ability estimates and item usage through CATs simulated under nine test conditions of various practical constraints and item selection space. The results showed that F had an apparent precision advantage over STR and USTR under unconstrained item selection, but with very poor item usage. USTR reduced error variances for STR under various conditions, with small compromises in item usage. Compared to F, USTR enhanced item usage while achieving comparable precision in ability estimates; it achieved a precision level similar to F with improved item usage when items were selected under exposure control and with limited item selection space. The results provide implications for choosing an appropriate item selection procedure in applied settings.  相似文献   

6.
Using a computer-based model of an item trace line, a random sampling experiment concerned with comparing item sample estimates to traditional (examinee) sample estimates of the mean and variance of a distribution of test scores was conducted. The results indicated that the optimal method for estimating a test's parameters may depend on several conditions. As expected, item sampling proved superior to traditional sampling in estimating test means under all conditions. However, with certain test lengths, ranges of item difficulty, and discrimination, traditional sampling provided better estimates of test variance than did item sampling.  相似文献   

7.
Practical use of the matrix sampling (i.e. item sampling) technique requires the assumption that an examinee's response to an item is independent of the context in which the item occurs. This assumption was tested experimentally by comparing the responses of examinees to a population of items with the responses of examinees to item samples. Matrix sampling mean and variance estimates for verbal, quantitative, and attitude tests were used as dependent variables to test for differences between the “context” and “out-of-context” groups. The estimates obtained from both treatment groups were also compared with actual population values. No significant differences were found between treatments on matrix sample parameter estimates for any of the three types of tests.  相似文献   

8.
目的是为多项选择敏感性问题提供科学的、精度更高的随机抽样调查方法及其统计量的计算公式,设计出多项选择敏感问题分层抽样下的随机抽样调查模型,并推导出在此模型下总体比例的估计量及其估计方差的计算公式,计算出敏感属性比例95%的置信区间。并在湖南城市学院本科学生考试作弊情况的实例调查中取得了信誉度较高的应用效果。  相似文献   

9.
《教育实用测度》2013,26(1):15-35
This study examines the effects of using item response theory (IRT) ability estimates based on customized tests that were formed by selecting specific content areas from a nationally standardized achievement test. Subsets of items were selected from four different subtests of the Iowa Tests of Basic Skills (Hieronymus, Hoover, & Lindquist, 1985) on the basis of (a) selected content areas (content-customized tests) and (b) a representative sampling of content areas (representative-customized tests). For three of the four tests examined, ability estimates and estimated national percentile ranks based on the content-customized tests in school samples tended to be systematically higher than those based on the full tests. The results of the study suggested that for certain populations, IRT ability estimates and corresponding normative scores on content-customized versions of standardized achievement tests cannot be expected to be equivalent to scores based on the full-length tests.  相似文献   

10.
提出了对不合格率低的产品进行集团检验方法,提出了一次抽样、二次抽样的检验方法,并对样 本量和临界值的选取方法进行了分析.  相似文献   

11.
The sample invariance of item discrimination statistics is evaluated in this case study using real data. The hypothesized superiority of the item response model (IRM) is tested against structural equation modeling (SEM) for responses to the Center for Epidemiologic Studies-Depression (CES-D) scale. Responses from 10 random samples of 500 people were drawn from a base sample of 6,621 participants across gender, age, and different health groups. Hierarchical tests of multiple-group structural equation models indicated statistically significant differences exist in item regressions across contrast groups. Although the IRM item discrimination estimates were most stable in all conditions of this case study, additional research on the precision of individual scores and possible item bias is required to support the validity of either model for scoring the CES-D. The SEM approach to examining between-group differences holds promise for any field where heterogeneous populations are assessed and important consequences arise from score interpretations.  相似文献   

12.
That the sample mean and variance are “good” estimates of the corresponding population parameters is easily accepted as “obvious” by students, but the concept of standard error of the mean is often found to be quite a hurdle. That this standard error decreases inversely as the square-root of the sample size, and the mysterious appearance of the Normal distribution, are often taken as magical and incomprehensible effects, and non-mathematical students can often be turned away from further understanding. This article describes a program which provides an experimental framework in which the student can rapidly develop an intuition for the basic properties of sampling.  相似文献   

13.
Many computerized testing algorithms require the fitting of some item response theory (IRT) model to examinees' responses to facilitate item selection, the determination of test stopping rules, and classification decisions. Some IRT models are thought to be particularly useful for small volume certification programs that wish to make the transition to computerized adaptive testing (CAT). The one-parameter logistic model (1-PLM) is usually assumed to require a smaller sample size than the three-parameter logistic model (3-PLM) for item parameter calibrations. This study examined the effects of model misspecification on the precision of the decisions made using the sequential probability ratio test (SPRT). For this comparison, the 1-PLM was used to estimate item parameters, even though the items' characteristics were represented by a 3-PLM. Results demonstrated that the 1-PLM produced considerably more decision errors under simulation conditions similar to a real testing environment, compared to the true model and to a fixed-form standard reference set of items.  相似文献   

14.
利用小样本数据预处理技术提高效能指标精度   总被引:1,自引:0,他引:1  
提出了运用熵值判别法和线性均方估计法来处理小样本数据的方法.熵值判别法是根据熵的上界对应最大的不确定度,利用所得数据的熵信息量判别数据是否含有粗大误差;线性均方估计消除粗大误差是一种采用软化的方法处理粗大误差.经过多次试验,结果表明这两种方法在处理小样本采样数据时能够有效地提高数据精度.  相似文献   

15.
Research Findings: This study builds on prior work related to the assessment of young dual language learners (DLLs). The purposes of the study were to (a) determine whether latent subgroups of preschool DLLs would replicate those found previously and (b) examine the validity of GOLD® by Teaching Strategies with empirically derived subgroups. Latent class analysis confirmed previous findings of 3 distinct latent subgroups of DLLs (bilingual children, emergent bilingual children, and heritage language speakers). Results of differential item functioning analysis showed that with few exceptions, GOLD items functioned similarly, which indicates that groups matched on ability were similar in their item scores. The item pertaining to using conventional grammar consistently favored non-DLLs over heritage language speakers. The item pertaining to name writing consistently favored DLLs as a single group, emergent bilingual children, and heritage language speakers. Practice or Policy: Study results provide further support for the heterogeneity of DLLs and the use of GOLD with DLL subgroups. This provides the field with an opportunity to better understand this special population of children and enables teachers to plan with greater precision experiences that contribute to their development and learning.  相似文献   

16.
Abstract

Randomized experiments are often seen as the “gold standard” for causal research. Despite the fact that experiments use random assignment to treatment conditions, units are seldom selected into the experiment using probability sampling. Very little research on experimental design has focused on how to make generalizations to well-defined populations or on how units should be selected into an experiment to facilitate generalization. This article addresses the problem of sample selection in experiments by providing a method for selecting the sample so that the population and sample are similar in composition. The method begins by requiring that the inference population and eligibility criteria for the study are well defined before study recruitment begins. When the inference population and population of eligible units differs, the article provides a method for sample recruitment based on stratified selection on a propensity score. The article situates the problem within the example of how to select districts for two scale-up experiments currently in recruitment.  相似文献   

17.
The post mortem item-examinee sampling investigation described herein explored the feasibility of using item-examinee sampling to estimate scale values denoting degree of affect toward stimuli when measured by the method of paired-comparisons. Results indicate that such scale values can be approximated satisfactorily through item-examinee sampling. Defining one observation as the response made by one examinee to one item, the similarity between the estimated scale values and normative scale values increased generally with increases in the number of observations acquired by the sampling plan.  相似文献   

18.
王秀军 《数学教育学报》2003,12(3):67-69,87
对抽样知识的理解是统计素养的基础,而抽样方法是抽样知识的重要组成部分.中学生在没有学抽样知识前就表现出了统计学上常用的几种抽样方法萌芽.总体上,他们在抽样过程中对随机性缺乏认识,认识不到随机样本的统计公平性,而更相信分层抽样,而且不同年级、不同层次水平的学生对抽样知识的认识也存在差异。  相似文献   

19.
Multiple matrix sampling procedures can be employed to improve survey research when the results of matrix sampling are equivalent to those obtained by the traditional census testing approach. This study examined the use of multiple matrix sampling as a strategy for the collection of data and compared rates of response when subgroups of items were administered as opposed to an entire instrument. In addition, the study investigated whether responses were equivalent in the two sampling procedures and whether bias was present. The results indicate that multiple matrix sampling is a viable and reasonable procedure to use when a mail survey questionnaire consists of a large number of pages and/or items.  相似文献   

20.
抽样方案的确定及实施是安徽广播电视大学开放教育试点毕业生追踪调查课题的一个重要环节。本文在结合中央电大对开放教育试点毕业生追踪调查抽样要求的基础上,根据安徽电大的实际情况,提出了采用二重抽样(double sampling)的抽样方案和实施步骤,实际运用时,方法科学、操作简便,取得了较理想的结果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号