首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
Large‐scale assessments such as the Programme for International Student Assessment (PISA) have field trials where new survey features are tested for utility in the main survey. Because of resource constraints, there is a trade‐off between how much of the sample can be used to test new survey features and how much can be used for the initial item response theory (IRT) scaling. Utilizing real assessment data of the PISA 2015 Science assessment, this article demonstrates that using fixed item parameter calibration (FIPC) in the field trial yields stable item parameter estimates in the initial IRT scaling for samples as small as n = 250 per country. Moreover, the results indicate that for the recovery of the county‐specific latent trait distributions, the estimates of the trend items (i.e., the information introduced into the calibration) are crucial. Thus, concerning the country‐level sample size of n = 1,950 currently used in the PISA field trial, FIPC is useful for increasing the number of survey features that can be examined during the field trial without the need to increase the total sample size. This enables international large‐scale assessments such as PISA to keep up with state‐of‐the‐art developments regarding assessment frameworks, psychometric models, and delivery platform capabilities.  相似文献   

2.
等价类划分是黑盒测试的典型方法之一,通过把被测试程序所有可能的输入数据域划分成若干部分。从每一部分中选取少数有代表性的数据作为测试用例,可有效减少测试次数,极大提高软件测试效率,缩短软件开发周期.  相似文献   

3.
产品焊接试板是压力容器生产工艺控制的重要环节,产品质量是否可靠,焊接试板检测结果是主要的判据。本文分析了薄焊接试较弯曲试样失效的原因,并对减少失效的可能性作出了阐述。  相似文献   

4.
This study evaluated the classification accuracy of a second grade oral reading fluency curriculum‐based measure (R‐CBM) in predicting third grade state test performance. It also compared the long‐term classification accuracy of local and publisher‐recommended R‐CBM cut scores. Participants were 266 students who were divided into a calibration sample (n = 170) and two cross‐validation samples (n = 46; n = 50), respectively. Using calibration sample data, local fall, winter, and spring R‐CBM cut scores for predicting students’ state test performance were developed using three methods: discriminant analysis (DA), logistic regression (LR), and receiver operating characteristic curve analysis (ROC). The classification accuracy of local and publisher‐recommended cut scores was evaluated across subsamples. Only DA and ROC produced cut scores that maintained adequate sensitivity (≥.70) across cohorts; however, LR and publisher‐recommended scores had higher levels of specificity and overall correct classification. Implications for developing local cut scores are discussed.  相似文献   

5.
讨论了固体试样的溶解问题,阐述了试样溶解的几个原则。  相似文献   

6.
This study focused on the development and application of a three‐tier multiple‐choice diagnostic test (or three‐tier test) on the nature and propagation of waves. A question in a three‐tier test comprises the content tier, which measures content knowledge; the reason tier, which measures explanatory knowledge; and the confidence tier, which measures the strength of conceptual understanding of the respondents. This paper presents results based on the responses of 243 Grade 10 students after they were formally instructed on the topic. The vast majority of the respondents showed an inadequate grasp of concepts about waves. Eleven alternative conceptions (ACs), which were expressed with confidence by more than 10% of the students, were identified; four of these ACs were expressed with high confidence.  相似文献   

7.
The rise of computer‐based testing has brought with it the capability to measure more aspects of a test event than simply the answers selected or constructed by the test taker. One behavior that has drawn much research interest is the time test takers spend responding to individual multiple‐choice items. In particular, very short response time—termed rapid guessing—has been shown to indicate disengaged test taking, regardless whether it occurs in high‐stakes or low‐stakes testing contexts. This article examines rapid‐guessing behavior—its theoretical conceptualization and underlying assumptions, methods for identifying it, misconceptions regarding its dynamics, and the contextual requirements for its proper interpretation. It is argued that because it does not reflect what a test taker knows and can do, a rapid guess to an item represents a choice by the test taker to momentarily opt out of being measured. As a result, rapid guessing tends to negatively distort scores and thereby diminish validity. Therefore, because rapid guesses do not contribute to measurement, it makes little sense to include them in scoring.  相似文献   

8.
Assuming a coin is fair is common place in introductory statistical education. This article offers three approaches to test if a coin is fair. The approaches lend themselves to straightforward simulation studies that can enrich student understanding of joint probability and sample size requirements. Simulation studies comparing the relative merits of the three, or potential other, approaches are an example of problem‐based learning.  相似文献   

9.
This article illustrates that not all statistical software packages are correctly calculating a p‐value for the classical F test comparison of two independent Normal variances. This is illustrated with a simple example, and the reasons why are discussed. Eight different software packages are considered.  相似文献   

10.
参数估计与检验中样本容量的确定   总被引:1,自引:0,他引:1  
对正态总体参数估计与检验中,最佳检验中样本容量的确定作了讨论,并给出了确定样本容量的具体方法。  相似文献   

11.
This study examined the factor structure of the Wechsler Intelligence Scale for Children‐Fifth Edition (WISC‐V) with four standardization sample age groups (6–8, 9–11, 12–14, 15–16 years) using exploratory factor analysis (EFA), multiple factor extraction criteria, and hierarchical EFA not included in the WISC‐V Technical and Interpretation Manual. Factor extraction criteria suggested that one to four factors might be sufficient despite the publisher‐promoted, five‐factor solution. Forced extraction of five factors resulted in only one WISC‐V subtest obtaining a salient pattern coefficient on the fifth factor in all four groups, rendering it inadequate. Evidence did not support the publisher's desire to split Perceptual Reasoning into separate Visual Spatial and Fluid Reasoning dimensions. Results indicated that most WISC‐V subtests were properly associated with the four theoretically oriented first‐order factors resembling the WISC‐IV, the g factor accounted for large portions of total and common variance, and the four first‐order group factors accounted for small portions of total and common variance. Results were consistent with EFA of the WISC‐V total standardization sample.  相似文献   

12.
The authors investigated the effectiveness of a mindfulness art activity compared with a free draw/coloring activity on test anxiety in children. The sample consisted of 152 students (50% female; Mage = 10.38 years, SD = 0.88 years) randomly assigned to a mindful (n = 76) or free (n = 76) group. Participants completed a standardized measure of anxiety and state mindfulness before and after the coloring activity, immediately before a spelling test, as well as a measure of dispositional mindfulness. Results revealed an overall significant decrease in test anxiety and an overall significant increase in state mindfulness following the interventions. Furthermore, although a significant negative correlation was found between dispositional mindfulness and change in state mindfulness pre- and post-coloring intervention, a significant positive correlation was found between dispositional mindfulness and pre-intervention state mindfulness, suggesting a possible ceiling effect. Explanations for these findings and implications for school personnel and future research are discussed.  相似文献   

13.
分析初中一年级学生各门课程的学业成绩状况。一方面,利用独立样本检验方法分别对普通班和快班不同性别的学生各门课程的学业成绩是否存在显著性差异进行检验;另一方面,对快班和普通班各门课程的学业成绩进行相关分析,在此基础上进一步对各门课程的学业成绩进行回归建模分析。  相似文献   

14.
In this ITEMS module, we provide a two‐part introduction to the topic of reliability from the perspective of classical test theory (CTT). In the first part, which is directed primarily at beginning learners, we review and build on the content presented in the original didactic ITEMS article by Traub and Rowley (1991). Specifically, we discuss the notion of reliability as an intuitive everyday concept to lay the foundation for its formalization as a reliability coefficient via the basic CTT model. We then walk through the step‐by‐step computation of key reliability indices and discuss the data collection conditions under which each is most suitable. In the second part, which is directed primarily at intermediary learners, we present a distribution‐centered perspective on the same content. We discuss the associated assumptions of various CTT models ranging from parallel to congeneric, and review how these affect the choice of reliability statistics. Throughout the module, we use a customized Excel workbook with sample data and basic data manipulation functionalities to illustrate the computation of individual statistics and to allow for structured independent exploration. In addition, we provide quiz questions with diagnostic feedback as well as short videos that walk through sample exercises within the workbook.  相似文献   

15.
Ⅰ型极小值分布样本异常数据的检验   总被引:1,自引:1,他引:0  
针对Ⅰ型极小值分布样本的多个异常数据,提出了一种新的检验方法.首先寻找到总体参数的具有较好稳健性的估计量,然后在此基础上构造出检验统计量,进一步求出了该检验统计量精确的概率密度函数和大样本情形下的近似分布.由于检验统计量中的核心统计量——样本分位数,对于异常数据的干扰具有一定的抵抗力,因此利用该方法可以达到有效的检验效果.  相似文献   

16.
基于教学改革以及高职人才培养质量评价需要,对中国各省及国外高职院校技能抽考制度进行了研究,结合湖北省高职院校专业建设情况分析,由此构建了湖北省高职专业技能课程抽考制度框架。在该框架下对湖北省高职技能抽考专业课程选择、考核方式、题库建设、评价模式进行了初步探索。  相似文献   

17.
The Teacher Efficacy for Inclusive Practices (TEIP) scale is designed to measure teacher‐self efficacy to teach in inclusive classrooms. The original study identified three scale factors: efficacy in using inclusive instruction (EII), efficacy in collaboration (EC), and efficacy in managing behavior (EMB) (Sharma et al., 2012). The purpose of our study was to examine the TEIP scale for dimensionality and to cross‐validate its factor structure for pre‐service teachers in the context of early childhood education. A bifactor model fit to the data revealed that the TEIP scale is essentially unidimensional, that is, there is one dominant latent factor and the originally found three scale factors (EII, EC, and EMB) represent specific aspects of the general factor of teacher self‐efficacy to teach in inclusive classrooms. Along with providing validation evidence, these findings have important implications for the scoring on the TEIP scale using classical test analysis or unidimensional item response theory models.  相似文献   

18.
本文介绍了在燃烧热的测定实验中采用双孔带槽模底,单、双股燃烧丝相结合的挂样片法,代替单孔或无孔模底,单股燃烧丝接触样片法,以及压片操作技法。四年的实践表明,这种实验方法大大提高了实验的成功率,缩短了实验操作时间。  相似文献   

19.
This study sought to devise a parsimonious instrument for evaluating academic self‐concept (ASC) among British‐born students entering ‘mass‐market’ (post‐1992) universities that cater for diverse and ‘non‐traditional’ intakes. Three major facets of ASC were found to be particularly relevant to these students: self‐belief in one’s academic competence; self‐appreciation of one’s personal worth as a student (independent of ability‐related considerations); and self‐connection with being an undergraduate.  相似文献   

20.
Statistical theories of goodness-of-fit tests in structural equation modeling are based on asymptotic distributions of test statistics. When the model includes a large number of variables or the population is not from a multivariate normal distribution, the asymptotic distributions do not approximate the distribution of the test statistics very well at small sample sizes. A variety of methods have been developed to improve the accuracy of hypothesis testing at small sample sizes. However, all these methods have their limitations, specially for nonnormal distributed data. We propose a Monte Carlo test that is able to control Type I error with more accuracy compared to existing approaches in both normal and nonnormally distributed data at small sample sizes. Extensive simulation studies show that the suggested Monte Carlo test has a more accurate observed significance level as compared to other tests with a reasonable power to reject misspecified models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号