期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A Study of Sensori-Motor and Conceptual Thinking in Children between the Ages of Nine and Eighteen

Herman N. Stuart 《Journal of Experimental Education》2013,81(2):147-153

In this research, the author addresses whether the application of unidimensional item response models provides valid interpretation of test results when administering items sensitive to multiple latent dimensions. Overall, the present study found that unidimensional models are quite robust to the violation of the unidimensionality assumption due to secondary dimensions from sensitive items. When secondary dimensions are highly correlated with main construct, unidimensional models generally fit and the accuracy of ability estimation is comparable to that of strictly unidimensional tests. In addition, longer tests are more robust to the violation of the essential unidimensionality assumption than shorter ones. The author also shows that unidimensional item response theory models estimate item difficulty parameter better than item discrimination parameter in tests with secondary dimensions. 相似文献

2.

Evaluation of Dimensionality in the Assessment of Internal Consistency Reliability: Coefficient Alpha and Omega Coefficients

下载免费PDF全文

Samuel B. Green Yanyun Yang 《Educational Measurement》2015,34(4):14-20

In the lead article, Davenport, Davison, Liou, & Love demonstrate the relationship among homogeneity, internal consistency, and coefficient alpha, and also distinguish among them. These distinctions are important because too often coefficient alpha—a reliability coefficient—is interpreted as an index of homogeneity or internal consistency. We argue that factor analysis should be conducted before calculating internal consistency estimates of reliability. If factor analysis indicates the assumptions underlying coefficient alpha are met, then it can be reported as a reliability coefficient. However, to the extent that items are multidimensional, alternative internal consistency reliability coefficients should be computed based on the parameter estimates of the factor model. Assuming a bifactor model evidenced good fit, and the measure was designed to assess a single construct, omega hierarchical—the proportion of variance of the total scores due to the general factor—should be presented. Omega—the proportion of variance of the total scores due to all factors—also should be reported in that it represents a more traditional view of reliability, although it is computed within a factor analytic framework. By presenting both these coefficients and potentially other omega coefficients, the reliability results are less likely to be misinterpreted. 相似文献

3.

A psychometric measurement model for adult English language learners: Pearson Test of English Academic

Hye K. Pae 《Educational Research and Evaluation》2013,19(3):211-229

The aim of this study was to apply Rasch modeling to an examination of the psychometric properties of the Pearson Test of English Academic (PTE Academic). Analyzed were 140 test-takers' scores derived from the PTE Academic database. The mean age of the participants was 26.45 (SD = 5.82), ranging from 17 to 46. Conformity of the participants' performance on the 86 items of PTE Academic Form 1 of the field test was evaluated using the partial credit model. The person reliability coefficient was .96, and item reliability was .99. The results showed that no significant differential item functioning was found across subgroups of gender and spoken-language context, indicating that the item data approximated the Rasch model. The findings of this study validated the test stability of PTE Academic as a useful measurement tool for English language learners' academic English assessment. 相似文献

4.

Damaging confusions in England's KS2 reading tests: a response to Anne Kispal

Mary Hilton 《Literacy》2006,40(1):36-41

This article is written in response to the article published in issue 39.3 of this journal, in November 2005, on the nature of the Key Stage 2 National Curriculum reading tests: ‘Examining England's National Curriculum assessments: an analysis of the KS2 reading test questions’ by Anne Kispal of the National Foundation for Educational Research. It argues that, far from providing a valid and rewarding assessment experience for pupils as Kispal suggests, the primary English tests at the end of KS2 are invalid as a measuring instrument and are having a damaging effect on pedagogy. The tests and the information on them provided by the Qualifications and Curriculum Agency are based on a misleading unidimensional conception of reading literacy attainment. Because the test assessment simply adds together marks achieved for very different cognitive skills, it propagates a dysfunctional model of literacy pedagogy that conflates and confuses two separate developmental trajectories – word reading and text comprehension. The article goes on to argue that the unidimensionality of the national tests and their pedagogic apparatus has constricted the primary English curriculum in ways that are damaging for young pupils and for the national need for creativity and enterprise. 相似文献

5.

How Well Does the Sum Score Summarize the Test? Summability as a Measure of Internal Consistency

下载免费PDF全文

J. J. Goeman N. H. De Jong 《Educational Measurement》2018,37(2):54-63

Many researchers use Cronbach's alpha to demonstrate internal consistency, even though it has been shown numerous times that Cronbach's alpha is not suitable for this. Because the intention of questionnaire and test constructers is to summarize the test by its overall sum score, we advocate summability, which we define as the proportion of total test variation that is explained by the sum score. This measure is closely related to Loevinger's H. The mathematical derivation of summability as a measure of explained variation is given for both scale and dichotomously scored items. Using computer simulations, we show that summability performs adequately and we apply it to an existing productive vocabulary test. An open‐source tool to easily calculate summability is provided online ( https://sites.google.com/view/summability ). 相似文献

6.

Psychometric characteristics of integrated multi-specialty examinations: Ebel ratings and unidimensionality

Matt Homer Jonathan Darling Godfrey Pell 《Assessment & Evaluation in Higher Education》2012,37(7):787-804

Over recent years, UK medical schools have moved to more integrated summative examinations. This paper analyses data from the written assessment of undergraduate medical students to investigate two key psychometric aspects of this type of high-stakes assessment. Firstly, the strength of the relationship between examiner predictions of item performance (as required under the Ebel standard setting method employed) and actual item performance (‘facility’) in the examination is explored. It is found that there is a systematic pattern of difference between these two measures, with examiners tending to underestimate the difficulty of items classified as relatively easy, and overestimating that of items classified harder. The implications of these differences for standard setting are considered. Secondly, the integration of the assessment raises the question as to whether the student total score in the exam can provide a single meaningful measure of student performance across a broad range of medical specialties. Therefore, Rasch measurement theory is employed to evaluate psychometric characteristics of the examination, including its dimensionality. Once adjustment is made for item interdependency, the examination is shown to be unidimensional with fit to the Rasch model implying that a single underlying trait, clinical knowledge, is being measured. 相似文献

7.

Scale Reliability Evaluation Under Multiple Assumption Violations

Tenko Raykov George A. Marcoulides 《Structural equation modeling》2016,23(2):302-313

A latent variable modeling approach to evaluate scale reliability under realistic conditions in empirical behavioral and social research is discussed. The method provides point and interval estimation of reliability of multicomponent measuring instruments when several assumptions are violated. These assumptions include missing data, correlated errors, nonnormality, lack of unidimensionality, and data not missing at random. The procedure can be readily used to aid scale construction and development efforts in applied settings, and is illustrated using data from an educational study. 相似文献

8.

Rasch模型在试卷质量分析中的应用——以小学科学六年级技术与工程素养评测试卷为例

柏毅朱文琴陈慧珍《教育测量与评价(理论版)》2019,(1):25-31

与传统测量模型相比,Rasch模型因其客观和等距的特点在试卷质量分析中独具优势。本文以南京市小学科学六年级技术与工程素养评测试卷的质量分析为例,从试卷整体质量检验、单维性检验、试卷难度与学生能力的匹配性检验、各题质量分析、题目拟合度和测量误差检验等方面介绍了Rasch模型在试卷质量分析中的应用,同时指出该评测试卷的信效度较高、题目区分度合理,绝大多数题目达到了测量预期。在具体应用中,测量者应依据实际情况选择合适的Rasch分析软件及Rasch模型对应的分析功能;在Rasch模型检测出试卷中的问题项目后,测量者应依据实际情况解释和处理这些问题项目。相似文献

9.

Delimiting Coefficient α from Internal Consistency and Unidimensionality

下载免费PDF全文

Klaas Sijtsma 《Educational Measurement》2015,34(4):10-13

I discuss the contribution by Davenport, Davison, Liou, & Love (2015) in which they relate reliability represented by coefficient α to formal definitions of internal consistency and unidimensionality, both proposed by Cronbach (1951). I argue that coefficient α is a lower bound to reliability and that concepts of internal consistency and unidimensionality, however defined, belong to the realm of validity, viz. the issue of what the test measures. Internal consistency and unidimensionality may play a role in the construction of tests when the theory of the attribute for which the test is constructed implies that the items be internally consistent or unidimensional. I also offer examples of attributes that do not imply internal consistency or unidimensionality, thus limiting these concepts' usefulness in practical applications. 相似文献