期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Test fairness: Examining differential functioning of the reading comprehension section of the GSEEE in China

《Studies in Educational Evaluation》2020

This study investigated differential item functioning (DIF), differential bundle functioning (DBF), and differential test functioning (DTF) across gender of the reading comprehension section of the Graduate School Entrance English Exam in China. The datasets included 10,000 test-takers’ item-level responses to 6 five-item testlets. Both DIF and DBF were examined by using poly-simultaneous item bias test and item-response-theory-likelihood-ratio test, and DTF was investigated with multi-group confirmatory factor analyses (MG-CFA). The results indicated that although none of the 30 items exhibited statistically and practically significant DIF across gender at the item level, 2 testlets were consistently identified as having significant DBF at the testlet level by the two procedures. Nonetheless, DBF does not manifest itself at the overall test score level to produce DTF based on MG-CFA. This suggests that the relationship between item-level DIF and test-level DTF is a complicated issue with the mediating effect of testlets in testlet-based language assessment. 相似文献

2.

Effects of Computer Anxiety and Computer Experience on the Computer-Based Achievement Test Performance of College Students

Ou Lydia Liu 《教育实用测度》2013,26(3):235-255

The TOEFL® iBT has increased the length of each reading passage to better approximate academic reading at North American universities, resulting in a reduction in the number of passages on the reading section of the test. One of the concerns brought about by this change is whether the decrease in topic variety increases the likelihood that an examinee's familiarity with the content of a given passage will influence the examinee's reading performance. This study investigated differential item functioning and differential bundle functioning for six TOEFL® iBT reading passages (N?=?8,692), three involving physical science and three involving cultural topics. The majority of items displayed little or no DIF. When all of the items in a passage were examined, none of the passages showed differential functioning at the passage level. Hypotheses are provided for the DIF occurrences. Implications on fairness issues in test development are also discussed. 相似文献

3.

Are Inferential Reading Items More Susceptible to Cultural Bias Than Literal Reading Items?

Kathleen Banks 《教育实用测度》2013,26(3):220-245

The purpose of this article is to illustrate a seven-step process for determining whether inferential reading items were more susceptible to cultural bias than literal reading items. The seven-step process was demonstrated using multiple-choice data from the reading portion of a reading/language arts test for fifth and seventh grade Hispanic, Black, and White examinees. The process began at the broadest level of analyzing bundles of items for differential bundle functioning and finished at the narrowest level of analyzing individual items for differential distractor functioning. Some evidence was found to indicate that inferential items are more susceptible to cultural bias than literal items. Implications of the results are discussed, and suggestions for item writers and test developers are given. 相似文献

4.

Equivalence of Paper-and-Pencil and Online Administration Modes of the Statewide English Test for Students With and Without Disabilities

Do-Hong Kim Huynh Huynh 《Educational Assessment》2013,18(2):107-121

This study investigated whether scores obtained from the online and paper-and-pencil administrations of the statewide end-of-course English test were equivalent for students with and without disabilities. Score comparability was evaluated by examining equivalence of factor structure (measurement invariance) and differential item and bundle functioning analyses for the online and paper groups. Results supported measurement invariance between the online and paper groups, suggesting that it is meaningful to compare scores across administration modes. When the data were analyzed at both the item and item bundle (content area) levels, similar performance appeared between the online and paper groups. 相似文献

5.

高中学业水平考试研究(二):考试质量评价 总被引：1，自引：0，他引：1

周群《考试研究》2012,(6):20-28

学业水平考试试题和试卷的质量直接影响学业评价和诊断结果的有效性和可靠性。本文以上海市高中思想政治学科学业水平考试为例,从试题和题组功能偏差、试题得分与总分的相关系数、识别指数分析及分类一致性和准确性四个方面对考试的质量进行了定量评价,以介绍学业水平考试质量评价的方法。相似文献

6.

Identifying Possible Sources of Differential Functioning Using Differential Bundle Functioning With Polytomously Scored Data

F. A. McCarty T. C. Oshima Nambury S. Raju 《教育实用测度》2013,26(2):205-225

Oshima, Raju, Flowers, and Slinde (1998) Oshima, T. C., Raju, N. S., Flowers, C. P. and Slinde, J. A. 1998. Differential bundle functioning using the DFIT framework: Procedures for identifying possible sources of differential functioning. Applied Measurement in Education, 11: 353–369. [Taylor & Francis Online], [Web of Science ®] , [Google Scholar] described procedures for identifying sources of differential functioning for dichotomous data using differential bundle functioning (DBF) derived from the differential functioning of items and test (DFIT) framework (Raju, van der Linden, & Fleer, 1995 Raju, N. S., van der Linden, W. J. and Fleer, P. F. 1995. IRT-based internal measures of differential functioning of items and tests. Applied Psychological Measurement, 19: 353–368. [Crossref], [Web of Science ®] , [Google Scholar]). The purpose of this study was to extend the procedures for dichotomous DBF to the polytomous case and to illustrate how DBF analysis can be conducted with polytomous scoring, common to psychological and educational rating scales. The data set used was parent and teacher ratings of child problem behaviors. Three group contrasts (teacher vs. parent, boy vs. girl, and random groups) and two bundle organizing principles (subscale designation and random selection) were used for the DBF analysis. Interpretations of bundle indexes in the context of child problem behaviors were presented. 相似文献

7.

Item-Bundle DIF Hypothesis Testing: Identifying Suspect Bundles and Assessing Their Differential Functioning

Jeffrey A. Douglas Louis A. Roussos William Stout 《Journal of Educational Measurement》1996,33(4):465-484

This article proposes two multidimensional IRT model-based methods of selecting item bundles (clusters of not necessarily adjacent items chosen according to some organizational principle) suspected of displaying DIF amplification. The approach embodied in these two methods is inspired by Shealy and Stout's (1993a, 1993b) multidimensional model for DIF. Each bundle selected by these methods constitutes a DIF amplification hypothesis. When SIBTEST (Shealy & Stout, 1993b) confirms DIF amplification in selected bundles, differential bundle functioning (DBF) is said to occur. Three real data examples illustrate the two methods for suspect bundle selection. The effectiveness of the methods is argued on statistical grounds. A distinction between benign and adverse DIF is made. The decision whether flagged DIF items or DBF bundles display benign or adverse DIF/DBF must depend in part on nonstatistical construct validity arguments. Conducting DBF analyses using these methods should help in the identification of the causes of DIF/DBF. 相似文献

8.

Exploring differential bundle functioning in mathematics by gender: the effect of hierarchical modelling

《International Journal of Research & Method in Education》2013,36(1):82-100

Researchers interested in exploring substantive group differences are increasingly attending to bundles of items (or testlets): the aim is to understand how gender differences, for instance, are explained by differential performances on different types or bundles of items, hence differential bundle functioning (DBF). Some previous work has modelled hierarchies in data in this context or considered item responses within persons, but here we model the bundles themselves as explanatory variables at the item level potentially explaining significant intra-class correlation due to gender differences in item difficulty, and thus explaining variation at the second item level. In this study, we analyse DBF using single- and two-level models (the latter modelling random item effects, which models responses at Level 1 and items at Level 2) in a high-stakes National Mathematics test. The models show comparable regression coefficients but the statistical significances of the two-level models are smaller due to the larger values of the estimated standard errors. We discuss the contrasting relevance of this effect for test developers and gender researchers. 相似文献

9.

The Standardization Approach to Assessing Comprehensive Differential Item Functioning 总被引：1，自引：0，他引：1

Neil J. Dorans Alicia P. Schmitt Carole A. Bleistein 《Journal of Educational Measurement》1992,29(4):309-319

相似文献

10.

Effects of multimedia on psychometric characteristics of cognitive tests: A comparison between technology-based and paper-based modalities

《Studies in Educational Evaluation》2023

The study aims to investigate the effects of delivery modalities on psychometric characteristics and student performance on cognitive tests. A first study assessed the inductive reasoning ability of 715 students under the supervision of teachers. A second study examined 731 students’ performance on the application of the control-of-variables strategy in basic physics but without teacher supervision due to the COVID-19 pandemic. Rasch measurement showed that the online format fitted to the data better in the unidimensional model across two conditions. Under teacher supervision, paper-based testing was better than online testing in terms of reliability and total scores, but contradictory findings were found in turn without teacher supervision. Although measurement invariance was confirmed between two versions at item level, the differential bundle functioning analysis supported the online groups on the item bundles constructed of figure-related materials. Response time was also discussed as an advantage of technology-based assessment for test development. 相似文献

11.

A Synthesis of the Peer‐Reviewed Differential Bundle Functioning Research

Kathleen Banks 《Educational Measurement》2013,32(1):43-55

The purpose of this article was to present a synthesis of the peer‐reviewed differential bundle functioning (DBF) research that has been conducted to date. A total of 16 studies were synthesized according to the following characteristics: tests used and learner groups, organizing principles used for developing bundles, DBF detection methods used, and types of bundles that indicated statistically significant DBF in the hypothesized direction on multiple occasions. The article concludes with a list of suggestions to individuals who conduct DBF research. For example, effect size guidelines should be established for interpreting the amount of DBF in bundles of items assessed with simultaneous item bias test (SIBTEST), given that it is the most commonly used DBF procedure. This would reduce our reliance on statistical significance testing. General effect size guidelines are needed as well as guidelines for special circumstances like small sample cases. Other useful suggestions are offered as well. 相似文献

12.

Ongoing issues in test fairness

Gregory Camilli 《Educational Research and Evaluation》2013,19(2-3):104-120

In the attempt to identify or prevent unfair tests, both quantitative analyses and logical evaluation are often used. For the most part, fairness evaluation is a pragmatic attempt at determining whether procedural or substantive due process has been accorded to either a group of test takers or an individual. In both the individual and comparative approaches to test fairness, counterfactual reasoning is useful to clarify a potential charge of unfairness: Is it plausible to believe that with an alternative assessment (test or item) or under different test conditions an individual or groups of individuals may have fared better? Beyond comparative questions, fairness can also be framed by moral and ethical choices. A number of ongoing issues are evaluated with respect to these topics including accommodations, differential item functioning (DIF), differential prediction and selection, employment testing, test validation, and classroom assessment. 相似文献

13.

Development and Demonstration of Multidimensional IRT-Based Internal Measures of Differential Functioning of ltems and Tests

T. C. Oshima Nambury S. Raju Claudia P. Flowers 《Journal of Educational Measurement》1997,34(3):253-272

This article defines and demonstrates a framework for studying differential item functioning (DIF) and differential test functioning (DTF) for tests that are intended to be multidimensional The procedure introduced here is an extension of unidimensional differential functioning of items and tests (DFIT) recently developed by Raju, van der Linden, & Fleer (1995). To demonstrate the usefulness of these new indexes in a multidimensional IRT setting, two-dimensional data were simulated with known item parameters and known DIF and DTE The DIF and DTF indexes were recovered reasonably well under various distributional differences of Os after multidimensional linking was applied to put the two sets of item parameters on a common scale. Further studies are suggested in the area of DIF/DTF for intentionally multidimensional tests. 相似文献

14.

Using Subpopulation Invariance to Assess Test Score Equity

Neil J. Dorans 《Journal of Educational Measurement》2004,41(1):43-68

Score equity assessment (SEA) is introduced, and placed within a fair assessment context that includes differential prediction or fair selection and differential item functioning. The notion of subpopulation invariance of linking functions is central to the assessment of score equity, just as it has been for differential item functioning and differential prediction. Advanced Placement (AP) data are used for illustrative purposes. The use of multiple-choice and constructed response items in AP provides an opportunity to observe a case where subpopulation invariance of linking functions does not hold (U.S. History), and a case in which it does hold (Calculus AB). The lack of invariance for U.S. History might be attributed to several sources. The role of SEA in assessing the fairness of test assembly processes is discussed. 相似文献

15.

DIF分析实际应用中的常见问题及其研究新进展 总被引：1，自引：0，他引：1

李凌艳张勋《考试研究》2010,(2):73-82

多等级计分题、小样本、匹配变量不纯以及DIF检验后的原因分析是DIF检验面临的常见问题,对多等级计分题目进行DSF分析,小样本情况下DIF检测的平滑方法,匹配变量不纯情况下采用MIMIC法,以及运用Logistic模型进行DIF检验后的原因分析是DIF研究中的一些新进展。对这些进展的分析使我们相信,多种检验方法的配合使用、运用DIF研究进行多维IRT框架下的潜在变量探究等,都有可能使DIF研究成为测量学未来的基础研究领域之一。相似文献

16.

Differential Item Functioning on the SAT-M Braille Edition

Randy Elliot Bennett Donald A. Rock Inge Novatkoski 《Journal of Educational Measurement》1989,26(1):67-79

This study attempted to pinpoint the causes of differential item difficulty for blind students taking the braille edition of the Scholastic Aptitude Test's Mathematical section (SAT-M). The study method involved reviewing the literature to identify factors that might cause differential item functioning for these examinees, forming item categories based on these factors, identifying categories that functioned differentially, and assessing the functioning o f the items comprising deviant categories to determine if the differential effect was pervasive. Results showed an association between selected item categories and differential functioning, particularly for items that included figures in the stimulus, items for which spatial estimation was helpful in eliminating at least two of the options, and items that presented figures that were small or medium in size. The precise meaning of this association was unclear, however, because some items from the suspected categories functioned normally, factors other than the hypothesized ones might have caused the observed aberrant item behavior, and the differential difficulty might reflect real population differences in relevant content knowledge 相似文献

17.

A Generalized DIF Effect Variance Estimator for Measuring Unsigned Differential Test Functioning in Mixed Format Tests

Randall D. Penfield James Algina 《Journal of Educational Measurement》2006,43(4):295-312

One approach to measuring unsigned differential test functioning is to estimate the variance of the differential item functioning (DIF) effect across the items of the test. This article proposes two estimators of the DIF effect variance for tests containing dichotomous and polytomous items. The proposed estimators are direct extensions of the noniterative estimators developed by Camilli and Penfield (1997) for tests composed of dichotomous items. A small simulation study is reported in which the statistical properties of the generalized variance estimators are assessed, and guidelines are proposed for interpreting values of DIF effect variance estimators. 相似文献

18.

A Nested Logit Approach for Investigating Distractors as Causes of Differential Item Functioning

Youngsuk Suh Daniel M. Bolt 《Journal of Educational Measurement》2011,48(2):188-205

In multiple‐choice items, differential item functioning (DIF) in the correct response may or may not be caused by differentially functioning distractors. Identifying distractors as causes of DIF can provide valuable information for potential item revision or the design of new test items. In this paper, we examine a two‐step approach based on application of a nested logit model for this purpose. The approach separates testing of differential distractor functioning (DDF) from DIF, thus allowing for clearer evaluations of where distractors may be responsible for DIF. The approach is contrasted against competing methods and evaluated in simulation and real data analyses. 相似文献

19.

Interplay of Various Methodologies is Good for Identifying Cultures of Physics Education

Vanes Mesic 《International Journal of Science Education》2013,35(6):1071-1081

In this Reply to the Letter to the Editor, it is emphasized that scientific inquiry requires a cyclic interplay of quantitative and qualitative methods. Furthermore, it is stressed out that the results of the predominantly quantitative study could be a basis for designing detailed qualitative studies of country-specific cultures of physics education. The overlooked fact that students from Slovenia were not taught about two topics related to ‘Electricity and magnetism’, did not compromise the results of differential item functioning/differential group functioning procedures, at all. The big majority of conclusions related to causes of students' differential achievement profiles were not affected either. 相似文献

20.

Identifying Sources of Differential Item and Bundle Functioning on Translated Achievement Tests: A Confirmatory Analysis

Mark J. Gierl Shameem Nyla Khaliq 《Journal of Educational Measurement》2001,38(2):164-187

Increasingly, tests are being translated and adapted into different languages. Differential item functioning (DIF) analyses are often used to identify non-equivalent items across language groups. However, few studies have focused on understanding why some translated items produce DIF. The purpose of the current study is to identify sources of differential item and bundle functioning on translated achievement tests using substantive and statistical analyses. A substantive analysis of existing DIF items was conducted by an 11-member committee of testing specialists. In their review, four sources of translation DIF were identified. Two certified translators used these four sources to categorize a new set of DIF items from Grade 6 and 9 Mathematics and Social Studies Achievement Tests. Each item was associated with a specific source of translation DIF and each item was anticipated to favor a specific group of examinees. Then, a statistical analysis was conducted on the items in each category using SIBTEST. The translators sorted the mathematics DIF items into three sources, and they correctly predicted the group that would be favored for seven of the eight items or bundles of items across two grade levels. The translators sorted the social studies DIF items into four sources, and they correctly predicted the group that would be favored for eight of the 13 items or bundles of items across two grade levels. The majority of items in mathematics and social studies were associated with differences in the words, expressions, or sentence structure of items that are not inherent to the language and/or culture. By combining substantive and statistical DIF analyses, researchers can study the sources of DIF and create a body of confirmed DIF hypotheses that may be used to develop guidelines and test construction principles for reducing DIF on translated tests. 相似文献