首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The purpose of this ITEMS module is to provide an introduction to differential item functioning (DIF) analysis using mixture item response models. The mixture item response models for DIF analysis involve comparing item profiles across latent groups, instead of manifest groups. First, an overview of DIF analysis based on latent groups, called latent DIF analysis, is provided and its applications in the literature are surveyed. Then, the methodological issues pertaining to latent DIF analysis are described, including mixture item response models, parameter estimation, and latent DIF detection methods. Finally, recommended steps for latent DIF analysis are illustrated using empirical data.  相似文献   

2.
ABSTRACT

Differential item functioning (DIF) analyses have been used as the primary method in large-scale assessments to examine fairness for subgroups. Currently, DIF analyses are conducted utilizing manifest methods using observed characteristics (gender and race/ethnicity) for grouping examinees. Homogeneity of item responses is assumed denoting that all examinees respond to test items using a similar approach. This assumption may not hold with all groups. In this study, we demonstrate the first application of the latent class (LC) approach to investigate DIF and its sources with heterogeneous (linguistic minority groups). We found at least three LCs within each linguistic group, suggesting the need to empirically evaluate this assumption in DIF analysis. We obtained larger proportions of DIF items with larger effect sizes when LCs within language groups versus the overall (majority/minority) language groups were examined. The illustrated approach could be used to improve the ways in which DIF analyses are typically conducted to enhance DIF detection accuracy and score-based inferences when analyzing DIF with heterogeneous populations.  相似文献   

3.
Differential item functioning (DIF) may be caused by an interaction of multiple manifest grouping variables or unexplored manifest variables, which cannot be detected by conventional DIF detection methods that are based on a single manifest grouping variable. Such DIF may be detected by a latent approach using the mixture item response theory model and subsequently explained by multiple manifest variables. This study facilitates the interpretation of latent DIF with the use of background and cognitive variables. The PISA 2009 reading assessment and student survey are analyzed. Results show that members in manifest groups were not homogenously advantaged or disadvantaged and that a single manifest grouping variable did not suffice to be a proxy of latent DIF. This study also demonstrates that DIF items arising from the interaction of multiple variables can be effectively screened by the latent DIF analysis approach. Background and cognitive variables jointly well predicted latent class membership.  相似文献   

4.
Detection of differential item functioning (DIF) is most often done between two groups of examinees under item response theory. It is sometimes important, however, to determine whether DIF is present in more than two groups. In this article we present a method for detection of DIF in multiple groups. The method is closely related to Lard's chi-square for comparing vectors of item parameters estimated in two groups. An example using real data is provided.  相似文献   

5.
In this study, the authors explored the importance of item difficulty (equated delta) as a predictor of differential item functioning (DIF) of Black versus matched White examinees for four verbal item types (analogies, antonyms, sentence completions, reading comprehension) using 13 GRE-disclosed forms (988 verbal items) and 11 SAT-disclosed forms (935 verbal items). The average correlation across test forms for each item type (and often the correlation for each individual test form as well) revealed a significant relationship between item difficulty and DIF value for both GRE and SAT. The most important finding indicates that for hard items, Black examinees perform differentially better than matched ability White examinees for each of the four item types and for both the GRE and SAT tests! The results further suggest that the amount of verbal context is an important determinant of the magnitude of the relationship between item difficulty and differential performance of Black versus matched White examinees. Several hypotheses accounting for this result were explored.  相似文献   

6.
When tests are administered under fixed time constraints, test performances can be affected by speededness. Among other consequences, speededness can result in inaccurate parameter estimates in item response theory (IRT) models, especially for items located near the end of tests (Oshima, 1994). This article presents an IRT strategy for reducing contamination in item difficulty estimates due to speededness. Ordinal constraints are applied to a mixture Rasch model (Rost, 1990) so as to distinguish two latent classes of examinees: (a) a "speeded" class, comprised of examinees that had insufficient time to adequately answer end-of-test items, and (b) a "nonspeeded" class, comprised of examinees that had sufficient time to answer all items. The parameter estimates obtained for end-of-test items in the nonspeeded class are shown to more accurately approximate their difficulties when the items are administered at earlier locations on a different form of the test. A mixture model can also be used to estimate the class memberships of individual examinees. In this way, it can be determined whether membership in the speeded class is associated with other student characteristics. Results are reported for gender and ethnicity.  相似文献   

7.
Identifying the Causes of DIF in Translated Verbal Items   总被引:1,自引:0,他引:1  
Translated tests are being used increasingly for assessing the knowledge and skills of individuals who speak different languages. There is little research exploring why translated items sometimes function differently across languages. If the sources of differential item functioning (DIF) across languages could be predicted, it could have important implications on test development, scoring and equating. This study focuses on two questions: “Is DIF related to item type?”, “What are the causes of DIF?” The data were taken from the Israeli Psychometric Entrance Test in Hebrew (source) and Russian (translated). The results indicated that 34% of the items functioned differentially across languages. The analogy items were the most problematic with 65% showing DIF, mostly in favor of the Russian-speaking examinees. The sentence completion items were also a problem (45% D1F). The main reasons for DIF were changes in word difficulty, changes in item format, differences in cultural relevance, and changes in content.  相似文献   

8.
This Monte Carlo study examined the effect of complex sampling of items on the measurement of differential item functioning (DIF) using the Mantel-Haenszel procedure. Data were generated using a 3-parameter logistic item response theory model according to the balanced incomplete block (BIB) design used in the National Assessment of Educational Progress (NAEP). The length of each block of items and the number of DIF items in the matching variable were varied, as was the difficulty, discrimination, and presence of DIF in the studied item. Block, booklet, pooled booklet, and extra-information analyses were compared to a complete data analysis using the transformed log-odds on the delta scale. The pooled booklet approach is recommended for use when items are selected for examinees according to a BIB design. This study has implications for DIF analyses of other complex samples of items, such as computer administered testing or another complex assessment design.  相似文献   

9.
《教育实用测度》2013,26(3):217-233
The standardization and Mantel-Haenszel approaches to the assessment of differential item functioning (DIF) are described and compared. For rightwrong scoring of items, these two approaches, which emphasize the importance of comparing comparable groups of examinees, use the same data base for analysis, namely, a 2 (Group) x 2 (Item Score: Correct or Incorrect) x S (Score Level) contingency table for each item studied. The two procedures differ with respect to how they operate on these basic data tables to compare the performance of the two groups of examinees. Whereas the operations employed by Mantel-Haenszel are motivated by statistical power considerations, the operations employed by standardization are motivated by datainterpretation considerations. These differences in operation culminate in different measures of DIF effect-size that are very highly related indicators of degree of departure from the null hypothesis of no DIF.  相似文献   

10.
Differential item functioning (DIF) analyses are a routine part of the development of large-scale assessments. Less common are studies to understand the potential sources of DIF. The goals of this study were (a) to identify gender DIF in a large-scale science assessment and (b) to look for trends in the DIF and non-DIF items due to content, cognitive demands, item type, item text, and visual-spatial or reference factors. To facilitate the analyses, DIF studies were conducted at 3 grade levels and for 2 randomly equivalent forms of the science assessment at each grade level (administered in different years). The DIF procedure itself was a variant of the "standardization procedure" of Dorans and Kulick (1986) and was applied to very large sets of data (6 sets of data, each involving 60,000 students). It has the advantages of being easy to understand and to explain to practitioners. Several findings emerged from the study that would be useful to pass on to test development committees. For example, when there was DIF in science items, MC items tended to favor male examinees and OR items tended to favor female examinees. Compiling DIF information across multiple grades and years increases the likelihood that important trends in the data will be identified and that item writing practices will be informed by more than anecdotal reports about DIF.  相似文献   

11.
Increasingly, tests are being translated and adapted into different languages. Differential item functioning (DIF) analyses are often used to identify non-equivalent items across language groups. However, few studies have focused on understanding why some translated items produce DIF. The purpose of the current study is to identify sources of differential item and bundle functioning on translated achievement tests using substantive and statistical analyses. A substantive analysis of existing DIF items was conducted by an 11-member committee of testing specialists. In their review, four sources of translation DIF were identified. Two certified translators used these four sources to categorize a new set of DIF items from Grade 6 and 9 Mathematics and Social Studies Achievement Tests. Each item was associated with a specific source of translation DIF and each item was anticipated to favor a specific group of examinees. Then, a statistical analysis was conducted on the items in each category using SIBTEST. The translators sorted the mathematics DIF items into three sources, and they correctly predicted the group that would be favored for seven of the eight items or bundles of items across two grade levels. The translators sorted the social studies DIF items into four sources, and they correctly predicted the group that would be favored for eight of the 13 items or bundles of items across two grade levels. The majority of items in mathematics and social studies were associated with differences in the words, expressions, or sentence structure of items that are not inherent to the language and/or culture. By combining substantive and statistical DIF analyses, researchers can study the sources of DIF and create a body of confirmed DIF hypotheses that may be used to develop guidelines and test construction principles for reducing DIF on translated tests.  相似文献   

12.
In a previous simulation study of methods for assessing differential item functioning (DIF) in computer-adaptive tests (Zwick, Thayer, & Wingersky, 1993, 1994), modified versions of the Mantel-Haenszel and standardization methods were found to perform well. In that study, data were generated using the 3-parameter logistic (3PL) model and this same model was assumed in obtaining item parameter estimates. In the current study, the 3PL data were used but the Rasch model was assumed in obtaining the item parameter estimates, which determined the information table used for item selection. Although the obtained DIF statistics were highly correlated with the generating DIF values, they tended to be smaller in magnitude than in the 3PL analysis, resulting in a lower probability of DIF detection. This reduced sensitivity appeared to be related to a degradation in the accuracy of matching. Expected true scores from the Rasch-based computer-adaptive test tended to be biased downward, particularly for lower-ability examinees  相似文献   

13.
Logistic regression has recently been advanced as a viable procedure for detecting differential item functioning (DIF). One of the advantages of this procedure is the considerable flexibility it offers in the specification of the regression equation. This article describes incorporating two ability estimates into a single regression analysis, with the result that substantially fewer items exhibit DIF. A comparable analysis is conducted using the Mantel-Haenszel with similar results. It is argued that by simultaneously conditioning on two relevant ability estimates, more accurate matching of examinees in the reference and focal groups is obtained, and thus multidimensional item impact is not mistakenly identified as DIF.  相似文献   

14.
In this paper we present a new methodology for detecting differential item functioning (DIF). We introduce a DIF model, called the random item mixture (RIM), that is based on a Rasch model with random item difficulties (besides the common random person abilities). In addition, a mixture model is assumed for the item difficulties such that the items may belong to one of two classes: a DIF or a non-DIF class. The crucial difference between the DIF class and the non-DIF class is that the item difficulties in the DIF class may differ according to the observed person groups while they are equal across the person groups for the items from the non-DIF class. Statistical inference for the RIM is carried out in a Bayesian framework. The performance of the RIM is evaluated using a simulation study in which it is compared with traditional procedures, like the likelihood ratio test, the Mantel-Haenszel procedure and the standardized p -DIF procedure. In this comparison, the RIM performs better than the other methods. Finally, the usefulness of the model is also demonstrated on a real life data set.  相似文献   

15.
The standardization approach to assessing differential item functioning (DIF), including standardized distractor analysis, is described. The results of studies conducted on Asian Americans, Hispanics (Mexican Americans and Puerto Ricans), and Blacks on the Scholastic Aptitude Test (SAT) are described and then synthesized across studies. Where the groups were limited to include only examinees who spoke English as their best language, very few items across forms and ethnic groups exhibited large DIF. Major findings include evidence of differential speededness (where minority examinees did not complete SAT-Verbal sections at the same rate as White students with comparable SAT-Verbal scores) for Blacks and Hispanics and, when the item content is of special interest, advantages for the relevant ethnic group. In addition, homographs tend to disadvantage all three ethnic groups, but the effect of vertical relationships in analogy items are not as consistent. Although these findings are important in understanding DIF, they do not seem to account for all differences. Other variables related to DIF still need to be identified. Furthermore, these findings are seen as tentative until corroborated by studies using controlled data collection designs.  相似文献   

16.
We analyzed a pool of items from an admissions test for differential item functioning (DIF) for groups based on age, socioeconomic status, citizenship, or English language status using Mantel-Haenszel and item response theory. DIF items were systematically examined to identify its possible sources by item type, content, and wording. DIF was primarily found in the citizenship group. As suggested by expert reviewers, possible sources of DIF in the direction of U.S. citizens was often in Quantitative Reasoning in items containing figures, charts, tables depicting real-world (as opposed to abstract) contexts. DIF items in the direction of non-U.S. citizens included “mathematical” items containing few words. DIF for the Verbal Reasoning items included geocultural references and proper names that may be differentially familiar for non-U.S. citizens. This study is responsive to foundational changes in the fairness section of the Standards for Educational and Psychological Testing, which now consider additional groups in sensitivity analyses, given the increasing demographic diversity in test-taker populations.  相似文献   

17.
朱乙艺  焦丽亚 《考试研究》2012,(6):80-87,19
和基于实测数据的DIF研究相比,基于模拟数据的DIF研究不仅可以自由操纵实验条件,而且可以给出检验力和I型错误指标。本文详细阐述了二级计分DIF模拟数据的产生原理,其产生过程包括四个阶段:选择DIF产生思路,选择项目反应理论模型,确定考生特征、题目特征和复本数,计算考生在题目上的正确作答概率并转化为二级计分数据。并且分别利用常用软件Excel和专业软件WinGen3展示了二级计分DIF模拟数据的产生过程。  相似文献   

18.
Most currently accepted approaches for identifying differentially functioning test items compare performance across groups after first matching examinees on the ability of interest. The typical basis for this matching is the total test score. Previous research indicates that when the test is not approximately unidimensional, matching using the total test score may result in an inflated Type I error rate. This study compares the results of differential item functioning (DIF) analysis with matching based on the total test score, matching based on subtest scores, or multivariate matching using multiple subtest scores. Analysis of both actual and simulated data indicate that for the dimensionally complex test examined in this study, using the total test score as the matching criterion is inappropriate. The results suggest that matching on multiple subtest scores simultaneously may be superior to using either the total test score or individual relevant subtest scores.  相似文献   

19.
The assessment of differential item functioning (DIF) is routinely conducted to ensure test fairness and validity. Although many DIF assessment methods have been developed in the context of classical test theory and item response theory, they are not applicable for cognitive diagnosis models (CDMs), as the underlying latent attributes of CDMs are multidimensional and binary. This study proposes a very general DIF assessment method in the CDM framework which is applicable for various CDMs, more than two groups of examinees, and multiple grouping variables that are categorical, continuous, observed, or latent. The parameters can be estimated with Markov chain Monte Carlo algorithms implemented in the freeware WinBUGS. Simulation results demonstrated a good parameter recovery and advantages in DIF assessment for the new method over the Wald method.  相似文献   

20.
There are numerous statistical procedures for detecting items that function differently across subgroups of examinees that take a test or survey. However, in endeavouring to detect items that may function differentially, selection of the statistical method is only one of many important decisions. In this article, we discuss the important decisions that affect investigations of differential item functioning (DIF) such as choice of method, sample size, effect size criteria, conditioning variable, purification, DIF amplification, DIF cancellation, and research designs for evaluating DIF. Our review highlights the necessity of matching the DIF procedure to the nature of the data analysed, the need to include effect size criteria, the need to consider the direction and balance of items flagged for DIF, and the need to use replication to reduce Type I errors whenever possible. Directions for future research and practice in using DIF to enhance the validity of test scores are provided.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号