首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
DIF分析实际应用中的常见问题及其研究新进展   总被引:1,自引:0,他引:1  
多等级计分题、小样本、匹配变量不纯以及DIF检验后的原因分析是DIF检验面临的常见问题,对多等级计分题目进行DSF分析,小样本情况下DIF检测的平滑方法,匹配变量不纯情况下采用MIMIC法,以及运用Logistic模型进行DIF检验后的原因分析是DIF研究中的一些新进展。对这些进展的分析使我们相信,多种检验方法的配合使用、运用DIF研究进行多维IRT框架下的潜在变量探究等,都有可能使DIF研究成为测量学未来的基础研究领域之一。  相似文献   

2.
朱乙艺  焦丽亚 《考试研究》2012,(6):80-87,19
和基于实测数据的DIF研究相比,基于模拟数据的DIF研究不仅可以自由操纵实验条件,而且可以给出检验力和I型错误指标。本文详细阐述了二级计分DIF模拟数据的产生原理,其产生过程包括四个阶段:选择DIF产生思路,选择项目反应理论模型,确定考生特征、题目特征和复本数,计算考生在题目上的正确作答概率并转化为二级计分数据。并且分别利用常用软件Excel和专业软件WinGen3展示了二级计分DIF模拟数据的产生过程。  相似文献   

3.
本文通过对2011年新汉语水平考试HSK(六级)8次考试的试题进行项目功能差异(DIF)分析,以评估其性别公平性。结果显示,800个试题中存在DIF的题目占总数的3.3%;800个试题的MH值平均数为0.02,其95%置信区间包含0,即试卷总体上不存在DIF。因此,HSK(六级)具有较理想的性别公平性。  相似文献   

4.
本研究引入能够处理题组效应的项目功能差异检验方法,为篇章阅读测验提供更科学的DIF检验法。研究采用GMH法、P—SIBTEST法和P—LR法对中国汉语水平考试(HSK)(高等)阅读理解试题进行了DIF检验。结果表明,这三种方法的检验结果具有较高的一致性,该部分试题在性别与国别变量上不存在显著的DIF效应。本研究还将传统的DIF检验方法与变通的题组DIF检验方法进行了比较,结果表明后者具有明显的优越性。  相似文献   

5.
已有研究发现,对于高考语文测试因子结构的探索目前鲜有人涉及。本研究的目的是,采用因子分析方法探索高考语文测验的试题结构,考查高考语文试题的因子结构是否与理论预期的结构一致。结果发现,运用探索性因素分析建构的模型为:A、B两省语文试卷试题均出现题型因子,包括二级计分题因子和多级计分题因子;模型整体拟合度明显优于依据考试大纲建构的模型。基于此,建议应进一步提升考试命题技术,通过编制科学、合理的测验完善语文核心素养的测量与评价体系。  相似文献   

6.
随着多级计分在心理和教育领域中日益广泛的应用,对检验项目功能差异(DIF)的方法提出新的挑战。已有研究表明,在检验DIF的方法中,MIMIC是一种经济有效的检验方法,然而还没有研究系统地分析MIMIC方法在多级计分项目中的有效性。本研究通过蒙特卡洛实验,探讨参照组与目标组的样本容量、DIF类别、项目区分度、组间能力差异和在锚题中存在的DIF题量5个因素,并在这些因素不同情况的组合中分析MIMIC方法的第一类错误率和检验力。研究发现:1)MIMIC是一种能够灵敏地检验一致性DIF的方法,即使在目标组样本容量较小或明显小于参照组的情况下,它仍然能很好地控制第一类错误率;2)纯化步骤对MIMIC方法控制第一类错误率、提高检验力是有必要的,但MIMIC方法对污染程度又有一定的容忍性;3)检验力受到低区分度的严重影响,但太高的区分度又会导致第一类错误率的增加;4)MIMIC方法对一致性DIF的检验力随着样本容量的增大而增大。  相似文献   

7.
本研究采用IRT_△b方法对2015年F省小学英语测试试题进行性别DIF检测,以了解该年度小学英语测试的公平性情况。检测结果表明:共有10道题目存在性别DIF,其中6题有利于男生,4题有利于女生。总体上来看,测试工具对男女生是基本公平的。通过对性别DIF题目产生原因的探讨,为以后的命题提供有益的参考。  相似文献   

8.
上海高考英语男、女生成绩显示出显著的性别差异。理论上看,导致这种差异的原因可能有两种,一是试题功能差异和考试功能差异;二是男、女生群体英语语言发展程度上存在差异。以项目反应理论和经典的统计理论为基础,用ConQuest和EZDIF统计软件在试题和大题水平上进行试题功能差异分析,结果显示英语考试的DIF效应可以忽略,且大题不存在DIF的累积情况;男、女生群体英语能力的确存在差异,这种差异不是试题的难度差异造成的。因此,可以认为男、女生群体高考英语成绩差异实际上反映的是男、女生群体英语语言发展程度上存在差异。根据这个结论,作者对上海市高中生英语语言学习研究和教学提出了若干建议。  相似文献   

9.
在介绍湖北上市公司整体财务状况的基础上,运用了Altman的Z计分模型,并以2006年湖北上市公司40只A股作样本(剔除了该年份数据缺失的上市公司),对湖北上市公司财务风险进行了实证研究。研究结果表明,Z计分模型在评价上市公司财务风险方面具有较强的有效性。  相似文献   

10.
Rasch模型在研究生入学考试质量分析中的应用   总被引:1,自引:0,他引:1  
运用Rasch模型对2010年全国硕士研究生入学考试心理学专业基础综合考试进行分析。结果表明,该试题总体上是一套高质量的测验,试题的内容覆盖了所有能力水平的考生,且能够较好地区分考生的能力水平,达到了预期的选拔目的。但通过Rasch分析也发现,在试题中有个别题目没有达到预期的测量目标,可以考虑在今后的工作中对其做出相应的修改。基于Rasch模型的试题分析能为考生能力和试题质量分析提供更多的测量信息。  相似文献   

11.
Increasingly, tests are being translated and adapted into different languages. Differential item functioning (DIF) analyses are often used to identify non-equivalent items across language groups. However, few studies have focused on understanding why some translated items produce DIF. The purpose of the current study is to identify sources of differential item and bundle functioning on translated achievement tests using substantive and statistical analyses. A substantive analysis of existing DIF items was conducted by an 11-member committee of testing specialists. In their review, four sources of translation DIF were identified. Two certified translators used these four sources to categorize a new set of DIF items from Grade 6 and 9 Mathematics and Social Studies Achievement Tests. Each item was associated with a specific source of translation DIF and each item was anticipated to favor a specific group of examinees. Then, a statistical analysis was conducted on the items in each category using SIBTEST. The translators sorted the mathematics DIF items into three sources, and they correctly predicted the group that would be favored for seven of the eight items or bundles of items across two grade levels. The translators sorted the social studies DIF items into four sources, and they correctly predicted the group that would be favored for eight of the 13 items or bundles of items across two grade levels. The majority of items in mathematics and social studies were associated with differences in the words, expressions, or sentence structure of items that are not inherent to the language and/or culture. By combining substantive and statistical DIF analyses, researchers can study the sources of DIF and create a body of confirmed DIF hypotheses that may be used to develop guidelines and test construction principles for reducing DIF on translated tests.  相似文献   

12.
Bock, Muraki, and Pfeiffenberger (1988) proposed a dichotomous item response theory (IRT) model for the detection of differential item functioning (DIF), and they estimated the IRT parameters and the means and standard deviations of the multiple latent trait distributions. This IRT DIF detection method is extended to the partial credit model (Masters, 1982; Muraki, 1993) and presented as one of the multiple-group IRT models. Uniform and non-uniform DIF items and heterogeneous latent trait distributions were used to generate polytomous responses of multiple groups. The DIF method was applied to this simulated data using a stepwise procedure. The standardized DIF measures for slope and item location parameters successfully detected the non-uniform and uniform DIF items as well as recovered the means and standard deviations of the latent trait distributions.This stepwise DIF analysis based on the multiple-group partial credit model was then applied to the National Assessment of Educational Progress (NAEP) writing trend data.  相似文献   

13.
The purpose of this article is to present logistic discriminant function analysis as a means of differential item functioning (DIF) identification of items that are polytomously scored. The procedure is presented with examples of a DIF analysis using items from a 27-item mathematics test which includes six open-ended response items scored polytomously. The results show that the logistic discriminant function procedure is ideally suited for DIF identification on nondichotomously scored test items. It is simpler and more practical than polytomous extensions of the logistic regression DIF procedure and appears to fee more powerful than a generalized Mantel-Haenszelprocedure.  相似文献   

14.
A computer simulation study was conducted to determine the feasibility of using logistic regression procedures to detect differential item functioning (DIF) in polytomous items. One item in a simulated test of 25 items contained DIF; parameters' for that item were varied to create three conditions of nonuniform DIF and one of uniform DIF. Item scores were generated using a generalized partial credit model, and the data were recoded into multiple dichotomies in order to use logistic regression procedures. Results indicate that logistic regression is powerful in detecting most forms of DIF; however, it required large amounts of data manipulation, and interpretation of the results was sometimes difficult. Some logistic regression procedures may be useful in the post hoc analysis of DlF for polytomous items.  相似文献   

15.
《教育实用测度》2013,26(2):175-199
This study used three different differential item functioning (DIF) detection proce- dures to examine the extent to which items in a mathematics performance assessment functioned differently for matched gender groups. In addition to examining the appropriateness of individual items in terms of DIF with respect to gender, an attempt was made to identify factors (e.g., content, cognitive processes, differences in ability distributions, etc.) that may be related to DIF. The QUASAR (Quantitative Under- standing: Amplifying Student Achievement and Reasoning) Cognitive Assessment Instrument (QCAI) is designed to measure students' mathematical thinking and reasoning skills and consists of open-ended items that require students to show their solution processes and provide explanations for their answers. In this study, 33 polytomously scored items, which were distributed within four test forms, were evaluated with respect to gender-related DIF. The data source was sixth- and seventh- grade student responses to each of the four test forms administrated in the spring of 1992 at all six school sites participatingin the QUASARproject. The sample consisted of 1,782 students with approximately equal numbers of female and male students. The results indicated that DIF may not be serious for 3 1 of the 33 items (94%) in the QCAI. For the two items that were detected as functioning differently for male and female students, several plausible factors for DIF were discussed. The results from the secondary analyses, which removed the mutual influence of the two items, indicated that DIF in one item, PPPl, which favored female students rather than their matched male students, was of particular concern. These secondary analyses suggest that the detection of DIF in the other item in the original analysis may have been due to the influence of Item PPPl because they were both in the same test form.  相似文献   

16.
The purpose of this article is to describe and demonstrate a three-step process of using differential distractor functioning (DDF) in a post hoc analysis to understand sources of differential item functioning (DIF) in multiple-choice testing. The process is demonstrated on two multiple-choice tests that used complex alternatives (e.g., “No Mistakes”) as distractors. Comparisons were made between different gender and race groups. DIF analyses were conducted using Simultaneous Item Bias Test, whereas DDF analyses were conducted using loglinear model fitting and odds ratios. Five items made it through all three steps and were identified as those with DIF results related to DDF. Implications of the results, as well as suggestions for future research, are discussed.  相似文献   

17.
In this paper I describe and illustrate the Roussos-Stout (1996) multidimensionality-based DIF analysis paradigm, with emphasis on its implication for the selection of a matching and studied subtest for DIF analyses. Standard DIF practice encourages an exploratory search for matching subtest items based on purely statistical criteria, such as a failure to display DIF. By contrast, the multidimensional DIF paradigm emphasizes a substantively-informed selection of items for both the matching and studied subtest based on the dimensions suspected of underlying the test data. Using two examples, I demonstrate that these two approaches lead to different interpretations about the occurrence of DIF in a test. It is argued that selecting a matching and studied subtest, as identified using the DIF analysis paradigm, can lead to a more informed understanding of why DIF occurs.  相似文献   

18.
This study analyzes and classifies items that display sex-related Differential Item Functioning (DIF) in attitude assessment. It applies the Educational Testing Services (ETS) procedure that is used for classifying DIF items in testing to classify sex-related DIF items in attitude scales. A total of 982 items that measure attitudes from 23 real data sets were used in the analysis. Results showed that sex DIF is common in attitude scales: more than 27% of items showed DIF related to sex, 15% of the items exhibited moderate to large DIF, and the magnitudes of DIF against males and females were not equal.  相似文献   

19.
The performance of English language learners (ELLs) has been a concern given the rapidly changing demographics in US K-12 education. This study aimed to examine whether students' English language status has an impact on their inquiry science performance. Differential item functioning (DIF) analysis was conducted with regard to ELL status on an inquiry-based science assessment, using a multifaceted Rasch DIF model. A total of 1,396 seventh- and eighth-grade students took the science test, including 313 ELL students. The results showed that, overall, non-ELLs significantly outperformed ELLs. Of the four items that showed DIF, three favored non-ELLs while one favored ELLs. The item that favored ELLs provided a graphic representation of a science concept within a family context. There is some evidence that constructed-response items may help ELLs articulate scientific reasoning using their own words. Assessment developers and teachers should pay attention to the possible interaction between linguistic challenges and science content when designing assessment for and providing instruction to ELLs.  相似文献   

20.
The study investigates consequences of eliminating items showing gender-specific differential item functioning (DIF) on the psychometric structure of a standard RIASEC interest inventory. Holland’s hexagonal model was tested for structural invariance using a confirmatory methodological approach (confirmatory factor analysis and randomization tests of hypothesized order relations). Results suggest that eliminating items showing gender-specific DIF had no considerable influence on the instrument’s psychometric structure. Considering DIF as one possibility to improve test fairness when developing interest inventories is discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号