首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
《教育实用测度》2013,26(1):35-48
This study investigated several current coaching practices used in training test-wiseness for analogy items in standardized test batteries. A three-group design was used which included a general test-taking, "encouragement" condition in addition to a no-training control group condition. The specific techniques used in training are described. Scholastic Aptitude Test (SAT) scores were obtained from university admission files to verify that no overall aptitude differences existed in the three conditions. Differences were observed for the coached group relative to the two control groups in terms of overall number of correct responses for the coached item types (analogies). No differences were found for the non-coached item types. Item difficulties for the three groups are also reported which show that several items were indeed made easier for individuals in the coached group. A qualitative analysis of the items made easier by coaching in terms of the training techniques used is given along with an analysis of the items that did not respond to coaching. Finally, a discussion of potentially flawed item types and item characteristics and suggestions for dealing with such flaws are given.  相似文献   

2.
In this experiment we studied the effect of goal setting on the strategies used to perform a block design task called SAMUEL. SAMUEL can measure many indicators, which are then combined to determine the strategies used by participants when solving SAMUEL problems. Two experimental groups were created: one group was given an explicit, difficult goal and the other was not given a goal. The two groups were comparable in their average visual–spatial ability. The results indicated no goal effect on the strategies, defined in terms of the combined indicators. However, the goal did have an effect on some of the indicators taken alone (total problem-solving time, total viewing time, and model-viewing frequency) but this was true only for subjects with a low cognitive ability. These findings demonstrate that setting a goal can have an effect on some strategy indexes used to assess performance on a visual-intelligence design task. This research has implications for defining intelligence-test instructions and educational requirements in school.  相似文献   

3.
The Survey of Young Adult Literacy conducted in 1985 by the National Assessment of Educational Progress included 63 items that elicited skills in acquiring and using information from written documents. These items were analyzed using two different models: (1) a qualitative cognitive model, which characterized items in terms of the processing tasks they required, and (2) an item response theory (IRT) model, which characterized items difficulties and respondents' proficiencies simply by tendencies toward correct response. This paper demonstrates how a generalization of Fischer and Seheibleehner's Linear Logistic Test Model can be used to integrate information from the cognitive analysis into the IRT analysis, providing a foundation for subsequent item construction, test development, and diagnosis of individuals skill deficiencies.  相似文献   

4.
In this study it is investigated to what extent contextualized and non-contextualized mathematics test items have a differential impact on examinee effort. Mixture item response theory (IRT) models are applied to two subsets of items from a national assessment on mathematics in the second grade of the pre-vocational track in secondary education in Flanders. One subset focused on elementary arithmetic and consisted of non-contextualized items. Another subset of contextualized items focused on the application of arithmetic in authentic problem-solving situations. Results indicate that differential performance on the subsets is to a large extent due to test effort. The non-contextualized items appear to be much more susceptible to low examinee effort in low-stakes testing situations. However, subgroups of students can be found with regard to the extent to which they show low effort. One can distinguish a compliant, an underachieving, and a dropout group. Group membership is also linked to relevant background characteristics.  相似文献   

5.
An important trend in educational measurement is the use of principles of cognitive psychology to design achievement and ability test items. Many studies show that manipulating the stimulus features of items influences the processes, strategies, and knowledge structures that are involved in solution. However, little is known about how cognitive design influences individual differences. That is, does applying cognitive design principles change the background skills and abilities that are associated with successful performance? This study compared the correlates of two spatial ability tests that used the same item type but different test design principles (cognitive design versus psychometric design). The results indicated differences in factorial complexity in the two tests; specifically, the impact of verbal abilities was substantially reduced by applying the cognitive design principles.  相似文献   

6.
Problem-solving strategy is frequently cited as mediating the effects of response format (multiple-choice, constructed response) on item difficulty, yet there are few direct investigations of examinee solution procedures. Fifty-five high school students solved parallel constructed response and multiple-choice items that differed only in the presence of response options. Student performance was videotaped to assess solution strategies. Strategies were categorized as "traditional"–those associated with constructed response problem solving (e.g., writing and solving algebraic equations)–or "nontraditional"–those associated with multiple-choice problem solving (e.g., estimating a potential solution). Surprisingly, participants sometimes adopted nontraditional strategies to solve constructed response items. Furthermore, differences in difficulty between response formats did not correspond to differences in strategy choice: some items showed a format effect on strategy but no effect on difficulty; other items showed the reverse. We interpret these results in light of the relative comprehension challenges posed by the two groups of items.  相似文献   

7.
This study examined conflict resolution strategies (CRSs) resorted to by sixth, seventh, and eighth grade primary school pupils in Turkey and identified gender differences in the resolution strategies typically resorted to. In addition, the study aimed to find out what actual conflicts students asked assistance for from teachers and what strategies students thought teachers used in dealing with their conflicts. The data for this research were collected via a questionnaire involving mostly open-ended items. Results supported the notion that three main groups of strategies (problem-solving, avoiding and aggressive) typically get implemented in solving conflicts. Problem-solving strategies were observed to be most frequently employed by the participants. There was a significant gender difference in terms of the use of CRSs, in that girls were more likely to use problem-solving strategies than boys. The majority of the participants tended not to ask for assistance from teachers in resolving their conflicts. However, students from low SES schools were more likely to ask for teacher assistance than students from middle and high SES schools. The participants also stated that teachers typically used two main strategies in helping them resolve their conflicts: problem-solving and aggressive strategies.  相似文献   

8.
Contrasts between constructed-response items and multiple-choice counterparts have yielded but a few weak generalizations. Such contrasts typically have been based on the statistical properties of groups of items, an approach that masks differences in properties at the item level and may lead to inaccurate conclusions. In this article, we examine item-level differences between a certain type of constructed-response item (called figural response) and comparable multiple-choice items in the domain of architecture. Our data show that in comparing two item formats, item-level differences in difficulty correspond to differences in cognitive processing requirements and that relations between processing requirements and psychometric properties are systematic. These findings illuminate one aspect of construct validity that is frequently neglected in comparing item types, namely the cognitive demand of test items.  相似文献   

9.
Using factor analysis, we conducted an assessment of multidimensionality for 6 forms of the Law School Admission Test (LSAT) and found 2 subgroups of items or factors for each of the 6 forms. The main conclusion of the factor analysis component of this study was that the LSAT appears to measure 2 different reasoning abilities: inductive and deductive. The technique of N. J. Dorans & N. M. Kingston (1985) was used to examine the effect of dimensionality on equating. We began by calibrating (with item response theory [IRT] methods) all items on a form to obtain Set I of estimated IRT item parameters. Next, the test was divided into 2 homogeneous subgroups of items, each having been determined to represent a different ability (i.e., inductive or deductive reasoning). The items within these subgroups were then recalibrated separately to obtain item parameter estimates, and then combined into Set II. The estimated item parameters and true-score equating tables for Sets I and II corresponded closely.  相似文献   

10.
Many researchers have suggested that the main cause of item bias is the misspecification of the latent ability space, where items that measure multiple abilities are scored as though they are measuring a single ability. If two different groups of examinees have different underlying multidimensional ability distributions and the test items are capable of discriminating among levels of abilities on these multiple dimensions, then any unidimensional scoring scheme has the potential to produce item bias. It is the purpose of this article to provide the testing practitioner with insight about the difference between item bias and item impact and how they relate to item validity. These concepts will be explained from a multidimensional item response theory (MIRT) perspective. Two detection procedures, the Mantel-Haenszel (as modified by Holland and Thayer, 1988) and Shealy and Stout's Simultaneous Item Bias (SIB; 1991) strategies, will be used to illustrate how practitioners can detect item bias.  相似文献   

11.
非参数项目反应理论模型包括单调均匀性模型和双单调模型。用单调均匀性模型对某英语听力考试结果研究发现,按照顺序选择法,可从16道听力试题中选出11道满足要求的试题,组成单维量表。用考生在这11道试题上的总得分对考生进行排序与按照潜质排序等效。利用双单调模型对11道听力试题组成的单维量表进行试题功能偏差研究发现,有5道试题在女生子群体中的排序与在男生子群体以及整个群体排序不同,显示女生子群体作出正确应答的概率明显高于男生子群体作出正确应答的概率。这种差异至少部分是由两个子群体听力能力上的差异引起的。  相似文献   

12.
Statistics used to detect differential item functioning can also reflect differential strengths and weaknesses in the performance characteristics of population subgroups. In turn, item features associated with the differential performance patterns are likely to reflect some facet of the item task and hence its difficulty, that might previously have been overlooked. In this study, several item features were identified and coded for a large number of reading comprehension items from the two admissions testing programs. Item features included subject matter content, various properties of item structure, cognitive demand indicators, and semantic content (propositional analysis). Differential item functioning was evaluated for males and females and for White and Black examinees. Results showed a number of significant relationships between item features and indicators of differential item functioning—many of which were consistent across testing programs. Implications of the results for related areas of research are discussed.  相似文献   

13.
《教育实用测度》2013,26(2):175-199
This study used three different differential item functioning (DIF) detection proce- dures to examine the extent to which items in a mathematics performance assessment functioned differently for matched gender groups. In addition to examining the appropriateness of individual items in terms of DIF with respect to gender, an attempt was made to identify factors (e.g., content, cognitive processes, differences in ability distributions, etc.) that may be related to DIF. The QUASAR (Quantitative Under- standing: Amplifying Student Achievement and Reasoning) Cognitive Assessment Instrument (QCAI) is designed to measure students' mathematical thinking and reasoning skills and consists of open-ended items that require students to show their solution processes and provide explanations for their answers. In this study, 33 polytomously scored items, which were distributed within four test forms, were evaluated with respect to gender-related DIF. The data source was sixth- and seventh- grade student responses to each of the four test forms administrated in the spring of 1992 at all six school sites participatingin the QUASARproject. The sample consisted of 1,782 students with approximately equal numbers of female and male students. The results indicated that DIF may not be serious for 3 1 of the 33 items (94%) in the QCAI. For the two items that were detected as functioning differently for male and female students, several plausible factors for DIF were discussed. The results from the secondary analyses, which removed the mutual influence of the two items, indicated that DIF in one item, PPPl, which favored female students rather than their matched male students, was of particular concern. These secondary analyses suggest that the detection of DIF in the other item in the original analysis may have been due to the influence of Item PPPl because they were both in the same test form.  相似文献   

14.
AN ITERATIVE ITEM BIAS DETECTION METHOD   总被引:1,自引:0,他引:1  
Two strategies for assessing item bias are discussed: methods that compare (transformed) item difficulties unconditional on ability level and methods that compare the probabilities of correct response conditional on ability level. In the present study, the logit model was used to compare the probabilities of correct response to an item by members of two groups, these probabilities being conditional on the observed score. Here the observed score serves as an indicator of ability level. The logit model was iteratively applied: In the Tth iteration, the T items with the highest value of the bias statistic are excluded from the test, and the observed score indicator of ability for the (T + 1)th iteration is computed from the remaining items. This method was applied to simulated data. The results suggest that the iterative logit method is a substantial improvement on the noniterative one, and that the iterative method is very efficient in detecting biased and unbiased items.  相似文献   

15.
Differential item functioning (DIF) analyses are a routine part of the development of large-scale assessments. Less common are studies to understand the potential sources of DIF. The goals of this study were (a) to identify gender DIF in a large-scale science assessment and (b) to look for trends in the DIF and non-DIF items due to content, cognitive demands, item type, item text, and visual-spatial or reference factors. To facilitate the analyses, DIF studies were conducted at 3 grade levels and for 2 randomly equivalent forms of the science assessment at each grade level (administered in different years). The DIF procedure itself was a variant of the "standardization procedure" of Dorans and Kulick (1986) and was applied to very large sets of data (6 sets of data, each involving 60,000 students). It has the advantages of being easy to understand and to explain to practitioners. Several findings emerged from the study that would be useful to pass on to test development committees. For example, when there was DIF in science items, MC items tended to favor male examinees and OR items tended to favor female examinees. Compiling DIF information across multiple grades and years increases the likelihood that important trends in the data will be identified and that item writing practices will be informed by more than anecdotal reports about DIF.  相似文献   

16.
In this study, we examine the degree of construct comparability and possible sources of incomparability of the English and French versions of the Programme for International Student Assessment (PISA) 2003 problem-solving measure administered in Canada. Several approaches were used to examine construct comparability at the test- (examination of test data structure, reliability comparisons and test characteristic curves) and item-levels (differential item functioning, item parameter correlations, and linguistic comparisons). Results from the test-level analyses indicate that the two language versions of PISA are highly similar as shown by similarity of internal consistency coefficients, test data structure (same number of factors and item factor loadings) and test characteristic curves for the two language versions of the tests. However, results of item-level analyses reveal several differences between the two language versions as shown by large proportions of items displaying differential item functioning, differences in item parameter correlations (discrimination parameters) and number of items found to contain linguistic differences.  相似文献   

17.
《教育实用测度》2013,26(3):167-180
In the figural response item format, proficiency is expressed by manipulating elements of a picture or diagram. Figural response items in architecture were contrasted with multiple-choice counterparts in their ability to predict architectural problem-solving proficiency. Problem-solving proficiency was measured by performance on two architecture design problems, one of which involved a drawing component, whereas the other required only a written verbal response. Both figural response and multiple-choice scores predicted verbal design problem solving, but only the figural response scores predicted graphical problem solving. The presumed mechanism for this finding is that figural response items more closely resemble actual architectural tasks than do multiple-choice items. Some evidence for this explanation is furnished by architects' self-reports, in which architects rated figural response items as "more like what an architect does" than multiple-choice items.  相似文献   

18.
To ensure the statistical result validity, model-data fit must be evaluated for each item. In practice, certain actions or treatments are needed for misfit items. If all misfit items are treated, much item information would be lost during calibration. On the other hand, if only severely misfit items are treated, the inclusion of misfit items may invalidate the statistical inferences based on the estimated item response models. Hence, given response data, one has to find a balance between treating too few and too many misfit items. In this article, misfit items are classified into three categories based on the extent of misfit. Accordingly, three different item treatment strategies are proposed in determining which categories of misfit items should be treated. The impact of using different strategies is investigated. The results show that the test information functions obtained under different strategies can be substantially different in some ability ranges.  相似文献   

19.
This study sought a scientific way to examine whether item response curves are influenced systematically by the cognitive processes underlying solution of the items in a procedural domain (addition of fractions). Starting from an expert teacher's logical task analysis and prediction of various erroneous rules and sources of misconceptions, an error diagnostic program was developed. This program was used to carry out an error analysis of test performance by three samples of students. After the cognitive structure of the subtasks was validated by a majority of the students, the items were characterized by their underlying subtask patterns. It was found that item response curves for items in the same categories were significantly more homogeneous than those in different categories. In other words, underlying cognitive subtasks appeared to systematically influence the slopes and difficulties of item response curves.  相似文献   

20.
The purpose of this study is to apply the attribute hierarchy method (AHM) to a subset of SAT critical reading items and illustrate how the method can be used to promote cognitive diagnostic inferences. The AHM is a psychometric procedure for classifying examinees’ test item responses into a set of attribute mastery patterns associated with different components from a cognitive model. The study was conducted in two steps. In step 1, three cognitive models were developed by reviewing selected literature in reading comprehension as well as research related to SAT Critical Reading. Then, the cognitive models were validated by having a sample of students think aloud as they solved each item. In step 2, psychometric analyses were conducted on the SAT critical reading cognitive models by evaluating the model‐data fit between the expected and observed response patterns produced from two random samples of 2,000 examinees who wrote the items. The model that provided best data‐model fit was then used to calculate attribute probabilities for 15 examinees to illustrate our diagnostic testing procedure.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号