首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 453 毫秒
1.
《教育实用测度》2013,26(4):255-268
The study applied a psychometric model-the rule-space model-to diagnose students' states of knowledge about how the exponents behave in multiplication and division of quantities with exponents. A 38-item test was administered to 431 Grade 10 students. Each item was characterized by a list of task attributes required for answering the item correctly, and each student was classified, based on his or her item-score pattern, into the most likely knowledge, state (i.e., attribute-mastery pattern) corresponding to an ideal item-score pattern. The following outcomes of the rule-space model were presented: (a) the results of the classification of examinees to knowledge states at the group level along with individual examples, (b) the mastery level of the underlying task attributes as evaluated at three different test-score groups, and (c) a tree diagram of the transitional relationships among the knowledge states that can guide the design of effective remediation. Implications for utilizing the feedback, provided by the rule-space model in the context of instruction and assessment are discussed.  相似文献   

2.
The attribute hierarchy method (AHM) is a psychometric procedure for classifying examinees' test item responses into a set of structured attribute patterns associated with different components from a cognitive model of task performance. Results from an AHM analysis yield information on examinees' cognitive strengths and weaknesses. Hence, the AHM can be used for cognitive diagnostic assessment. The purpose of this study is to introduce and evaluate a new concept for assessing attribute reliability using the ratio of true score variance to observed score variance on items that probe specific cognitive attributes. This reliability procedure is evaluated and illustrated using both simulated data and student response data from a sample of algebra items taken from the March 2005 administration of the SAT. The reliability of diagnostic scores and the implications for practice are also discussed.  相似文献   

3.
In this digital ITEMS module, Dr. Roy Levy describes Bayesian approaches to psychometric modeling. He discusses how Bayesian inference is a mechanism for reasoning in a probability-modeling framework and is well-suited to core problems in educational measurement: reasoning from student performances on an assessment to make inferences about their capabilities more broadly conceived, as well as fitting models to characterize the psychometric properties of tasks. The approach is first developed in the context of estimating a mean and variance of a normal distribution before turning to the context of unidimensional item response theory (IRT) models for dichotomously scored data. Dr. Levy illustrates the process of fitting Bayesian models using the JAGS software facilitated through the R statistical environment. The module is designed to be relevant for students, researchers, and data scientists in various disciplines such as education, psychology, sociology, political science, business, health, and other social sciences. It contains audio-narrated slides, diagnostic quiz questions, and data-based activities with video solutions as well as curated resources and a glossary.  相似文献   

4.
The analytically derived asymptotic standard errors (SEs) of maximum likelihood (ML) item estimates can be approximated by a mathematical function without examinees' responses to test items, and the empirically determined SEs of marginal maximum likelihood estimation (MMLE)/Bayesian item estimates can be obtained when the same set of items is repeatedly estimated from the simulation (or resampling) test data. The latter method will result in rather stable and accurate SE estimates as the number of replications increases, but requires cumbersome and time-consuming calculations. Instead of using the empirically determined method, the adequacy of using the analytical-based method in predicting the SEs for item parameter estimates was examined by comparing results produced from both approaches. The results indicated that the SEs yielded from both approaches were, in most cases, very similar, especially when they were applied to a generalized partial credit model. This finding encourages test practitioners and researchers to apply the analytically asymptotic SEs of item estimates to the context of item-linking studies, as well as to the method of quantifying the SEs of equating scores for the item response theory (IRT) true-score method. Three-dimensional graphical presentation for the analytical SEs of item estimates as the bivariate function of item difficulty together with item discrimination was also provided for a better understanding of several frequently used IRT models.  相似文献   

5.
The goal of the current study was to compare two forms of dynamic assessment and standard assessment of preschool children's phonological awareness. The first form of dynamic assessment was a form of scaffolding in which item formats were modified in response to an error so as to make the task easier or more explicit. The second form of dynamic assessment was direct instruction of the phonological awareness tasks. The results indicate that preschool children's phonological awareness can be assessed using standard assessment procedures, provided the items require processing units larger than the individual phoneme. No advantage was found in reliability or validity for either dynamic assessment condition relative to the standard assessment condition. Dynamic assessment does not appear to improve reliability or validity of phonological awareness assessments when preschool children are given tasks that they can perform using standard administration procedures.  相似文献   

6.
7.
Studies have shown that item difficulty can vary significantly based on the context of an item within a test form. In particular, item position may be associated with practice and fatigue effects that influence item parameter estimation. The purpose of this research was to examine the relevance of item position specifically for assessments used in early education, an area of testing that has received relatively limited psychometric attention. In an initial study, multilevel item response models fit to data from an early literacy measure revealed statistically significant increases in difficulty for items appearing later in a 20‐item form. The estimated linear change in logits for an increase of 1 in position was .024, resulting in a predicted change of .46 logits for a shift from the beginning to the end of the form. A subsequent simulation study examined impacts of item position effects on person ability estimation within computerized adaptive testing. Implications and recommendations for practice are discussed.  相似文献   

8.
ANGUS DUFF 《教育心理学》2003,23(2):123-139
This investigation: first, examines some psychometric properties of the scores obtained on a 30 item short form of the Revised Approaches to Studying Inventory (RASI) using samples of postgraduate management (MBA) students (n=75); second, examines the relationship between scores on the three dimensions of the RASI and background variables of age, gender and prior educational experience; third, tests for any relationship between the background variables and academic performance as measured by four distinct types of assessment; and fourth, examines the relationship between scores on the three dimensions of the RASI and academic performance. No previous published work has examined the approaches to learning of postgraduate business students. Key findings include: the instrument has satisfactory psychometric properties; and scores obtained on the RASI using samples of MBA students are good predictors of academic performance in continuous assessment tasks but poor predictors of performance in examinations and oral presentations.  相似文献   

9.
A series of item analyses of the CAK-C was conducted for a sample of 155 educable mentally retarded children. The probability of a correct response was found to differ from task to task, and there was evidence that the order of difficulty of the tasks for this sample resembled that for nonretarded children. The probabilities of the two incorrect responses were generally not equal, and the choice of one or the other incorrect response showed some relation to CA, MA, and IQ, particularly the last two variables.  相似文献   

10.
In response to the demand for sound science assessments, this article presents the development of a latent construct called knowledge integration as an effective measure of science inquiry. Knowledge integration assessments ask students to link, distinguish, evaluate, and organize their ideas about complex scientific topics. The article focuses on assessment topics commonly taught in 6th- through 12th-grade classes. Items from both published standardized tests and previous knowledge integration research were examined in 6 subject-area tests. Results from Rasch partial credit analyses revealed that the tests exhibited satisfactory psychometric properties with respect to internal consistency, item fit, weighted likelihood estimates, discrimination, and differential item functioning. Compared with items coded using dichotomous scoring rubrics, those coded with the knowledge integration rubrics yielded significantly higher discrimination indexes. The knowledge integration assessment tasks, analyzed using knowledge integration scoring rubrics, demonstrate strong promise as effective measures of complex science reasoning in varied science domains.  相似文献   

11.
The adaptation of experimental cognitive tasks into measures that can be used to quantify neurocognitive outcomes in translational studies and clinical trials has become a key component of the strategy to address psychiatric and neurological disorders. Unfortunately, while most experimental cognitive tests have strong theoretical bases, they can have poor psychometric properties, leaving them vulnerable to measurement challenges that undermine their use in applied settings. Item response theory–based computerized adaptive testing has been proposed as a solution but has been limited in experimental and translational research due to its large sample requirements. We present a generalized latent variable model that, when combined with strong parametric assumptions based on mathematical cognitive models, permits the use of adaptive testing without large samples or the need to precalibrate item parameters. The approach is demonstrated using data from a common measure of working memory—the N-back task—collected across a diverse sample of participants. After evaluating dimensionality and model fit, we conducted a simulation study to compare adaptive versus nonadaptive testing. Computerized adaptive testing either made the task 36% more efficient or score estimates 23% more precise, when compared to nonadaptive testing. This proof-of-concept study demonstrates that latent variable modeling and adaptive testing can be used in experimental cognitive testing even with relatively small samples. Adaptive testing has the potential to improve the impact and replicability of findings from translational studies and clinical trials that use experimental cognitive tasks as outcome measures.  相似文献   

12.
Cross‐level invariance in a multilevel item response model can be investigated by testing whether the within‐level item discriminations are equal to the between‐level item discriminations. Testing the cross‐level invariance assumption is important to understand constructs in multilevel data. However, in most multilevel item response model applications, the cross‐level invariance is assumed without testing of the cross‐level invariance assumption. In this study, the detection methods of differential item discrimination (DID) over levels and the consequences of ignoring DID are illustrated and discussed with the use of multilevel item response models. Simulation results showed that the likelihood ratio test (LRT) performed well in detecting global DID at the test level when some portion of the items exhibited DID. At the item level, the Akaike information criterion (AIC), the sample‐size adjusted Bayesian information criterion (saBIC), LRT, and Wald test showed a satisfactory rejection rate (>.8) when some portion of the items exhibited DID and the items had lower intraclass correlations (or higher DID magnitudes). When DID was ignored, the accuracy of the item discrimination estimates and standard errors was mainly problematic. Implications of the findings and limitations are discussed.  相似文献   

13.
14.
Students’ performance in assessments is commonly attributed to more or less effective teaching. This implies that students’ responses are significantly affected by instruction. However, the assumption that outcome measures indeed are instructionally sensitive is scarcely investigated empirically. In the present study, we propose a longitudinal multilevel‐differential item functioning (DIF) model to combine two existing yet independent approaches to evaluate items’ instructional sensitivity. The model permits for a more informative judgment of instructional sensitivity, allowing the distinction of global and differential sensitivity. Exemplarily, the model is applied to two empirical data sets, with classical indices (Pretest–Posttest Difference Index and posttest multilevel‐DIF) computed for comparison. Results suggest that the approach works well in the application to empirical data, and may provide important information to test developers.  相似文献   

15.
《教育实用测度》2013,26(3):167-180
In the figural response item format, proficiency is expressed by manipulating elements of a picture or diagram. Figural response items in architecture were contrasted with multiple-choice counterparts in their ability to predict architectural problem-solving proficiency. Problem-solving proficiency was measured by performance on two architecture design problems, one of which involved a drawing component, whereas the other required only a written verbal response. Both figural response and multiple-choice scores predicted verbal design problem solving, but only the figural response scores predicted graphical problem solving. The presumed mechanism for this finding is that figural response items more closely resemble actual architectural tasks than do multiple-choice items. Some evidence for this explanation is furnished by architects' self-reports, in which architects rated figural response items as "more like what an architect does" than multiple-choice items.  相似文献   

16.
The term measurement disturbance has been used to describe systematic conditions that affect a measurement process, resulting in a compromised interpretation of person or item estimates. Measurement disturbances have been discussed in relation to systematic response patterns associated with items and persons, such as start‐up, plodding, boredom, or fatigue. An understanding of the different types of measurement disturbances can lead to a more complete understanding of persons or items in terms of the construct being measured. Although measurement disturbances have been explored in several contexts, they have not been explicitly considered in the context of performance assessments. The purpose of this study is to illustrate the use of graphical methods to explore measurement disturbances related to raters within the context of a writing assessment. Graphical displays that illustrate the alignment between expected and empirical rater response functions are considered as they relate to indicators of rating quality based on the Rasch model. Results suggest that graphical displays can be used to identify measurement disturbances for raters related to specific ranges of student achievement that suggest potential rater bias. Further, results highlight the added diagnostic value of graphical displays for detecting measurement disturbances that are not captured using Rasch model–data fit statistics.  相似文献   

17.
In psychometric studies disagreement among experts is frequently treated as error void of informational value. The present study deviates from this approach and utilizes disagreement among experts in item classification as a source for examining the structural characteristics of the classification scheme. Guttman's smallest space analysis applied to an agreement-disagreement matrix suggests that the item-classification scheme reflects two different approaches to testing outcomes of studying history: that of the cognitive psychology and that of the philosophy of history.  相似文献   

18.
ABSTRACT

In the past decade, there has been interest in the assessment of cognitive and affective processes and products for the purposes of meaningful learning. Meaningful measurement (MM) has been proposed which is in accordance with a humanistic constructivist information‐processing perspective. Students’ responses to the assessment tasks are now evaluated according to an item response measurement model, together with a hypothesized model detailing the progressive forms of knowing/competence under examination. There is a possibility of incorporating student errors and alternative frameworks into these evaluation procedures. Meaningful measurement leads us to examine the composite concepts of “ability” and “difficulty”. Under the rubric of meaningful measurement, validity assessment (i.e. internal and external components of construct validity) is essentially the same as an inquiry into the meanings afforded by the measurements. Concepts of reliability, expressed as a group statistics which is applied in the same way to all the examinees in the sample, have to be obviated when the precision of the trait estimates stemming from the item response measurement models can be determined at each trait level. Reliability, measured in terms of standard errors of estimates needs to be within acceptable limits when internal validity is to be secured. Further evidence of validity may be provided by in‐depth analyses of how “epistemic subjects” of different levels of competence and proficiency engage in different types of assessment tasks, where affective and metacognitive behaviours may be examined as well. These ways of undertaking MM can be codified by proposing a three‐level conceptualization of MM. It is within the rubric of this conceptualization and the MM enquiry paradigm that validity and reliability of test measures are discussed in this paper.  相似文献   

19.
Item analysis is an integral part of operational test development and is typically conducted within two popular statistical frameworks: classical test theory (CTT) and item response theory (IRT). In this digital ITEMS module, Hanwook Yoo and Ronald K. Hambleton provide an accessible overview of operational item analysis approaches within these frameworks. They review the different stages of test development and associated item analyses to identify poorly performing items and effective item selection. Moreover, they walk through the computational and interpretational steps for CTT‐ and IRT‐based evaluation statistics using simulated data examples and review various graphical displays such as distractor response curves, item characteristic curves, and item information curves. The digital module contains sample data, Excel sheets with various templates and examples, diagnostic quiz questions, data‐based activities, curated resources, and a glossary.  相似文献   

20.
Sixty deaf and hearing students were asked to search for goods in a Hypertext Supermarket with either graphical or textual links of high typicality, frequency, and familiarity. Additionally, they performed a picture and word categorization task and two working memory span tasks (spatial and verbal). Results showed that deaf students were faster in graphical than in verbal hypertext when the number of visited pages per search trial was blocked. Regardless of stimuli format, accuracy differences between groups did not appear, although deaf students were slower than hearing students in both Web search and categorization tasks (graphical or verbal). No relation between the two tasks was found. Correlation analyses showed that deaf students with higher spatial span were faster in graphical Web search, but no correlations emerged between verbal span and verbal Web search. A hypothesis of different strategies used by the two groups for searching information in hypertext is formulated. It is suggested that deaf users use a visual-matching strategy more than a semantic approach to make navigation decisions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号