首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
2.
In this digital ITEMS module, Dr. Sue Lottridge, Amy Burkhardt, and Dr. Michelle Boyer provide an overview of automated scoring. Automated scoring is the use of computer algorithms to score unconstrained open-ended test items by mimicking human scoring. The use of automated scoring is increasing in educational assessment programs because it allows scores to be returned faster at lower cost. In the module, they discuss automated scoring from a number of perspectives. First, they discuss benefits and weaknesses of automated scoring, and what psychometricians should know about automated scoring. Next, they describe the overall process of automated scoring, moving from data collection to engine training to operational scoring. Then, they describe how automated scoring systems work, including the basic functions around score prediction as well as other flagging methods. Finally, they conclude with a discussion of the specific validity demands around automated scoring and how they align with the larger validity demands around test scores. Two data activities are provided. The first is an interactive activity that allows the user to train and evaluate a simple automated scoring engine. The second is a worked example that examines the impact of rater error on test scores. The digital module contains a link to an interactive web application as well as its R-Shiny code, diagnostic quiz questions, activities, curated resources, and a glossary.  相似文献   

3.
In this ITEMS module, we provide a two‐part introduction to the topic of reliability from the perspective of classical test theory (CTT). In the first part, which is directed primarily at beginning learners, we review and build on the content presented in the original didactic ITEMS article by Traub and Rowley (1991). Specifically, we discuss the notion of reliability as an intuitive everyday concept to lay the foundation for its formalization as a reliability coefficient via the basic CTT model. We then walk through the step‐by‐step computation of key reliability indices and discuss the data collection conditions under which each is most suitable. In the second part, which is directed primarily at intermediary learners, we present a distribution‐centered perspective on the same content. We discuss the associated assumptions of various CTT models ranging from parallel to congeneric, and review how these affect the choice of reliability statistics. Throughout the module, we use a customized Excel workbook with sample data and basic data manipulation functionalities to illustrate the computation of individual statistics and to allow for structured independent exploration. In addition, we provide quiz questions with diagnostic feedback as well as short videos that walk through sample exercises within the workbook.  相似文献   

4.
Drawing valid inferences from modern measurement models is contingent upon a good fit of the data to the model. Violations of model‐data fit have numerous consequences, limiting the usefulness and applicability of the model. As Bayesian estimation is becoming more common, understanding the Bayesian approaches for evaluating model‐data fit models is critical. In this instructional module, Allison Ames and Aaron Myers provide an overview of Posterior Predictive Model Checking (PPMC), the most common Bayesian model‐data fit approach. Specifically, they review the conceptual foundation of Bayesian inference as well as PPMC and walk through the computational steps of PPMC using real‐life data examples from simple linear regression and item response theory analysis. They provide guidance for how to interpret PPMC results and discuss how to implement PPMC for other model(s) and data. The digital module contains sample data, SAS code, diagnostic quiz questions, data‐based activities, curated resources, and a glossary.  相似文献   

5.
In this ITEMS module, we frame the topic of scale reliability within a confirmatory factor analysis and structural equation modeling (SEM) context and address some of the limitations of Cronbach's α. This modeling approach has two major advantages: (1) it allows researchers to make explicit the relation between their items and the latent variables representing the constructs those items intend to measure, and (2) it facilitates a more principled and formal practice of scale reliability evaluation. Specifically, we begin the module by discussing key conceptual and statistical foundations of the classical test theory model and then framing it within an SEM context; we do so first with a single item and then expand this approach to a multi‐item scale. This allows us to set the stage for presenting different measurement structures that might underlie a scale and, more importantly, for assessing and comparing those structures formally within the SEM context. We then make explicit the connection between measurement model parameters and different measures of reliability, emphasizing the challenges and benefits of key measures while ultimately endorsing the flexible McDonald's ω over Cronbach's α. We then demonstrate how to estimate key measures in both a commercial software program (Mplus) and three packages within an open‐source environment (R). In closing, we make recommendations for practitioners about best practices in reliability estimation based on the ideas presented in the module.  相似文献   

6.
In this digital ITEMS module, Nikole Gregg and Dr. Brian Leventhal discuss strategies to ensure data visualizations achieve graphical excellence. Data visualizations are commonly used by measurement professionals to communicate results to examinees, the public, educators, and other stakeholders. To do so effectively, it is important that these visualizations communicate data efficiently and accurately. These visualizations can achieve graphical excellence when they simultaneously display data effectively, efficiently, and accurately. Unfortunately, measurement and statistical software default graphics typically fail to uphold these standards and are therefore not suitable for publication or presentation to the public. To illustrate best practices, the instructors provide an introduction to the graphical template language in SAS and show how elementary components can be used to make efficient, effective, and accurate graphics for a variety of audiences. The module contains audio-narrated slides, embedded illustrative videos, quiz questions with diagnostic feedback, a glossary, sample SAS code, and other learning resources.  相似文献   

7.
Thailand passed the National Education Act (1999) which introduced the largest educational change there in over 50 years. This study investigated Lecturer Receptivity to that change at four Rajabhat Universities in the second year of the implementation stage during 2002. Receptivity was conceptualized as relating to eight aspects of the change. Data were collected by questionnaire (N = 659) with 50 stem-items answered in three perspectives. These were (1) how I expect the change to be planned, (2) how I think the change was really implemented, and (3) what my actual behavior was. Data were analyzed with a Rasch measurement model and 18 of the 50 stem-items fitted the measurement model. A linear scale of receptivity was created where the proportion of observed variance considered true was 95% and data were considered to be valid and reliable. The easiest aspect was comparison with the previous system and the hardest was participation in decision-making. For most items, the perspectives were found to be ordered from easy (perspective 1) to hard (perspective 3) as conceptualized.  相似文献   

8.
In this digital ITEMS module, Dr. Brian Leventhal and Dr. Allison Ames provide an overview of Monte Carlo simulation studies (MCSS) in item response theory (IRT). MCSS are utilized for a variety of reasons, one of the most compelling being that they can be used when analytic solutions are impractical or nonexistent because they allow researchers to specify and manipulate an array of parameter values and experimental conditions (e.g., sample size, test length, and test characteristics). Dr. Leventhal and Dr. Ames review the conceptual foundation of MCSS in IRT and walk through the processes of simulating total scores as well as item responses using the two-parameter logistic, graded response, and bifactor models. They provide guidance for how to implement MCSS using other item response models and best practices for efficient syntax and executing an MCSS. The digital module contains sample SAS code, diagnostic quiz questions, activities, curated resources, and a glossary.  相似文献   

9.
In this ITEMS module, we introduce the generalized deterministic inputs, noisy “and” gate (G‐DINA) model, which is a general framework for specifying, estimating, and evaluating a wide variety of cognitive diagnosis models. The module contains a nontechnical introduction to diagnostic measurement, an introductory overview of the G‐DINA model, as well as common special cases, and a review of model‐data fit evaluation practices within this framework. We use the flexible GDINA R package, which is available for free within the R environment and provides a user‐friendly graphical interface in addition to the code‐driven layer. The digital module also contains videos of worked examples, solutions to data activity questions, curated resources, a glossary, and quizzes with diagnostic feedback.  相似文献   

10.
Views on testing—its purpose and uses and how its data are analyzed—are related to one's perspective on test takers. Test takers can be viewed as learners, examinees, or contestants. I briefly discuss the perspective of test takers as learners. I maintain that much of psychometrics views test takers as examinees. I discuss test takers as a contestant in some detail. Test takers who are contestants in high‐stakes settings want reliable outcomes obtained via acceptable scoring of tests administered under clear rules. In addition, it is essential to empirically verify interpretations attached to scores. At the very least, item and test scores should exhibit certain invariance properties. I note that the “do no harm” dictum borrowed from the field of medicine is particularly relevant to the perspective of test takers as contestants.  相似文献   

11.
12.
Mixture Rasch models have been used to study a number of psychometric issues such as goodness of fit, response strategy differences, strategy shifts, and multidimensionality. Although these models offer the potential for improving understanding of the latent variables being measured, under some conditions overextraction of latent classes may occur, potentially leading to misinterpretation of results. In this study, a mixture Rasch model was applied to data from a statewide test that was initially calibrated to conform to a 3‐parameter logistic (3PL) model. Results suggested how latent classes could be explained and also suggested that these latent classes might be due to applying a mixture Rasch model to 3PL data. To support this latter conjecture, a simulation study was presented to demonstrate how data generated to fit a one‐class 2‐parameter logistic (2PL) model required more than one class when fit with a mixture Rasch model.  相似文献   

13.
The purpose of this study was to analyze and assess the Jordan National Test for Controlling the Quality of Science Instruction (NTCQSI) from the perspective provided by Rasch measurement. The test was administered on a stratified random sample that consisted of 41,556 tenth graders from all over Jordan. The test results were saved in a data bank. A random sample of 150 participants' records was selected from this data bank. To address the purpose of this study, a series of analyses were conducted. WINSTEPS and RUMM programs were used for the analysis. The procedures that were used in this paper might be used by worldwide testing agencies to clarify or outline how Rasch measurement may be used to obtain evidence for the validity of inferences of tests data.  相似文献   

14.
The aim of this study was to apply Rasch modeling to an examination of the psychometric properties of the Pearson Test of English Academic (PTE Academic). Analyzed were 140 test-takers' scores derived from the PTE Academic database. The mean age of the participants was 26.45 (SD = 5.82), ranging from 17 to 46. Conformity of the participants' performance on the 86 items of PTE Academic Form 1 of the field test was evaluated using the partial credit model. The person reliability coefficient was .96, and item reliability was .99. The results showed that no significant differential item functioning was found across subgroups of gender and spoken-language context, indicating that the item data approximated the Rasch model. The findings of this study validated the test stability of PTE Academic as a useful measurement tool for English language learners' academic English assessment.  相似文献   

15.
Rasch测量原理及在高考命题评价中的实证研究   总被引:1,自引:1,他引:1  
王蕾 《中国考试》2008,(1):32-39
Rasch测量是当前教育与心理测量中具有客观等距量尺的测量。克服了经典测量的测验工具依赖和样本依赖的局限。本文通过介绍Rasch测量原理及其在高考命题评价考生抽样数据分析上的具体应用,为教育决策者和命题者提供了直观的Rasch测量对高考命题评价的量化图形表现形式。希望Rasch测量能在高考抽样数据分析中为命题量化评价提供新的、有价值的思考方式,能被教育决策者和命题者认同和有效使用。  相似文献   

16.
This study examined the underlying structure of the Depression scale of the revised Minnesota Multiphasic Personality Inventory using dichotomous Rasch model and factor analysis. Rasch methodology was used to identify and restructure the Depression scale, and factor analysis was used to confirm the structure established by the Rasch model. The item calibration and factor analysis were carried out on the full sample of 2,600 normative subjects. The results revealed that the Depression scale did not consist of one homogeneous set of items, even though the scale was developed to measure one dimension of depression. Rasch analysis, as well as factor analysis, recognized two distinct content‐homogeneous subscales, here labeled mental depression and physical depression. The Rasch methodology provided a basis for a better understanding of the underlying structure and furnished a useful solution to the scale refinement.  相似文献   

17.
A new entry in the testing lexicon is through‐course summative assessment, a system consisting of components administered periodically during the academic year. As defined in the Race to the Top program, these assessments are intended to yield a yearly summative score for accountability purposes. They must provide for both individual and group proficiency estimates and allow for the measurement of growth. They must accommodate students who vary in their patterns of curricular exposure. Because they are meant to provide actionable information to teachers they must be instructionally sensitive, so item‐operating characteristics can be expected to change relative to one another as a function of patterns of curricular exposure. This paper discusses methodology one can draw upon to tackle this ambitious collection of inferences. We consider a modeling framework that consists of an item response theory component and a population component, as in the National Assessment of Educational Progress, and show how performance and growth could be expressed in terms of expected performance on a market basket of tasks. We discuss conditions under which modeling simplifications might be possible and discuss studies that would be needed to fit models, estimate parameters, and evaluate data requirements.  相似文献   

18.
In this paper we discuss the background to this study in the development of the international MSc e‐Learning Multimedia and Consultancy. The aims of the study focus on the conditions for achieving communication, interaction and collaboration in open and flexible e‐learning environments. We present our theoretical framework that has informed the design of programme as a whole which is based on a socio‐constructivist perspective on learning. Our research is placed within an action research framework and we outline our position within the critical or emancipatory tradition and also our standpoint on the use of ICT in education. We discuss the design of the programme and also our pedagogical approach and describe in detail the particular context for this study. We report on the student experience of being learners on this module, their perceptions of what they have gained most from learning from and with each other and their responses to the various ways in which ‘scaffolding’ has been designed and implemented by the tutors. Finally we offer some reflections on the conditions for achieving well‐orchestrated interdependence in open and flexible e‐learning environments.  相似文献   

19.
Understanding infectious diseases such as influenza is an important element of health literacy. We present a fully validated knowledge instrument called the Assessment of Knowledge of Influenza (AKI) and use it to evaluate knowledge of influenza, with a focus on misconceptions, in Midwestern United States high-school students. A two-phase validation process was used. In phase 1, an initial factor structure was calculated based on 205 students of grades 9–12 at a rural school. In phase 2, one- and two-dimensional factor structures were analyzed from the perspectives of classical test theory and the Rasch model using structural equation modeling and principal components analysis (PCA) on Rasch residuals, respectively. Rasch knowledge measures were calculated for 410 students from 6 school districts in the Midwest, and misconceptions were verified through the χ 2 test. Eight items measured knowledge of flu transmission, and seven measured knowledge of flu management. While alpha reliability measures for the subscales were acceptable, Rasch person reliability measures and PCA on residuals advocated for a single-factor scale. Four misconceptions were found, which have not been previously documented in high-school students. The AKI is the first validated influenza knowledge assessment, and can be used by schools and health agencies to provide a quantitative measure of impact of interventions aimed at increasing understanding of influenza. This study also adds significantly to the literature on misconceptions about influenza in high-school students, a necessary step toward strategic development of educational interventions for these students.  相似文献   

20.

The cheap and powerful personal computer (PC) has become an important and efficient tool for supporting engineering education. In this paper a PC‐based training module, AIROBOT,is presented. The purpose of this module is to provide a platform for students to develop and experiment with artificial intelligence techniques.

The training module, AIROBOT,utilizes an electronic noughts and crosses game board which is interfaced to the Scorbot‐ER VII robot and a PC. The development and implementation of the module are discussed. Two techniques developed by the students are presented to illustrate the utilization of the module. The first technique involves searching a game tree data structure. The learning involves the on‐line generation of the game tree as the games are played. An evaluation function is used to facilitate the search. The other technique is based on the artificial neural network approach using the backpropagation paradigm. The structure of the neural networks, training and performances are presented. The PC‐based training module has the potential to enhance the student understanding through the practical application of artificial intelligence. It is envisaged that similar modules can be easily integrated into most engineering undergraduate robotics courses.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号