首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Scaling is the process of constructing a score scale that associates numbers or other ordered indicators with the performance of examinees. Scaling typically is conducted to aid users in interpreting test results. This module describes different types of raw scores and scale scores, illustrates how to incorporate various sources of information into a score scale, and introduces vertical scaling and its related designs and methodologies as a special type of scaling. After completion of this module, the reader should be able to understand the relationship between various types of raw scores, understand the relationship between raw scores and scale scores, construct a scale with desired properties, evaluate an existing score scale, understand how content and standards information are built into a scale, and understand how vertical scales are developed and used in practice.  相似文献   

3.
4.
5.
Two problems in test development relate to the use of illustrations: (1) Do illustrated items perform better than written items, and (2) Does item performance vary as a function of the type and size of the illustration? A sample of 63 tests was drawn from all the Air Force Specialty Knowledge Tests containing illustrations. These 63 tests had been administered to approximately 28,261 airmen under operational conditions. Item statistics between illustrated and written items drawn from the same content areas were compared using F ratios. The results indicated: (1) That illustrated items in general performed slightly better than matched written items; (2) That the best-performing category of illustrated items was tables.  相似文献   

6.
In this ITEMS module, we provide a two‐part introduction to the topic of reliability from the perspective of classical test theory (CTT). In the first part, which is directed primarily at beginning learners, we review and build on the content presented in the original didactic ITEMS article by Traub and Rowley (1991). Specifically, we discuss the notion of reliability as an intuitive everyday concept to lay the foundation for its formalization as a reliability coefficient via the basic CTT model. We then walk through the step‐by‐step computation of key reliability indices and discuss the data collection conditions under which each is most suitable. In the second part, which is directed primarily at intermediary learners, we present a distribution‐centered perspective on the same content. We discuss the associated assumptions of various CTT models ranging from parallel to congeneric, and review how these affect the choice of reliability statistics. Throughout the module, we use a customized Excel workbook with sample data and basic data manipulation functionalities to illustrate the computation of individual statistics and to allow for structured independent exploration. In addition, we provide quiz questions with diagnostic feedback as well as short videos that walk through sample exercises within the workbook.  相似文献   

7.
8.
9.
10.
11.
In this ITEMS module, we frame the topic of scale reliability within a confirmatory factor analysis and structural equation modeling (SEM) context and address some of the limitations of Cronbach's α. This modeling approach has two major advantages: (1) it allows researchers to make explicit the relation between their items and the latent variables representing the constructs those items intend to measure, and (2) it facilitates a more principled and formal practice of scale reliability evaluation. Specifically, we begin the module by discussing key conceptual and statistical foundations of the classical test theory model and then framing it within an SEM context; we do so first with a single item and then expand this approach to a multi‐item scale. This allows us to set the stage for presenting different measurement structures that might underlie a scale and, more importantly, for assessing and comparing those structures formally within the SEM context. We then make explicit the connection between measurement model parameters and different measures of reliability, emphasizing the challenges and benefits of key measures while ultimately endorsing the flexible McDonald's ω over Cronbach's α. We then demonstrate how to estimate key measures in both a commercial software program (Mplus) and three packages within an open‐source environment (R). In closing, we make recommendations for practitioners about best practices in reliability estimation based on the ideas presented in the module.  相似文献   

12.
Two matched forms of a 50 item multiple-choice grammar test were developed. Twenty items designed to be humorous were included in one form. Test forms were randomly assigned to 126 eighth graders who received the test plus alternate forms of a questionnaire. Inclusion of humorous items did not affect grammar scores on matched humorous/nonhumorous items nor on common post-treatment items, nor did inclusion affect results of anxiety measures. Students favored inclusion of humor on tests, judged effects of humor positively, and estimated humorous items to be easier. Humor did not lower performance but was sought by the students. Potential for more valid and humane measurement is discussed.  相似文献   

13.
14.
This module describes and extends X‐to‐Y regression measures that have been proposed for use in the assessment of X‐to‐Y scaling and equating results. Measures are developed that are similar to those based on prediction error in regression analyses but that are directly suited to interests in scaling and equating evaluations. The regression and scaling function measures are compared in terms of their uncertainty reductions, error variances, and the contribution of true score and measurement error variances to the total error variances. The measures are also demonstrated as applied to an assessment of scaling results for a math test and a reading test. The results of these analyses illustrate the similarity of the regression and scaling measures for scaling situations when the tests have a correlation of at least .80, and also show the extent to which the measures can be adequate summaries of nonlinear regression and nonlinear scaling functions, and of heteroskedastic errors. After reading this module, readers will have a comprehensive understanding of the purposes, uses, and differences of regression and scaling functions.  相似文献   

15.
16.
17.
18.
19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号