首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 23 毫秒
1.
2.
3.
4.
Scaling is the process of constructing a score scale that associates numbers or other ordered indicators with the performance of examinees. Scaling typically is conducted to aid users in interpreting test results. This module describes different types of raw scores and scale scores, illustrates how to incorporate various sources of information into a score scale, and introduces vertical scaling and its related designs and methodologies as a special type of scaling. After completion of this module, the reader should be able to understand the relationship between various types of raw scores, understand the relationship between raw scores and scale scores, construct a scale with desired properties, evaluate an existing score scale, understand how content and standards information are built into a scale, and understand how vertical scales are developed and used in practice.  相似文献   

5.
Two problems in test development relate to the use of illustrations: (1) Do illustrated items perform better than written items, and (2) Does item performance vary as a function of the type and size of the illustration? A sample of 63 tests was drawn from all the Air Force Specialty Knowledge Tests containing illustrations. These 63 tests had been administered to approximately 28,261 airmen under operational conditions. Item statistics between illustrated and written items drawn from the same content areas were compared using F ratios. The results indicated: (1) That illustrated items in general performed slightly better than matched written items; (2) That the best-performing category of illustrated items was tables.  相似文献   

6.
7.
8.
9.
Two matched forms of a 50 item multiple-choice grammar test were developed. Twenty items designed to be humorous were included in one form. Test forms were randomly assigned to 126 eighth graders who received the test plus alternate forms of a questionnaire. Inclusion of humorous items did not affect grammar scores on matched humorous/nonhumorous items nor on common post-treatment items, nor did inclusion affect results of anxiety measures. Students favored inclusion of humor on tests, judged effects of humor positively, and estimated humorous items to be easier. Humor did not lower performance but was sought by the students. Potential for more valid and humane measurement is discussed.  相似文献   

10.
11.
12.
13.
14.
15.
16.
17.
科学研究与试验发展(R&D)是科技活动的核心部分。做好R&D项目的申报立项工作在科研管理活动中具有重要的意义。笔者根据自身从事科研管理多年的经验,并广泛参阅国内已有的成果,提出了一个综合评价指标体系,并对有关指标作了说明和界定,最后建立了R&D项目申项立项的评价模型。以供立项时进行决策。  相似文献   

18.
In this ITEMS module, we provide a two‐part introduction to the topic of reliability from the perspective of classical test theory (CTT). In the first part, which is directed primarily at beginning learners, we review and build on the content presented in the original didactic ITEMS article by Traub and Rowley (1991). Specifically, we discuss the notion of reliability as an intuitive everyday concept to lay the foundation for its formalization as a reliability coefficient via the basic CTT model. We then walk through the step‐by‐step computation of key reliability indices and discuss the data collection conditions under which each is most suitable. In the second part, which is directed primarily at intermediary learners, we present a distribution‐centered perspective on the same content. We discuss the associated assumptions of various CTT models ranging from parallel to congeneric, and review how these affect the choice of reliability statistics. Throughout the module, we use a customized Excel workbook with sample data and basic data manipulation functionalities to illustrate the computation of individual statistics and to allow for structured independent exploration. In addition, we provide quiz questions with diagnostic feedback as well as short videos that walk through sample exercises within the workbook.  相似文献   

19.
20.
Investigated were the effects of several variables on the expression of unwarranted confidence in the accuracy of responses to objective test items. A final examination was administered to 72 subjects under confidence-weighting instructions with two levels of penalty for incorrect responses. A 2-way ANOVA revealed no significant main effects or interaction attributable to level of penalty or sex. Although increased penalty level had no effect on confidence-expression, the test's reliability decreased from .85 to .39, and the correlation between conventional and weighted scores dropped from .88 to .095.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号