期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Improving the Measurement of School Climate Using Item Response Theory

Sarah Lindstrom Johnson Ray E. Reichenberg Kathan Shukla Tracy E. Waasdorp Catherine P. Bradshaw 《Educational Measurement》2019,38(4):99-107

The U.S. government has become increasingly focused on school climate, as recently evidenced by its inclusion as an accountability indicator in the Every Student Succeeds Act. Yet, there remains considerable variability in both conceptualizing and measuring school climate. To better inform the research and practice related to school climate and its measurement, we leveraged item response theory (IRT), a commonly used psychometric approach for the design of achievement assessments, to create a parsimonious measure of school climate that operates across varying individual characteristics. Students (n = 69,513) in 111 secondary schools completed a school climate assessment focused on three domains of climate (i.e., safety, engagement, and environment), as defined by the U.S. Department of Education. Item and test characteristics were estimated using the mirt package in R using unidimensional IRT. Analyses revealed measurement difficulties that resulted in a greater ability to assess less favorable perspectives on school climate. Differential item functioning analyses indicated measurement differences based on student academic success. These findings support the development of a broad measure of school climate but also highlight the importance of work to ensure precision in measuring school climate, particularly when considering use as an accountability measure. 相似文献

2.

The Benefits of Fixed Item Parameter Calibration for Parameter Accuracy in Small Sample Situations in Large‐Scale Assessments

Christoph Knig Lale Khorramdel Kentaro Yamamoto Andreas Frey 《Educational Measurement》2021,40(1):17-27

Large‐scale assessments such as the Programme for International Student Assessment (PISA) have field trials where new survey features are tested for utility in the main survey. Because of resource constraints, there is a trade‐off between how much of the sample can be used to test new survey features and how much can be used for the initial item response theory (IRT) scaling. Utilizing real assessment data of the PISA 2015 Science assessment, this article demonstrates that using fixed item parameter calibration (FIPC) in the field trial yields stable item parameter estimates in the initial IRT scaling for samples as small as n = 250 per country. Moreover, the results indicate that for the recovery of the county‐specific latent trait distributions, the estimates of the trend items (i.e., the information introduced into the calibration) are crucial. Thus, concerning the country‐level sample size of n = 1,950 currently used in the PISA field trial, FIPC is useful for increasing the number of survey features that can be examined during the field trial without the need to increase the total sample size. This enables international large‐scale assessments such as PISA to keep up with state‐of‐the‐art developments regarding assessment frameworks, psychometric models, and delivery platform capabilities. 相似文献

3.

Measuring Student Involvement: A Comparison of Classical Test Theory and Item Response Theory in the Construction of Scales from Student Surveys

Jessica Sharkness Linda DeAngelo 《Research in higher education》2011,52(5):480-507

This study compares the psychometric utility of Classical Test Theory (CTT) and Item Response Theory (IRT) for scale construction with data from higher education student surveys. Using 2008 Your First College Year (YFCY) survey data from the Cooperative Institutional Research Program at the Higher Education Research Institute at UCLA, two scales are built and tested—one measuring social involvement and one measuring academic involvement. Findings indicate that although both CTT and IRT can be used to obtain the same information about the extent to which scale items tap into the latent trait being measured, the two measurement theories provide very different pictures of scale precision. On the whole, IRT provides much richer information about measurement precision as well as a clearer roadmap for scale improvement. The findings support the use of IRT for scale construction and survey development in higher education. 相似文献

4.

A comprehensive needs assessment to facilitate prevention of school drop out and violence

Mary Helen Hunt Joel Meyers Gwen Davies Barbara Meyers Kathryn Rogers Grogg John Neel 《Psychology in the schools》2002,39(4):399-416

The present study addresses school violence and school drop out and proposes that the underlying factor of school connectedness/school climate should guide preventive and intervention efforts. Data were gathered from five schools in a small city school district in north Georgia. Group and individual interviews served as the basis for constructing a 78‐item district‐wide survey administered to 304 school employees. Data are presented on individual items from the survey. Principal components analysis revealed five distinct factors: school connectedness/positive school climate, causes of violence, causes of school drop out, interventions for drop out, and interventions for violence. The principal components analysis was the basis for construction of a revised scale. Differences between revised scale scores were noted as a function of whether respondents were from central office, elementary or secondary schools. The five revised scales had correlation ranging from .31 to .59. Implications for research and practice are discussed. © 2002 Wiley Periodicals, Inc. 相似文献

5.

Examining students’ affective commitment toward country: a case study of a Singapore primary school

Khe Foon Hew Wing Sum Cheung 《Asia Pacific Journal of Education》2011,31(1):19-31

The purpose of this study was to examine students’ affective commitment toward Singapore. Affective commitment refers to the sense of attachment to the nation state. The sample was taken from 286 students in a primary school. In the first section of the paper, we described the design of a Likert-type Affective Commitment to Country questionnaire. Factor analyses (principal component analysis and confirmatory factor analysis) showed evidence of construct validity for the 10-item scale, and an overall Cronbach alpha reliability coefficient of 0.91. In the second section, we reported the statistics related to the students’ affective commitment scores. Overall, a positive affective commitment toward the country was found. Results of our t-test analyses revealed that no statistically significant difference was found between boys and girls for each of the questionnaire items. However, students who had higher academic achievement reported significantly higher scores than their lower ability counterparts with regard to six items of the questionnaire. Suggestions for future research are discussed. 相似文献

6.

Factors Which Influence Precision of School-Level IRT Ability Estimates 总被引：1，自引：0，他引：1

Richard L. Tate FJ King 《Journal of Educational Measurement》1994,31(1):1-15

The precision of the group-level IRT model applied to school ability estimation is described, assuming use of Bayesian estimation with precision represented by the standard deviation of the posterior distribution. Similarities and differences between the school-level model and the familiar individual-level IRT model are considered. School size and between-school variability, two factors not relevant at the student level, are dominant determinants of school-level precision. Under the multiple-matrix sampling design required for the school-level IRT, the number of items associated with a scale does not influence the precision at the school level. Also, the effects of school ability and item quality on school-level precision are often relatively weak. It was found that the use of Bayesian estimation could result in a systematic distortion of the true ranking of schools based on ability because of an estimation bias which is a function of school size. 相似文献

7.

测验等值是开发中考评价功能之必需

杨悦《教育科学》2010,26(1)

中考是各地区规模较大和有影响力的高利害性考试,只有建立科学完善的考试评价系统才能充分发挥中考对地区初中教学多方面的服务作用,而建立完善考试评价系统的必备程序是等值。IRT等值的步骤包括估计项目参数、进行IRT量表转换以及制作分数转换表。相似文献

8.

Detecting Local Item Dependence in Polytomous Adaptive Data

Jessica L. Mislevy André A. Rupp Jeffrey R. Harring 《Journal of Educational Measurement》2012,49(2):127-147

A rapidly expanding arena for item response theory (IRT) is in attitudinal and health‐outcomes survey applications, often with polytomous items. In particular, there is interest in computer adaptive testing (CAT). Meeting model assumptions is necessary to realize the benefits of IRT in this setting, however. Although initial investigations of local item dependence have been studied both for polytomous items in fixed‐form settings and for dichotomous items in CAT settings, there have been no publications applying local item dependence detection methodology to polytomous items in CAT despite its central importance to these applications. The current research uses a simulation study to investigate the extension of widely used pairwise statistics, Yen's Q₃ Statistic and Pearson's Statistic X², in this context. The simulation design and results are contextualized throughout with a real item bank of this type from the Patient‐Reported Outcomes Measurement Information System (PROMIS). 相似文献

9.

Relating Unidimensional IRT Parameters to a Multidimensional Response Space: A Review of Two Alternative Projection IRT Models for Scoring Subscales

Nilufer Kahraman Tony Thompson 《Journal of Educational Measurement》2011,48(2):146-164

A practical concern for many existing tests is that subscore test lengths are too short to provide reliable and meaningful measurement. A possible method of improving the subscale reliability and validity would be to make use of collateral information provided by items from other subscales of the same test. To this end, the purpose of this article is to compare two different formulations of an alternative Item Response Theory (IRT) model developed to parameterize unidimensional projections of multidimensional test items: Analytical and Empirical formulations. Two real data applications are provided to illustrate how the projection IRT model can be used in practice, as well as to further examine how ability estimates from the projection IRT model compare to external examinee measures. The results suggest that collateral information extracted by a projection IRT model can be used to improve reliability and validity of subscale scores, which in turn can be used to provide diagnostic information about strength and weaknesses of examinees helping stakeholders to link instruction or curriculum to assessment results. 相似文献

10.

Minimizing the Influence of Item Parameter Estimation Errors in Test Development: A Comparison of Three Selection Procedures

Mark J. Gierl Dianne Henderson Michael Jodoin Don Klinger 《Journal of Experimental Education》2013,81(3):261-279

In test development, item response theory (IRT) is a method to determine the amount of information that each item (i.e., item information function) and combination of items (i.e., test information function) provide in the estimation of an examinee's ability. Studies investigating the effects of item parameter estimation errors over a range of ability have demonstrated an overestimation of information when the most discriminating items are selected (i.e., item selection based on maximum information). In the present study, the authors examined the influence of item parameter estimation errors across 3 item selection methods—maximum no target, maximum target, and theta maximum—using the 2- and 3-parameter logistic IRT models. Tests created with the maximum no target and maximum target item selection procedures consistently overestimated the test information function. Conversely, tests created using the theta maximum item selection procedure yielded more consistent estimates of the test information function and, at times, underestimated the test information function. Implications for test development are discussed. 相似文献

11.

Integrating Cognitive and Psychometric Models to Measure Document Literacy

Kathleen Sheehan Robert J. Mislevy 《Journal of Educational Measurement》1990,27(3):255-272

The Survey of Young Adult Literacy conducted in 1985 by the National Assessment of Educational Progress included 63 items that elicited skills in acquiring and using information from written documents. These items were analyzed using two different models: (1) a qualitative cognitive model, which characterized items in terms of the processing tasks they required, and (2) an item response theory (IRT) model, which characterized items difficulties and respondents' proficiencies simply by tendencies toward correct response. This paper demonstrates how a generalization of Fischer and Seheibleehner's Linear Logistic Test Model can be used to integrate information from the cognitive analysis into the IRT analysis, providing a foundation for subsequent item construction, test development, and diagnosis of individuals skill deficiencies. 相似文献

12.

Monitoring Items in Real Time to Enhance CAT Security

Jinming Zhang Jie Li 《Journal of Educational Measurement》2016,53(2):131-151

An IRT‐based sequential procedure is developed to monitor items for enhancing test security. The procedure uses a series of statistical hypothesis tests to examine whether the statistical characteristics of each item under inspection have changed significantly during CAT administration. This procedure is compared with a previously developed CTT‐based procedure through simulation studies. The results show that when the total number of examinees is fixed both procedures can control the rate of type I errors at any reasonable significance level by choosing an appropriate cutoff point and meanwhile maintain a low rate of type II errors. Further, the IRT‐based method has a much lower type II error rate or more power than the CTT‐based method when the number of compromised items is small (e.g., 5), which can be achieved if the IRT‐based procedure can be applied in an active mode in the sense that flagged items can be replaced with new items. 相似文献

13.

The Effects of Dimensionality on Equating the Law School Admission Test

Gregory Camilli Ming-mei Wang Jacqueline Fesq 《Journal of Educational Measurement》1995,32(1):79-96

Using factor analysis, we conducted an assessment of multidimensionality for 6 forms of the Law School Admission Test (LSAT) and found 2 subgroups of items or factors for each of the 6 forms. The main conclusion of the factor analysis component of this study was that the LSAT appears to measure 2 different reasoning abilities: inductive and deductive. The technique of N. J. Dorans & N. M. Kingston (1985) was used to examine the effect of dimensionality on equating. We began by calibrating (with item response theory [IRT] methods) all items on a form to obtain Set I of estimated IRT item parameters. Next, the test was divided into 2 homogeneous subgroups of items, each having been determined to represent a different ability (i.e., inductive or deductive reasoning). The items within these subgroups were then recalibrated separately to obtain item parameter estimates, and then combined into Set II. The estimated item parameters and true-score equating tables for Sets I and II corresponded closely. 相似文献

14.

Equating Minimum-Competency Tests: Comparisons of Methods

John R. Hills Raja G. Subhiyah Thomas M. Hirsch 《Journal of Educational Measurement》1988,25(3):221-231

The 1986 scores from Florida's Statewide Student Assessment Test, Part II (SSAT-II), a minimum-competency test required for high school graduation in Florida, were placed on the scale of the 1984 scores from that test using five different equating procedures. For the highest scoring 84 % of the students, four of the five methods yielded results within 1.5 raw-score points of each other. They would be essentially equally satisfactory in this situation, in which the tests were made parallel item by item in difficulty and content and the groups of examinees were population cohorts separated by only 2 years. Also, the results from six different lengths of anchor items were compared. Anchors of 25, 20, 15, or 10 randomly selected items provided equatings as effective as 30 items using the concurrent IRT equating method, but an anchor of 5 randomly selected items did not 相似文献

15.

用项目反应理论修订教学效能感量表

杨建原臧运洪赵守盈《教育科学》2012,28(2):46-51

采用随机整群抽样抽取505名中小学教师作为被试,其中,男教师189名,女教师271名,年龄均在25至55岁之间。采用教学效能感问卷进行施测,基于项目反应理论,对测试结果进行分析,得出所有项目的区分度、难度和项目信息峰值,参考项目区分度、难度及项目信息函数峰值对教学效能感量表做了修订,再运用结构方程模型、层面理论技术和最小空间分析对修订后的量表进行质量检验,结果表明修订后的量表测量拥有更为清晰的结构效度和更高的信度,测量更为精确。运用SPSS15.0管理数据,运用Hudap6.0和MULTILOG 7.03分析数据,研究得出如下五个结论:1)教学效能感量表为单一维度,可以使用项目反应理论进行分析;2)修订后的量表项目的区分度、难度更为合理;3)修订后的量表的测验信息峰值较原量表稍低;4)修订前后量表对应层面元素之间存在高相关;5)量表的三个方面内容结构得以证实,即学生品德行为教育、课堂组织管理和知识传授。相似文献

16.

Development and Reliability of the Comprehensive Crisis Plan Checklist, 2nd Edition

Daniel F. McCleary Kathleen B. Aspiranti 《Psychology in the schools》2020,57(7):1155-1170

As the research base for school crisis intervention and prevention expands, the need for well-developed tools to assess school readiness in the event of a crisis increases. This paper describes how the Comprehensive Crisis Plan Checklist (CCPC) was updated to reflect advances in crisis management and crisis planning. An extensive literature search and pilot study were used to refine existing items and create new items for the checklist. The Comprehensive Crisis Plan Checklist, 2^nd Edition (CCPC-2) has 102 items separated into three sections: prevention, intervention, and postvention. The CCPC-2 can be used by crisis teams to create new crisis plans or evaluate existing ones. Users are encouraged to carefully consider the inclusion of all items and articulate why individual items are not necessarily based on their specific needs. The CCPC-2 was given to 10 pairs of raters to evaluate school-based crisis plans; average interrater reliability was 89.04%. Discussion focuses on item analysis and how to use the checklist within a school setting. 相似文献

17.

An NCME Instructional Module on Polytomous Item Response Theory Models

Randall David Penfield 《Educational Measurement》2014,33(1):36-48

A polytomous item is one for which the responses are scored according to three or more categories. Given the increasing use of polytomous items in assessment practices, item response theory (IRT) models specialized for polytomous items are becoming increasingly common. The purpose of this ITEMS module is to provide an accessible overview of polytomous IRT models. The module presents commonly encountered polytomous IRT models, describes their properties, and contrasts their defining principles and assumptions. After completing this module, the reader should have a sound understating of what a polytomous IRT model is, the manner in which the equations of the models are generated from the model's underlying step functions, how widely used polytomous IRT models differ with respect to their definitional properties, and how to interpret the parameters of polytomous IRT models. 相似文献

18.

Stability of Rasch Scales Over Time

Catherine S. Taylor Yoonsun Lee 《教育实用测度》2013,26(1):87-113

Item response theory (IRT) methods are generally used to create score scales for large-scale tests. Research has shown that IRT scales are stable across groups and over time. Most studies have focused on items that are dichotomously scored. Now Rasch and other IRT models are used to create scales for tests that include polytomously scored items. When tests are equated across forms, researchers check for the stability of common items before including them in equating procedures. Stability is usually examined in relation to polytomous items' central “location” on the scale without taking into account the stability of the different item scores (step difficulties). We examined the stability of score scales over a 3–5-year period, considering both stability of location values and stability of step difficulties for common item equating. We also investigated possible changes in the scale measured by the tests and systematic scale drift that might not be evident in year-to-year equating. Results across grades and content areas suggest that equating results are comparable whether or not the stability of step difficulties is taken into account. Results also suggest that there may be systematic scale drift that is not visible using year-to-year common item equating. 相似文献

19.

从大学新生职业生涯规划现状调查看中学教育的缺陷——以丽水学院2009级新生为例 总被引：1，自引：0，他引：1

陈轶张缨《丽水学院学报》2010,32(4):120-122

通过对丽水学院2009级新生的问卷调查,发现大部分学生在中学阶段基本未进行过职业生涯规划。通过分项解读调查结果,探析中学教育存在的缺陷,以求中学教育更加重视学生的职业生涯规划教育。相似文献

20.

Routing Strategies and Optimizing Design for Multistage Testing in International Large‐Scale Assessments

Dubravka Svetina Yuan‐Ling Liaw Leslie Rutkowski David Rutkowski 《Journal of Educational Measurement》2019,56(1):192-213

This study investigates the effect of several design and administration choices on item exposure and person/item parameter recovery under a multistage test (MST) design. In a simulation study, we examine whether number‐correct (NC) or item response theory (IRT) methods are differentially effective at routing students to the correct next stage(s) and whether routing choices (optimal versus suboptimal routing) have an impact on achievement precision. Additionally, we examine the impact of testlet length on both person and item recovery. Overall, our results suggest that no single approach works best across the studied conditions. With respect to the mean person parameter recovery, IRT scoring (via either Fisher information or preliminary EAP estimates) outperformed classical NC methods, although differences in bias and root mean squared error were generally small. Item exposure rates were found to be more evenly distributed when suboptimal routing methods were used, and item recovery (both difficulty and discrimination) was most precisely observed for items with moderate difficulties. Based on the results of the simulation study, we draw conclusions and discuss implications for practice in the context of international large‐scale assessments that recently introduced adaptive assessment in the form of MST. Future research directions are also discussed. 相似文献