期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Robustness of the School-Level IRT Model 总被引：1，自引：0，他引：1

Richard L. Tate 《Journal of Educational Measurement》1995,32(2):145-162

The robustness of the school-level item response theoretic (IRT) model to violations of distributional assumptions was studied in a computer simulation. Estimated precision of "expected a posteriori" (EAP) estimates of the mean school ability from BILOG 3 was compared with actual precision, varying school size, intraclass correlation, school ability, number of forms comprising the test, and item parameters. Under conditions where the school-level precision might be possibly acceptable for real school comparisons, the EAP estimates of school ability were robust over a wide range of violations and conditions, with the estimated precision being either consistent with the actual precision or somewhat conservative. Some lack of robustness was found, however, under conditions where the precision was inherently poor and the test would presumably not be used for serious school comparisons. 相似文献

2.

School-Level IRT Scaling of Writing Assessment Data

《教育实用测度》2013,26(4):371-383

School-level assessment of student writing ability using a group-level, polytomous item response theory (IRT) model was illustrated in this study. The study supported the viability of an IRT-based school assessment as an alternative to the conventional approach based on aggregation of individual scores. The precision provided by the assumed assessment design varied dramatically depending on school size and school average ability. For small schools and students with low average abilities, differences in average school performance had to be quite large to be trustworthy. In contrast, the design provided greater precision in detecting differences for large schools and students with high average abilities. An operational use of this design would require great care in the reporting of results to ensure that unreliable school comparisons are clearly identified. 相似文献

3.

项目反应理论测验信度及其研究述评

陈士奇戴海琦《考试研究》2013,(6):65-72

项目反应理论下的测验信度能够评价潜在特质估计的可靠性与稳定性,由于具有宏观性的特点,项目反应理论信度的作用并不能被测验信息函数所取代,是IRT测验的一个重要指标。本文参考国内外文献,首先介绍国内外学者关于IRT信度作用的观点,并介绍和评价了多种IRT信度估计方法,然后简要介绍IRT信度的影响因素,最后展望了IRT信度领域后续研究尚可着力之处。相似文献

4.

A Comparison of Estimation Techniques for IRT Models With Small Samples

Holmes Finch Brian F. French 《教育实用测度》2019,32(2):77-96

The usefulness of item response theory (IRT) models depends, in large part, on the accuracy of item and person parameter estimates. For the standard 3 parameter logistic model, for example, these parameters include the item parameters of difficulty, discrimination, and pseudo-chance, as well as the person ability parameter. Several factors impact traditional marginal maximum likelihood (ML) estimation of IRT model parameters, including sample size, with smaller samples generally being associated with lower parameter estimation accuracy, and inflated standard errors for the estimates. Given this deleterious impact of small samples on IRT model performance, use of these techniques with low-incidence populations, where it might prove to be particularly useful, estimation becomes difficult, especially with more complex models. Recently, a Pairwise estimation method for Rasch model parameters has been suggested for use with missing data, and may also hold promise for parameter estimation with small samples. This simulation study compared item difficulty parameter estimation accuracy of ML with the Pairwise approach to ascertain the benefits of this latter method. The results support the use of the Pairwise method with small samples, particularly for obtaining item location estimates. 相似文献

5.

Subjective Priors for Item Response Models: Application of Elicitation by Design

下载免费PDF全文

Allison Ames Elizabeth Smith 《Journal of Educational Measurement》2018,55(3):373-402

Bayesian methods incorporate model parameter information prior to data collection. Eliciting information from content experts is an option, but has seen little implementation in Bayesian item response theory (IRT) modeling. This study aims to use ethical reasoning content experts to elicit prior information and incorporate this information into Markov Chain Monte Carlo (MCMC) estimation. A six‐step elicitation approach is followed, with relevant details at each stage for two IRT items parameters: difficulty and guessing. Results indicate that using content experts is the preferred approach, rather than noninformative priors, for both parameter types. The use of a noninformative prior for small samples provided dramatically different results when compared to results from content expert–elicited priors. The WAMBS (When to worry and how to Avoid the Misuse of Bayesian Statistics) checklist is used to aid in comparisons. 相似文献

6.

词汇测验中猜测行为的探查——贝叶斯猜测系列模型的应用与思考

谭艳姬曹亦薇《考试研究》2012,(5):19-28

本研究应用Caojing等人的Bayesian IRT Guessing系列模型,分析初中二年级学生在汉语词汇测验中的猜测行为,使用DIC3指标评价模型的拟合程度,并将参数估计结果与双参数Logistic模型进行了比较。研究发现：（1）猜测模型的拟合度优于双参数Logistic模型;（2）初中二年级测验数据最适合临界猜测模型（IRT-TG）,约有3.5%的学生存在TG型猜测行为;（3）猜测者的存在会明显影响本身的能力估计与项目难度估计,但是对非猜测者的能力及区分度参数估计影响不大。相似文献

7.

Assessing Fit of Unidimensional Item Response Theory Models Using a Bayesian Approach 总被引：1，自引：0，他引：1

Sandip Sinharay 《Journal of Educational Measurement》2005,42(4):375-394

Even though Bayesian estimation has recently become quite popular in item response theory (IRT), there is a lack of works on model checking from a Bayesian perspective. This paper applies the posterior predictive model checking (PPMC) method ( Guttman, 1967 ; Rubin, 1984 ), a popular Bayesian model checking tool, to a number of real applications of unidimensional IRT models. The applications demonstrate how to exploit the flexibility of the posterior predictive checks to meet the need of the researcher. This paper also examines practical consequences of misfit, an area often ignored in educational measurement literature while assessing model fit. 相似文献

8.

Using State School Accountability Data to Evaluate Federal Programs: A Long Uphill Road

《Peabody Journal of Education》2013,88(4):122-145

Evaluations of federal programs designed to improve student achievement generally depend on data gathered by the states for school accountability purposes, rather than data specifically designed for program evaluation. In addition, these data are available at the school level but not at the student level. This article first discusses issues related to the quality of school-level data collected as part of state accountability systems, including the reliability and validity of school-level test scores as a measure of the value added by schools to student learning. It then outlines various ways in which school-level data can be usefully analyzed and illustrates the challenges inherent in doing so, including the challenges of aggregating data across states to find an overall program effect. The final section discusses the implications of the arguments presented here for measuring changes in school performance and linking these effects to a specific program. Ultimately, our ability to measure changes in outcomes and link them back to the intervention depends on three factors: (a) identifying a set of activities attributable to the program, (b) measuring the quality of implementation of these activities, and (c) obtaining a valid and reliable measure of the desired outcome. The article makes it clear that none of these is easy to come by. 相似文献

9.

Pedagogical and Social School Climate: Psychometric Evaluation and Validation of the Student Edition of PESOC

H. Hultin K. Eichas L. Ferrer-Wreder R. Dimitrova M. Karlberg M.R. Galanti 《Scandinavian Journal of Educational Research》2019,63(4):534-550

Previous studies indicate that school climate is important for student health and academic achievement. This study concerns the validity and reliability of the student edition a Swedish instrument for measuring pedagogical and social school climate (PESOC). Data were collected from 5,745 students at 97 Swedish secondary schools. Multilevel confirmatory factor analyses were conducted, and multilevel composite reliability estimates, as well as correlations with school-level achievement indicators, were calculated. The results supported an 8-factor structure at the student level and 1 general factor at the school level. Factor loadings and composite reliability estimates were acceptable at both levels. The school-level factor was moderately and positively correlated with school-level academic achievement. The student PESOC is a promising instrument for studying school climate. 相似文献

10.

The heterogeneous effects of ability grouping on national college entrance exam performance – evidence from a large city in China

《International Journal of Educational Development》2014

This study attempts to evaluate the achievement effect of ability grouping on student performance on the National College Entrance Exam in China. The context of this study is the ongoing school reform movement occurring in many Chinese municipalities. The current reform movement is striving to achieve educational equity and quality by integrating initial low achievers into high-performing schools. The propensity score matching method is employed as the identification strategy. After controlling for self-selection bias in high school assignment, this study finds that while there is no effect of high ability grouping at the school level on academic achievement, initial low achievers’ academic performances can be significantly improved when integrated with high performing students at the school level. In addition, between-class grouping significantly improves student performance in a heterogeneous school-level grouping. Upon analyzing the results, rich information on school level input is reported. 相似文献

11.

Evaluating the Accuracy of Judgments Obtained From Item Review Committees

《教育实用测度》2013,26(2):199-210

When the item response theory (IRT) model uses the marginal maximum likelihood estimation, person parameters are usually treated as random parameters following a certain distribution as a prior distribution to estimate the structural parameters in the model. For example, both PARSCALE (Muraki &; Bock, 1999) and BILOG 3 (Mislevy &; Bock, 1990) use a standard normal distribution as a default person prior. When the fixed-item linking method is used with an IRT program having a fixed-person prior distribution, it biases person ability growth downward or upward depending on the direction of the growth due to the misspecification of the prior. This study demonstrated by simulation how much biasing impact there is on person ability growth from the use of the fixed prior distribution in fixed-item linking for mixed-format test data. In addition, the study demonstrated how to recover growth through an iterative prior update calibration procedure. This shows that fixed-item linking is still a viable linking method for a fixed-person prior IRT calibration. 相似文献

12.

Are school-SES effects statistical artefacts? Evidence from longitudinal population data

Gary N. Marks 《牛津教育评论》2013,39(1):122-144

Schools’ socioeconomic status (SES) has been claimed as an important influence on student performance and there are calls for a policy response. However, there is an extensive literature which for various reasons casts doubt on the veracity of school-SES effects. This paper investigates school-SES effects with population data from a longitudinal cohort of school students which includes achievement measures in Years 3, 5 and 7. Estimates for school-SES are unstable under differing model and measurement specifications. School-SES effects are trivial controlling for student- and school-level prior ability. Inconsistent with theoretical explanations, school-SES effects were stronger with weaker SES measures. Furthermore, school-SES effects differ somewhat by achievement domain. Also contrary to expectations, there were school-SES effects on Year 7 achievement in secondary school for the primary schools students attended in Year 5. In each of five domains of achievement, fixed effect models show a small negative effect for school-SES and a small positive effect for school-level prior ability. The large school-SES effects prominent in some research and policy literatures are statistical artefacts. 相似文献

13.

The effect of technology funding on school-level student proficiency

《Economics of Education Review》2021

This study presents new evidence on the effect of technology funding on school-level student proficiency. I exploit exogenous variation in school-level technology funding using the California Education Technology K-12 Voucher Program. The program provided eligible schools with technology vouchers to purchase qualifying hardware and software products and services. Using a regression discontinuity difference-in-difference design and data on voucher eligibility, voucher use, and school-level student proficiency, I find that voucher eligibility had no significant impact on school-level student proficiency, while voucher use had positive impacts on school-level student proficiency. The voucher use results are driven entirely by schools using the voucher funds for technology resources, but reallocating dollars initially earmarked for technology to other school inputs. 相似文献

14.

Equity in opportunity-to-learn and achievement in reading: A secondary analysis of PISA 2009 data

《Studies in Educational Evaluation》2015

Using data from PISA 2009, the present study investigates firstly how equally students are exposed to opportunities to improve their reading skills (OTL) depending on the school they are enrolled in, and secondly the links between OTL in reading and achievement at the school level. A multidimensional within-item IRT is used to model the OTL. The intraclass correlation of both OTL dimensions issued from the IRT analysis – reading fiction and reading non-continuous tasks – is high, especially in differentiated education systems, showing an unequal exposure to OTL in reading according to the school. Robust correlations between the two OTL dimensions and reading achievement are observed at the school level. In addition, the results of a multilevel regression analysis show that a substantial proportion of the between-school variance in reading can be explained by OTL and by the school social intake. The proportion of between-school variance explained jointly by OTL and social intake is higher in differentiated education systems than in comprehensive ones. 相似文献

15.

Teacher Evaluation and Teacher Effectiveness in the United Kingdom 总被引：1，自引：0，他引：1

David Reynolds Daniel Muijs David Treharne 《Journal of Personnel Evaluation in Education》2003,17(1):83-100

An outline is given of the UK research situation for the knowledge bases of school effectiveness and teacher effectiveness, and the UK policy situation in terms of school and teacher evaluation, improvement and development. It is argued that the UK has seen a much greater use of school-level policies, reflecting its substantial school effectiveness research base, rather than teacher-level interventions, although there are currently some attempts at policy and practice level to focus upon teacher effects, teacher evaluation and related issues of professional development. Speculations are given concerning future policy and research needs. 相似文献

16.

Improving the Measurement of School Climate Using Item Response Theory

Sarah Lindstrom Johnson Ray E. Reichenberg Kathan Shukla Tracy E. Waasdorp Catherine P. Bradshaw 《Educational Measurement》2019,38(4):99-107

The U.S. government has become increasingly focused on school climate, as recently evidenced by its inclusion as an accountability indicator in the Every Student Succeeds Act. Yet, there remains considerable variability in both conceptualizing and measuring school climate. To better inform the research and practice related to school climate and its measurement, we leveraged item response theory (IRT), a commonly used psychometric approach for the design of achievement assessments, to create a parsimonious measure of school climate that operates across varying individual characteristics. Students (n = 69,513) in 111 secondary schools completed a school climate assessment focused on three domains of climate (i.e., safety, engagement, and environment), as defined by the U.S. Department of Education. Item and test characteristics were estimated using the mirt package in R using unidimensional IRT. Analyses revealed measurement difficulties that resulted in a greater ability to assess less favorable perspectives on school climate. Differential item functioning analyses indicated measurement differences based on student academic success. These findings support the development of a broad measure of school climate but also highlight the importance of work to ensure precision in measuring school climate, particularly when considering use as an accountability measure. 相似文献

17.

School-level facilitators of inclusive education: the case of Serbia

Dragica Pavlović Babić Eben Friedman 《欧洲特需教育杂志》2018,33(4):449-465

Relying on Bronfenbrenner’s ecological model, this paper attempts to identify school-level factors that contribute to effective implementation of inclusive education. We also explored how government policy, with emphasis on individual education plans, school teams, Roma assistants and inter-sectorial committees, is implemented at the school level. Qualitative data were collected from various informants (students, parents, teachers, school associates, Roma assistants and local community representatives) in five schools selected on the basis of regional distribution and success in supporting diverse student needs. Two core categories of school-level facilitators were generated: inclusive practices and inclusive culture. Within the first category, which refers to concrete actions and relationships in the school and local community, five themes emerged: individualisation and use of individual education plans; cooperation between teachers and school inclusive education expert team; cooperation with internal and external specialists; cooperation with parents, and cooperation with the local community. The second category, which reflects beliefs, values and implicit school norms, was further divided into five subcategories: willingness for life-long learning; proactive stance; sense of teamwork; sophisticated personal philosophies of development and learning; and acceptance of difference. We concluded that successful schools have developed into professional learning communities. Finally, recommendations for improving relevant practices were provided. 相似文献

18.

Estimating Non-Normal Latent Trait Distributions within Item Response Theory Using True and Estimated Item Parameters

D. A. Sass T. A. Schmitt C. M. Walker 《教育实用测度》2013,26(1):65-88

Item response theory (IRT) procedures have been used extensively to study normal latent trait distributions and have been shown to perform well; however, less is known concerning the performance of IRT with non-normal latent trait distributions. This study investigated the degree of latent trait estimation error under normal and non-normal conditions using four latent trait estimation procedures and also evaluated whether the test composition, in terms of item difficulty level, reduces estimation error. Most importantly, both true and estimated item parameters were examined to disentangle the effects of latent trait estimation error from item parameter estimation error. Results revealed that non-normal latent trait distributions produced a considerably larger degree of latent trait estimation error than normal data. Estimated item parameters tended to have comparable precision to true item parameters, thus suggesting that increased latent trait estimation error results from latent trait estimation rather than item parameter estimation. 相似文献

19.

IRT‐Estimated Reliability for Tests Containing Mixed Item Formats

Lianghua Shu Richard D. Schwarz 《Journal of Educational Measurement》2014,51(2):163-177

As a global measure of precision, item response theory (IRT) estimated reliability is derived for four coefficients (Cronbach's α, Feldt‐Raju, stratified α, and marginal reliability). Models with different underlying assumptions concerning test‐part similarity are discussed. A detailed computational example is presented for the targeted coefficients. A comparison of the IRT model‐derived coefficients is made and the impact of varying ability distributions is evaluated. The advantages of IRT‐derived reliability coefficients for problems such as automated test form assembly and vertical scaling are discussed. 相似文献

20.

Comparing the Difficulty of Examination Subjects with Item Response Theory

Oksana B. Korobko Cees A. W. Glas Roel J. Bosker Johan W. Luyten 《Journal of Educational Measurement》2008,45(2):139-157

Methods are presented for comparing grades obtained in a situation where students can choose between different subjects. It must be expected that the comparison between the grades is complicated by the interaction between the students' pattern and level of proficiency on one hand, and the choice of the subjects on the other hand. Three methods based on item response theory (IRT) for the estimation of proficiency measures that are comparable over students and subjects are discussed: a method based on a model with a unidimensional representation of proficiency, a method based on a model with a multidimensional representation of proficiency, and a method based on a multidimensional representation of proficiency where the stochastic nature of the choice of examination subjects is explicitly modeled. The methods are compared using the data from the Central Examinations in Secondary Education in the Netherlands. The results show that the unidimensional IRT model produces unrealistic results, which do not appear when using the two multidimensional IRT models. Further, it is shown that both the multidimensional models produce acceptable model fit. However, the model that explicitly takes the choice process into account produces the best model fit. 相似文献