期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

张军《考试研究》2011,(6)

本文采用多维项目反应理论(MIRT)中的补偿型模型,探索性地分析HSK(初中等)阅读部分的潜在维度空间,为HSK的构想效度研究提供一个佐证。实验使用自编程序估计参数。在模型拟合检验时,应用聚类分析技术为被试分组。结果表明:三维的MIRT模型是阅读部分的最佳心理计量模型,其中维度1能较好地区分被试,是阅读部分的主要测量对象。维度1与维度2所代表的能力呈一定相反的变化趋势,维度3较独立于其他两个维度。相似文献

2.

层面理论中的测量结构研究 总被引：1，自引：0，他引：1

赵守盈江新会骆文淑《中国考试》2007,(3):13-19

作为行为科学研究的一种有效策略,层面理论最独到的特点就在于依据层面理论所构建的映射语句可以编制符合一定理论构想的测验项目。但这种编制项目的方式实际是按照层面的因子设计来实现的。运用多维尺度技术分析被试的反应结果,研究者往往期望得到能够反映各层面的多维空间结构,这样可以通过各层面的不同水平对空间结构进行分割。本文主张多维尺度分析所得出的空间结构中,每一维度所反映的是一个具体的被试类群而不是一个或几个层面。采用层面的水平区分多维空间结构,会导致对反应作用的分析的忽略。相似文献

3.

区域教育质量评估学业工具的研发

《中国考试》2018,(6)

学生学业测评是教育质量测评的重要组成部分。国内外学业质量测评以数学、科学、阅读为主要领域,强调对素养的考查,测评框架包括内容维度和认知过程,测试题目由客观题和主观题构成,并初步实现计算机化测试。区域学业质量测评工具的设计借鉴国际经验,从内容维度和认知过程考查学生的数学和阅读素养,测试题目覆盖所有的维度,实现了年度间结果的可比较。未来区域教育质量评估学业工具的研发,应与诊断性评价相结合,逐步采用计算机化测试,将信息技术运用于过程性学习和评价。相似文献

4.

命题者：影响阅读理解测试效度的一个因素

李雪曾用强《考试研究》2012,(4):49-60

本研究应用项目反应理论,从被试的阅读能力值和题目的难度值这两个方面,分析阅读理解测试中多项选择题命题者对考试效度的影响。实验设计中,将两组被试同时施测于一项“阅读水平测试”,根据测试结果估计出的两组被试能力值之间无显著性差异。再次将这两组被试分别施测于两位不同命题者所命制的题目,尽管这些题目均产生于相同的阅读材料,且题目的难度值之间并没有显著性差异,被试的表现却显著不同。Rasch模型认为,被试表现由被试能力和试题难度共同决定。因此,可以推测,这是由于不同命题者所命制的题目影响了被试的表现,并进而影响了使用多项选择题进行阅读理解测试的效度。相似文献

5.

PISA阅读素养测评内容领域的解析及其启示

《中国考试》2014,(10)

阅读素养测评是PISA测评的一个重要组成部分。测评的内容领域对试卷设计、题目命制以及分数的解释都起着关键的作用。本文结合现代语言测试理论研究界发展的背景和趋势,以《PISA 2015阅读素养框架(草案)》为基础对PISA阅读素养测评的内容领域进行解析并总结其对国内外语阅读测试带来的启示。相似文献

6.

德国职业能力测评项目ASCOT述评

《职教论坛》2015,(21)

作为国际上最早的职业能力大规模测评项目之一,德国ASCOT能力测评在国际职业教育研究中具有重要的影响。它首次确定了共同的职业任务和资格要求,按照项目反应理论构建能力测评模型和开发测评题目,并利用虚拟工作情境进行测评。但从职业教育学理论角度看,ASCOT尚未满足职业能力大规模测评的全部要求,特别是职业教育目标和职业效度要求。本文对此提出了如采用开放式测试等的相关改进建议。相似文献

7.

基于Rasch模型的初中化学素养表现性试题研究

夏振洋刘开福王后雄黄勇《教育与装备研究》2023,(5):47-52

课程标准是教师教学实施和考试评价的依据。研发测评试题在素养进阶的过程中有重要的意义。该研究的目的是采用项目反应理论研制初中生化学素养的测评试题,为加强素养进阶的实证研究、应用研究提供参考。研究过程为：基于义务教育化学课程标准(2022年版)质量要求研制中期调研试题测评试题,研究问题、通过500个被试样本从项目一致性、单维度、项目-被试对应分析、拟合程度等多维度数据分析、拟合性较差试题项目分析,得出结果表明：在新课标颁布初期,初中生化学素养进阶与学业质量符合预期,通过表现性诊断报告为了解学生个性化发展、改进教学提供科学工具。相似文献

8.

PISA2012财经素养测试的启示

赵闻敏鲍建生《现代教学》2014,(4):15-17

2012年,OECD国际学生评估项目（PISA）在原有的数学素养、科学素养以及阅读素养的测试中,增加了财经素养（FinancialLiteracy）测试领域。18个国家和地区参加了这一新领域的测试。本文将结合具体题目对PISA2012中财经素养的基本评价维度进行介绍,分析其与数学素养的联系,探讨财经素养对我国数学教学的意义。相似文献

9.

高中数学教师核心素养测评模型构建研究

武丽莎朱立明张莉马振《天津师范大学学报(基础教育版)》2023,(1):42-47

随着学生数学学科核心素养的发展，数学教师核心素养也成为研究热点。如何对数学教师核心素养进行测评，成为教师教育领域亟待解决的问题。研究发现，通过梳理数学教师核心素养的相关文献，对其素养成分与测评指标进行探究，利用文献法、专家咨询法、层次分析法，从理论思辨与实证分析两方面初步构建了高中数学教师核心素养KACE测评模型，具体涵盖教师所必备的数学知识素养（Knowledge-K）、数学能力素养（Ability-A）、数学文化素养（Culture-C）与数学情感素养（Emotion-E），以及由此继续划分的17个观测指标，进而为当前高中数学教师核心素养测评提供理论支撑与实践指引。相似文献

10.

数学核心素养测评之小学试题设计

胡典顺张可心《湖北教育》2023,(4):41-44

<正>试题测评法与问卷调查法相结合是数学核心素养测评的常用方法,本文主要介绍试题测评法。根据数学核心素养的测评框架,试题主要围绕核心素养领域、内容领域、情境领域、过程领域四个维度进行测试。由于小学和初中两个阶段数学核心素养的主要表现不同,所以试题设计要注意区分小学和初中的素养表现。本文基于“WJ市义务教育核心素养监测”项目,阐述数学核心素养测评的小学试题设计。相似文献

11.

Using Multidimensional Item Response Theory to Evaluate Educational and Psychological Tests

Terry A. Ackerman Mark J. Gierl Cindy M. Walker 《Educational Measurement》2003,22(3):37-51

Many educational and psychological tests are inherently multidimensional, meaning these tests measure two or more dimensions or constructs. The purpose of this module is to illustrate how test practitioners and researchers can apply multidimensional item response theory (MIRT) to understand better what their tests are measuring, how accurately the different composites of ability are being assessed, and how this information can be cycled back into the test development process. Procedures for conducting MIRT analyses–from obtaining evidence that the test is multidimensional, to modeling the test as multidimensional, to illustrating the properties of multidimensional items graphically-are described from both a theoretical and a substantive basis. This module also illustrates these procedures using data from a ninth-grade mathematics achievement test. It concludes with a discussion of future directions in MIRT research. 相似文献

12.

The Impact of Item Stem Format on the Dimensional Structure of Mathematics Assessments

Adnan Kan Damien C. Cormier 《Educational Assessment》2019,24(1):13-32

Item stem formats can alter the cognitive complexity as well as the type of abilities required for solving mathematics items. Consequently, it is possible that item stem formats can affect the dimensional structure of mathematics assessments. This empirical study investigated the relationship between item stem format and the dimensionality of mathematics assessments. A sample of 671 sixth-grade students was given two forms of a mathematics assessment in which mathematical expression (ME) items and word problems (WP) were used to measure the same content. The effects of mathematical language and reading abilities in responding to ME and WP items were explored using unidimensional and multidimensional item response theory models. The results showed that WP and ME items appear to differ with regard to the underlying abilities required to answer these items. Hence, the multidimensional model fit the response data better than the unidimensional model. For the accurate assessment of mathematics achievement, students’ reading and mathematical language abilities should also be considered when implementing mathematics assessments with ME and WP items. 相似文献

13.

Comparing Multidimensional and Unidimensional Proficiency Classifications: Multidimensional IRT as a Diagnostic Aid

Cindy M. Walker S. Natasha Beretvas 《Journal of Educational Measurement》2003,40(3):255-275

This research examined the effect of scoring items thought to be multidimensional using a unidimensional model and demonstrated the use of multidimensional item response theory (MIRT) as a diagnostic tool. Using real data from a large-scale mathematics test, previously shown to function differentially in favor of proficient writers, the difference in proficiency classifications was explored when a two-versus one-dimensional confirmatory model was fit. The estimate of ability obtained when using the unidimensional model was considered to represent general mathematical ability. Under the two-dimensional model, one of the two dimensions was also considered to represent general mathematical ability. The second dimension was considered to represent the ability to communicate in mathematics. The resulting pattern of mismatched proficiency classifications suggested that examinees found to have less mathematics communication ability were more likely to be placed in a lower general mathematics proficiency classification under the unidimensional than multidimensional model. Results and implications are discussed. 相似文献

14.

The Use of Hierarchical Generalized Linear Model for Item Dimensionality Assessment

S. Natasha Beretvas Natasha J. Williams 《Journal of Educational Measurement》2004,41(4):379-395

To assess item dimensionality, the following two approaches are described and compared: hierarchical generalized linear model (HGLM) and multidimensional item response theory (MIRT) model. Two generating models are used to simulate dichotomous responses to a 17-item test: the unidimensional and compensatory two-dimensional (C2D) models. For C2D data, seven items are modeled to load on the first and second factors, θ₁ and θ₂, with the remaining 10 items modeled unidimensionally emulating a mathematics test with seven items requiring an additional reading ability dimension. For both types of generated data, the multidimensionality of item responses is investigated using HGLM and MIRT. Comparison of HGLM and MIRT's results are possible through a transformation of items' difficulty estimates into probabilities of a correct response for a hypothetical examinee at the mean on θ and θ₂. HGLM and MIRT performed similarly. The benefits of HGLM for item dimensionality analyses are discussed. 相似文献

15.

Examining the Reliability of Student Growth Percentiles Using Multidimensional IRT

下载免费PDF全文

Scott Monroe Li Cai 《Educational Measurement》2015,34(4):21-30

Student growth percentiles (SGPs, Betebenner, 2009) are used to locate a student's current score in a conditional distribution based on the student's past scores. Currently, following Betebenner (2009), quantile regression (QR) is most often used operationally to estimate the SGPs. Alternatively, multidimensional item response theory (MIRT) may also be used to estimate SGPs, as proposed by Lockwood and Castellano (2015). A benefit of using MIRT to estimate SGPs is that techniques and methods already developed for MIRT may readily be applied to the specific context of SGP estimation and inference. This research adopts a MIRT framework to explore the reliability of SGPs. More specifically, we propose a straightforward method for estimating SGP reliability. In addition, we use this measure to study how SGP reliability is affected by two key factors: the correlation between prior and current latent achievement scores, and the number of prior years included in the SGP analysis. These issues are primarily explored via simulated data. In addition, the QR and MIRT approaches are compared in an empirical application. 相似文献

16.

A Multidimensional Item Response Theory Model for Continuous and Graded Responses With Error in Persons and Items

Pere J. Ferrando David Navarro-Gonzlez 《Educational and psychological measurement》2021,81(6):1029

Item response theory “dual” models (DMs) in which both items and individuals are viewed as sources of differential measurement error so far have been proposed only for unidimensional measures. This article proposes two multidimensional extensions of existing DMs: the M-DTCRM (dual Thurstonian continuous response model), intended for (approximately) continuous responses, and the M-DTGRM (dual Thurstonian graded response model), intended for ordered-categorical responses (including binary). A rationale for the extension to the multiple-content-dimensions case, which is based on the concept of the multidimensional location index, is first proposed and discussed. Then, the models are described using both the factor-analytic and the item response theory parameterizations. Procedures for (a) calibrating the items, (b) scoring individuals, (c) assessing model appropriateness, and (d) assessing measurement precision are finally discussed. The simulation results suggest that the proposal is quite feasible, and an illustrative example based on personality data is also provided. The proposals are submitted to be of particular interest for the case of multidimensional questionnaires in which the number of items per scale would not be enough for arriving at stable estimates if the existing unidimensional DMs were fitted on a separate-scale basis. 相似文献

17.

IRT Approaches to Modeling Scores on Mixed-Format Tests

Won-Chan Lee Stella Y. Kim Jiwon Choi Yujin Kang 《Journal of Educational Measurement》2020,57(2):230-254

This article considers psychometric properties of composite raw scores and transformed scale scores on mixed-format tests that consist of a mixture of multiple-choice and free-response items. Test scores on several mixed-format tests are evaluated with respect to conditional and overall standard errors of measurement, score reliability, and classification consistency and accuracy under three item response theory (IRT) frameworks: unidimensional IRT (UIRT), simple structure multidimensional IRT (SS-MIRT), and bifactor multidimensional IRT (BF-MIRT) models. Illustrative examples are presented using data from three mixed-format exams with various levels of format effects. In general, the two MIRT models produced similar results, while the UIRT model resulted in consistently lower estimates of reliability and classification consistency/accuracy indices compared to the MIRT models. 相似文献

18.

Longitudinal Analysis of Early Mathematics Learning

下载免费PDF全文

Gregory Camilli Sunhee Kim 《Educational Measurement》2018,37(3):4-10

The trend in mathematics achievement from preschool to kindergarten is studied with a longitudinal growth item response theory model. The three measurement occasions included the spring of preschool and the spring and fall of kindergarten. The growth trend was nonlinear, with a steep drop between spring of preschool and fall of kindergarten. The modeling results provide validation for the argument that a classroom assessment in mathematics can be used to assess developmental skill levels that are consistent with a theory of early mathematics acquisition. The statistical model employed enables an effective illustration of overall gains and individual variability. Implications of the summer loss are discussed as well as model limitations. 相似文献

19.

Evaluation of linking methods for multidimensional irt calibrations

Kyung-Seok Min 《Asia Pacific Education Review》2007,8(1):41-55

Most researchers agree that psychological/educational tests are sensitive to multiple traits, implying the need for a multidimensional item response theory (MIRT). One limitation of applying a MIRT in practice is the difficulty in establishing equivalent scales of multiple traits. In this study, a new MIRT linking method was proposed and evaluated by comparison with two existing methods. The results showed that the new method was more acceptable in transforming item parameters and maintaining dimensional structures. Limitations and cautions in using multidimensional linking techniques were also discussed. 相似文献

20.

The challenges and possibilities of aligning large‐scale testing with mathematical reform: the case of Ontario

Alex Lawson Christine Suurtamm 《Assessment in Education: Principles, Policy & Practice》2006,13(3):305-325

In 1997, the Ontario government, like many other jurisdictions, undertook systemic reform of their elementary school mathematics programme, developing a new mathematics curriculum, report card, and province‐wide assessment. The curricular reform embodied a new vision of mathematics learning and instruction that emphasized instruction using challenging problems, the student construction of multiple solution methods, and mathematical communication and defence of ideas. While the design of the original large‐scale assessment incorporated much of the latest research and theory on effective practices at that time, these traditional item development and scoring practices no longer adequately assess mathematics achievement in reform‐inspired classrooms. The difficulties of marrying traditional assessment practices with a reform‐inspired curriculum could be addressed by creating a construct definition from the recent research findings on students’ mathematical development in reform‐inspired classrooms. The importance, challenges and implications of redefining the construct on the basis of existing research on students’ mathematical development, as well as collapsing the traditional content‐by‐process matrix for item development, are explored. 相似文献