首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Abstract

One major aim of international large-scale assessments (ILSAs) is to monitor changes in student performance over time. To accomplish this task, a set of common items is repeatedly administered in each assessment and linking methods are used to align the results from the different assessments on a common scale. The present article introduces a framework for discussing linking errors in ILSAs, in which different components of linking errors are distinguished (country-by-item interaction, assessment-by-item interaction and country-by-assessment-by-item interaction). Furthermore, the different components of linking errors are used to analytically derive standard errors for national trend estimates. In a simulation study, the proposed standard error formula outperforms the method that is used in PISA. In addition, the PISA 2006 and 2009 reading data are used to illustrate how the interpretation of national trend estimates can change when different procedures are applied to calculate standard errors.  相似文献   

2.
Errors don't exist in our data, but they serve a vital function. Reality is complicated, but our models need to be simple in order to be manageable. We assume that attributes are invariant over some conditions of observation, and once we do that we need some way of accounting for the variability in observed scores over these conditions of observation. We relegate this inconvenient variability to errors of measurement. The seriousness of errors of measurement depends on the intended interpretations and uses of the scores and the context in which they are used. Errors are too large if they interfere with the intended interpretations and uses, and otherwise are acceptable. The errors of measurement have to be small compared to the tolerance for error, and errors that are too large have to be controlled in some way. We have several ways of doing this. We can redefine the attribute of interest, we can standardize the assessments and leave the attribute alone, and/or we can sample the relevant performance domain more thoroughly. It is particularly important to control the larger sources of error. If a source of error (systematic or random) is small compared to the dominant sources of error for a testing procedure, it can generally be ignored.  相似文献   

3.
In large-scale assessment programs such as NAEP, TIMSS and PISA, students' achievement data sets provided for secondary analysts contain so-called plausible values. Plausible values are multiple imputations of the unobservable latent achievement for each student. In this article it has been shown how plausible values are used to: (1) address concerns with bias in the estimation of certain population parameters when point estimates of latent achievement are used to estimate those population parameters; (2) allow secondary data analysts to employ standard techniques and tools (e.g., SPSS, SAS procedures) to analyse achievement data that contains substantial measurement error components; and (3) facilitate the computation of standard errors of estimates when the sample design is complex. The advantages of plausible values have been illustrated by comparing the use of maximum likelihood estimates and plausible values (PV) for estimating a range of population statistics.  相似文献   

4.
Formative assessments and feedback are vital to enhancing learning outcomes but require that learners feel at ease identifying their errors, and receiving feedback from a trusted source – teachers. An experimental test of a new theoretical framework was conducted to cultivate a pedagogical alliance to enhance students’ (a) trust in the teacher, (b) well-being in the learning environment and (c) identification of confusion and errors for the purpose of learning, assessment and feedback. A sample of 101 undergraduate students was randomly assigned to either an intervention (n = 51) or control (n = 50) condition in Elementary Statistics. Results indicated that a pedagogical alliance could be created to enhance student trust in the instructor, leading students to report greater well-being and a higher number of potential areas of confusion in their understanding of new content material relative to a control group. These results have implications for formative feedback, assessments, and by extension learning outcomes.  相似文献   

5.
Background:?Assessment grades are ‘estimates’ of ability or performance and there are many reasons why an awarded grade might not meet a candidate's expectations, being either better or poorer than anticipated. Although there may be some obvious reasons for grade discrepancies, such as a lack of preparation or under-performance, there are a number of technical issues to consider, such as the potential effects of random measurement error, human error and grade misclassification. However, traditionally, there has been limited information available to the public about such issues.

Purpose:?This study formed part of a two-year investigation into the reliability of public examination outcomes in England and the current paper explores participants’ narratives relating to one of the themes that emerged from the study of public perceptions of assessment reliability. It examines how individuals interpreted and rationalised their examination results, particularly those that failed to meet expectations, and discusses the impact that such results may have on individuals’ academic self-concept.

Sample and method:?Ten focus groups were conducted across five qualification user groups:?two each with employees, employers, teachers, trainee teachers, and job-seekers (74 participants in total). A flexible discussion schedule was employed to explore participants’ experiences and perceptions of assessment reliability.

Main findings:?Participants tended to internalise ‘blame’ for results that were poorer than expected by constructing explanations that focused on a perceived lack of preparation, ability or knowledge. These experiences appeared to have a negative impact on individuals' academic self-concept. Secondary school teacher participants shared experiences of marking, technical and standard setting errors, and were more aware than other qualification user groups of the external factors that can impact on assessment outcomes.

Conclusion:?Examination results that are poorer than expected can threaten individuals’ academic self-concept, confidence in their ability, and influence their study and career intentions and opportunities. A better understanding of educational measurement issues may offer individuals a more informed framework for understanding their examination results, especially where results do not meet expectations.  相似文献   

6.
非母语者汉语语法偏误研究程序   总被引:2,自引:0,他引:2  
文章结合外国人学汉语的语法偏误,探讨了偏误研究的8个步骤:(1)语料收集;(2)偏误辨识;(3)纠偏;(4)偏误点选择;(5)形式描述;(6)规则解释;(7)原因探索;(8)教学建议。语料收集可分为开放式和聚焦式两类。偏误点选择和聚焦式语料收集,可以在不同阶段介入。当这两个环节介入后,某些步骤可能要重新进行。但是,这不是简单的重复,而是更高阶段的循环。由于选择后的偏误点,语料会更多更全面,相关的环节,如纠偏、类别描述、规则解释、原因探索,就会更为全面、准确,提出的教学建议也就更有可操作性和实效性。  相似文献   

7.
对于大部分初级英语水平的学习者来说,口语出现错误成为一种普遍现象,应用语言学家S.P.Corder曾指出:对教师而言,这是了解学生使用什么手段和程序来掌握语言的途径;对学习者而言,这是必不可少的手段。试结合错误分析理论、教学理论以及教学实践分析错误来源、错误种类,简要评述课堂中几种常见的纠错方法,并指出如何选择纠错方法,希望能对教学实践有所启示。  相似文献   

8.
Standard errors computed according to the operational practices of international large-scale assessment studies such as the Programme for International Student Assessment’s (PISA) or the Trends in International Mathematics and Science Study (TIMSS) may be biased when cross-national differential item functioning (DIF) and item parameter drift are present. This bias may be somewhat reduced when cross-national DIF is correlated over study cycles, which is the case in PISA. This article reviews existing methods for calculating standard errors for national trends in international large-scale assessments and proposes a new method that takes into account the dependency of linking errors at different time points. We conducted a simulation study to compare the performance of the standard error estimators. The results showed that the newly suggested estimator outperformed the existing estimators as it estimated standard errors more accurately and efficiently across all simulated conditions. Implications for practical applications are discussed.  相似文献   

9.
Engineering Criteria 2000 (EC 2000), the recently revised set of accreditation criteria for engineering programs in the USA, places considerable stress on outcomes assessments. EC 2000 requires the assessment results to be used to identify program improvements and for such usage and the resulting improvements to be documented. While numerous assessment instruments have been developed and discussed in the literature, less attention has been paid to the question of how to use these to improve the programs or to document this usage. In this paper, we present an approach that serves both to identify possible improvements based on the results of assessments, as well as to provide high quality documentation. As an added bonus, it also helps incoming students and new faculty to get a good understanding of the structure and evolution of the program.  相似文献   

10.
This module describes and extends X‐to‐Y regression measures that have been proposed for use in the assessment of X‐to‐Y scaling and equating results. Measures are developed that are similar to those based on prediction error in regression analyses but that are directly suited to interests in scaling and equating evaluations. The regression and scaling function measures are compared in terms of their uncertainty reductions, error variances, and the contribution of true score and measurement error variances to the total error variances. The measures are also demonstrated as applied to an assessment of scaling results for a math test and a reading test. The results of these analyses illustrate the similarity of the regression and scaling measures for scaling situations when the tests have a correlation of at least .80, and also show the extent to which the measures can be adequate summaries of nonlinear regression and nonlinear scaling functions, and of heteroskedastic errors. After reading this module, readers will have a comprehensive understanding of the purposes, uses, and differences of regression and scaling functions.  相似文献   

11.
Abstract

Background: International large-scale assessments (ILSAs) are a much-debated phenomenon in education. Increasingly, their outcomes attract considerable media attention and influence educational policies in many jurisdictions worldwide. The relevance, uses and consequences of these assessments are often the focus of research scrutiny. Whilst some argue that the assessment outcomes provide an effective basis for informed policy-making, critics claim that the use of international assessment data can result in a range of unintended consequences, such as the shaping and governing of school systems ‘by numbers’.

Purpose: This article explores and analyses the arguments about the uses and consequences of ILSAs. In particular, the discourse about the assessments’ consequential validity will be discussed and evaluated.

Sources of evidence: Literature relating to the uses and consequences of large-scale assessment was analysed, with a focus on research on the consequential aspects of validity.

Main argument: Much research suggests that ILSAs have unintended consequences that affect and influence educational policy. However, the influences on educational policy are complex and interwoven: for example, it is not clear-cut whether effects such as converging curricular are, necessarily, direct consequences of large-scale assessments. Further, it is suggested that a beneficial consequence of large-scale assessment is the infrastructure they provide for studies in the social sciences, although caution must be applied to causal claims, in particular because of the cross-sectional design of the assessments.

Conclusions: The considerable literature discussing the uses and consequences of large-scale assessments tends to point out potential negative aspects of the studies. However, it is also apparent that large-scale international assessments can be a valuable resource for studying global trends and evolving systems in education. Despite the extensive debates around large-scale assessment outcomes both in the media and in educational policy arenas, empirical educational research all too often appears underused in the discussion.  相似文献   

12.
Science teachers’ content knowledge is an important influence on student learning, highlighting an ongoing need for programs, and assessments of those programs, designed to support teacher learning of science. Valid and reliable assessments of teacher science knowledge are needed for direct measurement of this crucial variable. This paper describes multiple sources of validity and reliability (Cronbach’s alpha greater than 0.8) evidence for physical, life, and earth/space science assessments—part of the Diagnostic Teacher Assessments of Mathematics and Science (DTAMS) project. Validity was strengthened by systematic synthesis of relevant documents, extensive use of external reviewers, and field tests with 900 teachers during assessment development process. Subsequent results from 4,400 teachers, analyzed with Rasch IRT modeling techniques, offer construct and concurrent validity evidence.  相似文献   

13.
14.
It is expected that children increasingly learn to identify errors throughout their schooling process and even before it. As a further step, however, some scholars have suggested how a culture of error should be implemented in the classroom for the student to be able not only to locate errors but also, and above all, to learn from them. Yet the various proposals aimed at generating a culture of error in the classroom keep regarding error as all those responses and reactions that are not considered as true or correct in each specific case, thereby not realizing that many of these alleged errors are really anomalies with very different characteristics and consequences despite their seeming resemblance. In this paper, I rely on Ludwig Wittgenstein’s On Certainty to clarify the difference between errors and anomalies. Subsequently, I provide guidelines that may be adapted by each teacher to her students’ needs and development level in order to foster a culture of error that begins by distinguishing error from anomaly, which constitutes a practical as well as conceptual necessity particularly in Child and Primary Education, as it is just then when anomalies most frequently arise in the form of questions and answers.  相似文献   

15.
文章系统地介绍了GPS导航仪的工作原理和误差,着重从理论上分析了GPS导航仪与卫星信号传播有关的测伪距误差、卫星和船舶相对位置夹角有关的几何误差、海图坐标系和海图所引起误差的来源和大小。根据这些误差的特点,提出一些减小误差提高定位精度的方法以及利用本船的GPS卫星导航仪进行数据实测,验证实测结果是否在理论误差范围内。  相似文献   

16.
The Bologna Declaration brought reforms into higher education that imply changes in teaching methods, didactic materials and textbooks, infrastructures and laboratories, etc. Statistics and mathematics are disciplines that traditionally have the worst success rates, particularly in non‐mathematics core curricula courses. This research project, Mathematics and Statistics for the Development of Professional Skills, which is in progress at the University of Beira Interior in Portugal, has as one of its main objectives the development of the e‐assessment system as a resource for learning assessment and student self‐regulation. Based on the results of the above‐mentioned project, this paper will give evidence of how to improve the reliability of an e‐assessment system, show that e‐assessment can be a good alternative to open‐ended tests and that students tend to show a positive attitude towards its use. We show that this can be done by checking the internal consistency and measurement error of the e‐assessment tests, the analysis of the association between student scores obtained by different methods of assessment and the analysis of data survey on student opinion about e‐assessment methods.  相似文献   

17.
ABSTRACT

In the last decades, most countries have adopted data-intensive policy instruments aimed at modernizing the governance of education systems, and strengthening their competitiveness. Instruments such as national large-scale assessments and test-based accountabilities have disseminated widely, to the point that they are being enacted in countries with very different administrative traditions and levels of economic development. Nonetheless, comparative research on the trajectories that governance instruments follow in different institutional and socio-economic contexts is still scarce. On the basis of a systematic literature review (n?=?158), this paper enquires into the scope and modalities of educational governance change that national large-scale assessments and test-based accountability instruments have triggered in a broad range of institutional settings. The paper shows that, internationally, educational governance reforms advance through path-dependent and contingent processes of policy instrumentation that are markedly conditioned by prevailing politico-administrative regimes. The paper also reflects on the additive and evolving nature of educational governance reforms.  相似文献   

18.
This study investigated (1) the extent to which presentations of measurement error in score reports influence teachers’ decisions and (2) teachers’ preferences in relation to these presentations. Three presentation formats of measurement error (blur, colour value and error bar) were compared to a presentation format that omitted measurement error. The results from a factorial survey analysis showed that the position of a score in relation to a cut-off score impacted most significantly on decisions. Moreover, the teachers (N = 337) indicated the need for additional information significantly more often when the score reports included an error bar compared to when they omitted measurement error. The error bar was also the most preferred presentation format. The results were supported in think-aloud protocols and focus groups, although several interpretation problems and misconceptions of measurement error were identified.  相似文献   

19.
Experimental designs involving the randomization of cases to treatment and control groups are powerful and under-used in many areas of social science and social policy. This paper reminds readers of the pre- and post-test, and the post-test only, designs, before explaining briefly how measurement errors propagate according to error theory. The substance of the paper involves a series of comparisons using the same measurements, all assumed to have a small initial error, and seeing what would happen to that error in the two different experimental designs. The findings from these calculations and simulations are that although post-test only and pre- and post-test designs yield different ‘manifest’ results with the same data, the substantive conclusions drawn would be similar in most real-life situations. However, if these manifest results are assumed to be in error, stemming from small initial errors in the measurements at pre- and post-test, then these substantive conclusions could be completely wrong. In one example, the pre- and post-test designs propagate an initial maximum measurement error of 10% to an error of over 60,000% in the answer. In general, and perhaps counter-intuitively, the post-test only results are less misleading. The paper ends by summarizing the lessons drawn. The key message is that all other things being equal, the post-test only design is to be preferred. We may also need to use bigger samples, and more strictly accurate measures, capable of objective calibration focus on seeking larger effect sizes.  相似文献   

20.
The school as an institution assumes that students' grades are constituted by their assessments. This paper examines the background of this presupposition and provides a micro-analytical perspective of the grading practice of teachers in German High Schools (Gymnasium). This paper conceptualises the theoretical framework of the research in educational measurement in discussion. It is shown that the measured assessment of students and the teacher's observations are linked. When grading, teachers construct their own assessments. This process is depicted in this paper by two forms of observations: self-observation within the context of written examinations and third-party observation within the context of final oral exams.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号