首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Standard errors computed according to the operational practices of international large-scale assessment studies such as the Programme for International Student Assessment’s (PISA) or the Trends in International Mathematics and Science Study (TIMSS) may be biased when cross-national differential item functioning (DIF) and item parameter drift are present. This bias may be somewhat reduced when cross-national DIF is correlated over study cycles, which is the case in PISA. This article reviews existing methods for calculating standard errors for national trends in international large-scale assessments and proposes a new method that takes into account the dependency of linking errors at different time points. We conducted a simulation study to compare the performance of the standard error estimators. The results showed that the newly suggested estimator outperformed the existing estimators as it estimated standard errors more accurately and efficiently across all simulated conditions. Implications for practical applications are discussed.  相似文献   

2.
ABSTRACT

Based on concerns about the item response theory (IRT) linking approach used in the Programme for International Student Assessment (PISA) until 2012 as well as the desire to include new, more complex, interactive items with the introduction of computer-based assessments, alternative IRT linking methods were implemented in the 2015 PISA round. The new linking method represents a concurrent calibration using all available data, enabling us to find item parameters that maximize fit across all groups and allowing us to investigate measurement invariance across groups. Apart from the Rasch model that historically has been used in PISA operational analyses, we compared our method against more general IRT models that can incorporate item-by-country interactions. The results suggest that our proposed method holds promise not only to provide a strong linkage across countries and cycles but also to serve as a tool for investigating measurement invariance.  相似文献   

3.
This article addresses the policy implications of participation in international large-scale assessments (ILSAs), particularly the Programme for International Student Assessment (PISA), and the ways in which such implications might influence mathematics education. Taking Norway as a special case, this discussion focuses on insights into teaching, learning and assessment practices that can be inferred from the PISA study, and how participation in ILSAs has contributed to educational policy and even changed policymakers’ perspectives on schools, teachers and students. Following publication of the PISA 2000 results, Norway experienced a ‘PISA shock’, leading to the implementation of a national quality assessment system and national tests. In addition, changes were made to the mathematics curriculum for compulsory school and to mathematics teacher education. More recently, public debate has focused less on rank and league tables, shifting instead to the high number of low-achieving students and the low number of high achievers. Moreover, there has been little uptake of policy advice provided by the Organisation for Economic Co-operation and Development (OECD), which focuses on strengthening accountability measures. Furthermore, although the Norwegian educational system in the past decade has undergone a decentralisation process, the educational system still follows the Nordic model, which focuses on equity and ‘education for all’. Analyses of the Norwegian case indicate that policymaking takes place in highly cultural contexts, and that international studies might be used merely to validate existing policy directions.  相似文献   

4.
This paper examines whether, to what extent, and how international large-scale assessments (ILSAs) have influenced education policy-making at the national level. Based on an exploratory review of the research and policy literature on ILSAs and two surveys administered to educational policy experts, researchers, policymakers, and educators, our research found that ILSAs, with their multiple and ambiguous uses, increasingly function as solutions in search for the right problem – that is, they appear to be used as tools to legitimize educational reforms. The survey results pointed to a growing perception among stakeholders that ILSAs are having an effect on national educational policies, with 38% of respondents stating that ILSAs were generally misused in national policy contexts. However, while the ILSA literature indicates that these assessments are having some influence, there is little evidence that any positive or negative causal relationship exists between ILSA participation and the implementation of education reforms. Perhaps the most significant change associated with the use of ILSAs in the literature reviewed is the way in which new conditions for educational comparison have been made possible at the national, regional, and global levels.  相似文献   

5.
Ji Liu 《牛津教育评论》2019,45(3):315-332
This study explores the multidimensionality of engagements with international large-scale standardised assessments (ILSAs). The objective is to understand how different policy actors—government, media, and citizens—rationalise, report, and perceive China’s PISA participation. First, government archive analysis traces a decade of documents (2005–2015), and the findings show that Shanghai’s initial participation in PISA was rationalised as a policy experiment for learning Western ideas of education governance. Second, media content analysis of two major news outlets indicates that media framing of PISA participation was strategic on timing, intensity, and tone. Third, a public opinion survey yields results which show that low public knowledge of Shanghai’s PISA participation in 2012 is prevalent. Drawing on these findings, this study investigates how the ILSA movement, exemplified by PISA, engages different levels of stakeholders in China.  相似文献   

6.
ABSTRACT

This paper examines how international, large-scale skills assessments (ILSAs) engage with the broader societies they seek to serve and improve. It looks particularly at the discursive work that is done by different interest groups and the media through which the findings become part of public conversations and are translated into usable form in policy arenas. The paper discusses how individual countries are mobilised to participate in international surveys, how the public release of findings is managed and what is known from current research about how the findings are reported and interpreted in the media. Research in this area shows that international and national actors engage actively and strategically with ILSAs, to influence the interpretation of findings and subsequent policy outcomes. However, these efforts are indeterminate and this paper argues that it is at the more profound level of the public imagination of education outcomes and of the evidence needed to know about these that ILSAs achieve their most totalising effects.  相似文献   

7.
ABSTRACT

Modern international studies of educational achievement have grown in terms of participating educational systems. Accompanying this development is an increase in heterogeneity, as more and different kinds of educational systems take part. This growth has been particularly pronounced among low-performing, less economically developed systems. Although studies such as PISA have made modifications to account for increased diversity, the degree to which international assessments serve educational systems at the lower ends of the achievement continuum is understudied. We used modified Wright maps and PISA’s definition of proficiency to evaluate the fitness of PISA, especially among low performers. Our findings suggest that there is mismatch between some populations and PISA. Results from a simulation show that such disparities produced biased achievement estimates and correlations with policy relevant variables. Projected PISA growth and new instantiations of PISA, particularly geared toward developing educational systems, make these findings timely and especially relevant.  相似文献   

8.
Trend estimation in international comparative large‐scale assessments relies on measurement invariance between countries. However, cross‐national differential item functioning (DIF) has been repeatedly documented. We ran a simulation study using national item parameters, which required trends to be computed separately for each country, to compare trend estimation performances to two linking methods employing international item parameters across several conditions. The trend estimates based on the national item parameters were more accurate than the trend estimates based on the international item parameters when cross‐national DIF was present. Moreover, the use of fixed common item parameter calibrations led to biased trend estimates. The detection and elimination of DIF can reduce this bias but is also likely to increase the total error.  相似文献   

9.
Abstract

This article analyses international large-scale assessments in education from a temporal perspective. The article discusses and compares the different conceptions of time in the early international assessments conducted in the 1960s and 1970s by the IEA with the PISA studies conducted by the OECD from the year 2000 onwards. The paper argues that there has been a shift in the ways that the assessments structure time. The early IEA surveys were characterized by a relative slowness, lack of synchronization and lack of trend analyses. PISA, by contrast, is characterized by high pace, simultaneous publication of results around the world and regular and recurrent studies making the analysis of trends possible. The emergence of this new time regime, it is argued, has implications for how education is governed. At the transnational level, it strengthens the influence and importance of OECD as a significant policy actor. At the national level, as educational discourse and policy adapts to the temporalities of the PISA calendar, two kinds of effects can be distinguished. First, there is a tendency towards searching for “retrotopian” solutions for contemporary problems. Second, there is a tendency towards acceleration and short-term planning when it comes to educational reforms.  相似文献   

10.
The media analysis is situated in the larger body of studies that explore the varied reasons why different policy actors advocate for international large-scale student assessments (ILSAs) and adds to the research on the fast advance of the global education industry. The analysis of The Economist, Financial Times, and Wall Street Journal covers publications on ‘PISA’, ‘TIMSS’, and related search items over the period 1996–2016. The three media outlets vary in terms of ILSA reporting. The Economist and Financial Times tend to focus on PISA, whereas the Wall Street Journal pays greater attention to TIMSS than PISA. The content analysis of 59 articles yields interesting results about how the business-oriented readership of the three media outlets frames public education and why it sees education as a profitable business opportunity. The three most common narratives, reflecting the business logic, are the following: (i) public education is in crisis; (ii) there is no correlation between spending and education outcome; and (iii) school accountability, teacher performance, and decentralisation represent the most effective policies to improve the quality of education. Drawing on these three common narratives, the financial media outlets present a particular vision of how to improve education; a vision in which the private sector is supposed to play a major role.  相似文献   

11.
The present paper aims to discuss how data from international large-scale assessments (ILSAs) can be utilized and combined, even with other existing data sources, in order to monitor educational outcomes and study the effectiveness of educational systems. We consider different purposes of linking data, namely, extending outcomes measures, analyzing differences over time or across cohorts, and supplementing context information. These linking strategies are illustrated by a non-exhaustive selection of studies that exploited ILSAs to investigate a wide range of educational topics. We conclude that the main contribution of ILSA to educational research lies in the ways they facilitate analyses of educational policy and policy-related issues at the institutional level by means of cross-country analyses. However, the scope of these studies also covers high-quality data on lower levels of the educational system.  相似文献   

12.
Abstract

Background: International large-scale assessments (ILSAs) are a much-debated phenomenon in education. Increasingly, their outcomes attract considerable media attention and influence educational policies in many jurisdictions worldwide. The relevance, uses and consequences of these assessments are often the focus of research scrutiny. Whilst some argue that the assessment outcomes provide an effective basis for informed policy-making, critics claim that the use of international assessment data can result in a range of unintended consequences, such as the shaping and governing of school systems ‘by numbers’.

Purpose: This article explores and analyses the arguments about the uses and consequences of ILSAs. In particular, the discourse about the assessments’ consequential validity will be discussed and evaluated.

Sources of evidence: Literature relating to the uses and consequences of large-scale assessment was analysed, with a focus on research on the consequential aspects of validity.

Main argument: Much research suggests that ILSAs have unintended consequences that affect and influence educational policy. However, the influences on educational policy are complex and interwoven: for example, it is not clear-cut whether effects such as converging curricular are, necessarily, direct consequences of large-scale assessments. Further, it is suggested that a beneficial consequence of large-scale assessment is the infrastructure they provide for studies in the social sciences, although caution must be applied to causal claims, in particular because of the cross-sectional design of the assessments.

Conclusions: The considerable literature discussing the uses and consequences of large-scale assessments tends to point out potential negative aspects of the studies. However, it is also apparent that large-scale international assessments can be a valuable resource for studying global trends and evolving systems in education. Despite the extensive debates around large-scale assessment outcomes both in the media and in educational policy arenas, empirical educational research all too often appears underused in the discussion.  相似文献   

13.
Large‐scale assessments such as the Programme for International Student Assessment (PISA) have field trials where new survey features are tested for utility in the main survey. Because of resource constraints, there is a trade‐off between how much of the sample can be used to test new survey features and how much can be used for the initial item response theory (IRT) scaling. Utilizing real assessment data of the PISA 2015 Science assessment, this article demonstrates that using fixed item parameter calibration (FIPC) in the field trial yields stable item parameter estimates in the initial IRT scaling for samples as small as n = 250 per country. Moreover, the results indicate that for the recovery of the county‐specific latent trait distributions, the estimates of the trend items (i.e., the information introduced into the calibration) are crucial. Thus, concerning the country‐level sample size of n = 1,950 currently used in the PISA field trial, FIPC is useful for increasing the number of survey features that can be examined during the field trial without the need to increase the total sample size. This enables international large‐scale assessments such as PISA to keep up with state‐of‐the‐art developments regarding assessment frameworks, psychometric models, and delivery platform capabilities.  相似文献   

14.
This article presents findings from a recent study of the education policy uses and impact of international large-scale assessments, namely the Programme for International Student Assessment (PISA). The paper focuses on two overlapping dimensions of PISA’s education policy use in the context of Spain. These include political dimensions, such as the use of PISA to initiate and justify the 2012?2013 educational reforms; and technical dimensions, namely the use of PISA in the development of national-level indicators used to benchmark progress and guide education and curriculum reform. The study points to the growing dominance of PISA as a powerful policy tool. Findings from the paper add to the body of literature on the different ways in which international assessments are used to guide education policy within national spaces and the role of the OECD as an agent of transnational policy steering.  相似文献   

15.
In large-scale assessment programs such as NAEP, TIMSS and PISA, students' achievement data sets provided for secondary analysts contain so-called plausible values. Plausible values are multiple imputations of the unobservable latent achievement for each student. In this article it has been shown how plausible values are used to: (1) address concerns with bias in the estimation of certain population parameters when point estimates of latent achievement are used to estimate those population parameters; (2) allow secondary data analysts to employ standard techniques and tools (e.g., SPSS, SAS procedures) to analyse achievement data that contains substantial measurement error components; and (3) facilitate the computation of standard errors of estimates when the sample design is complex. The advantages of plausible values have been illustrated by comparing the use of maximum likelihood estimates and plausible values (PV) for estimating a range of population statistics.  相似文献   

16.
ABSTRACT

International large-scale assessments and comparisons (ILSAs) in education have become significant policy phenomena. How a country fares in these assessments has come to signify not only how a nation’s education system is performing, but also its future prospects in a global economic ‘race’. These assessments provoke passionate arguments at specialist conferences and in scholarly journals and they are just as passionately debated in the media. Within academe, ILSAs are researched by sociologists and psychometricians, policy experts and statisticians. This multidisciplinary, multi-voice discussion has not always served to highlight the complexity of the issues involved. Instead, discussions across various groups of actors have often led to a polarisation of views and a hardening of stances. Large-scale comparisons have deeply divided academic opinion with regard to their validity, usefulness and use. The divergence in ontological commitments, methodologies and paradigms of research makes discussions among one set of scholars almost incomprehensible to another. New theories, concepts and vocabularies are urgently required to engage productively with this important phenomenon. Borrowing concepts from Science and Technology Studies (STS) and the history and sociology of numbers, I argue that understanding such comparative exercises as socio-technical assemblages would move the critique of large-scale comparisons in education in more productive directions.  相似文献   

17.
18.
There is widespread concern that assessments which have no direct consequences for students, teachers or schools underestimate student ability, and that the extent of this underestimation increases as the students become ever more familiar with such tests. This issue is particularly relevant for international comparative studies such as the IEA’s Third International Mathematics and Science Study (TIMSS) and the OECD’s Programme for International Student Assessment (PISA). In the present experimental study, a short form of the PISA mathematical literacy test is used to explore whether the levels of test motivation and test performance observed in the context of the standard PISA assessment situation can be improved by raising the stakes of testing. The impact of (1) informational feedback, (2) grading, and (3) performance-contingent financial rewards on the personal value of performing well, perceived utility of participating in the test, intended and invested effort, task-irrelevant cognitions, and test performance are investigated. The central finding of the study is that the different treatment conditions make the various value components of test motivation equally salient. Consequently, no differences were found either with respect to intended and invested effort or to test performance.  相似文献   

19.
This article presents the pseudo-equivalent group approach and discusses how it can enhance the quality of linking in the presence of nonequivalent groups. The pseudo-equivalent group approach allows to achieve pseudo-equivalence using propensity score reweighting techniques. We use it to perform linking to establish scale concordance between two assessments. The article presents Monte-Carlo simulations and a real data application based on data from the Survey of Adult Skills (PIAAC) and the Programme for International Student Assessment (PISA). Monte-Carlo simulations suggest that the pseudo-equivalent group design is particularly useful whenever there is a large overlap across the two groups with respect to balancing variables and when the correlation between such variables and ability is medium or high. The example based on PISA and PIAAC data indicates that the approach can provide reasonable accurate linking that can be used for group-level comparisons.  相似文献   

20.
Ordinal variables are common in many empirical investigations in the social and behavioral sciences. Researchers often apply the maximum likelihood method to fit structural equation models to ordinal data. This assumes that the observed measures have normal distributions, which is not the case when the variables are ordinal. A better approach is to use polychoric correlations and fit the models using methods such as unweighted least squares (ULS), maximum likelihood (ML), weighted least squares (WLS), or diagonally weighted least squares (DWLS). In this simulation evaluation we study the behavior of these methods in combination with polychoric correlations when the models are misspecified. We also study the effect of model size and number of categories on the parameter estimates, their standard errors, and the common chi-square measures of fit when the models are both correct and misspecified. When used routinely, these methods give consistent parameter estimates but ULS, ML, and DWLS give incorrect standard errors. Correct standard errors can be obtained for these methods by robustification using an estimate of the asymptotic covariance matrix W of the polychoric correlations. When used in this way the methods are here called RULS, RML, and RDWLS.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号