首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 65 毫秒
1.
The use of the Experiences of Teaching & Learning Questionnaire (ETLQ) for the evaluation of learning quality in higher education has been expanding during the last decade, thus a review of the instrument’s validity evidence is warranted. The design of the study was a systematic critical literature review. We evaluated the strength of the validity evidence of 17 included studies with a quality appraisal framework reflecting current standards for educational testing. The evidence supporting the central validity assumptions of the ETLQ scales is currently weak to moderate and incomplete. Thus, caution against the uncritical use of ETQL scores for high-stakes educational decisions is warranted. The appraisal framework used was useful for creating an overview of the evidence. However, attention to more general aspects of study quality, and consensus deliberations with three to four raters was also important for sufficiently reliable appraisal of the evidence.  相似文献   

2.
As current procedures for teacher assessment are often based on non-standardized, qualitative information derived from multiple sources, the overall validity of the assessment depends heavily on the judgement processes of the assessors. Because it is of great importance for assessors to be aware of their own judgement processes and of the possible threats to validity in these processes, investigating assessors’ perceptions is of vital significance. In the present study, the perceptions of 22 assessors who judged a student teacher pair-wise using a specific assessment procedure were explored using semi-structured interviews. A qualitative analysis of the individual assessors’ perceptions with regard to the essential judgement processes of consideration of evidence and combination of evidence to attain an overall judgement resulted in an overview of successful strategies and threats underlying a valid assessment process. General implications for ensuring the validity of the assessment process and the preparation of assessors are discussed.  相似文献   

3.
The Chinese Early Childhood Environment Rating Scale (trial) (CECERS) is a new instrument for measuring early childhood program quality in the Chinese socio-cultural contexts, based on substantial adaptation from the Early Childhood Environment Rating Scale-Revised Edition (ECERS-R). This paper describes the development and validation process of CECERS. Empirical data were collected from a stratified random sample 178 classrooms, from which a random sample of 1012 children was measured for child development outcomes. Guided by the framework of broad conceptualization of validity and validation as advocated by Messick (1989), evidence in a variety of forms is presented and discussed, including content validity considerations (e.g., measuring socially and culturally relevant domains), measurement reliability considerations (e.g., internal consistency reliability, inter-rater reliability), and measurement validity considerations (concurrent validity, criterion-related validity, internal structure based on exploratory factor analysis). The empirical findings for CECERS compare very favorably with the validation outcomes of ECERS-R. The body of evidence accumulated in the validation process supports the use and interpretation of CECERS scores as quality indicators of early childhood education program in the Chinese social and cultural contexts. Limitations and future directions are also discussed.  相似文献   

4.
The argument-based approach to validation has been widely adopted in validation theory. However, this approach aims to validate the intended interpretation and use of a single test or assessment. This article proposes an extension of the argument-based approach for validation of multiple tests. This extension is illustrated with the validation of a competency assessment program (CAP). This CAP was validated in collaboration with a quality manager of an educational program. In this case study, it became apparent that this approach fosters an in-depth evaluation of the assessment program and that the approach appears suitable for validation efforts of competency assessment programs. The approach guides validation research from a more general perspective, but also guides more detailed validation efforts.  相似文献   

5.
Peer assessment exercises yield varied reliability and validity. To maximise reliability and validity, the literature recommends adopting various design principles including the use of explicit assessment criteria. Counter to this literature, we report a peer assessment exercise in which criteria were deliberately avoided yet acceptable reliability and validity were achieved. Based on this finding, we make two arguments. First, the comparative judgement approach adopted can be applied successfully in different contexts, including higher education and secondary school. Second, the success was due to this approach; an alternative technique based on absolute judgement yielded poor reliability and validity. We conclude that sound outcomes are achievable without assessment criteria, but success depends on how the peer assessment activity is designed.  相似文献   

6.
This study investigated the psychometric properties of a newly developed measure, the Student Goals and Behaviour Questionnaire (SGBQ). The SGBQ is a unique measure in that it assesses both students' goal setting attitudes and behaviour within a tertiary education context. In other words, the SGBQ measures students' actual, rather than preferred academic goals. To date, no such instrument exists in the psychological and educational literature. The SGBQ, a modified version of Locke & Latham's (1990) Goal Setting Questionnaire (MGSQ), and Wood & Locke's (1987) measure of academic self-efficacy (ASEQ) were administered to 100 first-year Psychology students. With regard to construct validity, predicted moderate levels of convergence were found between the SGBQ factors, and MGSQ and ASEQ. It was also found that the SGBQ had predictive validity with respect to subsequent academic performance. Furthermore, a number of demographic characteristics, such as course of study and age were found to be associated with goal setting attitudes and behaviour. The SGBQ would seem to have some promise as an instrument for the assessment and monitoring of student goal setting.  相似文献   

7.
Perry and Winne (2006) describe their computer program, ‘gStudy’, and argue that it facilitates valid measurement of self-regulated learning (SRL) over time. This commentary addresses the assumptions underlying this argument and raises additional validity questions regarding the use of this tool. These include issues related to the development of SRL in young children, the difficulty in embedding assessments in a learning tool, and the extent to which the log analyzer can separate SRL sequences from other behavior. Finally, the extent to which behavior ‘inside’ gStudy reflects SRL in other contexts is discussed.  相似文献   

8.
效度是评价高考选才有效性的指标。测算高考效度的最佳效标是大学学业总成绩,评价高考效度应当基于它提高高校录取决策质量的程度。实证研究认为,高考有效地发挥了为普通高等学校选择合适生源的作用,高考选才的有效性体现了由"精英"到"大众"转变的特点,就具体科目而言,英语科目选才的有效性最好,而综合科目选才的有效性最差。提高高考效度需要从命题、施考、阅卷、分数使用等因素入手,而制约高考效度提高的实践条件主要有考试理论与技术、考试模式、考试成本、高校招生录取制度及高校内部管理制度、舆论环境、利益的调整。  相似文献   

9.
ABSTRACT

The Teacher Assessment in Primary Science project is funded by the Primary Science Teaching Trust and based at Bath Spa University. The study aims to develop a whole-school model of valid, reliable and manageable teacher assessment to inform practice and make a positive impact on primary-aged children’s learning in science. The model is based on a data-flow ‘pyramid’ (analogous to the flow of energy through an ecosystem), whereby the rich formative assessment evidence gathered in the classroom is summarised for monitoring, reporting and evaluation purposes [Nuffield Foundation. (2012). Developing policy, principles and practice in primary school science assessment. London: Nuffield Foundation]. Using a design-based research (DBR) methodology, the authors worked in collaboration with teachers from project schools and other expert groups to refine, elaborate, validate and operationalise the data-flow ‘pyramid’ model, resulting in the development of a whole-school self-evaluation tool. In this paper, we argue that a DBR approach to theory-building and school improvement drawing upon teacher expertise has led to the identification, adaptation and successful scaling up of a promising approach to school self-evaluation in relation to assessment in science.  相似文献   

10.
This paper is supportive of the judgemental model of assessment posited by Hager & Butler (1996) and is a twofold response. First, it examines the underpinning principles of assessment and testing theory on which their paper is based. Whilst the judgemental model of assessment is not based on the traditional notions of validity and reliability, as in classical test theory, it nevertheless does take account of and is built on appropriate conceptions of these two fundamental principles of any instrument of assessment. It is the concomitant and necessary shift in perceptions of these underlying principles that give support for Hager & Butler's justification for a judgemental model of assessment as that which is appropriate for the assessment of workplace performance. Second, it supports the application of the judgemental model to workplace performance by considering a pertinent area, that of a postgraduate certificate in initial teacher education.  相似文献   

11.
The aim of the current study was to assess the validity of the sex-plus versus sex-only categorization method for distinguishing between different types of adolescent sex offenders (ASOs; Butler & Seto, 2002). It is hypothesized that this categorization method has utility when attempting to distinguish between generalist and specialist ASOs (Seto & Pullman, 2014). Additionally, further classification of ASOs was attempted using a well known juvenile delinquency classification scheme, early-onset versus late-onset offenders (Moffitt, 1993). The current study was an archival analysis of clinical files from a sample of 158 male ASOs seen for clinical assessment at a Metropolitan Family Court Clinic. Results indicate that sex-plus offenders are more antisocial, exhibit more psychiatric issues, and have greater deficits in general social skills compared to sex-only offenders. Conversely, sex-only offenders were found to have more atypical sexual interests, and were more likely to have greater deficits in romantic relationships compared to sex-plus offenders. Due to a power related limitation, little support was found for the use of the early-onset versus late-onset classification scheme with ASOs. Overall, these results provide further support to the validity of a sex-only versus sex-plus distinction. Given these results mirror those found in the generalist/specialist literature regarding the etiology of ASOs, sex-only and sex-plus offenders may indeed have different etiological pathways: sex-plus offenders are more driven by general antisociality factors, as the generalist perspective suggests, and sex-only offenders are more driven by special factors, as the specialist explanations suggest.  相似文献   

12.
The richness and complexity of video portfolios endanger both the reliability and validity of the assessment of teacher competencies. In a post-graduate teacher education program, the assessment of video portfolios was evaluated for its reliability, construct validity, and consequential validity. Although video portfolio facilitated a reliable and valid assessment of teacher competencies, procedures to improve assessment quality were also revealed and are therefore discussed: more explicit grounding of assessment results in the data, peer debriefing, prolonged engagement with the assessment data, cross-checking to find confirmatory or counter examples.  相似文献   

13.
With the rising number of Latino and dual language learner (DLL) children attending pre-k and the importance of assessing the quality of their experiences in those settings, this study examined the extent to which a commonly used assessment of teacher-child interactions, the Classroom Assessment Scoring System (CLASS), demonstrated similar psychometric properties in classrooms serving ethnically and linguistically diverse children as it does in other classrooms. Specifically, this study investigated: (1) whether CLASS observations of teacher-child interactions are organized in three domains across classrooms with varying ethnic and language compositions (measurement invariance) and (2) the extent to which CLASS-assessed teacher-child interactions (emotional support, classroom organization, and instructional support) predict children's social, math, and literacy outcomes equally well for Latino and DLL children (predictive validity). CLASS observations of teacher-child interactions were conducted in 721 state-funded pre-k classrooms across 11 states. Direct assessments and teacher ratings of social, math, and literacy outcomes were collected for four randomly selected children in each classroom. CLASS observations factored similarly across pre-k classrooms with different Latino and DLL compositions and predicted improvements in school readiness regardless of a child's Latino or DLL status. Results suggest CLASS functions equally well as an assessment of the quality of teacher-child interactions in pre-k settings regardless of the proportion of Latino children and/or the language diversity of the children in that setting.  相似文献   

14.
The Family Day Care Rating Scale (FDCRS; Harms & Clifford, 1989) was developed in the USA. The scale attempts to define quality in family day care for pre‐schoolers and to provide a standardised way of assessing it. The FDCRS has been shown to have good reliability and validity in the USA and Canada. However, the rating scale has never been used to assess quality of family day care in the UK. This paper describes data collected from 104 family day care providers (childminders) in England using the FDCRS. It reports on the internal consistency of the FDCRS subscales and the validity of aggregating the subscales to derive one single measure of quality. Analyses suggest that the scale may provide a useful global measure of quality. However, not all subscales yielded good internal consistency. Recommendations are made as to how the FDCRS could be modified for effective use as a research tool in England.  相似文献   

15.
Number of raters is theoretically central to peer assessment reliability and validity, yet rarely studied. Further, requiring each student to assess more peers’ documents both increases the number of evaluations per document but also assessor workload, which can decline performance. Moreover, task complexity is likely a moderating factor, influencing both workload and validity. This study examined whether changing the number of required peer assessments per student / number of raters per document affected peer assessment reliability and validity for tasks at different levels of task complexity. 181 students completed and provided peer assessments for tasks at three levels of task complexity: low complexity (dictation), medium complexity (oral imitation), and high complexity (writing). Adequate validity of peer assessments was observed for all three task complexities at low reviewing loads. However, the impacts of increasing reviewing load varied by reliability vs. validity outcomes and by task complexity.  相似文献   

16.
Abstract

Background: International large-scale assessments (ILSAs) are a much-debated phenomenon in education. Increasingly, their outcomes attract considerable media attention and influence educational policies in many jurisdictions worldwide. The relevance, uses and consequences of these assessments are often the focus of research scrutiny. Whilst some argue that the assessment outcomes provide an effective basis for informed policy-making, critics claim that the use of international assessment data can result in a range of unintended consequences, such as the shaping and governing of school systems ‘by numbers’.

Purpose: This article explores and analyses the arguments about the uses and consequences of ILSAs. In particular, the discourse about the assessments’ consequential validity will be discussed and evaluated.

Sources of evidence: Literature relating to the uses and consequences of large-scale assessment was analysed, with a focus on research on the consequential aspects of validity.

Main argument: Much research suggests that ILSAs have unintended consequences that affect and influence educational policy. However, the influences on educational policy are complex and interwoven: for example, it is not clear-cut whether effects such as converging curricular are, necessarily, direct consequences of large-scale assessments. Further, it is suggested that a beneficial consequence of large-scale assessment is the infrastructure they provide for studies in the social sciences, although caution must be applied to causal claims, in particular because of the cross-sectional design of the assessments.

Conclusions: The considerable literature discussing the uses and consequences of large-scale assessments tends to point out potential negative aspects of the studies. However, it is also apparent that large-scale international assessments can be a valuable resource for studying global trends and evolving systems in education. Despite the extensive debates around large-scale assessment outcomes both in the media and in educational policy arenas, empirical educational research all too often appears underused in the discussion.  相似文献   

17.
Interim tests are a central component of district-wide assessment systems, yet their technical quality to guide decisions (e.g., instructional) has been repeatedly questioned. In response, the study purpose was to investigate the validity of a series of English Language Arts (ELA) interim assessments in terms of dimensionality and prediction of summative test performance, based on Grade 6 student data (N = 4,651) from a larger, urban district. Factor analytic results supported modeling the interim test data in terms of a bifactor model (Gibbons & Hedeker, 1992), with items reporting moderate to high relationships to the primary dimension (i.e., ELA) and varying estimates on the secondary domains. Hierarchical multiple linear regression results indicated that primary ELA scores were the strongest predictors of summative test performance, with subscale scores not improving predictive accuracy. Findings address issues pertaining to investigating the technical quality of test data widely used in district-wide assessment systems.  相似文献   

18.
《Assessing Writing》2008,13(3):153-170
Despite the debate among writing researchers about its viability as a pedagogical tool in writing instruction [e.g., Helms-Park, R., & Stapleton, P. (2003). Questioning the importance of individualized voice in undergraduate L2 argumentative writing: An empirical study with pedagogical implications. Journal of Second Language Writing, 12 (3), 245–265; Stapleton, P. (2002). Critiquing voice as a viable pedagogical tool in L2 writing: Returning spotlight to ideas. Journal of Second Language Writing, 11 (3), 177–190], voice remains one of the constructs commonly addressed in learning standards and assessed in high-stakes English Language Arts tests. It is assumed, therefore, that the presence of a strong authorial voice plays an important role in the evaluation of the overall quality of students’ writing. In reality, however, there is a critical lack of empirical research that explores the nature and characteristics of the relationship between voice and overall writing quality. The present study builds on and extends the work of Helms-Park and Stapleton [Helms-Park, R., & Stapleton, P. (2003). Questioning the importance of individualized voice in undergraduate L2 argumentative writing: An empirical study with pedagogical implications. Journal of Second Language Writing, 12 (3), 245–265] and examines such a relationship in the context of an L1 high-stakes academic writing assessment. Results show a positive and significant relationship between voice intensity and writing quality, which contradicts what Helms-Park and Stapleton [Helms-Park, R., & Stapleton, P. (2003). Questioning the importance of individualized voice in undergraduate L2 argumentative writing: An empirical study with pedagogical implications. Journal of Second Language Writing, 12 (3), 245–265] found in the context of L2 argumentative writing. This study therefore contributes to the exploration of the role of voice in writing instruction and assessment.  相似文献   

19.
This case study provides evidence-based suggestions for the use of Question and Answer discussion forums for improving quality and assessment of online learning. General online discussion forums are accessible at any time to all subscribers, making it possible for some learners to update, concur with or paraphrase discussions posted earlier by their peers or the tutors. Consequently, the usefulness of such forums in individual and constructivist learning is compromised, especially when ‘correct’ responses are posted early on by participants. The Question and Answer (Q & A) version of discussion forums significantly addresses such inadequacies by restricting access to forum subscribers until they have made a post. We focus on Public Health learners’ perceptions of Q & A discussion forums implemented at Hamdan Bin Mohammed Smart University, UAE. Analyses of learners’ perception surveys of 25 participating Master of Public Health learners and 8 Bachelor of Health Administration learners reveal that the Q & A discussion forum platform offers distinct advantages over general discussion forums in synergising individual and co-operative learning in Public Health training.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号