首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
An Approach for Evaluating the Technical Quality of Interim Assessments   总被引:1,自引:1,他引:0  
Increasing numbers of schools and districts have expressed interest in interim assessment systems to prepare for summative assessments and to improve teaching and learning. However, with so many commercial interim assessments available, schools and districts are struggling to determine which interim assessment is most appropriate to their needs. Unfortunately, there is little research-based guidance to help schools and districts to make the right choice about how to spend their money. Because we realize the urgency of developing criteria that can describe or evaluate the quality of interim assessments, this article presents the results of an initial attempt to create an instrument that school and district educators could use to evaluate the quality and usefulness of the interim assessment. The instrument is designed for use by state and district leaders to help them select an appropriate interim assessment system for their needs, but it could also be used by test vendors looking to evaluate and improve their own systems and by researchers engaged in studies of interim assessment use.  相似文献   

2.
This study examined the role of formative and summative assessment in instructional video games on student learning and engagement. A 2 (formative feedback: present vs absent) × 2 (summative feedback: present vs absent) factorial design with an offset control (recorded lecture) was conducted to explore the impacts of assessment in video games. A total of 172 undergraduates were randomly assigned to one of four instructional video game conditions or the control. Results found that knowledge significantly increased from the pretest for players in all game conditions. Participants in summative assessment conditions learned more than players without summative assessment. In terms of engagement outcomes, formative assessment conditions did not significantly produce better learning engagement outcomes than conditions without formative assessment. However, summative assessment conditions were associated with higher temporal disassociation than non-summative conditions. Implications for future instructional video game development and testing are discussed in the paper.  相似文献   

3.
Understanding how situational features of assessment tasks impact reasoning is important for many educational pursuits, notably the selection of curricular examples to illustrate phenomena, the design of formative and summative assessment items, and determination of whether instruction has fostered the development of abstract schemas divorced from particular instances. The goal of our study was to employ an experimental research design to quantify the degree to which situational features impact inferences about participants’ understanding of Mendelian genetics. Two participant samples from different educational levels and cultural backgrounds (high school, n = 480; university, n = 444; Germany and USA) were used to test for context effects. A multi-matrix test design was employed, and item packets differing in situational features (e.g., plant, animal, human, fictitious) were randomly distributed to participants in the two samples. Rasch analyses of participant scores from both samples produced good item fit, person reliability, and item reliability and indicated that the university sample displayed stronger performance on the items compared to the high school sample. We found, surprisingly, that in both samples, no significant differences in performance occurred among the animal, plant, and human item contexts, or between the fictitious and “real” item contexts. In the university sample, we were also able to test for differences in performance between genders, among ethnic groups, and by prior biology coursework. None of these factors had a meaningful impact upon performance or context effects. Thus some, but not all, types of genetics problem solving or item formats are impacted by situational features.  相似文献   

4.
This study examined the effectiveness and influence on validity of a computer-based pop-up English glossary accommodation for English learners (ELs) in grades 3 and 7. In a randomized controlled trial, we administered pop-up English glossaries with audio to students taking a statewide accountability English language arts (ELA) and mathematics assessments. As is typically found, EL students exhibited lower achievement scores than non-EL students in all portions of the test. The pop-up glossaries provided inconsistent benefit for EL students. There was some evidence that the pop-up English glossaries had a minimal inhibitory effect for 3rd-grade students on both the ELA and mathematics assessment. Furthermore, 7th-grade ELs also showed slightly inhibited performance when using the pop-up glossary on the mathematics assessment. However, 7th-grade EL students had a performance benefit when using the pop-up glossary on the ELA assessment. We discuss how increased cognitive load placed on younger students may play a role in diminishing performance when using pop-up glossaries. We explore potential explanations for the difference outcomes between mathematics and ELA in grade 7.  相似文献   

5.
Local assessment systems are being marketed as formative, benchmark, predictive, and a host of other terms. Many so-called formative assessments are not at all similar to the types of assessments and strategies studied by   Black and Wiliam (1998)   but instead are interim assessments. In this article, we clarify the definition and uses of interim assessments and argue that they can be an important piece of a comprehensive assessment system that includes formative, interim, and summative assessments. Interim assessments are given on a larger scale than formative assessments, have less flexibility, and are aggregated to the school or district level to help inform policy. Interim assessments are driven by their purpose, which fall into the categories of instructional, evaluative, or predictive. Our intent is to provide a specific definition for these "interim assessments" and to develop a framework that district and state leaders can use to evaluate these systems for purchase or development. The discussion lays out some concerns with the current state of these assessments as well as hopes for future directions and suggestions for further research.  相似文献   

6.
Peer assessment is often used for formative learning, but few studies have examined the validity of group-based peer assessment for the summative evaluation of course assignments. The present study contributes to the literature by using online technology (the course management system Moodle?) to implement structured, summative peer review based on an anchored rubric in an ecological statistics course taught to graduate students. We found that grade discrepancies between students and the instructor were fairly common (60% of assignments), relatively low in value (mean = 3.3 ± 2.5% on assignments that had discrepancies) and proportionally higher for criteria related to interpretation of statistical results and code quality and organisation than for criteria related to the successful completion of analysis or instructional tasks (e.g. fitting particular statistical methods, de-identification of one’s submission). Students reported that the peer assessment process increased their exposure to alternative ways of approaching statistical and computational problem-solving, but there were concerns raised about the fairness of the process and the effectiveness of the group component. We conclude with some recommendations for implementing peer assessment to maximise student learning and satisfaction.  相似文献   

7.
Abstract

Teachers require specialised assessment knowledge and skills in order to effectively assess student learning. These knowledge and skills develop over time through ongoing teacher learning and experiences. The first part of this paper presents a Summative Assessment Literacy Rubric (SALRubric) constructed to track the development of secondary science teachers’ summative assessment literacy. The analytic rubric consists of 10 dimensions spread across three categories drawn from the literature and context-specific empirical evidence: knowledge of assessment, understanding the context for assessment, and recognising the impact of assessment. The second part of this paper applies the SALRubric in a case study to explore the development of summative assessment literacy of New Zealand secondary science pre-service and novice teachers. An increasing sophistication in these teachers’ summative assessment literacy was evident over 20 months albeit in a nuanced manner for individual teachers. The rubric was a very useful tool for evaluating and documenting shifts in teachers’ summative assessment literacy over time. Implications of the use of SALRubric are discussed in terms of summative assessment literacy practice and development.  相似文献   

8.
ABSTRACT

The authors address the reliability of scores obtained on the summative performance assessments during the pilot year of our research. Contrary to classical test theory, we discussed the advantages of using generalizability theory for estimating reliability of scores for summative performance assessments. Generalizability theory was used as the framework because of the flexibility this approach provides for examining sources of inconsistency within a complex assessment. Two major sources of inconsistency on scores considered in this study were raters and agencies (teachers' rating vs. researchers' rating). Overall, results showed that the inconsistency in scores attributable to raters and agencies was relatively small. Suggestions regarding improvement of consistency in the subsequent years of our research were provided.  相似文献   

9.
《教育实用测度》2013,26(3):281-299
The growing use of computers for test delivery, along with increased interest in performance assessments, has motivated test developers to develop automated systems for scoring complex constructed-response assessment formats. In this article, we add to the available information describing the performance of such automated scoring systems by reporting on generalizability analyses of expert ratings and computer-produced scores for a computer-delivered performance assessment of physicians' patient management skills. Two different automated scoring systems were examined. These automated systems produced scores that were approximately as generalizable as those produced by expert raters. Additional analyses also suggested that the traits assessed by the expert raters and the automated scoring systems were highly related (i.e., true correlations between test forms, across scoring methods, were approximately 1.0). In the appendix, we discuss methods for estimating this correlation, using ratings and scores produced by an automated system from a single test form.  相似文献   

10.
This chapter addresses the complex issues of a district-wide school information system design and development in Hong Kong. The SAMS (School Administration & Management System) is being implemented by the government in about 1,500 schools. The focus of the chapter is on the system design and development strategy and the related outcome. Evidence that a design and development strategy without genuine user participation produces a low quality system and low user acceptance is provided. Besides technical issues there are political factors that need to be considered in designing and developing a district-wide school information system. Moreover, an officially intended design and development strategy is one thing; the actual strategy is another. Furthermore, the SAMS project shows that high quality systems cannot be developed using a strategy that ignores a detailed study of school organizational information requirements.  相似文献   

11.
This article focuses on timed tests and specifically on whether increased time enhances test performance. Three courses during the Winter 2015 term (quizzes n = 573) and three courses over the Spring 2015 term (quizzes n = 600) comprised this sample. Students were given the same tests, but the experimental group (Spring 2015) was given 50% more time than the control group. The results indicate that more time on tests did not enhance student performance in terms of higher scores.

Much attention has been given to student assessment of learning in the online classroom. One such method of measurement is online tests, quizzes, and exams. The focus of this research is to determine whether test scores would improve if students were allowed more time on tests.  相似文献   


12.
基于网络的形成性考核与终结性考试研究   总被引:2,自引:0,他引:2  
中央广播电视大学基于网络的课程考核,从2005年试点以来,规模不断扩大,其所采用的形成性考核与终结性考试一体化设计,考核方法、效果与以往的传统考核存在一定差异。我们采用线性回归的方法研究了形成性考核和终结性考试成绩之间的关系,发现基于网络的形成性考核和终结性考试成绩之间存在正相关关系,并且与传统的考核方式有着相同的规律性和有效性。  相似文献   

13.
Abstract

Every Classroom, Every Day (ECED) is a set of instructional improvement interventions designed to increase student achievement in math and English/language arts (ELA). ECED includes three primary components: (a) systematic classroom observations by school leaders, (b) intensive professional development and support for math teachers and instructional leaders to reorganize math instruction, assessment, and grading around mastery of benchmarks, and (c) a structured literacy curriculum that supplements traditional English courses, with accompanying professional development and support for teachers surrounding its use. The present study is a two-year trial, conducted by independent researchers, which employed a school-randomized design and included 20 high schools (10 treatment; 10 control) in five districts in four states. The students were ethnically diverse and most were eligible for free or reduced-price lunch. Results provided evidence that ECED improved scores on standardized tests of math achievement, but not standardized tests of ELA achievement. Findings are discussed in terms of differences between math and ELA and of implications for future large-scale school-randomized trials.  相似文献   

14.
This longitudinal study examines the relationship between students' knowledge-in-use performance and their performance on third-party designed summative tests within a coherent and equitable learning environment. Focusing on third-grade students across three consecutive project-based learning (PBL) units aligned with the Next Generation Science Standards (NGSS), the study includes 1067 participants from 23 schools in a Great Lakes state. Two-level hierarchical linear modeling estimates the effects of post-unit assessments on end-of-year summative tests. Results indicate that post-unit assessment performances predict NGSS-aligned summative test performance. Students experiencing more PBL units demonstrate greater gains on the summative test, with predictions not favoring students from diverse backgrounds. This study underscores the importance of coherence, equity, and the PBL approach in promoting knowledge-in-use and science achievement. A systematically coherent PBL environment across multiple units facilitates the development of students' knowledge-in-use, highlighting the significance of designing science and engineering practices (SEPs) and crosscutting concepts coherently and progressively, with intentional revisitation of disciplinary core ideas (DCIs). The study also investigates how the PBL approach fosters equitable learning environments for diverse demographic groups, offering equitable opportunities through equity-oriented design. Contributions include a coherent assessment system that tracks and supports learning aligned with NGSS, emphasizing the predictive power of post-unit assessments, continuous monitoring and tracking. The implications of context similarity and optimal performance expectations within units are discussed. Findings inform educators, administrators, and policymakers about the benefits of NGSS-aligned PBL systems and the need for coherent and equitable learning and assessment systems supporting knowledge-in-use development and equitable opportunities for all learners.  相似文献   

15.
Teaching a large class can present real challenges in design, management and standardisation of assessment practices. One of the main dilemmas for university teachers is how to implement effective formative assessment practices with accompanying high-quality feedback consistently over time with large classroom groups. This article reports on how elements of formative practices can be implemented as part of summative assessment in very large undergraduate cohorts (n = 1500 in one semester), studying in different modes (on- and off-campus), with multiple markers, and under common cost and time constraints. Design features implemented include the use of exemplars, rubrics and audio feedback. The article draws on the reflections of the leading teacher, and argues that, for summative assessment to benefit learners, it should contain formative assessment elements. The teaching practices utilised in the case study provide some means to resolve the tensions between formative assessment and summative assessment that may be more generally applicable.  相似文献   

16.
《师资教育杂志》2012,38(1):61-75
Teachers' thinking about four conceptions of teaching (i.e., apprenticeship‐developmental, nurturing, social reform, and transmission) were captured using the Teaching Perspectives Inventory (TPI). New Zealand and Queensland have very similar teaching‐related policies and practices but differences around assessment policies and practices are expected to influence teachers' conceptions of teaching. Results from two surveys (New Zealand primary (n = 241) and Queensland primary (n = 784) and secondary (n = 614) teachers) found acceptably fitting models. TPI models were not invariant between primary and secondary teachers in Queensland while the models for primary teachers in Queensland and New Zealand were partially invariant. There were only small differences in mean perspectives scores, except for transmission, which elicited large differences.  相似文献   

17.
ABSTRACT

We examine how teacher leaders (TLs), working in a low-income urban elementary school, supported their colleagues to learn how to collect quality formative data and to discuss it in collaborative conversations in order to make their students’ learning visible. The TLs faced challenges reflecting consequences resulting from the district’s high stakes accountability policies restricting teachers’ agency with instructional decision-making and limiting their definitions of data as summative test scores. We document how the TLs worked to reframe teachers’ understanding of data to include evidence of student thinking and supported their colleagues to reclaim teaching as professional versus technical work.  相似文献   

18.
This study compared short-form constructed responses evaluated by both human raters and machine scoring algorithms. The context was a public competition on which both public competitors and commercial vendors vied to develop machine scoring algorithms that would match or exceed the performance of operational human raters in a summative high-stakes testing environment. Data (N = 25,683) were drawn from three different states, employed 10 different prompts, and were drawn from two different secondary grade levels. Samples ranging in size from 2,130 to 2,999 were randomly selected from the data sets provided by the states and then randomly divided into three sets: a training set, a test set, and a validation set. Machine performance on all of the agreement measures failed to match that of the human raters. The current study concluded with recommendations on steps that might improve machine-scoring algorithms before they can be used in any operational way.  相似文献   

19.
To the author’s knowledge, this is the first Australian study to empirically compare the use of a multiple-choice questionnaire (MCQ) with the use of a written assignment for interim, summative law school assessment. This study also surveyed the same student sample as to what types of assessments are preferred and why. In total, 182 undergraduate property law students participated in this study. Results showed that scores for the MCQ (assessing five topics) and assignment (assessing one topic) followed a similar distribution. This indicates that an MCQ does not necessarily skew students towards higher grades than an assignment. Results also showed significant but low correlations of test scores across instruments. When asked which instrument best assessed their knowledge of property law, students expressed a strong preference for an assignment over an MCQ or examination. Comments revealed a strong belief that, because lawyers write, law schools must assess legal writing – a skill not captured by MCQs. This study is important as many Australian law schools face increasing marking loads due to higher student numbers and compulsory mid-term assessments. This article endorses the use of MCQs but only as part of a diverse suite of law school assessment.  相似文献   

20.
在“双减”政策背景下,中小学既要大力压减考试次数,严格控制作业数量,又要确保教育教学的高质量发展,提供令家长满意的高品质教育,所以必须加大学生学业评价改革的探索力度。一些可行的改革策略主要有:去繁就简,简化评价设计,提高评价效率;改进结果评价,深化命题改革,探索表现性评价;强化过程评价,评学结合,以评促学。有了学业评价改革的配套支撑,轻负高质的教育才有望真正实现。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号