首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 593 毫秒
1.
2.
The NTID Writing Test was developed to assess the writing ability of postsecondary deaf students entering the National Technical Institute for the Deaf and to determine their appropriate placement into developmental writing courses. While previous research (Albertini et al., 1986; Albertini et al., 1996; Bochner, Albertini, Samar, & Metz, 1992) has shown the test to be reliable between multiple test raters and as a valid measure of writing ability for placement into these courses, changes in curriculum and the rater pool necessitated a new look at interrater reliability and concurrent validity. We evaluated the rating scores for 236 samples from students who entered the college during the fall 2001. Using a multiprong approach, we confirmed the interrater reliability and the validity of this direct measure of assessment. The implications of continued use of this and similar tests in light of definitions of validity, local control, and the nature of writing are discussed.  相似文献   

3.
The current study examined efficient modes for providing standardized feedback to improve performance on an assignment for a second year college class involving writing a brief research proposal. Two forms of standardized feedback (detailed rubric and proposal exemplars) were utilized is an experimental design with undergraduate students (N = 100) at three urban college campuses. Students completed a draft of a proposal as part of their course requirements and were then randomly assigned to receive a detailed rubric, proposal exemplars, or a rubric and proposal exemplars for use in revising their work. Analyses of students’ writing from first draft to second draft indicated that all three conditions led to improvements in writing that were significant and strong in terms of effect size, with the stand-alone detailed rubric leading to the greatest improvement. Follow-up focus groups with students indicated that a stand-alone rubric potentially engages greater mindfulness on the part of the student. Practical implications are discussed.  相似文献   

4.
The development of a framework for the content and assessment of National Curriculum science in England and Wales, following the 1988 Education Reform Act, is described, with a particular emphasis on assessment at the end of Key Stage 3 (14-year-old pupils). The University of Exeter evaluations of Key Stage 3 science assessments in 1995 and 1996 are outlined and the findings concerning the reliability and validity of the testing are presented. The views of science teachers on the impact of this assessment on teaching and learning are summarised, with particular reference to the structure, delivery and interpretation of the National Curriculum, the setting of pupils, continuity and progression, the preparation of pupils for the tests and teacher assessment.  相似文献   

5.
The Professional Development Inventory, an instrument used in assessment centers sponsored by the National Association of Elementary School Principals, has recently been revised from a two- to a one-day process to reduce the time for those being assessed and for those assessing. This article reports a preliminary study to determine the validity and reliability of the instrument using field-based data from the first 113 candidates assessed. The assessment consists of thirteen skills assessed in twelve different simulated activities that principals might face in their daily duties. When finished, the artifacts produced during the simulations are then analyzed by trained assessors, scores entered into a computer data file, and results returned to the participants for direction in professional development. Preliminary results as reported indicate user satisfaction with the simplified process, construct validity for the thirteen skills, and a high degree of validity and reliability for the instrument.  相似文献   

6.
The last decade has seen changes in the systems used in the summative assessment of musical performance at the end of compulsory schooling. One trend has been the replacement of holistic assessment by segmented assessment. The author discusses the subjectivity, reliability and musical validity of the two systems, and summarises an experiment which investigates the extent to which holistic assessment can be accounted for by means of a segmented marking system. Multiple regression analysis produces a regression equation which accounts for 71% of the variance in holistic marks produced by 29 assessors marking 10 performances. The paper concludes with a consideration of the implications of this study for assessment in musical performance and other fields.  相似文献   

7.
对于全国性测试,经常性的评估是必不可少的。语言测试评估、有效性研究的关键是信度或一致性研究。本研究使用TEM4平行试卷,分别进行信度统计、差异分析。它不仅检验了平行测试之间的一致性问题,还在有差异的情况下,对有差异的测试或题项进行定位。这种定位对以后的测试编制、预测及拼卷将起到积极的作用。  相似文献   

8.
This study was undertaken in order to evaluate the use of students as peer assessors, in collaboration with academic tutors, in the assessment of second-year viva examinations as part of a problem based learning occupational therapy curriculum. Data were collected from three consecutive cohorts of second-year students (N = 93), and an assessment was made of the reliability of the academic tutor marking, and the reliability of peer marking against the tutor marks. Results demonstrated that overall ratings of the viva examination performances given by the panel of assessors (two peer assessors and one academic tutor), were significantly correlated. On some occasions, such as the assessment of a borderline student, the ratings given were not as closely correlated. Some modifications of the examination process are suggested in order to optimise the reliability of the outcomes, but the study results lend support for the practice of peer assessment.  相似文献   

9.
ABSTRACT

As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation. The purpose of this study was to assess the validity, labor costs, and efficiency of comparative judgments as a potential substitute for rubric scoring. An analysis of two essay prompts revealed that comparative judgment measures were comparable to rubric scores at a level similar to that expected of two professional scorers. The comparative judgment measures correlated slightly higher than rubric scores with a multiple-choice writing test. Score reliability exceeding .80 was achieved with approximately nine judgments per response. The average judgment time was 94 seconds, which compared favorably to 119 seconds per rubric score. Practical challenges to future implementation are discussed.  相似文献   

10.
Prior knowledge and exemplar encoding in children's concept acquisition   总被引:1,自引:0,他引:1  
Three experiments examined how children's domain knowledge and observation of exemplars interact during concept acquisition and how exposure to novel exemplars causes revision of such knowledge. In Experiments 1 (N = 126) and 2 (N = 64), children aged 4 to 10 years were shown exemplars of fictitious animal categories that were either unrelated to, or consistent with, their prior knowledge in 25% or 75% of presented exemplars. In Experiment 3, children (N = 290) saw fictitious animal, artifact, or unfamiliar social categories that were either consistent or inconsistent with their prior knowledge in 20%, 40%, 60%, or 80% of exemplars. In the test, children made judgments about the likely co-occurence of features. In all experiments, prior knowledge and exemplar observation independently influenced children's categorization judgments. Utilization of prior knowledge was consistent across age and domain, but 10-year-olds were more sensitive to observed feature covariation. Training with larger categories increased the impact of observed feature covariation and decreased reliance on prior knowledge.  相似文献   

11.
This study investigates agreement between professional assessors and laypersons (participants) in a group procedure that draws from assessment center principles designed to evaluate candidates to teacher-education programs. Earlier studies have established the validity of this assessment procedure and indicated high interrater agreement of professionals. Evidence that participants concur with professional evaluators will further increase our confidence in the process. The study was conducted in Israel and encompassed 159 applicants to two different educational programs. Results showed high correlations between professional and participant ratings, suggesting that the interactional process provides sufficient information for lay assessors to reach judgments that agree with expert evaluations. Nonetheless, the finding that professional ratings were significantly lower than peer and self-evaluations seems to imply that participant assessors can enhance, but by no means replace, professionals. The social and economic benefits of including lay participants in the assessment process are discussed.  相似文献   

12.
13.
《Assessing Writing》2004,9(3):190-207
Specialists in the field of large-scale, high-stakes writing assessment have, over the last forty years alternately discussed the issue of maximizing either reliability or validity in test design. Factors complicating the debate–such as Messick's (1989) expanded definition of validity, and the ethical implications of testing–are explored. An inverse relationship between the loss of reliability and the loss of validity of a test is proffered. The term, Quality, in reference to writing assessment is defined and introduced. Construct complexity is hypothesized as a factor that influences validity, reliability, and quality. It is suggested that the either/or debate concerning emphasis over reliability or validity in test design be put aside in favor of a discussion on how to maximize the quality of an assessment. Insofar as this goal can be achieved, it is necessary in the design of the test to minimize and balance the loss of both validity and reliability. The discussion draws on literature from within the field of writing assessment and from works in the fields of mathematics and information theory.  相似文献   

14.
In writing assessment, the inconsistency of teachers’ scorings is among the frequently reported concerns regarding the validity and the reliability of assessment. The study aimed to find out to what extent participating in a community of assessment practice (CAP) can impact the discrepancies among raters’ scorings. Adopting one group pretest-posttest design, patterns in the teachers’ scoring judgments were explored based on both quantitative and qualitative data. The results indicate significant increase in the degrees of agreement in the teachers’ differential scorings showing changes in their severity tendencies for structural variety, lexical accuracy, organization and mechanics criteria while their scoring judgements on structural accuracy, task achievement, and lexical variety criteria had low levels of agreement.  相似文献   

15.
Feedback is central to pedagogic theory, and if feedback is to be effective, students need to engage with it and apply it at some point in the future. However, student dissatisfaction with feedback – as evidenced in the National Student Survey – suggests that there are problems which limit student engagement with feedback, such as their perception that much of their feedback is irrelevant to future assignments. This article reports on a study which sought to enhance engagement by giving students exemplar assignments annotated with feedback before submission of their final assignments. This was done by providing an online facility where students could view exemplars and post comments or questions to tutors and peers on a discussion board. The exemplar facility was highly valued by students, although there were no quantitative effects such as an increase in students’ assignment marks when compared with the previous cohort. The article reflects on possible reasons for this result and discusses ways to improve the exemplar facility, for example by facilitating dialogue between tutors and students. The article concludes with lessons learned about how to construct exemplars, and considers how exemplars might also be used within marking teams to improve consistency of marking.  相似文献   

16.
Book Reviews     
The increase in litigation by students who are dissatisfied with their assessment, and to a lesser extent the time and monetary costs of student appeals makes it imperative that institutions adopt a robust assessment strategy. The concern of consumers with respect to professional services offered by students after graduation is also an issue. This paper examines issues round the ‘borderline method’ of standard setting, using regression analysis. The paper will present data from two cohorts of students, and will examine the benefits and problems associated with this method. Increasingly academic institutions are being required to improve the validity of assessment processes; often this is at the expense of reliability. The new assessment procedures often have different assessors, have practical aspects which cannot be replicated across the cohort, and therefore raise issues with respect to the robustness of the comparative student grading mechanism. This issue has been particularly important in the field of medical education for a number of years; with medical students in the latter stages of their courses being required to demonstrate competence in a variety of different simulated clinical activities with different patients, in front of different assessors in different hospitals and on different days. This poses serious questions in the robustness of setting the pass–fail boundary and to a lesser extent the honours boundaries.  相似文献   

17.
As current procedures for teacher assessment are often based on non-standardized, qualitative information derived from multiple sources, the overall validity of the assessment depends heavily on the judgement processes of the assessors. Because it is of great importance for assessors to be aware of their own judgement processes and of the possible threats to validity in these processes, investigating assessors’ perceptions is of vital significance. In the present study, the perceptions of 22 assessors who judged a student teacher pair-wise using a specific assessment procedure were explored using semi-structured interviews. A qualitative analysis of the individual assessors’ perceptions with regard to the essential judgement processes of consideration of evidence and combination of evidence to attain an overall judgement resulted in an overview of successful strategies and threats underlying a valid assessment process. General implications for ensuring the validity of the assessment process and the preparation of assessors are discussed.  相似文献   

18.
Categorization theory helps make sense of writing placement. As developed by psychologists, social scientists, and language analysts, current theory suggests that human acts of categorization fall roughly under three models: classical, prototypical, and exemplar. Results of actual holistic scoring, however, show that readers behave as if the categories (scale points) had prototypical structure. In many ways, prototypical and exemplar categorization support a different writing-placement method, one more efficient than the holistic, a two-tiered sequence of quick, blind, and single readings followed by careful, informed, and multiple readings. The procedure is flexible and admits wide variations. Categorization theory itself, and standard research techniques associated with it, can be productive in inquiry into writing assessment.  相似文献   

19.
Vahid Aryadoust 《教育心理学》2016,36(10):1742-1770
This study sought to examine the development of paragraph writing skills of 116 English as a second language university students over the course of 12 weeks and the relationship between the linguistic features of students’ written texts as measured by Coh-Metrix – a computational system for estimating textual features such as cohesion and coherence – and the scores assigned by human raters. The raters’ reliability was investigated using many-facet Rasch measurement (MFRM); the growth of students’ paragraph writing skills was explored using a factor-of-curves latent growth model (LGM); and the relationships between changes in linguistic features and writing scores across time were examined by path modelling. MFRM analysis indicates that despite several misfits, students’ and raters’ performances and scale’s functionality conformed to the expectations of MFRM, thus providing evidence of psychometric validity for the assessments. LGM shows that students’ paragraph writing skills develop steadily during the course. The Coh-Metrix indices have more predictive power before and after the course than during it, suggesting that Coh-Metrix may struggle to discriminate between some ability levels. Whether a Coh-Metrix index gains or loses predictive power over time is argued to be partly a function of whether raters maintain or lose sensitivity to the linguistic feature measured by that index in their own assessment as the course progresses.  相似文献   

20.
We evaluated the efficiency, precision, and concurrent validity of results obtained from adaptive and fired-item music listening tests in three studies: (a) a computer simulation study in which each of 2,200 simulees completed a computerized adaptive tonal memory test, a computerized fired-item tonal memory test constructed from items in the adaptive test pool and two standardized group-administered tonal memory tests; (b) a live testing study in which each of 204 examinees took the computerized adaptive test and the standardized tests; and (c) a live testing study in which randomly equivalent groups took either the computerized adaptive test (n = 86) or the computerized fired-item test (n = 86). The adaptive music test required 50% to 93% fewer items to match the reliability and concurrent validity of the fired-item tests, and it yielded higher levels of reliability and concurrent validity than the fired-item tests when test length was held constant. These findings suggest that computerized adaptive tests, which typically have been limited to visually produced items, may also be well suited for measuring skills that require aurally produced items.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号