期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

曹亦薇杨晨《考试研究》2007,(1)

本研究是第一个使用潜语义分析技术对汉语作文进行计算机自动评分的研究。首先对202篇高中作文进行人工评分,两位评分员评分相关为0.62。然后使用潜语义分析评价作文,先得到内容分数,其和人工评价内容分的相关达到了0.47;再使用该内容分对总分进行回归,回归方程的决定系数为0.3,回归所得总分和人工评价总分的相关达到了0.55。本研究表明,潜语义分析技术在汉语作文自动评分中起着重要作用。进一步的研究需要寻找更多的指标,并辅以其他方法来提高评分效果。相似文献

2.

Comparing Human and Automated Essay Scoring for Prospective Graduate Students With Learning Disabilities and/or ADHD

Heather Buzick Maria Elena Oliveri Yigal Attali Michael Flor 《教育实用测度》2013,26(3):161-172

ABSTRACT

Automated essay scoring is a developing technology that can provide efficient scoring of large numbers of written responses. Its use in higher education admissions testing provides an opportunity to collect validity and fairness evidence to support current uses and inform its emergence in other areas such as K–12 large-scale assessment. In this study, human and automated scores on essays written by college students with and without learning disabilities and/or attention deficit hyperactivity disorder were compared, using a nationwide (U.S.) sample of prospective graduate students taking the revised Graduate Record Examination. The findings are that, on average, human raters and the automated scoring engine assigned similar essay scores for all groups, despite average differences among groups with respect to essay length and spelling errors. 相似文献

3.

Appraising the scoring performance of automated essay scoring systems—Some additional considerations: Which essays? Which human raters? Which scores?

Kevin Raczynski Allan Cohen 《教育实用测度》2018,31(3):233-240

ABSTRACT

The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are used to calibrate and test AES systems; (b) which human raters provided the scores on these essays; and (c) given that multiple human raters are generally used for this purpose, which human scores should ultimately be used when there are score disagreements? This article provides commentary on the first two questions and an empirical investigation into the third question. The authors suggest that addressing these three questions strengthens the scoring component of the validity argument for any assessment that includes AES scoring. 相似文献

4.

基于自动作文评分系统的英语写作教学模式研究

沈志法《浙江教育学院学报》2011,(4):7-11,19

自动作文评分系统的技术优势为英语写作教学模式的创新改革提供一个良好的平台。本研究对基于自动作文评分系统的英语写作教学模式进行了设计与教学实践,包括写前阶段、初稿和同伴互评阶段、修改和自动评阋阶段、课堂讲评和定稿阶段的设计。为期一年的写作教学实验表明：新的写作教学模式督促学生写,保持写作的频率,激发学生的写作兴趣,培养学生自主写作能力,提高学生英语写作水平。相似文献

5.

Statistically Comparing the Performance of Multiple Automated Raters Across Multiple Items

Vincent Kieftenbeld Michelle Boyer 《教育实用测度》2017,30(2):117-128

Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to random sampling of items and/or responses in the validation sets. Any statistical hypothesis test of the differences in rankings needs to be appropriate for use with rater statistics and adjust for multiple comparisons. This study considered different statistical methods to evaluate differences in performance across multiple raters and items. These methods are illustrated leveraging data from the 2012 Automated Scoring Assessment Prize competitions. Using average rankings to test for significant differences in performance between automated and human raters, findings show that most automated raters did not perform statistically significantly different from human-to-human inter-rater agreement for essays but they did perform differently on short-answer items. Differences in average rankings between most automated raters were not statistically significant, even when their observed performance differed substantially. 相似文献

6.

自动作文评价系统的发展与研究综述——从PEG到Style Writer

樊军陈静《宜宾学院学报》2011,11(4):83-86

人机评阅作文的关系探讨促进了自动作文评价系统研究开始从自动作文评分转向计算机辅助写作。大量计算机辅助写作工具软件的引入和应用使得自动作文评价系统在写作教学中越来越充满活力和希望。相似文献

7.

Using automated analysis to assess middle school students' competence with scientific argumentation

Christopher D. Wilson Kevin C. Haudek Jonathan F. Osborne Zoë E. Buck Bracey Tina Cheuk Brian M. Donovan Molly A. M. Stuhlsatz Marisol M. Santiago Xiaoming Zhai 《科学教学研究杂志》2024,61(1):38-69

Argumentation is fundamental to science education, both as a prominent feature of scientific reasoning and as an effective mode of learning—a perspective reflected in contemporary frameworks and standards. The successful implementation of argumentation in school science, however, requires a paradigm shift in science assessment from the measurement of knowledge and understanding to the measurement of performance and knowledge in use. Performance tasks requiring argumentation must capture the many ways students can construct and evaluate arguments in science, yet such tasks are both expensive and resource-intensive to score. In this study we explore how machine learning text classification techniques can be applied to develop efficient, valid, and accurate constructed-response measures of students' competency with written scientific argumentation that are aligned with a validated argumentation learning progression. Data come from 933 middle school students in the San Francisco Bay Area and are based on three sets of argumentation items in three different science contexts. The findings demonstrate that we have been able to develop computer scoring models that can achieve substantial to almost perfect agreement between human-assigned and computer-predicted scores. Model performance was slightly weaker for harder items targeting higher levels of the learning progression, largely due to the linguistic complexity of these responses and the sparsity of higher-level responses in the training data set. Comparing the efficacy of different scoring approaches revealed that breaking down students' arguments into multiple components (e.g., the presence of an accurate claim or providing sufficient evidence), developing computer models for each component, and combining scores from these analytic components into a holistic score produced better results than holistic scoring approaches. However, this analytical approach was found to be differentially biased when scoring responses from English learners (EL) students as compared to responses from non-EL students on some items. Differences in the severity between human and computer scores for EL between these approaches are explored, and potential sources of bias in automated scoring are discussed. 相似文献

8.

Autoscoring Essays Based on Complex Networks

Xiaohua Ke Yongqiang Zeng Haijiao Luo 《Journal of Educational Measurement》2016,53(4):478-497

This article presents a novel method, the Complex Dynamics Essay Scorer (CDES), for automated essay scoring using complex network features. Texts produced by college students in China were represented as scale‐free networks (e.g., a word adjacency model) from which typical network features, such as the in‐/out‐degrees, clustering coefficient (CC), and dynamic networks, were obtained. The CDES integrates the classical concepts of network feature representation and essay score series variation. Several experiments indicated that the network measures different essay qualities and can be clearly demonstrated to develop complex networks for autoscoring tasks. The average agreement of the CDES and human rater scores was 86.5%, and the average Pearson correlation was .77. The results indicate that the CDES produced functional complex systems and autoscored Chinese essays in a method consistent with human raters. Our research suggests potential applications in other areas of educational assessment. 相似文献

9.

Investigating the impact of automated feedback on students’ scientific argumentation

Mengxiao Zhu Hee-Sun Lee Ting Wang Ou Lydia Liu Vinetha Belur Amy Pallant 《International Journal of Science Education》2013,35(12):1648-1668

ABSTRACT

This study investigates the role of automated scoring and feedback in supporting students’ construction of written scientific arguments while learning about factors that affect climate change in the classroom. The automated scoring and feedback technology was integrated into an online module. Students’ written scientific argumentation occurred when they responded to structured argumentation prompts. After submitting the open-ended responses, students received scores generated by a scoring engine and written feedback associated with the scores in real-time. Using the log data that recorded argumentation scores as well as argument submission and revisions activities, we answer three research questions. First, how students behaved after receiving the feedback; second, whether and how students’ revisions improved their argumentation scores; and third, did item difficulties shift with the availability of the automated feedback. Results showed that the majority of students (77%) made revisions after receiving the feedback, and students with higher initial scores were more likely to revise their responses. Students who revised had significantly higher final scores than those who did not, and each revision was associated with an average increase of 0.55 on the final scores. Analysis on item difficulty shifts showed that written scientific argumentation became easier after students used the automated feedback. 相似文献

10.

MARC: A Thought Experiment in the Morality of Automated Marking of English

Victoria Elliott 《Changing English: An International Journal of English Teaching》2014,21(4):393-401

Automated essay scoring programs are becoming more common and more technically advanced. They provoke strong reactions from both their advocates and their detractors. Arguments tend to fall into two categories: technical and principled. This paper argues that since technical difficulties will be overcome with time, the debate ought to be held in terms of the principles. A thought experiment, based on a technically perfect Automated Essay Scorer, is proposed in order to explore the moral questions related to this topic, such as whether students deserve to have their work read by a human. It concludes that affect is an important component both of writing and of the debate, but that if the move to automated scoring stops being an ‘all or nothing’ debate, then many of the objections on principle will be obviated. 相似文献

11.

Academic essay writing as imitative problem solving: examples from distance learning

Sydney Ian Robertson 《Assessment & Evaluation in Higher Education》2014,39(3):263-274

Students in tertiary education are often faced with the prospect of writing an essay on a topic they know nothing about in advance. In distance learning institutions, essays are a common method of assessment in the UK, and specified course texts remain the main sources of information the students have. How do students use a source text to construct an essay? The present paper presents a methodology for mapping the source text on to the finished student essay. The underlying assumption is that students are using a form of imitative problem solving when faced with the complex task of writing an essay. Twenty-two essays written by Open University students in the UK, based on three different questions, were analysed on the basis of the order in which novel concepts were introduced and the extent to which this order mirrored that of the source textbook. Correlations were then carried out between the structure of the essay, the structure of the source text and the eventual grade awarded. The average correlation for all three essays and source texts was 0.8, with some individual essays having a correlation of 0.98, demonstrating that the students were closely imitating the argument structure of the source text. 相似文献

12.

Individual differences in undergraduate essay-writing strategies: A longitudinal study

Mark Torrance Glyn V. Thomas Elizabeth J. Robinson 《Higher Education》2000,39(2):181-200

Analysis of questionnaire responses describing thewriting processes associated with a total of 715essays (term papers) produced by undergraduatepsychology students identified four distinct patternsof writing behaviour: a minimal-drafting strategywhich typically involved the production of one or atmost two drafts; an outline-and-develop strategy whichentailed content development both prior to and duringdrafting; a detailed-planning strategy which involvedthe use of content-development methods (mindmapping,brainstorming or rough drafting) in addition tooutlining, and a ``think-then-do' strategy which,unlike the other three strategies, did not involve theproduction of a written outline. The minimal-draftingand outline-and-develop strategies appeared to producethe poorest results, with the latter being more timeconsuming. The detailed-planning and ``think-then-do'strategies both appeared to result in better qualityessays, although differences were small. We analysedthe writing strategies for a subset of these essaysproduced by a cohort of 48 students followed throughthe three years of their degree course. We found someevidence of within-student consistency in strategy usewith on average two out of every three of a student'sessays being written using the same type of strategy.There was no evidence of systematic change in writingstrategy from year to year. 相似文献

13.

基于AES的大学英语写作教学模式研究

赵慧唐建敏《教育技术导刊》2019,18(11):168-171

英语写作能力培养一直是大学英语教学的重点和难点,目前自动作文评分AES（Automated Essay Scoring）技术已得到广泛应用,但如何将其与大学英语写作教学有效结合仍有待深入研究。鉴于此,根据我国大学英语写作教学现状,结合L2（Second Language）语言学习特点,在分析AES技术相关原理基础上,对大学英语写作教学模式进行分析研究。结果表明,当前中国大学英语写作教学需结合AES技术和L2语言学习特点,构建基于AES的大学英语教学模式,以激发学生学习兴趣,提升学生英语写作能力。相似文献

14.

Using Automated Scores of Student Essays to Support Teacher Guidance in Classroom Inquiry

Libby F. Gerard Marcia C. Linn 《Journal of Science Teacher Education》2016,27(1):111-129

Computer scoring of student written essays about an inquiry topic can be used to diagnose student progress both to alert teachers to struggling students and to generate automated guidance. We identify promising ways for teachers to add value to automated guidance to improve student learning. Three teachers from two schools and their 386 students participated. We draw on evidence from student progress, observations of how teachers interact with students, and reactions of teachers. The findings suggest that alerts for teachers prompted rich teacher–student conversations about energy in photosynthesis. In one school, the combination of the automated guidance plus teacher guidance was more effective for student science learning than two rounds of personalized, automated guidance. In the other school, both approaches resulted in equal learning gains. These findings suggest optimal combinations of automated guidance and teacher guidance to support students to revise explanations during inquiry and build integrated understanding of science. 相似文献

15.

Transforming Biology Assessment with Machine Learning: Automated Scoring of Written Evolutionary Explanations 总被引：4，自引：4，他引：0

Ross H. Nehm Minsu Ha Elijah Mayfield 《Journal of Science Education and Technology》2012,21(1):183-196

This study explored the use of machine learning to automatically evaluate the accuracy of students’ written explanations of evolutionary change. Performance of the Summarization Integrated Development Environment (SIDE) program was compared to human expert scoring using a corpus of 2,260 evolutionary explanations written by 565 undergraduate students in response to two different evolution instruments (the EGALT-F and EGALT-P) that contained prompts that differed in various surface features (such as species and traits). We tested human-SIDE scoring correspondence under a series of different training and testing conditions, using Kappa inter-rater agreement values of greater than 0.80 as a performance benchmark. In addition, we examined the effects of response length on scoring success; that is, whether SIDE scoring models functioned with comparable success on short and long responses. We found that SIDE performance was most effective when scoring models were built and tested at the individual item level and that performance degraded when suites of items or entire instruments were used to build and test scoring models. Overall, SIDE was found to be a powerful and cost-effective tool for assessing student knowledge and performance in a complex science domain. 相似文献

16.

Expanding assessment methods and moments in history

Jennifer Frost Genevieve de Pont Ian Brailsford 《Assessment & Evaluation in Higher Education》2012,37(3):293-304

History courses at The University of Auckland are typically assessed at two or three moments during a semester. The methods used normally employ two essays and a written examination answering questions set by the lecturer. This study describes an assessment innovation in 2008 that expanded both the frequency and variety of activities completed by 182 undergraduates taking a course on the history of African‐American freedom struggles. All week‐by‐week tutorial assignments were collected for textual analysis to see if students were moving beyond the recollection and regurgitation of facts (surface learning) and instead were dealing with the deeper historical issues. The quality of student work coupled with our own classroom observations indicate that innovative assessment methods at regular moments during the semester made a positive difference to the student learning experience. 相似文献

17.

Large-scale assessment,locally-developed measures,and automated scoring of essays: Fishing for red herrings?

William Condon 《Assessing Writing》2013,18(1):100-108

Automated Essay Scoring (AES) has garnered a great deal of attention from the rhetoric and composition/writing studies community since the Educational Testing Service began using e-rater^® and the Criterion^® Online Writing Evaluation Service as products in scoring writing tests, and most of the responses have been negative. While the criticisms leveled at AES are reasonable, the more important, underlying issues relate to the aspects of the writing construct of the tests AES can rate. Because these tests underrepresent the construct as it is understood by the writing community, such tests should not be used in writing assessment, whether for admissions, placement, formative, or achievement testing. Instead of continuing the traditional, large-scale, commercial testing enterprise associated with AES, we should look to well-established, institutionally contextualized forms of assessment as models that yield fuller, richer information about the student's control of the writing construct. Such tests would be more valid, as reliable, and far fairer to the test-takers, whose stakes are often quite high. 相似文献

18.

Investigating rater/prompt interactions in writing assessment: Quantitative and qualitative approaches

Sara Cushing Weigle 《Assessing Writing》1999,6(2):137

This study investigates how experienced and inexperienced raters score essays written by ESL students on two different prompts. The quantitative analysis using multi-faceted Rasch measurement, which provides measurements of rater severity and consistency, showed that the inexperienced raters were more severe than the experienced raters on one prompt but not on the other prompt, and that differences between the two groups of raters were eliminated following rater training. The qualitative analysis, which consisted of analysis of raters' think-aloud protocols while scoring essays, provided insights into reasons for these differences. Differences were related to the ease with which the scoring rubric could be applied to the two prompts and to differences in how the two groups of raters perceived the appropriateness of the prompts. 相似文献

19.

Differences in written discourse in independent and integrated prototype tasks for next generation TOEFL

《Assessing Writing》2005,10(1):5-43

We assessed whether and how the discourse written for prototype integrated tasks (involving writing in response to print or audio source texts) field tested for Next Generation TOEFL^® differs from the discourse written for independent essays (i.e., the TOEFL Essay^®). We selected 216 compositions written for six tasks by 36 examinees in a field test—representing score levels 3, 4, and 5 on the TOEFL Essay—then coded the texts for lexical and syntactic complexity, grammatical accuracy, argument structure, orientations to evidence, and verbatim uses of source text. Analyses with non-parametric MANOVAs followed a three (task type: TOEFL Essay, writing in response to a reading passage, writing in response to a listening passage) by three (English proficiency level: score levels 3, 4, and 5 on the TOEFL Essay) within-subjects factorial design. The discourse produced for the integrated writing tasks differed significantly from the discourse produced in the independent essay for the variables of: lexical complexity (text length, word length, ratio of different words to total words written), syntactic complexity (number of words per T-unit, number clauses per T-unit), rhetoric (quality of propositions, claims, data, warrants, and oppositions in argument structure), and pragmatics (orientations to source evidence in respect to self or others and to phrasing the message as either declarations, paraphrases, or summaries). Across the three English proficiency levels, significant differences appeared for the variables of grammatical accuracy as well as all indicators of lexical complexity (text length, word length, ratio of different words to total words written), one indicator of syntactic complexity (words per T-unit), one rhetorical aspect (quality of claims in argument structure), and two pragmatic aspects (expression of self as voice, messages phrased as summaries). 相似文献

20.

Literacy instruction: From assignment to assessment

Betsy M. DelleBovi 《Assessing Writing》2012,17(4):271-292

This action research demonstrates the answer to this question: How can literacy professors provide effective training in evaluating writing to preservice graduate education students? The study examines writing assessment instruction in the context of a literacy course required of preservice teachers seeking secondary (7–12) certification in content area instruction. Approximately half of the course is devoted to instruction in 3 areas of writing assessment: (1) theory and practice in aspects of holistic writing assessment analysis, (2) methods for designing teachable rubrics, and (3) approaches to creating and sharing written feedback. Student-participants’ written responses to protocols demonstrate learning outcomes in these 3 areas along with their attitudes and the effects of their practice with an authentic set of high school students’ essays. The study demonstrates the effectiveness of this assessment instruction as a part of overall effectiveness in teacher preparation programs at the graduate level. 相似文献