期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Problem solving in schools and beyond: Transitions from the naive to the neophyte to the master

STUART A. KARABENICK MICHAEL E. WOOLLEY JEANNE M. FRIEDEL BRIDGET V. AMMON JULIANNE BLAZEVSKI CHRISTINA RHEE BONNEY 《教育心理学家》2013,48(3):139-151

Techniques emerging from the considerable research on cognitive aspects of survey methodology include various forms of probing and cognitive interviewing. These techniques are used to examine whether respondents' interpretations of self-report items are consistent with researchers' assumptions and intended meanings given the constructs the items are designed to measure. However, although informal procedures are common, such developments have not been systematically applied in educational research. We describe how information derived from the systematic application of cognitive pretesting can contribute to determining the validity—designated cognitive validity—of self-report items. Examples are presented from prominent motivation-related instruments that assess real-world instructional practices, mastery classroom goal structure, and student self-efficacy. The implications and pragmatics of adopting this approach are discussed. 相似文献

2.

作文网上评分“三评法”初探

李银玲《考试研究》2013,(2):64-70

文章针对目前网阅环境下作文"一评"定分评分方法的缺陷,提出了将"三评法"应用于作文评分中。结果表明,"一评法"下,评分员间一致性不够理想,存在显著性差异。"三评法"在一定程度上降低了评分误差,确保了阅卷质量。但这种方法在实施过程中也要注意避免三评人员的求稳心理,以确保该方法得到科学合理的使用。对于该方法能否投入到大规模作文网上评分中,还有待进一步研究。相似文献

3.

多面Rasch模型在主观题评分培训中的应用 总被引：7，自引：2，他引：7

李中权孙晓敏张厚粲张立松《中国考试》2008,(1):26-31

主观题的评分受到很多因素的影响,如评分者的知识水平、综合能力和个人偏好等。这些评分者偏差不仅会导致不同评分者之间存在主观差异,也会到导致同一评分者在不同的时间也具有主观不稳定性,最终导致主观题评分信度的降低。本研究将多面Rasch模型运用到某国家级考试论述题的评分培训中。通过分析6名有经验评分者对58份试卷的试评数据,鉴别出四种评分者偏差,然后据此对每个评分者进行个别反馈,从而提高评分的客观性和精确性。相似文献

4.

The effect of rating augmentation on inter-rater reliability: An empirical study of a holistic rubric

Jim Penny Robert L. Johnson Belita Gordon 《Assessing Writing》2000,7(2):163

A two-stage process by which a holistic rubric is applied to the assessment of open-ended items, such as writing samples, is defined. The first stage involves scoring a performance by the assignment of an integer rating that is congruent with the proficiency level that is exhibited in the performance. The second stage is the subsequent assignment by the rater of an augmentation that indicates whether or not the writing competency reflected in the paper is a bit higher or lower than the competency level reflected in the benchmark paper for the given proficiency level. If the rater feels that the paper represents benchmark proficiency for the given level, no augmentation is assigned to the rating. The results of this study indicate that the use of rating augmentation can improve the inter-rater reliability of holistic assessments, as indicated by generalizability phi coefficients, correlation coefficients, and percent agreement indices. Implications and suggestions for follow-up research are discussed. 相似文献

5.

THE RELIABILITY OF GENERAL CERTIFICATE OF EDUCATION EXAMINATION ENGLISH COMPOSITION PAPERS IN WEST AFRICA1

S. A. AKEJU 《Journal of Educational Measurement》1972,9(3):175-180

相似文献

6.

CET-4作文评分人评分标准使用情况的研究

徐鹰《浙江教育学院学报》2014,(2):39-46,93

本研究采用混合研究法对CET-4作文评分人如何使用评分标准进行分析。26位CET-4作文评分人对30篇CET-4模拟作文评分,并提供3条按重要性排序的评分理由。研究结果显示：（1）虽然存在严厉度的差异,但是26位评分人之间的一致性比较好,且大部分评分人的自身一致性也较好。（2）部分评分人的评分理由呈现了单一化趋势。（3）评分人所给评分理由的71.91%体现了CET-4作文评分标准所规定的5个文本特征,说明大部分评分人对标准的理解和把握还是比较准确的。相似文献

7.

自主的结构与测量 总被引：3，自引：0，他引：3

夏凌翔黄希庭吴波《西南师范大学学报(人文社会科学版)》2007,33(3):10-15

学者们提出了种类繁多的自主结构、测量工具、测量方法和测量指标。自主结构的划分包括基于自主测量的自主结构和仅仅基于理论分析的自主划分两类。自主的测量包括自陈法和他评法两类。自陈法包括自主量表、其他量表中的自主分量表、测量自主某个方面的量表和其他方法四类。他评法则主要是研究者通过观察、访谈等方法来收集资料，之后根据有关的编码系统等手段来评价个体的自主情况。最后，对自主与自立的结构和测量问题进行了对比分析。相似文献

8.

大学英语阅读小班合作学习实证研究 总被引：2，自引：0，他引：2

赵俊海《成都教育学院学报》2008,22(8):81-84

研究采用合作学习策略中的“小组成绩分组法”,为期10周,研究对象为48名一年级非英语专业本科生。研究工具为成绩测验、态度量表、合作学习行为评估表以及访谈。学习过程为研究过程前后成绩测验,合作学习策略讲解、分组、适应性学习和正式学习。学习结束后,对两次测验成绩作对比分析和显著性分析,以检验合作学习策略对提高大学生英语阅读能力的效果。研究结果表明,合作学习策略能有效提高大学非英语专业学生的英语阅读能力,80％的参与者对合作学习持肯定态度,该策略除了可以明显提高非英语专业大学生的英语阅读能力外,还能显著提高他们的合作意识和团队精神。相似文献

9.

Counselor Anxiety During a Counseling Interview

JAMES T. BOWMAN GAYLE T. ROBERTS 《Counselor Education & Supervision》1978,17(3):205-212

Physiological and subjective measures of counselor anxiety were compared to determine if counselors experienced greater anxiety during a counseling interview than during a conversation, Twenty experienced rehabilitation counselors in a graduate-level practicum course volunteered to participate in a 10-minute conversation and counseling session. Anxiety was assessed by self-report skin conductance and heart-rate measures. Results indicated that there were no significant treatment, period, or interaction effects for heart-rate data; however, there was a significant period effect for conductance data. There were no significant differences for participants' self-report evaluations of the two situations. Baseline autonomic data were highly related to autonomic data during the anticipation and stimulus periods, and preexperimental self-report data were moderately related to postexperimental self-report data. Conclusions were that counselors experience comparable anxiety during counseling and conversing, that expectation accounts for most of the counselors' anxiety, and that baseline physiological and self-report data may prove useful in identifying counselors who would experience anxiety during an interview. 相似文献

10.

Measuring Cognitive Engagement With Self-Report Scales: Reflections From Over 20 Years of Research

Barbara A. Greene 《教育心理学家》2015,50(1):14-30

Research spanning 20 years is reviewed as it relates to the measurement of cognitive engagement using self-report scales. The author's research program is at the forefront of the review, although the review is couched within the broader context of the research on motivation and cognitive engagement that began in the early 1990s. The theoretical origins of self-report instruments are examined, along with the early measurement findings and struggles. Research in science, technology, engineering, and mathematics contexts are highlighted. The author concludes that self-report data have made significant and important contributions to the understanding of motivation and cognitive engagement. However, the evidence also suggests a need to develop and use multiple approaches to measuring engagement in academic work rather than rely only on self-report instruments. Some alternatives to self-report measures are suggested here and throughout this issue. 相似文献

11.

学生课外英语自主学习活动调查

陈婧燕《海南师范大学学报(社会科学版)》2013,(9):68-70,134

自主学习或学习者自主性自上世纪80年代由Holec引入语言教学界以来就备受关注,现今已发展成为每一位学习者的必备能力。文章采用问卷方式对大二非英语专业140名本科生进行调查,并随之进行访谈,旨在了解学生课外参与英语自主学习活动情况,并通过深层分析、探讨可能存在的问题,给予必要的建议。相似文献

12.

Development and psychometric evaluation of a new assessment method for childhood maltreatment experiences: The interview for traumatic events in childhood (ITEC)

Jill Lobbestael Arnoud Arntz Petra Harkema-Schouten David Bernstein 《Child abuse & neglect》2009,33(8):505-517

ObjectiveWe conducted a comprehensive assessment of the reliability and validity of the Interview for Traumatic Events in Childhood (ITEC, Lobbestael, Arntz, Kremers, & Sieswerda, 2006), a retrospective, semi-structured interview for childhood maltreatment. The ITEC aims to yield dimensional scores for severity of experiences of different childhood maltreatment dimensions.MethodsInitial psychometric properties were tested with the pilot version of the ITEC in 362 participants. A second study assessed the revised ITEC in 217 participants, patients and non-patients.ResultsFactor analyses produced the best fit for a five-factor model (sexual, physical and emotional abuse, physical and emotional neglect). The scales had good internal consistency, except for the physical neglect subscale, and excellent inter-rater reliability. The scales were highly associated with equivalent scales of the Childhood Trauma Questionnaire (i.e., good convergent validity), and showed good correspondence with patient file information (i.e., good criterion validity).ConclusionThese results support the reliability and validity of the ITEC, making it a potentially useful tool for assessing a broad range of traumatic events in childhood.Practice implicationThe first step in therapy for dealing with childhood maltreatment is to map abusive experiences and assess their severity and impact. Since maltreatment is a sensitive topic that is not reported on easily, trauma interviews are promising assessment instruments since they provide the opportunity to probe and clarify. There are hardly any well-validated trauma interviews available that assess the extent of maltreatment in and outside the family in various dimensions. The current study tries to fill this gap by presenting a new trauma interview; the Interview for Traumatic Events in Childhood. 相似文献

13.

The number of feedbacks needed for reliable evaluation. A multilevel analysis of the reliability,stability and generalisability of students’ evaluation of teaching

Pekka Rantanen 《Assessment & Evaluation in Higher Education》2013,38(2):224-239

A multilevel analysis approach was used to analyse students’ evaluation of teaching (SET). The low value of inter-rater reliability stresses that any solid conclusions on teaching cannot be made on the basis of single feedbacks. To assess a teacher’s general teaching effectiveness, one needs to evaluate four randomly chosen course implementations. Two implementations are needed when one course is evaluated, and if one implementation is evaluated, up to 15 feedbacks are needed. The stability of students’ ratings is very high, which reflects students’ stable rating criteria. There is an obvious rating paradox: from the student’s point of view, each rating is very precise, stable and justifiable, but from the teacher’s point of view a single feedback reflects the quality of teaching to just a moderate extent. Cross-hierarchical analysis reveals that there are large discrepancies between the uses of rating scales; some students are systematically more lenient in their rating whereas others are systematically more severe. The study also reveals that some courses are generally rated more favourably and that some courses are more suitable for certain teachers. Managers can thus improve the quality of teaching by finding the most suitable courses for each teacher. 相似文献

14.

PSYCHOMETRIC ANALYSIS OF A 5E LEARNING CYCLE LESSON PLAN ASSESSMENT INSTRUMENT

M. Jenice Goldston Jeanelle Bland Day Cheryl Sundberg John Dantzler 《International Journal of Science and Mathematics Education》2010,8(4):633-648

The purpose of this paper is to describe the procedures and the analysis of an instrument designed to measure preservice teachers’ ability to develop appropriate 5E learning cycle lesson plans. The 5E inquiry lesson plan (ILP) rubric is comprised of 12 items with a scoring range of zero to four points per item. Content validity was determined through the expertise of a panel of five science educators. Sixty six preservice teachers enrolled in elementary science methods at three universities prepared lesson plans, which were scored by their instructors using the ILP rubric. Using a Pearson two-tailed correlation, inter-rater reliability was established at a value of 0.83. An exploratory factor analysis provided evidence of construct validity, with three factors. The factors included (1) explore, (2) engage/explain/elaborate, and (3) evaluate. In addition, a secondary analysis revealed the means and standard deviations of the students' performance on each of the phases of the 5E that include: engage, explore, explain, elaborate, and evaluate. The engage item held the highest mean rating, and the evaluation items had the lowest mean ratings. Examination of the instrument's structure in light of the 5E phases is discussed and provides directions for future revisions and research. 相似文献

15.

A Review of the Psychometric Properties of Retell Instruments

Deborah K. Reed 《Educational Assessment》2013,18(3):123-144

This narrative synthesis reviews the psychometric properties of commercially and publicly available retell instruments used to assess the reading comprehension of students in grades K–12. Eleven instruments met selection criteria and were systematically coded for data related to the administration procedures, scoring procedures, and technical adequacy of the retell component. High variability was evident in the prompting conditions and the use of quantitative and qualitative scoring mechanisms. Because no two instruments shared the same features, their retell scores are likely not equitable. None of the measures provided sufficient information to substantiate their reliability and validity. Many were lacking data on critical psychometric aspects, such as passage equivalency and construct validity, and nearly all had insufficient or ill-defined norming samples. 相似文献

16.

The design and development of a dyscalculia checklist based on a focus group interview

Soo May Yoong Noor Aini Ahmad Charanjit Kaur Swaran Singh Wei Lun Wong 《British Journal of Special Education》2023,50(3):403-412

This article aims to explore the symptoms and characteristics of dyscalculia. This is a qualitative study. Five experts in the field of special education took part in a focus group interview. Each expert had more than ten years of experience in their area of expertise. To determine the content validity of the protocol, three experts in special education, language and qualitative research evaluated each of the eight items. Cohen's kappa analysis was used to assess inter-rater reliability. The findings of this study indicate that 59 items have been developed, based on six constructs in the dyscalculia checklist. The six constructs were subitising, estimating, Arabic numerals, verbal numbers, arithmetic facts and calculating processes. Following the focus group interview, a new construct emerged: math anxiety. The study implies that teachers might utilise this checklist to carry out early detection of students with dyscalculia in primary schools. This will enable appropriate intervention, resulting in significant benefits for the Ministry of Education, for educators and teachers, and for the students themselves. Although this study was based in Malaysia, the results have wider implications because dyscalculia is present everywhere. 相似文献

17.

Modeling Randomness in Judging Rating Scales with a Random-Effects Rating Scale Model

Wen-Chung Wang Mark Wilson Ching-Lin Shih 《Journal of Educational Measurement》2006,43(4):335-353

This study presents the random-effects rating scale model (RE-RSM) which takes into account randomness in the thresholds over persons by treating them as random-effects and adding a random variable for each threshold in the rating scale model (RSM) ( Andrich, 1978 ). The RE-RSM turns out to be a special case of the multidimensional random coefficients multinomial logit model (MRCMLM) ( Adams, Wilson, & Wang, 1997 ) so that the estimation procedures for the MRCMLM can be directly applied. The results of the simulation indicated that when the data were generated from the RSM, using the RSM and the RE-RSM to fit the data made little difference: both resulting in accurate parameter recovery. When the data were generated from the RE-RSM, using the RE-RSM to fit the data resulted in unbiased estimates, whereas using the RSM resulted in biased estimates, large fit statistics for the thresholds, and inflated test reliability. An empirical example of 10 items with four-point rating scales was illustrated in which four models were compared: the RSM, the RE-RSM, the partial credit model ( Masters, 1982 ), and the constrained random-effects partial credit model. In this real data set, the need for a random-effects formulation becomes clear. 相似文献

18.

Assessing the reliability of student evaluations of teaching: choosing the right coefficient

Donald Morley 《Assessment & Evaluation in Higher Education》2014,39(2):127-139

Many of the studies used to support the claim that student evaluations of teaching are reliable measures of teaching effectiveness have frequently calculated inappropriate reliability coefficients. This paper points to three coefficients that would be appropriate depending on if student evaluations were used for formative or summative purposes. Results from the present study indicated that students had very low absolute inter-rater reliability, but somewhat higher consistency inter-rater reliability. 相似文献

19.

Measuring the demand for agricultural information

Frank van Steenbergen 《The Journal of Agricultural Education and Extension》2013,19(1):41-44

Abstract

Networked learning aims to foster students’ knowledge construction processes as well as the quality of knowledge construction. In this respect, it is crucial to be able to analyse both aspects of networked learning. Based on theories on networked learning and the empirical work of relevant authors in this domain, two coding schemes are presented to analyse the nature of learning processes and the quality of knowledge construction in networked learning. The coding schemes were used to analyse the learning processes and learning results of students in an MSc course on land use planning at Wageningen University in which networked learning played an important role. The inter-rater reliability of both instruments appeared to be satisfactory. The relation between the two coding schemes is discussed and recommendations for future research and educational practice are formulated. 相似文献

20.

Measuring teacher perceptions of the “how” and “why” of student motivation

Patricia L. Hardré Kendrick A. Davis David W. Sullivan 《Educational Research and Evaluation》2013,19(2):155-179

In the field of educational psychology, there is diverse and active research in motivation for learning and achievement. Many instruments exist for assessing students' motivation, primarily as self-report. Fewer instruments are available for assessing teachers' perceptions of their students' motivation, and fewer still for assessing teachers' perceptions of reasons for students' lack of motivation. Teachers' intervention strategies for motivation are linked to their causal perceptions. Therefore, it is important to assess those causal perceptions. In this paper, we offer evidence for the Perceptions of Student Motivation questionnaire, a new measure that offers evidence of validity and reliability for this purpose among high school teachers. It offers potential to increase efficiency and clarity of findings regarding teachers' perceptions of students' motivation. 相似文献