首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Techniques emerging from the considerable research on cognitive aspects of survey methodology include various forms of probing and cognitive interviewing. These techniques are used to examine whether respondents' interpretations of self-report items are consistent with researchers' assumptions and intended meanings given the constructs the items are designed to measure. However, although informal procedures are common, such developments have not been systematically applied in educational research. We describe how information derived from the systematic application of cognitive pretesting can contribute to determining the validity—designated cognitive validity—of self-report items. Examples are presented from prominent motivation-related instruments that assess real-world instructional practices, mastery classroom goal structure, and student self-efficacy. The implications and pragmatics of adopting this approach are discussed.  相似文献   

2.
文章针对目前网阅环境下作文"一评"定分评分方法的缺陷,提出了将"三评法"应用于作文评分中。结果表明,"一评法"下,评分员间一致性不够理想,存在显著性差异。"三评法"在一定程度上降低了评分误差,确保了阅卷质量。但这种方法在实施过程中也要注意避免三评人员的求稳心理,以确保该方法得到科学合理的使用。对于该方法能否投入到大规模作文网上评分中,还有待进一步研究。  相似文献   

3.
多面Rasch模型在主观题评分培训中的应用   总被引:7,自引:2,他引:7  
主观题的评分受到很多因素的影响,如评分者的知识水平、综合能力和个人偏好等。这些评分者偏差不仅会导致不同评分者之间存在主观差异,也会到导致同一评分者在不同的时间也具有主观不稳定性,最终导致主观题评分信度的降低。本研究将多面Rasch模型运用到某国家级考试论述题的评分培训中。通过分析6名有经验评分者对58份试卷的试评数据,鉴别出四种评分者偏差,然后据此对每个评分者进行个别反馈,从而提高评分的客观性和精确性。  相似文献   

4.
A two-stage process by which a holistic rubric is applied to the assessment of open-ended items, such as writing samples, is defined. The first stage involves scoring a performance by the assignment of an integer rating that is congruent with the proficiency level that is exhibited in the performance. The second stage is the subsequent assignment by the rater of an augmentation that indicates whether or not the writing competency reflected in the paper is a bit higher or lower than the competency level reflected in the benchmark paper for the given proficiency level. If the rater feels that the paper represents benchmark proficiency for the given level, no augmentation is assigned to the rating. The results of this study indicate that the use of rating augmentation can improve the inter-rater reliability of holistic assessments, as indicated by generalizability phi coefficients, correlation coefficients, and percent agreement indices. Implications and suggestions for follow-up research are discussed.  相似文献   

5.
6.
本研究采用混合研究法对CET-4作文评分人如何使用评分标准进行分析。26位CET-4作文评分人对30篇CET-4模拟作文评分,并提供3条按重要性排序的评分理由。研究结果显示:(1)虽然存在严厉度的差异,但是26位评分人之间的一致性比较好,且大部分评分人的自身一致性也较好。(2)部分评分人的评分理由呈现了单一化趋势。(3)评分人所给评分理由的71.91%体现了CET-4作文评分标准所规定的5个文本特征,说明大部分评分人对标准的理解和把握还是比较准确的。  相似文献   

7.
自主的结构与测量   总被引:3,自引:0,他引:3  
学者们提出了种类繁多的自主结构、测量工具、测量方法和测量指标。自主结构的划分包括基于自主测量的自主结构和仅仅基于理论分析的自主划分两类。自主的测量包括自陈法和他评法两类。自陈法包括自主量表、其他量表中的自主分量表、测量自主某个方面的量表和其他方法四类。他评法则主要是研究者通过观察、访谈等方法来收集资料,之后根据有关的编码系统等手段来评价个体的自主情况。最后,对自主与自立的结构和测量问题进行了对比分析。  相似文献   

8.
大学英语阅读小班合作学习实证研究   总被引:2,自引:0,他引:2  
研究采用合作学习策略中的“小组成绩分组法”,为期10周,研究对象为48名一年级非英语专业本科生。研究工具为成绩测验、态度量表、合作学习行为评估表以及访谈。学习过程为研究过程前后成绩测验,合作学习策略讲解、分组、适应性学习和正式学习。学习结束后,对两次测验成绩作对比分析和显著性分析,以检验合作学习策略对提高大学生英语阅读能力的效果。研究结果表明,合作学习策略能有效提高大学非英语专业学生的英语阅读能力,80%的参与者对合作学习持肯定态度,该策略除了可以明显提高非英语专业大学生的英语阅读能力外,还能显著提高他们的合作意识和团队精神。  相似文献   

9.
Physiological and subjective measures of counselor anxiety were compared to determine if counselors experienced greater anxiety during a counseling interview than during a conversation, Twenty experienced rehabilitation counselors in a graduate-level practicum course volunteered to participate in a 10-minute conversation and counseling session. Anxiety was assessed by self-report skin conductance and heart-rate measures. Results indicated that there were no significant treatment, period, or interaction effects for heart-rate data; however, there was a significant period effect for conductance data. There were no significant differences for participants' self-report evaluations of the two situations. Baseline autonomic data were highly related to autonomic data during the anticipation and stimulus periods, and preexperimental self-report data were moderately related to postexperimental self-report data. Conclusions were that counselors experience comparable anxiety during counseling and conversing, that expectation accounts for most of the counselors' anxiety, and that baseline physiological and self-report data may prove useful in identifying counselors who would experience anxiety during an interview.  相似文献   

10.
Research spanning 20 years is reviewed as it relates to the measurement of cognitive engagement using self-report scales. The author's research program is at the forefront of the review, although the review is couched within the broader context of the research on motivation and cognitive engagement that began in the early 1990s. The theoretical origins of self-report instruments are examined, along with the early measurement findings and struggles. Research in science, technology, engineering, and mathematics contexts are highlighted. The author concludes that self-report data have made significant and important contributions to the understanding of motivation and cognitive engagement. However, the evidence also suggests a need to develop and use multiple approaches to measuring engagement in academic work rather than rely only on self-report instruments. Some alternatives to self-report measures are suggested here and throughout this issue.  相似文献   

11.
自主学习或学习者自主性自上世纪80年代由Holec引入语言教学界以来就备受关注,现今已发展成为每一位学习者的必备能力。文章采用问卷方式对大二非英语专业140名本科生进行调查,并随之进行访谈,旨在了解学生课外参与英语自主学习活动情况,并通过深层分析、探讨可能存在的问题,给予必要的建议。  相似文献   

12.
ObjectiveWe conducted a comprehensive assessment of the reliability and validity of the Interview for Traumatic Events in Childhood (ITEC, Lobbestael, Arntz, Kremers, & Sieswerda, 2006), a retrospective, semi-structured interview for childhood maltreatment. The ITEC aims to yield dimensional scores for severity of experiences of different childhood maltreatment dimensions.MethodsInitial psychometric properties were tested with the pilot version of the ITEC in 362 participants. A second study assessed the revised ITEC in 217 participants, patients and non-patients.ResultsFactor analyses produced the best fit for a five-factor model (sexual, physical and emotional abuse, physical and emotional neglect). The scales had good internal consistency, except for the physical neglect subscale, and excellent inter-rater reliability. The scales were highly associated with equivalent scales of the Childhood Trauma Questionnaire (i.e., good convergent validity), and showed good correspondence with patient file information (i.e., good criterion validity).ConclusionThese results support the reliability and validity of the ITEC, making it a potentially useful tool for assessing a broad range of traumatic events in childhood.Practice implicationThe first step in therapy for dealing with childhood maltreatment is to map abusive experiences and assess their severity and impact. Since maltreatment is a sensitive topic that is not reported on easily, trauma interviews are promising assessment instruments since they provide the opportunity to probe and clarify. There are hardly any well-validated trauma interviews available that assess the extent of maltreatment in and outside the family in various dimensions. The current study tries to fill this gap by presenting a new trauma interview; the Interview for Traumatic Events in Childhood.  相似文献   

13.
A multilevel analysis approach was used to analyse students’ evaluation of teaching (SET). The low value of inter-rater reliability stresses that any solid conclusions on teaching cannot be made on the basis of single feedbacks. To assess a teacher’s general teaching effectiveness, one needs to evaluate four randomly chosen course implementations. Two implementations are needed when one course is evaluated, and if one implementation is evaluated, up to 15 feedbacks are needed. The stability of students’ ratings is very high, which reflects students’ stable rating criteria. There is an obvious rating paradox: from the student’s point of view, each rating is very precise, stable and justifiable, but from the teacher’s point of view a single feedback reflects the quality of teaching to just a moderate extent. Cross-hierarchical analysis reveals that there are large discrepancies between the uses of rating scales; some students are systematically more lenient in their rating whereas others are systematically more severe. The study also reveals that some courses are generally rated more favourably and that some courses are more suitable for certain teachers. Managers can thus improve the quality of teaching by finding the most suitable courses for each teacher.  相似文献   

14.
The purpose of this paper is to describe the procedures and the analysis of an instrument designed to measure preservice teachers’ ability to develop appropriate 5E learning cycle lesson plans. The 5E inquiry lesson plan (ILP) rubric is comprised of 12 items with a scoring range of zero to four points per item. Content validity was determined through the expertise of a panel of five science educators. Sixty six preservice teachers enrolled in elementary science methods at three universities prepared lesson plans, which were scored by their instructors using the ILP rubric. Using a Pearson two-tailed correlation, inter-rater reliability was established at a value of 0.83. An exploratory factor analysis provided evidence of construct validity, with three factors. The factors included (1) explore, (2) engage/explain/elaborate, and (3) evaluate. In addition, a secondary analysis revealed the means and standard deviations of the students' performance on each of the phases of the 5E that include: engage, explore, explain, elaborate, and evaluate. The engage item held the highest mean rating, and the evaluation items had the lowest mean ratings. Examination of the instrument's structure in light of the 5E phases is discussed and provides directions for future revisions and research.  相似文献   

15.
This narrative synthesis reviews the psychometric properties of commercially and publicly available retell instruments used to assess the reading comprehension of students in grades K–12. Eleven instruments met selection criteria and were systematically coded for data related to the administration procedures, scoring procedures, and technical adequacy of the retell component. High variability was evident in the prompting conditions and the use of quantitative and qualitative scoring mechanisms. Because no two instruments shared the same features, their retell scores are likely not equitable. None of the measures provided sufficient information to substantiate their reliability and validity. Many were lacking data on critical psychometric aspects, such as passage equivalency and construct validity, and nearly all had insufficient or ill-defined norming samples.  相似文献   

16.
This article aims to explore the symptoms and characteristics of dyscalculia. This is a qualitative study. Five experts in the field of special education took part in a focus group interview. Each expert had more than ten years of experience in their area of expertise. To determine the content validity of the protocol, three experts in special education, language and qualitative research evaluated each of the eight items. Cohen's kappa analysis was used to assess inter-rater reliability. The findings of this study indicate that 59 items have been developed, based on six constructs in the dyscalculia checklist. The six constructs were subitising, estimating, Arabic numerals, verbal numbers, arithmetic facts and calculating processes. Following the focus group interview, a new construct emerged: math anxiety. The study implies that teachers might utilise this checklist to carry out early detection of students with dyscalculia in primary schools. This will enable appropriate intervention, resulting in significant benefits for the Ministry of Education, for educators and teachers, and for the students themselves. Although this study was based in Malaysia, the results have wider implications because dyscalculia is present everywhere.  相似文献   

17.
This study presents the random-effects rating scale model (RE-RSM) which takes into account randomness in the thresholds over persons by treating them as random-effects and adding a random variable for each threshold in the rating scale model (RSM) ( Andrich, 1978 ). The RE-RSM turns out to be a special case of the multidimensional random coefficients multinomial logit model (MRCMLM) ( Adams, Wilson, & Wang, 1997 ) so that the estimation procedures for the MRCMLM can be directly applied. The results of the simulation indicated that when the data were generated from the RSM, using the RSM and the RE-RSM to fit the data made little difference: both resulting in accurate parameter recovery. When the data were generated from the RE-RSM, using the RE-RSM to fit the data resulted in unbiased estimates, whereas using the RSM resulted in biased estimates, large fit statistics for the thresholds, and inflated test reliability. An empirical example of 10 items with four-point rating scales was illustrated in which four models were compared: the RSM, the RE-RSM, the partial credit model ( Masters, 1982 ), and the constrained random-effects partial credit model. In this real data set, the need for a random-effects formulation becomes clear.  相似文献   

18.
Many of the studies used to support the claim that student evaluations of teaching are reliable measures of teaching effectiveness have frequently calculated inappropriate reliability coefficients. This paper points to three coefficients that would be appropriate depending on if student evaluations were used for formative or summative purposes. Results from the present study indicated that students had very low absolute inter-rater reliability, but somewhat higher consistency inter-rater reliability.  相似文献   

19.
Abstract

Networked learning aims to foster students’ knowledge construction processes as well as the quality of knowledge construction. In this respect, it is crucial to be able to analyse both aspects of networked learning. Based on theories on networked learning and the empirical work of relevant authors in this domain, two coding schemes are presented to analyse the nature of learning processes and the quality of knowledge construction in networked learning. The coding schemes were used to analyse the learning processes and learning results of students in an MSc course on land use planning at Wageningen University in which networked learning played an important role. The inter-rater reliability of both instruments appeared to be satisfactory. The relation between the two coding schemes is discussed and recommendations for future research and educational practice are formulated.  相似文献   

20.
In the field of educational psychology, there is diverse and active research in motivation for learning and achievement. Many instruments exist for assessing students' motivation, primarily as self-report. Fewer instruments are available for assessing teachers' perceptions of their students' motivation, and fewer still for assessing teachers' perceptions of reasons for students' lack of motivation. Teachers' intervention strategies for motivation are linked to their causal perceptions. Therefore, it is important to assess those causal perceptions. In this paper, we offer evidence for the Perceptions of Student Motivation questionnaire, a new measure that offers evidence of validity and reliability for this purpose among high school teachers. It offers potential to increase efficiency and clarity of findings regarding teachers' perceptions of students' motivation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号