首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 468 毫秒
1.
Due to variation in test difficulty, the use of pre-fixed cut-off scores in criterion-referenced standard setting methods may lead to variation in grades and pass rates. This paper aims to empirically investigate the strength of this relationship. To this end we examine a dataset of over 500 observations from an institution of higher education in The Netherlands over the period 2008–2013. We measure variation in test difficulty by using students’ perceptions of the validity of the examination and by recording personnel changes in the primary instructor. The latter measure is based on the considerable variation in teachers’ ability to assess test difficulty that is found in the literature. Other explanatory variables are course evaluations, instructor evaluations and self-reported study time. Variation in student quality is controlled for by measuring course results in deviation from the cohort average. We take a panel approach in estimating the effect of the explanatory variables on the variability in grades and pass rates. Our findings indicate that exam validity and instructor change are significantly related to variation in test results. The latter finding supports the hypothesis that instructors’ difficulty in assessing test difficulty may introduce subjectivity in criterion-referenced standard setting methods.  相似文献   

2.
Abstract

This study uses decision tree analysis to determine the most important variables that predict high overall teaching and course scores on a student evaluation of teaching (SET) instrument at a large public research university in the United States. Decision tree analysis is a more robust and intuitive approach for analysing and interpreting SET scores compared to more common parametric statistical approaches. Variables in this analysis included individual items on the SET instrument, self-reported student characteristics, course characteristics and instructor characteristics. The results show that items on the SET instrument that most directly address fundamental issues of teaching and learning, such as helping the student to better understand the course material, are most predictive of high overall teaching and course scores. SET items less directly related to student learning, such as those related to course grading policies, have little importance in predicting high overall teaching and course scores. Variables irrelevant to the construct, such as an instructor’s gender and race/ethnicity, were not predictive of high overall teaching and course scores. These findings provide evidence of criterion and discriminant validity, and show that high SET scores do not reflect student biases against an instructor’s gender or race/ethnicity.  相似文献   

3.
Abstract

The validity of student evaluation of teaching (SET) scores depends on minimum effect of extraneous response processes or biases. A bias may increase or decrease scores and change the relationship with other variables. In contrast, SET literature defines bias as an irrelevant variable correlated with SET scores, and among many, a relevant biasing factor in literature is the instructor’s gender. The study examines the extent to which acquiescence, the tendency to endorse the highest response option across items and bias in the first sense affects students’ responses to a SET rating scale. The study also explores how acquiescence affects the difference in teaching quality (TQ) by instructor’s gender, a bias in the latter sense. SET data collected at a faculty of education in Ontario, Canada were analysed using the Rasch rating scale model. Findings provide empirical support for acquiescence affecting students’ responses. Latent regression analyses show how acquiescence reduces the difference in TQ by instructor’s gender. Findings encourage greater attention to the response process quality as a way to better defend the utility of SET and prevent potentially misleading conclusions from the analysis of SET data.  相似文献   

4.
Instructors whose teaching was evaluated by students were given the opportunity to rate how applicable the evaluation items were to their classes. This study examined the kinds of items which instructors felt to be applicable or inapplicable, the relationships between the student ratings and the instructor applicability ratings, and the effect on an overall evaluation score of using the instructor applicability judgments as weights.Results generally support the consensus procedure of establishing rating forms; they suggest that the common criticism that faculty judgments of item applicability are influenced by anticipation of student ratings may be true for specific items and that while weighting composite evaluation scores by means of faculty applicability judgments does not affect those overall scores, the distributions of certain items may be altered.  相似文献   

5.
This paper studies the effect of teacher gender and ethnicity on student evaluations of teaching at university. We analyze a unique data-set featuring mixed teaching teams and a diverse, multicultural, multi-ethnic group of students and teachers. Blended co-teaching allows us to study the link between student evaluations of teaching and teacher gender as well as ethnicity exploiting within course variation in a panel data model with course-year fixed effects. We document a negative effect of being a female teacher on student evaluations of teaching, which amounts to roughly one fourth of the sample standard deviation of teaching scores. Overall women are 11 percentage points less likely to attain the teaching evaluation cut-off for promotion to associate professor compared to men. The effect is robust to a host of co-variates such as course leadership, teacher experience and research quality, as well as an alternative teacher fixed effect specification. There is no evidence of a corresponding ethnicity effect. Our results are suggestive of a gender bias against female teachers and indicate that the use of teaching evaluations in hiring and promotion decisions may put female lectures at a disadvantage.  相似文献   

6.
Abstract

The present research examined whether students’ likelihood to take a course with a male or female professor was affected by different expectations of professors based on gender stereotypes. In an experimental vignette study, 503 undergraduate students from a Canadian university were randomly assigned to read a fictitious online review, similar to those found on RateMyProfessors.com, that varied professor gender, overall quality score and level of caring for students. Students responded to items assessing their likelihood to take a course with the professor, perceived competence and warmth of the professor, and their own gender bias. An analysis of variance revealed an interaction between professor gender, student gender, quality score and caring. When quality score was low, male students indicated a lower likelihood of taking a course with female professors who were not described as caring. Regression analyses showed, however, that students' gender bias was negatively associated with likelihood to take a course with a female professor. These results imply that student gender plays a role in evaluations of female professors who do not display stereotypical warmth but that gender bias, which is typically higher for males at the group-level, may be an underlying factor.  相似文献   

7.
A reliability coefficient for criterion-referenced tests is developed from the assumptions of classical test theory. This coefficient is based on deviations of scores from the criterion score, rather than from the mean. The coefficient is shown to have several of the important properties of the conventional normreferenced reliability coefficient, including its interpretation as a ratio of variances and as a correlation between parallel forms, its relationship to test length, its estimation from a single form of a test, and its use in correcting for attenuation due to measurement error. Norm-referenced measurement is considered as a special case of criterion-referenced measurement.  相似文献   

8.
Growth in the use of testing to determine student eligibility for community college courses has prompted debate and litigation regarding over the equity, access, and legal implications of these practices. In California, this has resulted in state regulations requiring that community colleges provide predictive validity evidence of test-score?based inferences and course prerequisites. In addition, companion measures that supplement placement test scores must be used for placement purposes. However, for both theoretical and technical reasons the predictive validity coefficients between placement test scores and final grades or retention in a course generally demonstrate a weak relationship. The study discussed in this article examined the predictive validity of placement test scores with course grade and retention in English and mathematics courses. The investigation produced a model to explain variance in course outcomes using test scores, student background data, and instructor differences in grading practices. The model produced suggests that student dispositional characteristics explain the high proportion of variance in the dependent variables. Including instructor grading practices in the model adds significantly to the explanatory power and suggests that grading variations make accurate placement more problematic. This investigation underscores the importance of academic standards as something imposed on students by an institution and not something determined by the entering abilities of students.  相似文献   

9.
In discussion of the properties of criterion-referenced tests, it is often assumed that traditional reliability indices, particularly those based on internal consistency, are not relevant. However, if the measurement errors involved in using an individual's observed score on a criterion-referenced test to estimate his or her universe scores on a domain of items are compared to errors of an a priori procedure that assigns the same universe score (the mean observed test score) to all persons, the test-based procedure is found to improve the accuracy of universe score estimates only if the test reliability is above 0.5. This suggests that criterion-referenced tests with low reliabilities generally will have limited use in estimating universe scores on domains of items.  相似文献   

10.
Conclusion Technology advancement is shifting our education paradigm. The role of the instructor is changing from an information-giver to a facilitator. Students no longer passively receive information but may become instructional resources in class. Given opportunities, they may be self-learners and self-trainers. In a multimedia course, the instructor employed teaching methods allowing her to be a facilitator and her students to be self-learners. It was discovered that the course motivated students; fostered active, meaningful, and constructive learning; enhanced critical thinking skills; and increased students’ confidence. Class observations, interviews, and student feedback revealed that the new teaching methods and new role of the instructor had a positive impact on student learning. As a university professor in Instructional Technology, the author might have experienced the education paradigm shift and its impact on the role of an instructor earlier or faster than instructors of other subject areas might. However, the new paradigm is expected to spread widely in education. As NCATE stated in 1997, teachers need to develop a new understanding, new attitude, new approach, and new role. Every instructor should be open to the changes and explore the possibility of creating a learning community in which instructors, students, and community members may contribute, benefit, and generate meaningful learning experiences. One can only look forward to participating in the dynamic learning and expect its positive impact on our society.  相似文献   

11.
12.
Student evaluations of teaching (SETs) are widely used to measure teaching quality in higher education and compare it across different courses, teachers, departments and institutions. Indeed, SETs are of increasing importance for teacher promotion decisions, student course selection, as well as for auditing practices demonstrating institutional performance. However, survey response is typically low, rendering these uses unwarranted if students who respond to the evaluation are not randomly selected along observed and unobserved dimensions. This paper is the first to fully quantify this problem by analyzing the direction and size of selection bias resulting from both observed and unobserved characteristics for over 3000 courses taught in a large European university. We find that course evaluations are upward biased, and that correcting for selection bias has non-negligible effects on the average evaluation score and on the evaluation-based ranking of courses. Moreover, this bias mostly derives from selection on unobserved characteristics, implying that correcting evaluation scores for observed factors such as student grades does not solve the problem. However, we find that adjusting for selection only has small impacts on the measured effects of observables on SETs, validating a large related literature which considers the observable determinants of evaluation scores without correcting for selection bias.  相似文献   

13.
Abstract

Evaluation of college instructors often centers on course ratings; however, there is little evidence that these ratings only reflect teaching. The purpose of this study was to assess the relative importance of three facets of course ratings: instructor, course and occasion. We sampled 2,459 fully-crossed dyads from a large university where two instructors taught the same two courses at least twice in a 3-year period. Generalizability theory was used to estimate unconfounded variance components for instructor, course and occasion, as well as their interactions. Meta-analysis was used to summarize those estimates. Results indicated that a three-way interaction between instructor, course and occasion that includes measurement error accounted for the most variance in student ratings (24%), with instructor accounting for the second largest amount (22%). While instructor - and presumably teaching - accounted for substantial variance in student course ratings, factors other than instructor quality had a larger influence on student ratings.  相似文献   

14.
Student evaluation of teaching (SET) is now common practice across higher education, with the results used for both course improvement and quality assurance purposes. While much research has examined the validity of SETs for measuring teaching quality, few studies have investigated the factors that influence student participation in the SET process. This study aimed to address this deficit through the analysis of an SET respondent pool at a large Canadian research-intensive university. The findings were largely consistent with available research (showing influence of student gender, age, specialisation area and final grade on SET completion). However, the study also identified additional influential course-specific factors such as term of study, course year level and course type as statistically significant. Collectively, such findings point to substantively significant patterns of bias in the characteristics of the respondent pool. Further research is needed to specify and quantify the impact (if any) on SET scores. We conclude, however, by recommending that such bias does not invalidate SET implementation, but instead should be embraced and reported within standard institutional practice, allowing better understanding of feedback received, and driving future efforts at recruiting student respondents.  相似文献   

15.
Active learning is based on self-directed and autonomous teaching methods, whereas passive learning is grounded in instructor taught lectures. An animal physiology course was studied over a two-year period (Year 1, n = 42 students; Year 2, n = 30 students) to determine the effects of student-led seminar (andragogical) and lecture (pedagogical) teaching methods on students' retention of information and performance. For each year of the study, the course was divided into two time periods. The first half was dedicated to instructor-led lectures, followed by a control survey in which the students rated the efficiency of pedagogical learning on a five-point Likert scale from one (strongly disagree) to five (strongly agree). During the second period, students engaged in andragogical learning via peer-led seminars. An experimental survey was then administered to students using the same scale as above to determine students' preferred teaching method. Raw examination scores and survey results from both halves of the course were statistically analyzed by ANOVA with Newman-Keuls multiple comparison test. By the end of the study, student preference for peer-led seminars increased [mean ± SD: (2.47 ± 0.94)/(4.03 ± 1.36), P < 0.04], and examination scores significantly increased [mean ± SD: (73.91% ± 13.18)/(85.77 ± 5.22), P < 0.001]. A majority of students (68.8%) preferred a method that contained peer-led seminars and instructor-led lectures. These results may indicate that integration of active and passive learning into undergraduate courses may have greater benefit in terms of student preference and performance than either method alone.  相似文献   

16.
Two studies examined student psychological need satisfaction as a predictor of positive teacher-course evaluations. In Study 1, 268 undergraduates recalled and rated the quality of a recent important college course, then rated their feelings of autonomy, competence, and relatedness within that course. Consistent with self-determination theory, all three ratings predicted instructor and/or course ratings. Study 2 found the same pattern in a sample of 179 introductory journalism students nested within 12 sections of a single course. Study 2 also evaluated instructor characteristics as predictors of mean levels of student need satisfaction across the 12 classes. Although instructor age and overall teaching experience were unrelated to students' need satisfaction, greater experience teaching their particular class negatively predicted student autonomy and relatedness need satisfaction. Implications for pedagogical practice are discussed.  相似文献   

17.
The use of student evaluation of teaching (SET) to evaluate and improve teaching is widespread amongst institutions of higher education. Many authors have searched for a conclusive understanding about the influence of student, course, and teacher characteristics on SET. One hotly debated discussion concerns the interpretation of the positive and statistically significant relationship that has been found between course grades and SET scores. In addition to reviewing the literature, the main purpose of the present study is to examine the influence of course grades and other characteristics of students, courses, and teachers on SET. Data from 1244 evaluations were collected using the SET-37 instrument and analyzed by means of cross-classified multilevel models. The results show positive significant relationships between course grades, class attendance, the examination period in which students receive their highest course grades, and the SET score. These relationships, however, are subject to different interpretations. Future research should focus on providing a definitive and empirically supported interpretation for these relationships. In the absence of such an interpretation, it will remain unclear whether these relationships offer proof of the validity of SET or whether they are a biasing factor.  相似文献   

18.
Whenever the purpose of measurement is to inform an inference about a student’s achievement level, it is important that we be able to trust that the student’s test score accurately reflects what that student knows and can do. Such trust requires the assumption that a student’s test event is not unduly influenced by construct-irrelevant factors that could distort his score. This article examines one such factor—test-taking motivation—that tends to induce a person-specific, systematic negative bias on test scores. Because current measurement models underlying achievement testing assume students respond effortfully to test items, it is important to identify test scores that have been materially distorted by non-effortful test taking. A method for conducting effort-related individual score validation is presented, and it is recommended that measurement professionals have a responsibility to identify invalid scores to individuals who make inferences about student achievement on the basis of those scores.  相似文献   

19.
This paper examines the effects of instructors’ attractiveness on student evaluations of their teaching. We build on previous studies by holding both observed and unobserved characteristics of the instructor and classes constant. Our identification strategy exploits the fact that many instructors, in addition to traditional teaching in the classroom, also teach in the online environment, where attractiveness is either unknown or less salient. We utilize multiple attractiveness measures, including facial symmetry software, subjective evaluations, and a novel, proxy methodology that resembles a “Keynesian Beauty Contest.” We identify a substantial beauty premium in face-to-face classes for women but not for men. While gender on its own does not impact teaching evaluation scores, female instructors rated as more attractive receive higher instructional ratings. This result holds across several beauty measures, given a multitude of controls and while controlling for unobserved instructor characteristics and skills. Notably, the positive relationship between beauty and teaching effectiveness is not found in the online environment, suggesting the observed premium may be due to discrimination.  相似文献   

20.
We conduct a framed field experiment at a Dutch university to compare student effort provision and exam performance under the two most prevalent evaluation practices: absolute (criterion-referenced) and relative (norm-referenced) grading. We hypothesize that the rank-order tournament created by relative grading will increase effort provision and performance among students with competitive preferences. We use student gender and survey measures (self-reported as well as incentivized) as proxies for competitiveness. Contrary to our expectations, we find no significant impact of relative grading on preparation behavior or exam scores, neither among men nor among students with higher measures of competitiveness. We discuss several potential explanations for this finding, and argue that it is likely attributable to the low value that students in our sample attach to academic excellence.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号