期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The generalizability of student ratings of instructors: Item specificity and section effects

Terence J. Crooks Michael T. Kane 《Research in higher education》1981,15(4):305-313

A number of recent studies have used generalizability theory to examine the dependability of student ratings of instruction. This study extends this line of research by examining the consistency of ratings between different sections of a course taught in a given semester by the same instructor, and by comparing the performance of global- and attribute-type instructor rating items. Five samples of physics instructors, varying in size from 5 to 12 instructors, were rated by their students on a form containing two global and eight attribute items. Each instructor taught two sections of a course. The study found that the section effect was small (ratings of instructors were consistent across different sections of the same course), and that the generalizability of ratings was substantially influenced by item specificity. For summary purposes, one global item seemed sufficient. 相似文献

2.

EXAMINING THE USE OF SPACING EFFECT TO INCREASE THE EFFICIENCY OF INCREMENTAL REHEARSAL

Sarah E. Swehla Matthew K. Burns Anne F. Zaslofsky Matthew S. Hall Sashank Varma Robert J. Volpe 《Psychology in the schools》2016,53(4):404-415

Incremental rehearsal (IR) is a highly effective intervention that uses high repetition and a high ratio of known to unknown items with linearly spaced known items between the new items. It has been hypothesized that narrowly spaced practice would result in quick learning, whereas items that are widely spaced would result in longer‐term retention. The current study examined the effect of spacing by teaching vocabulary words to 36 fourth‐grade students. Each student was randomly assigned to a widely spaced IR condition (i.e., one unknown item, one known item, one unknown item, two known items, one unknown item, three known items, and an increase in the number of known items presented each time by one) or an IR condition in which spacing increased exponentially (IR‐Exp; i.e., one unknown item, one known item, one unknown item, two known items, one unknown item, four known items, and one unknown item, eight known items). The results indicated that the students in the study retained twice as much information with the widely spaced IR than with the IR‐Exp condition, but the latter required half as much time. IR and IR‐Exp were equally efficient, but IR continues to be superior to all other flashcard approaches in improving retention. 相似文献

3.

Faculty ratings of course evaluation items

John B. Francis 《Research in higher education》1976,4(1):23-40

Instructors whose teaching was evaluated by students were given the opportunity to rate how applicable the evaluation items were to their classes. This study examined the kinds of items which instructors felt to be applicable or inapplicable, the relationships between the student ratings and the instructor applicability ratings, and the effect on an overall evaluation score of using the instructor applicability judgments as weights.Results generally support the consensus procedure of establishing rating forms; they suggest that the common criticism that faculty judgments of item applicability are influenced by anticipation of student ratings may be true for specific items and that while weighting composite evaluation scores by means of faculty applicability judgments does not affect those overall scores, the distributions of certain items may be altered. 相似文献

4.

A comparison of difficulty and discrimination values of selected true-false item types

Douglas Barker Robert L Ebel 《Contemporary educational psychology》1982,7(1):35-40

Thirty-eight undergraduate students were randomly assigned one of two alternate forms of a 144-item true-false midterm examination. Whenever a statement appeared on one form as true and positively stated, it appeared on the alternate form as false and negatively stated. Similarly, a false and positively stated item on one form was true and negatively stated on the other. The subject matter of the two forms was identical and the four kinds of true-false items were equally represented on each form. Difficulty and discrimination indices were computed for each of the four item types. The statistical results showed negatively stated items were more difficult, but no more discriminating, than positively stated items. Also, false items were not statistically more difficult than true items, but were significantly more discriminating. It was concluded that test constructors should include more false items than true items in their instruments and that all items should be stated positively. 相似文献

5.

STUDENT EVALUATION OF INSTRUCTION: PERCEPTIONS OF COMMUNITY COLLEGE STUDENTS,FACULTY, AND ADMINISTRATORS

William E. Piland 《Community College Journal of Research & Practice》2013,37(1-4):115-125

The purpose of this study was to compare the opinions of students, teachers, and administrators relative to student evaluation of instruction in selected community colleges. While important educational decisions in community colleges are made on the basis of students’ evaluations (as in retention, promotion, tenure, and pay), little has been accomplished in testing the assumptions behind student evaluation of instruction. The student evaluation process assumes that students are honest, serious, and evaluate instruction, not some incidental activity.

A 25‐item Student Evaluation Process Scale was completed by 607 students, 130 faculty, and 45 administrators in five Illinois community colleges. Findings revealed little significant differences in the opinions of students regarding evaluation of instruction based on variables of sex, age, school location, student type (transfer or occupational), and class standing. There were little significant differences in faculty opinion and within the administrative groups based on selected variables. There were significant differences when the opinions of students, faculty, and administrators were compared. Students and faculty tended to agree with those items that questioned the objectivity of student evaluation of instruction. Administrators and students tended to agree with items reflecting the seriousness with which students evaluate instruction. Faculty and administrators indicated that student evaluation of instruction impacted faculty members’ instructional performances. Neither students, faculty, nor administrators supported the concept of merit pay tied to student evaluation of instruction.

The role of student evaluation of instruction in a faculty evaluation system must be investigated. A variety of groups should participate in this investigation. 相似文献

6.

Development of an item bank for assessing generic competences in a higher-education institute: a Rasch modelling approach

Qin Xie Xiaoling Zhong Wen-Chung Wang Cher Ping Lim 《高等教育研究与发展》2014,33(4):821-835

This paper describes the development and validation of an item bank designed for students to assess their own achievements across an undergraduate-degree programme in seven generic competences (i.e., problem-solving skills, critical-thinking skills, creative-thinking skills, ethical decision-making skills, effective communication skills, social interaction skills and global perspective). The Rasch modelling approach was adopted for instrument development and validation. A total of 425 items were developed. The content validity of these items was examined via six focus group interviews with target students, and the construct validity was verified against data collected from a large student sample (N?=?1151). A matrix design was adopted to assemble the items in 26 test forms, which were distributed at random in each administration session. The results demonstrated that the item bank had high reliability and good construct validity. Cross-sectional comparisons of Years 1–4 students revealed patterns of changes over the years. Correlation analyses shed light on the relationships between the constructs. Implications are drawn to inform future efforts to develop the instrument, and suggestions are made regarding ways to use the instrument to enhance the teaching and learning of generic skills. 相似文献

7.

An NCME Instructional Module on Booklet Designs in Large-Scale Assessments of Student Achievement: Theory and Practice 总被引：2，自引：0，他引：2

Andreas Frey Johannes Hartig André A. Rupp 《Educational Measurement》2009,28(3):39-53

In most large-scale assessments of student achievement, several broad content domains are tested. Because more items are needed to cover the content domains than can be presented in the limited testing time to each individual student, multiple test forms or booklets are utilized to distribute the items to the students. The construction of an appropriate booklet design is a complex and challenging endeavor that has far-reaching implications for data calibration and score reporting. This module describes the construction of booklet designs as the task of allocating items to booklets under context-specific constraints. Several types of experimental designs are presented that can be used as booklet designs. The theoretical properties and construction principles for each type of design are discussed and illustrated with examples. Finally, the evaluation of booklet designs is described and future directions for researching, teaching, and reporting on booklet designs for large-scale assessments of student achievement are identified. 相似文献

8.

Following student ratings over time with a catalog-based system

David M. Gray Dale C. Brandenburg 《Research in higher education》1985,22(2):155-168

First, the longitudinal nature of student ratings of instructors has not received deserved emphasis from researchers. Second, the use of item banks for designing student rating questionnaires, especially for instructor feedback, has needed attention. These two factors are investigated in this study, which tracks 304 instructors over a four semester period. It was found that the type of questionnaire generated from the item bank led to statistically significant differences among designated groups. The longitudinal analysis, however, indicated only minor improvement over time, regardless of whether or not an instructor chose to use items yielding specific feedback on the instructional components of a course. Additionally, although main effect differences were noted between teaching assistants and regular faculty, other results were very similar. 相似文献

9.

Evaluating Statistical Targets for Assembling Parallel Mixed‐Format Test Forms

下载免费PDF全文

Dries Debeer Usama S. Ali Peter W. van Rijn 《Journal of Educational Measurement》2017,54(2):218-242

Test assembly is the process of selecting items from an item pool to form one or more new test forms. Often new test forms are constructed to be parallel with an existing (or an ideal) test. Within the context of item response theory, the test information function (TIF) or the test characteristic curve (TCC) are commonly used as statistical targets to obtain this parallelism. In a recent study, Ali and van Rijn proposed combining the TIF and TCC as statistical targets, rather than using only a single statistical target. In this article, we propose two new methods using this combined approach, and compare these methods with single statistical targets for the assembly of mixed‐format tests. In addition, we introduce new criteria to evaluate the parallelism of multiple forms. The results show that single statistical targets can be problematic, while the combined targets perform better, especially in situations with increasing numbers of polytomous items. Implications of using the combined target are discussed. 相似文献

10.

Using a single score for summative teacher evaluation by students 总被引：1，自引：1，他引：1

Nira Hativa Alona Raviv 《Research in higher education》1993,34(5):625-646

In spite of the hundreds of studies done on teacher evaluation by student ratings, there are still several major controversial issues of how to construct evaluation instruments that best serve their purpose. The present study contributes to one of the ongoing debates of recent years that addresses the number of teaching dimensions to be considered in decision making, namely, whether to use a single score or multiple scores for teacher evaluation. The study demonstrates using a short form on which the global score overall teaching performance can almost perfectly predict the mean of all teacher-attribute items. The questionnaire used in this study was administered eight times—twice a semester for two years—for all faculty members and TAs in the departments of physics and chemistry at Tel Aviv University. The composition of teacher attribute items was different for the faculty members and TAs, reflecting their different teaching functions—lectures versus recitation problem solving. Results show that while for the faculty the global score can simply replace the mean of all instructor-attribute items and serve as a single score that faithfully represents all dimensions of teacher ratings, for the TAs, a linear transformation is needed. 相似文献

11.

Transformational classroom leadership: a novel approach to evaluating classroom performance

James S. Pounder 《Assessment & Evaluation in Higher Education》2008,33(3):233-243

In higher education, student evaluation of teaching is widely used as a measure of an academic’s teaching performance despite considerable disagreement as to its value. This paper begins by examining the merit of teaching evaluations with reference to the factors influencing the accuracy of the teaching evaluation process. One of the central assumptions on which student evaluation of teaching is based is that there is a relationship between student achievement and student rating of teachers. However, the findings of the majority of studies do not support this assumption. The absence of a strong link between student achievement and teaching evaluations suggests that there is scope for examining other approaches to measuring effective classroom dynamics. This paper presents such an approach based on the notion of transformational classroom leadership. 相似文献

12.

Rikkert M. van der Lans Wim J.C.M. van de Grift Klaas van Veen 《Educational Measurement》2019,38(3):55-64

Using item response theory, this study explores whether student survey and classroom observation items can be calibrated onto a common metric of teaching quality. The data comprises 269 lessons of 141 teachers that were scored on the International Comparative Analysis of Learning and Teaching (ICALT) observation instrument and the My Teacher student survey. Using Rasch model concurrent calibration, items from both instruments were calibrated onto a common one‐dimensional metric of teaching quality. Most items were found to fit the model. Challenges pertain mainly to items measuring teaching students learning strategies and differentiation. Explanations for these difficulties are discussed. 相似文献

13.

Student evaluation of teaching: the use of best–worst scaling

Twan Huybers 《Assessment & Evaluation in Higher Education》2014,39(4):496-513

An important purpose of student evaluation of teaching is to inform an educator’s reflection about the strengths and weaknesses of their teaching approaches. Quantitative instruments are one way of obtaining student responses. They have traditionally taken the form of surveys in which students provide their responses to various statements using item-by-item agree/disagree ratings. Previous research has identified shortcomings of such rating scales, including response bias and the associated lack of discrimination amongst the items evaluated. In this paper, best–worst scaling is proposed as a novel method for quantitative teaching evaluation. The way in which best–worst scaling can be used in this context is illustrated in three different applications. Two applications demonstrate how it can be used for evaluations in a small-size classroom environment. The third application is a broader evaluation of university courses on a larger scale. In comparison with conventional rating scales, the best–worst scaling approach enables better highlighting of the differences between evaluation items. In doing so, it can provide enhanced guidance to educators in their reflection about their teaching. Moreover, implementation and analysis of a best–worst scaling evaluation is relatively straightforward, which establishes it a feasible method for teaching practitioners and researchers. 相似文献

14.

The Scaling of Mixed-Item-Format Tests With the One-Parameter and Two-Parameter Partial Credit Models

Robert C. Sykes Wendy M. Yen 《Journal of Educational Measurement》2000,37(3):221-244

Item response theory scalings were conducted for six tests with mixed item formats. These tests differed in their proportions of constructed response (c.r.) and multiple choice (m.c.) items and in overall difficulty. The scalings included those based on scores for the c.r. items that had maintained the number of levels as the item rubrics, either produced from single ratings or multiple ratings that were averaged and rounded to the nearest integer, as well as scalings for a single form of c.r. items obtained by summing multiple ratings. A one-parameter (IPPC) or two-parameter (2PPC) partial credit model was used for the c.r. items and the one-parameter logistic (IPL) or three-parameter logistic (3PL) model for the m.c. items, ltem fit was substantially worse with the combination IPL/IPPC model than the 3PL/2PPC model due to the former's restrictive assumptions that there would be no guessing on the m.c. items and equal item discrimination across items and item types. The presence of varying item discriminations resulted in the IPL/IPPC model producing estimates of item information that could be spuriously inflated for c.r. items that had three or more score levels. Information for some items with summed ratings were usually overestimated by 300% or more for the IPL/IPPC model. These inflated information values resulted in under-estbnated standard errors of ability estimates. The constraints posed by the restricted model suggests limitations on the testing contexts in which the IPL/IPPC model can be accurately applied. 相似文献

15.

大学学生评教的实证分析 总被引：13，自引：0，他引：13

宋光辉《教学研究(河北)》2002,25(4):317-320

结合大学学生对课堂教学评价的实证分析，分析了影响学生评教指标体系设计的因素，同时论述了学生评教结果与班级大小，课程性质，教师职称，课程学时之间的关系。相似文献

16.

Gender-Based Differential Item Performance in Mathematics Achievement Items

Allen E. Doolittle T. Anne Cleary 《Journal of Educational Measurement》1987,24(2):157-166

A procedure for the detection of differential item performance (DIP) is used to investigate the relationships between characteristics of mathematics achievement items and gender differences in performance. Eight randomly equivalent samples of high school seniors were each given a unique form of the ACT Assessment Mathematics Usage Test (ACTM). Students without requisite mathematics courses were deleted from the samples to reduce the confounding effects of differences in instruction at the high school level. Signed measures of DIP were obtained for each item in the eight ACTM forms. These DIP estimates were then analyzed in a 6 × 8 (item category by form) experimental design. A significant item category effect was found indicating a relationship between item characteristics and gender-based DIP. Predictions, based on previous research about the categories of items that would contribute to gender-based DIP, were supported: Geometry and mathematics reasoning items were relatively more difficult for female examinees and the more algorithmic, computation-oriented items were relatively easier. 相似文献

17.

An empirical examination of the construct validity of goal commitment in the persistence process

David Allen Ph.D. Amaury Nora Ph.D. 《Research in higher education》1995,36(5):509-533

This study represents the first published investigation into the construct validity of goal commitment as it affects the persistence process. Confirmatory factor analyses revealed that goal commitment could be decomposed into multiple indicators of the same latent construct: a special factor called goal commitment that groups items related to goal importance, specificity of goals, and situational influence; a second factor represented by items indicating certainty of purpose; and a third factor consisting of items related to goals in general. The predictive validity of each subcomponent on different outcomes related to student persistence was established. While goal commitment was found to have a significant direct effect on both students' intents to persist and actual persistence behavior, neither of the other two factors were as equally predictive as measures of student retention. 相似文献

18.

Estimating Average Domain Scores

Mary Pommerich W. Alan Nicewander Bradley A. Hanson 《Journal of Educational Measurement》1999,36(3):199-216

A simulation study was performed to determine whether a group's average percent correct in a content domain could be accurately estimated for groups taking a single test form and not the entire domain of items. Six Item Response Theory based domain score estimation methods were evaluated, under conditions of few items per content area perform taken, small domains, and small group sizes. The methods used item responses to a single form taken to estimate examinee or group ability; domain scores were then computed using the ability estimates and domain item characteristics. The IRT-based domain score estimates typically showed greater accuracy and greater consistency across forms taken than observed performance on the form taken. For the smallest group size and least number of items taken, the accuracy of most IRT-based estimates was questionable; however, a procedure that operates on an estimated distribution of group ability showed promise under most conditions. 相似文献

19.

The construct validity of Institutional Commitment: A confirmatory factor analysis 总被引：1，自引：0，他引：1

Amaury Nora Alberto F. Cabrera 《Research in higher education》1993,34(2):243-262

The present study examined the underlying structure of the variable Institutional Commitment by testing for the convergence, or lack thereof, among different indicators of the construct as represented by three theoretical frameworks (Tinto, 1975, 1987; Bean, 1985; Huselid and Day, 1991). Confirmatory factor analyses revealed that Institutional Commitment could be decomposed into two multiple indicators of the same latent construct: a general factor that groups items related to institutional quality, practical value of an education, utility of an education, fit between student and institution, and loyalty to the institution and another factor represented by items indicating similarity of values (Affinity of Values). Moreover, the study established the predictive validity of each subcomponent on different outcomes related to student persistence. While Institutional Commitment was found to have a significant direct effect on both students' intents to persist and actual persistence behavior, Affinity of Values was not as equally predictive of measures of student retention.Paper presented before the 1991 ASHE Annual Meeting. Boston, Massachusetts. 相似文献

20.

国外大规模开放教育资源设计理念及启示——基于Coursera平台MOOC课程的体验研究

王海荣张伟《天津电大学报》2013,(3):32-36

MOOC作为一种新型的在线教学模式冲击着全球学校教育的改革与发展。对于一线教育实践者来说,MOOC的大规模在线教育教学法是值得学习和借鉴的精髓所在。笔者以在线学习者的身份参与Coursera多门课程进行体验,结合国内已有课程的形式和特点,总结提炼在课程内容组织、资源设计、师生交互及多元评价等方面的教学方法,并结合我国在线教育实践,提出几点思考与启示,希望为我国的MOOC推广及实践应用提供参考。相似文献