期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The Effect of Drag-and-Drop Item Features on Test-Taker Performance and Response Strategies

Burcu Arslan Yang Jiang Madeleine Keehner Tao Gong Irvin R. Katz Fred Yan 《Educational Measurement》2020,39(2):96-106

Computer-based educational assessments often include items that involve drag-and-drop responses. There are different ways that drag-and-drop items can be laid out and different choices that test developers can make when designing these items. Currently, these decisions are based on experts’ professional judgments and design constraints, rather than empirical research, which might threaten the validity of interpretations of test outcomes. To this end, we investigated the effect of drag-and-drop item features on test-taker performance and response strategies with a cognition-centered approach. Four hundred and seventy-six adult participants solved content-equivalent drag-and-drop mathematics items under five design variants. Results showed that: (a) test takers’ performance and response strategies were affected by the experimental manipulations, and (b) test takers mostly used cognitively efficient response strategies regardless of the manipulated item features. Implications of the findings are provided to support test developers’ design decisions. 相似文献

2.

An exploration of the use of eye‐gaze tracking to study problem‐solving on standardized science assessments

《International Journal of Research & Method in Education》2013,36(2):185-208

相似文献

3.

Do images influence assessment in anatomy? Exploring the effect of images on item difficulty and item discrimination

Marc A. T. M. Vorstenbosch Tim P. F. M. Klaassen Jan G. M. Kooloos Sanneke M. Bolhuis Roland F. J. M. Laan 《Anatomical sciences education》2013,6(1):29-41

Anatomists often use images in assessments and examinations. This study aims to investigate the influence of different types of images on item difficulty and item discrimination in written assessments. A total of 210 of 460 students volunteered for an extra assessment in a gross anatomy course. This assessment contained 39 test items grouped in seven themes. The answer format alternated per theme and was either a labeled image or an answer list, resulting in two versions containing both images and answer lists. Subjects were randomly assigned to one version. Answer formats were compared through item scores. Both examinations had similar overall difficulty and reliability. Two cross‐sectional images resulted in greater item difficulty and item discrimination, compared to an answer list. A schematic image of fetal circulation led to decreased item difficulty and item discrimination. Three images showed variable effects. These results show that effects on assessment scores are dependent on the type of image used. Results from the two cross‐sectional images suggest an extra ability is being tested. Data from a scheme of fetal circulation suggest a cueing effect. Variable effects from other images indicate that a context‐dependent interaction takes place with the content of questions. The conclusion is that item difficulty and item discrimination can be affected when images are used instead of answer lists; thus, the use of images as a response format has potential implications for the validity of test items. Anat Sci Educ © 2012 American Association of Anatomists. 相似文献

4.

Helping preservice teachers realize the transformative potential of read alouds

Rebecca Stortz Miriam Martinez Raquel Cataldo Lucinda Marie Juarez 《Journal of Early Childhood Teacher Education》2013,34(3):238-255

ABSTRACT

A common instructional practice in early childhood classrooms is the picturebook read aloud. The purpose of this investigation was to help preservice teachers learn to plan picturebook read alouds with the goal of helping children interpret the visual affordances of picturebooks, including visual elements (e.g., cclor, line) and peritextual components (e.g., title page, endpapers) as they engage in collaborative meaning making. In this qualitative study, the participants were 12 preservice teachers enrolled in an undergraduate children’s literature course. The findings presented here are focused on three participants that represented a range of growth and understandings about the picturebook format as well as growth and understandings about designing read aloud lessons. Findings indicate that while participants grew in their understandings of picturebook formats and in their ability to develop read aloud lesson plans with a visual focus, the rates and ways in which this occurred varied. Furthermore, the findings also suggest that crafting read alouds with a focus on visual affordances is a complex process and deserves careful attention in teacher preparation programs. 相似文献

5.

Assessment of Genetics Understanding

Philipp Schmiemann Ross H. Nehm Robyn E. Tornabene 《Science & Education》2017,26(10):1161-1191

Understanding how situational features of assessment tasks impact reasoning is important for many educational pursuits, notably the selection of curricular examples to illustrate phenomena, the design of formative and summative assessment items, and determination of whether instruction has fostered the development of abstract schemas divorced from particular instances. The goal of our study was to employ an experimental research design to quantify the degree to which situational features impact inferences about participants’ understanding of Mendelian genetics. Two participant samples from different educational levels and cultural backgrounds (high school, n = 480; university, n = 444; Germany and USA) were used to test for context effects. A multi-matrix test design was employed, and item packets differing in situational features (e.g., plant, animal, human, fictitious) were randomly distributed to participants in the two samples. Rasch analyses of participant scores from both samples produced good item fit, person reliability, and item reliability and indicated that the university sample displayed stronger performance on the items compared to the high school sample. We found, surprisingly, that in both samples, no significant differences in performance occurred among the animal, plant, and human item contexts, or between the fictitious and “real” item contexts. In the university sample, we were also able to test for differences in performance between genders, among ethnic groups, and by prior biology coursework. None of these factors had a meaningful impact upon performance or context effects. Thus some, but not all, types of genetics problem solving or item formats are impacted by situational features. 相似文献

6.

The Effect of the Most-Attractive-Distractor Location on Multiple-Choice Item Difficulty

Jinnie Shin Okan Bulut Mark J. Gierl 《Journal of Experimental Education》2020,88(4):643-659

Abstract

The arrangement of response options in multiple-choice (MC) items, especially the location of the most attractive distractor, is considered critical in constructing high-quality MC items. In the current study, a sample of 496 undergraduate students taking an educational assessment course was given three test forms consisting of the same items but the positions of the most attractive distractor varied across the forms. Using a multiple-indicators–multiple-causes (MIMIC) approach, the effects of the most attractive distractor's positions on item difficulty were investigated. The results indicated that the relative placement of the most attractive distractor and the distance between the most attractive distractor and the keyed option affected students’ response behaviors. Moreover, low-achieving students were more susceptible to response-position changes than high-achieving students. 相似文献

7.

Evaluating the Comparability of Paper‐ and Computer‐Based Science Tests Across Sex and SES Subgroups

Jennifer Randall Stephen Sireci Xueming Li Leah Kaira 《Educational Measurement》2012,31(4):2-12

As access and reliance on technology continue to increase, so does the use of computerized testing for admissions, licensure/certification, and accountability exams. Nonetheless, full computer‐based test (CBT) implementation can be difficult due to limited resources. As a result, some testing programs offer both CBT and paper‐based test (PBT) administration formats. In such situations, evidence that scores obtained from different formats are comparable must be gathered. In this study, we illustrate how contemporary statistical methods can be used to provide evidence regarding the comparability of CBT and PBT scores at the total test score and item levels. Specifically, we looked at the invariance of test structure and item functioning across test administration mode across subgroups of students defined by SES and sex. Multiple replications of both confirmatory factor analysis and Rasch differential item functioning analyses were used to assess invariance at the factorial and item levels. Results revealed a unidimensional construct with moderate statistical support for strong factorial‐level invariance across SES subgroups, and moderate support of invariance across sex. Issues involved in applying these analyses to future evaluations of the comparability of scores from different versions of a test are discussed. 相似文献

8.

Item Arrangement,Cognitive Entry Characteristics,Sex, and Test Anxiety as Predictors of Achievement Examination Performance

《Journal of Experimental Education》2012,80(4):214-219

This study’s general research question was: Given male and female students in an introductory educational psychology course who vary in cognitive entry characteristics and test anxiety, how do three item arrangements (easy to difficult, difficult to easy, and random) located within a 50-item multiple-choice achievement examination influence students’ total test performance? Two hierarchical multiple regression analyses were used to analyze the data. The four predictor variables and their interactions were tested for the amount of variation that they explained in the dependent variable. The main finding within the context of this study is that item arrangements based on item difficulties do not influence achievement examination performance. 相似文献

9.

“…If we were cavemen we'd be fine”: Facebook as a catalyst for critical literacy learning by dyslexic sixth‐form students

Owen Barden 《Literacy》2012,46(3):123-132

This article is derived from a study of the use of Facebook as an educational resource by five dyslexic students at a sixth form college in north‐west England. Through a project in which teacher‐researcher and student‐participants co‐constructed a group Facebook page about the students’ scaffolded research into dyslexia, the study examined the educational affordances of a digitally mediated social network. An innovative, flexible, experiential methodology combining action research and case study with an ethnographic approach was devised. This enabled the use of multiple mixed methods, capturing much of the rich complexity of the students’ online and offline interactions with each other and with digital media as they contributed to the group and co‐constructed their group Facebook page. Social perspectives on dyslexia and multiliteracies were used to help interpret the students’ engagement with the social network and thereby deduce its educational potential. The research concludes that as a digitally mediated social network, Facebook engages the students in active, critical learning about and through literacies in a rich and complex semiotic domain. Offline dialogue plays a crucial role. This learning is reciprocally shaped by the students’ developing identities as both dyslexic students and able learners. The findings suggest that social media can have advantageous applications for literacy learning in the classroom. In prompting learning yet remaining unchanged by it, Facebook can be likened to a catalyst. 相似文献

10.

Accounting for unexpected test responses through examinees’ and their teachers’ explanations

Alexandra Petridou Julian Williams 《Assessment in Education: Principles, Policy & Practice》2010,17(4):357-382

Researchers have developed indices to identify persons whose test results ‘misfit’ and are considered statistically ‘aberrant’ or ‘unexpected’ and whose measures are consequently potentially invalid, drawing the test’s validity into question. This study draws on interviews of pupils and their teachers, using a sample of 31 10‐year‐olds who were flagged as most ‘aberrant’ in a standardised mathematics test. The children’s and their teachers’ explanations were analysed and attributed: (i) to item‐, person‐ (self/other) and classroom‐levels; and ii) according to causal dimensions. Children’s and teachers’ explanations were mostly in agreement in relation to unexpected negative results and they included references to previously well‐cited sources of construct‐irrelevant variance (e.g. ineffective test‐taking strategies, careless mistakes) as well as construct‐relevant variance (e.g. misconceptions, weaknesses in particular topics). Findings of this exploratory study are discussed from a test validity and attribution theory perspective: we conclude that this approach offers grounds for multi‐level explanations of person misfit and that this qualitative research approach to unexpected responses is worthy of more attention. 相似文献

11.

Performing support in higher education: negotiating conflicting agendas in academic language and learning advisory work

L. Gurney V. Grossi 《高等教育研究与发展》2019,38(5):940-953

ABSTRACT

This article critically explores neoliberal administrative priorities and the ways in which they shape and constrain academic language and learning (ALL) advisory practice in the Australian higher education sector. Drawing on semi-structured interview data collected from in-service ALL advisors working in Australian universities, we highlight a disjuncture between the participants’ understandings of effective practice and the institutional framing of ALL advisory work as a mechanism to improve accessibility of higher study and retention of diversifying student cohorts. The data indicate the potential for institutional priorities to generate dilemmas in advisors’ daily practice as they are tasked with performing support via formats considered antithetical to meeting students’ needs and fostering productive approaches to study. Under these circumstances, participants characterised collaboration between advisors and content specialists in the pursuit of more embedded approaches as dependent on individual endeavours and stymied by a lack of institutional support. We call for renewed consideration of the formats and affordances with which advisors work in order to better align ALL advisory practice with the holistic development of students’ academic literacy skills. 相似文献

12.

Ethos of Ambiguity: Artist Teachers and the Transparency Exclusion Paradox

Miranda Matthews 《The International Journal of Art & Design Education》2019,38(4):853-866

Addressing changes in conditions for practitioners that can be related to education policy in England and Wales since 2010, this article presents issues faced by teachers of art and design and their responses in practice. The current insistence on transparency in education emerges through policy that audits performativity, in a limiting skills bank. Practitioners in art and design are particularly affected by what I term ‘the transparency‐exclusion paradox’, as they battle to maintain the subject area and are ‘othered’ by the English Baccalaureate and Progress 8. I will discuss an emergent ‘ethos of ambiguity’ among artist‐teachers and contemporary artists, with a theoretical basis informed by Beauvoir and Foucault. Empirical data from research participants will be evidenced, to explore strategies of response in inclusive social practice. This article adds to literature that considers the effects of policy in implementation and it contributes to research on creative expressions of ambiguity in the arts. 相似文献

13.

Examinees' Perceptions of Feedback in Applied Performance Testing: The Case of the National Board for Professional Teaching Standards

Hye K. Pae 《Educational Assessment》2013,18(2):97-115

This study investigated the role of item formats in the performance of 206 nonnative speakers of English on expressive skills (i.e., speaking and writing). Test scores were drawn from the field test of the Pearson Test of English Academic for Chinese, French, Hebrew, and Korean native speakers. Four item formats, including multiple-choice questions asking for a single answer (SAMC), multiple-choice questions allowing for multiple answers (MAMC), gap-filling, and summarizing items, were examined in relation to expressive skills. The results showed that, although the four groups showed different score distributions, their first language itself did not account for a significant variance in the expressive skills. The summarizing item format assessing listening skills accounted for the greatest variance in the test takers' expressive skills. The SAMC format explained consistently a smaller variance than that of MAMC in the expressive skills measured. Unlike the findings of previous research, no gender difference was found. 相似文献

14.

Measurement Efficiency of Innovative Item Formats in Computer-Based Testing

Michael G. Jodoin 《Journal of Educational Measurement》2003,40(1):1-15

The psychometric literature provides little empirical evaluation of examinee test data to assess essential psychometric properties of innovative items. In this study, examinee responses to conventional (e.g., multiple choice) and innovative item formats in a computer-based testing program were analyzed for IRT information with the three-parameter and graded response models. The innovative item types considered in this study provided more information across all levels of ability than multiple-choice items. In addition, accurate timing data captured via computer administration were analyzed to consider the relative efficiency of the multiple choice and innovative item types. As with previous research, multiple-choice items provide more information per unit time. Implications for balancing policy, psychometric, and pragmatic factors in selecting item formats are also discussed. 相似文献

15.

Examining learners’ perspective of taking a MOOC: reasons,excitement, and perception of usefulness

M. Liu J. Kang E. McKelroy 《Educational Media International》2015,52(2):129-146

In recent years, massive open online Courses (MOOCs) as an online instruction format have attracted educators’ attention in higher education. While there are many news reports and blog entries about MOOCs, evidence-based research is still emerging. Research examining the learners’ perspective on taking a MOOC is scarce but very much needed. This study, using both quantitative and qualitative data, investigated participants’ reasons and excitement levels to take a MOOC and their perception of the usefulness of the course. The findings indicated that the majority of the participants were working professionals who sought to get opportunities and resources for their career development without the constraints of their geographical locations and time. Flexibility of the course schedule, credibility of the instructor, and quality of the materials are important factors for these individuals. The findings highlighted the importance of good pedagogies regardless if the platform is a MOOC, face-to-face, or other online formats; the hands-on nature was the most helpful aspect of this MOOC. The findings also showed that course design is important as difficult navigations and not-so-intuitive interface affected participants’ learning experience and perception of the course negatively. 相似文献

16.

Measurement Properties of Two Innovative Item Formats in a Computer-Based Test

Lei Wan George A. Henly 《教育实用测度》2013,26(1):58-78

Many innovative item formats have been proposed over the past decade, but little empirical research has been conducted on their measurement properties. This study examines the reliability, efficiency, and construct validity of two innovative item formats—the figural response (FR) and constructed response (CR) formats used in a K–12 computerized science test. The item response theory (IRT) information function and confirmatory factor analysis (CFA) were employed to address the research questions. It was found that the FR items were similar to the multiple-choice (MC) items in providing information and efficiency, whereas the CR items provided noticeably more information than the MC items but tended to provide less information per minute. The CFA suggested that the innovative formats and the MC format measure similar constructs. Innovations in computerized item formats are reviewed, and the merits as well as challenges of implementing the innovative formats are discussed. 相似文献

17.

Probabilistic Approaches to Examining Linguistic Features of Test Items and Their Effect on the Performance of English Language Learners

Guillermo Solano-Flores 《教育实用测度》2014,27(4):236-247

This article addresses validity and fairness in the testing of English language learners (ELLs)—students in the United States who are developing English as a second language. It discusses limitations of current approaches to examining the linguistic features of items and their effect on the performance of ELL students. The article submits that these limitations stem from the fact that current ELL testing practices are not effective in addressing three basic notions on the nature of language and the linguistic features of test items: (a) language is a probabilistic phenomenon, (b) the linguistic features of test items are multidimensional and interconnected, and (c) each test item has a unique set of linguistic affordances and constraints. Along with the limitations of current testing practices, for each notion, the article discusses evidence of the effectiveness of several probabilistic approaches to examining the linguistic features of test items in ELL testing. 相似文献

18.

More evidence that less is better: Sub-optimal choice in dogs

Rebecca J. Chase David N. George 《Learning & behavior》2018,46(4):462-471

The less-is-better effect is a preference for the lesser of two alternatives sometimes observed when they are evaluated separately. For example, a dinner service of 24 intact pieces might be judged to be more valuable than a 40-piece dinner service containing nine broken pieces. Pattison and Zentall (Animal Cognition, 17: 1019-1022, 2014) reported similar sub-optimal choice behavior in dogs using a simultaneous choice procedure. Given a choice between a single high-value food item (cheese) or an equivalent high-value item plus a lower-value food item (carrot), their dogs chose the individual item. In a subsequent test, the dogs preferred two high-value items to a single high-value item, suggesting that avoidance of multiple items did not cause the sub-optimal choice behavior. In two experiments, we replicated Pattison and Zentall’s procedure while including additional controls. In Experiment 1, habituation of neophobia for multiple items was controlled for by intermixing the two types of test trial within a single experimental session. In Experiment 2, we controlled for avoidance of heterogeneous rewards by including test trials in which a choice was offered between the combination of items and a single low-value item. In both experiments we observed sub-optimal choice behavior which could not be explained by either of these putative mechanisms. Our results, as well as those of Pattison and Zentall, are consistent with the suggestion that dogs’ assessment of the total value of multiple items is based, at least partly, on their average quality. 相似文献

19.

EFFECTS OF A PROFESSIONAL DEVELOPMENT PROGRAM ON BEHAVIORAL ENGAGEMENT OF STUDENTS IN MIDDLE AND HIGH SCHOOL

Anne Gregory Joseph P. Allen Amori Y. Mikami Christopher A. Hafen Robert C. Pianta 《Psychology in the schools》2014,51(2):143-163

Student behavioral engagement is a key condition supporting academic achievement, yet student disengagement in middle and high schools is all too common. The current study used a randomized controlled design to test the efficacy of the My Teaching Partner‐Secondary program to increase behavioral engagement. The program offers teachers personalized coaching and systematic feedback on teachers’ interactions with students, based on systematic observation of videorecordings of teacher‐student interactions in the classroom. The study found that intervention teachers had significantly higher increases, albeit to a modest degree, in student behavioral engagement in their classrooms after 1 year of involvement with the program compared to the teachers in the control group (explaining 4% of variance). In exploratory analyses, two dimensions of teachers’ interactions with students—their focus on analysis and problem solving during instruction and their use of diverse instructional learning formats—acted as mediators of increased student engagement. The findings offer implications for new directions in teacher professional development and for understanding the classroom as a setting for adolescent development. 相似文献

20.

Impact of Both Local Item Dependencies and Cut‐Point Locations on Examinee Classifications

下载免费PDF全文

Jonathan D. Rubright 《Educational Measurement》2018,37(3):40-45

Performance assessments, scenario‐based tasks, and other groups of items carry a risk of violating the local item independence assumption made by unidimensional item response theory (IRT) models. Previous studies have identified negative impacts of ignoring such violations, most notably inflated reliability estimates. Still, the influence of this violation on examinee ability estimates has been comparatively neglected. It is known that such item dependencies cause low‐ability examinees to have their scores overestimated and high‐ability examinees' scores underestimated. However, the impact of these biases on examinee classification decisions has been little examined. In addition, because the influence of these dependencies varies along the underlying ability continuum, whether or not the location of the cut‐point is important in regard to correct classifications remains unanswered. This simulation study demonstrates that the strength of item dependencies and the location of an examination systems’ cut‐points both influence the accuracy (i.e., the sensitivity and specificity) of examinee classifications. Practical implications of these results are discussed in terms of false positive and false negative classifications of test takers. 相似文献