首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The performance of English language learners (ELLs) has been a concern given the rapidly changing demographics in US K-12 education. This study aimed to examine whether students' English language status has an impact on their inquiry science performance. Differential item functioning (DIF) analysis was conducted with regard to ELL status on an inquiry-based science assessment, using a multifaceted Rasch DIF model. A total of 1,396 seventh- and eighth-grade students took the science test, including 313 ELL students. The results showed that, overall, non-ELLs significantly outperformed ELLs. Of the four items that showed DIF, three favored non-ELLs while one favored ELLs. The item that favored ELLs provided a graphic representation of a science concept within a family context. There is some evidence that constructed-response items may help ELLs articulate scientific reasoning using their own words. Assessment developers and teachers should pay attention to the possible interaction between linguistic challenges and science content when designing assessment for and providing instruction to ELLs.  相似文献   

2.
The emphasis on scientific inquiry has increased the importance in developing the fundamental abilities to conduct scientific investigations and urged a need for valid assessments of students' inquiry abilities. We took advantage of the advanced technology to develop a simulation-based assessment of inquiry abilities (SAIA) that allowed students to generate scientific explanations and demonstrate their experimental abilities. This paper describes the validation of the assessment. Data were collected from 48 12th-grade students at a local high school who were categorized into three groups based on their program majors. Both quantitative and qualitative approaches were utilized to validate SAIA. The quantitative results showed that SAIA was aligned with a validated reasoning-skill test (criterion-related validity), discriminated variance among different groups (construct validity), and was highly suitable for examining inquiry abilities (content validity). Additionally, we utilized the think-aloud technique in order to identify the performances exhibited by students while they accomplished the SAIA tasks. The protocol analysis indicated that in general, students demonstrated the expected abilities in SAIA and that their SAIA scores accurately reflected their performance levels of inquiry abilities. The results suggested that SAIA was a valid assessment for evaluating the inquiry abilities of high school students. This study also provided systemic strategies for validating simulation-based assessments.  相似文献   

3.
This study explores measurement of a construct called knowledge integration in science using multiple-choice and explanation items. We use construct and instructional validity evidence to examine the role multiple-choice and explanation items plays in measuring students' knowledge integration ability. For construct validity, we analyze item properties such as alignment, discrimination, and target range on the knowledge integration scale using a Rasch Partial Credit Model analysis. For instructional validity, we test the sensitivity of multiple-choice and explanation items to knowledge integration instruction using a cohort comparison design. Results show that (1) one third of correct multiple-choice responses are aligned with higher levels of knowledge integration while three quarters of incorrect multiple-choice responses are aligned with lower levels of knowledge integration, (2) explanation items discriminate between high and low knowledge integration ability students much more effectively than multiple-choice items, (3) explanation items measure a wider range of knowledge integration levels than multiple-choice items, and (4) explanation items are more sensitive to knowledge integration instruction than multiple-choice items.  相似文献   

4.
This study established a Chinese scale for measuring high school students’ ocean literacy. This included testing its reliability, validity, and differential item functioning (DIF) with the aim of compensating for the lack of DIF tests focusing on current scales. The construct validity and reliability were verified and tested by analyzing the established scale’s items using the Rasch model, and a gender DIF test was conducted to ensure the test results’ fairness when distinct groups were compared simultaneously. The results indicated that the scale established in this study is unidimensional and possesses favorable internal consistency and construct validity. The gender DIF test results indicated that several items were difficult for either female or male students to correctly answer; however, the experts and scholars discussed these items individually and suggested retaining them. The final Chinese version of the ocean literacy scale developed here comprises 48 items that can reflect high school students’ understanding of ocean literacy—which helps students understand the topics of marine science encountered in real life.  相似文献   

5.
As access and reliance on technology continue to increase, so does the use of computerized testing for admissions, licensure/certification, and accountability exams. Nonetheless, full computer‐based test (CBT) implementation can be difficult due to limited resources. As a result, some testing programs offer both CBT and paper‐based test (PBT) administration formats. In such situations, evidence that scores obtained from different formats are comparable must be gathered. In this study, we illustrate how contemporary statistical methods can be used to provide evidence regarding the comparability of CBT and PBT scores at the total test score and item levels. Specifically, we looked at the invariance of test structure and item functioning across test administration mode across subgroups of students defined by SES and sex. Multiple replications of both confirmatory factor analysis and Rasch differential item functioning analyses were used to assess invariance at the factorial and item levels. Results revealed a unidimensional construct with moderate statistical support for strong factorial‐level invariance across SES subgroups, and moderate support of invariance across sex. Issues involved in applying these analyses to future evaluations of the comparability of scores from different versions of a test are discussed.  相似文献   

6.
Science teachers’ content knowledge is an important influence on student learning, highlighting an ongoing need for programs, and assessments of those programs, designed to support teacher learning of science. Valid and reliable assessments of teacher science knowledge are needed for direct measurement of this crucial variable. This paper describes multiple sources of validity and reliability (Cronbach’s alpha greater than 0.8) evidence for physical, life, and earth/space science assessments—part of the Diagnostic Teacher Assessments of Mathematics and Science (DTAMS) project. Validity was strengthened by systematic synthesis of relevant documents, extensive use of external reviewers, and field tests with 900 teachers during assessment development process. Subsequent results from 4,400 teachers, analyzed with Rasch IRT modeling techniques, offer construct and concurrent validity evidence.  相似文献   

7.
Given the central importance of the Nature of Science (NOS) and Scientific Inquiry (SI) in national and international science standards and science learning, empirical support for the theoretical delineation of these constructs is of considerable significance. Furthermore, tests of the effects of varying magnitudes of NOS knowledge on domain‐specific science understanding and belief require the application of instruments validated in accordance with AERA, APA, and NCME assessment standards. Our study explores three interrelated aspects of a recently developed NOS instrument: (1) validity and reliability; (2) instrument dimensionality; and (3) item scales, properties, and qualities within the context of Classical Test Theory and Item Response Theory (Rasch modeling). A construct analysis revealed that the instrument did not match published operationalizations of NOS concepts. Rasch analysis of the original instrument—as well as a reduced item set—indicated that a two‐dimensional Rasch model fit significantly better than a one‐dimensional model in both cases. Thus, our study revealed that NOS and SI are supported as two separate dimensions, corroborating theoretical distinctions in the literature. To identify items with unacceptable fit values, item quality analyses were used. A Wright Map revealed that few items sufficiently distinguished high performers in the sample and excessive numbers of items were present at the low end of the performance scale. Overall, our study outlines an approach for how Rasch modeling may be used to evaluate and improve Likert‐type instruments in science education.  相似文献   

8.
Efficacy of the Measure of Understanding of Macroevolution (MUM) as a measurement tool has been a point of contention among scholars needing a valid measure for knowledge of macroevolution. We explored the structure and construct validity of the MUM using Rasch methodologies in the context of a general education biology course designed with an emphasis on macroevolution content. The Rasch model was utilized to quantify item- and test-level characteristics, including dimensionality, reliability, and fit with the Rasch model. Contrary to previous work, we found that the MUM provides a valid, reliable, and unidimensional scale for measuring knowledge of macroevolution in introductory non-science majors, and that its psychometric behavior does not exhibit large changes across time. While we found that all items provide productive measurement information, several depart substantially from ideal behavior, warranting a collective effort to improve these items. Suggestions for improving the measurement characteristics of the MUM at the item and test levels are put forward and discussed.  相似文献   

9.
The paper traces a research process in the design and development of a science learning environment called WiMVT (web-based inquirer with modeling and visualization technology). The WiMVT system is designed to help secondary school students build a sophisticated understanding of scientific conceptions, and the science inquiry process, as well as develop critical learning skills through model-based collaborative inquiry approach. It is intended to support collaborative inquiry, real-time social interaction, progressive modeling, and to provide multiple sources of scaffolding for students. We first discuss the theoretical underpinnings for synthesizing the WiMVT design framework, introduce the components and features of the system, and describe the proposed work flow of WiMVT instruction. We also elucidate our research approach that supports the development of the system. Finally, the findings of a pilot study are briefly presented to demonstrate of the potential for learning efficacy of the WiMVT implementation in science learning. Implications are drawn on how to improve the existing system, refine teaching strategies and provide feedback to researchers, designers and teachers. This pilot study informs designers like us on how to narrow the gap between the learning environment’s intended design and its actual usage in the classroom.  相似文献   

10.
In recent years, science education has placed increasing importance on learners' mastery of scientific reasoning. This growing emphasis presents a challenge for both developers and users of assessments. We report on our effort around the conceptualization, development, and testing the validity of an assessment of students' ability to reason around physical dynamic models in Earth Science. Building from the research literature on analogical mapping and informed by the current perspectives on learning progressions, we present a three‐tiered construct describing the increasing sophistication of students' analogical reasoning around the correspondences and non‐correspondences between models and the Earth System: at the level of entities (Level 1), configurations in space or relative motion of entities (Level 2), and the mechanism or cause for observed phenomena (Level 3). Grounded in a construct‐centered design approach, we describe our process for developing assessments in order to examine and validate this construct, including how we selected topics and models, designed items, and developed outcome spaces. We present the specific example of one assessment centered on moon phases, which was administered to 164 8th and 9th grade Earth Science students as a pre/postmeasure. Two hundred ninety‐four responses were analyzed using a Rasch modeling approach. Item difficulties and student proficiency scores were calculated and analyzed regarding their relative performance with respect to the three levels of the construct. The analysis results provided initial evidence in support of the construct as conceived, with students displaying a range of analogical reasoning spanning all three construct levels. It also identified problematic items that merit further examination. Overall, the assessment has provided us the opportunity to better describe and frame the cognitive uses of models by students during learning situations in Earth Science. Implications for instruction and future directions for research in this area are discussed. © 2012 Wiley Periodicals, Inc. J Res Sci Teach 49: 713–743, 2012  相似文献   

11.
This study focuses on the teachers’ predictions of the students’ performances – in particular the middle-low achievers – while solving tasks testing inquiry competencies. The tasks come from PISA science. More specifically we study science teachers’ predictions for several aspects: levels of difficulty of the tasks, the potential sources of difficulty and the potential difficulty in solving it for medium-low achievers. We also study what assessed competencies are identified by science teachers in the tasks. Our approach is a questionnaire-based study. A sample of French teachers in science and technology (125) responded to the questionnaire. The teachers show a rather good ability to predict inquiry task levels of difficulty for medium-low achievers and are able to identify relevant potential sources of difficulty or easiness in the items. However, they are not aware of some essential difficulties that medium-low students encounter while solving science inquiry tasks. Moreover, the teachers have difficulty identifying the competencies that are tested by an item.  相似文献   

12.
This article describes the development, validation and application of a Rasch-based instrument, the Elementary School Science Classroom Environment Scale (ESSCES), for measuring students’ perceptions of constructivist practices within the elementary science classroom. The instrument, designed to complement the Reformed Teaching Observation Protocol (RTOP), is conceptualised using the RTOP’s three construct domains: Lesson Design and Implementation; Content; and Classroom Culture. Data from 895 elementary students was used to develop the Rasch scale, which was assessed for item fit, invariance and dimensionality. Overall, the data conformed to the assumptions of the Rasch model. In addition, the structural relationships among the retained items of the Rasch model supported and validated the instrument for measuring the reformed science classroom environment theoretical construct. The application of the ESSCES in a research study involving fourth grade students provides evidence that educators and researchers have a reliable instrument for understanding the elementary science classroom environment through the lens of the students.  相似文献   

13.
The purpose of this study was to develop a questionnaire that could measure preservice mathematics teachers' mathematics educational values. Development and validation of the questionnaire involved a sequential inquiry in which design principles were established from the existing literature and a pool of items was constructed then submitted to experts for consideration of the construct validity. Alterations to the items based on their suggestions were made to produce a trial version of the questionnaire. A pilot study involving preservice mathematics teachers explored the validity and usefulness of the questionnaire. The pilot results were used to revise the questionnaire that was administered to a sample of preservice mathematics teachers attending Cumhuriyet University, Sivas, Turkey. Further explorations of the construct and structural validity, item contributions, and reliability were achieved by using a factor analysis and two different item analysis methods. Results revealed that the questionnaire included four factors, satisfactory item contributions, and acceptable internal consistency. One result obtained in this study suggested that some mathematics education values based on Western culture (e.g., accessibility–special) have not been accepted by Turkish preservice mathematics teachers.  相似文献   

14.
This article reports about our efforts to determine engineering students' competence in mathematics. Our research is embedded in a larger project, KoM@ING–Modeling and developing competence: Integrated IRT based and qualitative studies with a focus on mathematics and its usage in engineering studies, within the program Modeling and Measuring Competencies in Higher Education (KoKoHS). KoKoHS provides the umbrella organization of several research projects addressing the modeling and measuring of competences at the college level. KoM@ING aims to model the role of engineering students' mathematical competences for their studies from both a quantitative and a qualitative perspective.

Here, we report the development of a large-scale instrument assessing engineering freshmen's competence in mathematics by applying Rasch analysis to determine measures for item difficulties and student abilities. Several analyses were performed to provide insights into the measures' reliability and validity. In particular, to examine cognitive validity, we scrutinized students' think-aloud protocols when solving the items to investigate their problem solving abilities as a proxy for item difficulty. Overall, we found first evidence that our instrument is suitable to assess engineering freshmen's competence in mathematics. This instrument may be helpful to conduct further research and to inform those concerned with college organization and policy.  相似文献   

15.
This paper describes the development and validation of an item bank designed for students to assess their own achievements across an undergraduate-degree programme in seven generic competences (i.e., problem-solving skills, critical-thinking skills, creative-thinking skills, ethical decision-making skills, effective communication skills, social interaction skills and global perspective). The Rasch modelling approach was adopted for instrument development and validation. A total of 425 items were developed. The content validity of these items was examined via six focus group interviews with target students, and the construct validity was verified against data collected from a large student sample (N?=?1151). A matrix design was adopted to assemble the items in 26 test forms, which were distributed at random in each administration session. The results demonstrated that the item bank had high reliability and good construct validity. Cross-sectional comparisons of Years 1–4 students revealed patterns of changes over the years. Correlation analyses shed light on the relationships between the constructs. Implications are drawn to inform future efforts to develop the instrument, and suggestions are made regarding ways to use the instrument to enhance the teaching and learning of generic skills.  相似文献   

16.
This study adapts Levels 1 and 2 of Kirkpatrick’s model of training evaluation to evaluate learning outcomes of an English as a second language (ESL) paragraph writing course offered by a major Asian university. The study uses a combination of surveys and writing tests administered at the beginning and end of the course. The survey evaluated changes in students’ perception of their skills, attitude, and knowledge (SAK), and the writing tests measured their writing ability. Rasch measurement was applied to examine the psychometric validity of the instruments. The measured abilities were successively subjected to path modeling to evaluate Levels 1 and 2 of the model. The students reported that the module was enjoyable and useful. In addition, their self-perceived level of skills and knowledge developed across time alongside their writing scores but their attitude remained unchanged. Limitations of Kirkpatrick’s model as well as lack of solid frameworks for evaluating educational effectiveness in applied linguistics are discussed.  相似文献   

17.
This study presents evidence regarding the construct validity and internal consistency of the IFSP Rating Scale (McWilliam & Jung, 2001), which was designed to rate individualized family service plans (IFSPs) on 12 indicators of family centered practice. Here, the Rasch measurement model is employed to investigate the scale's functioning and fit for both person and item diagnostics of 120 IFSPs that were previously analyzed with a classical test theory approach. Analyses demonstrated scores on the IFSP Rating Scale fit the model well, though additional items could improve the scale's reliability. Implications for applying the Rasch model to improve special education research and practice are discussed.  相似文献   

18.
19.
This research explored the measurement characteristics of two science examinations and the potential to use access arrangements data to investigate how students requiring reading support are affected by features of exam questions. For two science examinations, traditional and Rasch analyses provided estimates of difficulty and information on item functioning. For one examination, the performance of students eligible for support from a reader in exams was compared to a ‘norm’ group. For selected items a sample of student responses were analysed. A number of factors potentially making questions easier, more difficult or potentially contributing to problems with item functioning were identified. A number of features that may particularly influence those requiring reading support were also identified.  相似文献   

20.
This study explores the effects of metacognitive and cognitive prompting on the scientific inquiry practices of students with various levels of initial metacognition. Two junior high school classes participated in this study. One class, the experimental group (n?=?26), which received an inquiry-based curriculum with a combination of cognitive and metacognitive prompts, was compared to the other class, the comparison group (n?=?25), which received only cognitive prompts in the same curriculum. Data sources included a test of inquiry practices, a questionnaire of metacognition, and worksheets. The results showed that the mixed cognitive and metacognitive prompts had significant impacts on the students’ inquiry practices, especially their planning and analyzing abilities. Furthermore, the mixed prompts appeared to have a differential effect on those students with lower level metacognition, who showed significant improvement in their inquiry abilities. A combination of cognitive and metacognitive prompts during an inquiry cycle was found to promote students’ inquiry practices.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号