首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 49 毫秒
1.
Given the central importance of the Nature of Science (NOS) and Scientific Inquiry (SI) in national and international science standards and science learning, empirical support for the theoretical delineation of these constructs is of considerable significance. Furthermore, tests of the effects of varying magnitudes of NOS knowledge on domain‐specific science understanding and belief require the application of instruments validated in accordance with AERA, APA, and NCME assessment standards. Our study explores three interrelated aspects of a recently developed NOS instrument: (1) validity and reliability; (2) instrument dimensionality; and (3) item scales, properties, and qualities within the context of Classical Test Theory and Item Response Theory (Rasch modeling). A construct analysis revealed that the instrument did not match published operationalizations of NOS concepts. Rasch analysis of the original instrument—as well as a reduced item set—indicated that a two‐dimensional Rasch model fit significantly better than a one‐dimensional model in both cases. Thus, our study revealed that NOS and SI are supported as two separate dimensions, corroborating theoretical distinctions in the literature. To identify items with unacceptable fit values, item quality analyses were used. A Wright Map revealed that few items sufficiently distinguished high performers in the sample and excessive numbers of items were present at the low end of the performance scale. Overall, our study outlines an approach for how Rasch modeling may be used to evaluate and improve Likert‐type instruments in science education.  相似文献   

2.
ABSTRACT

The understanding of what makes a question difficult is a crucial concern in assessment. To study the difficulty of test questions, we focus on the case of PISA, which assesses to what degree 15-year-old students have acquired knowledge and skills essential for full participation in society. Our research question is to identify PISA science item characteristics that could influence the item’s proficiency level. It is based on an a-priori item analysis and a statistical analysis. Results show that only the cognitive complexity and the format out of the different characteristics of PISA science items determined in our a-priori analysis have an explanatory power on an item’s proficiency levels. The proficiency level cannot be explained by the dependence/independence of the information provided in the unit and/or item introduction and the competence. We conclude that in PISA, it appears possible to anticipate a high proficiency level, that is, students’ low scores for items displaying a high cognitive complexity. In the case of a middle or low cognitive complexity level item, the cognitive complexity level is not sufficient to predict item difficulty. Other characteristics play a crucial role in item difficulty. We discuss anticipating the difficulties in assessment in a broader perspective.  相似文献   

3.
This inquiry is an investigation of item response theory (IRT) proficiency estimators’ accuracy under multistage testing (MST). We chose a two‐stage MST design that includes four modules (one at Stage 1, three at Stage 2) and three difficulty paths (low, middle, high). We assembled various two‐stage MST panels (i.e., forms) by manipulating two assembly conditions in each module, such as difficulty level and module length. For each panel, we investigated the accuracy of examinees’ proficiency levels derived from seven IRT proficiency estimators. The choice of Bayesian (prior) versus non‐Bayesian (no prior) estimators was of more practical significance than the choice of number‐correct versus item‐pattern scoring estimators. The Bayesian estimators were slightly more efficient than the non‐Bayesian estimators, resulting in smaller overall error. Possible score changes caused by the use of different proficiency estimators would be nonnegligible, particularly for low‐ and high‐performing examinees.  相似文献   

4.
We describe the procedures for constructing an instrument designed to evaluate children's proficiency in American Sign Language (ASL). The American Sign Language Proficiency Assessment (ASL-PA) is a much-needed tool that potentially could be used by researchers, language specialists, and qualified school personnel. A half-hour ASL sample is collected on video from a target child (between ages 6 and 12) across three separate discourse settings and is later analyzed and scored by an assessor who is highly proficient in ASL. After the child's language sample is scored, he or she can be assigned an ASL proficiency rating of Level 1, 2, or 3. At this phase in its development, substantial evidence of reliability and validity has been obtained for the ASL-PA using a sample of 80 profoundly deaf children (ages 6-12) of varying ASL skill levels. The article first explains the item development and administration of the ASL-PA instrument, then describes the empirical item analysis, standard setting procedures, and evidence of reliability and validity. The ASL-PA is a promising instrument for assessing elementary school-age children's ASL proficiency. Plans for further development are also discussed.  相似文献   

5.
This study proposes a structured constructs model (SCM) to examine measurement in the context of a multidimensional learning progression (LP). The LP is assumed to have features that go beyond a typical multidimentional IRT model, in that there are hypothesized to be certain cross‐dimensional linkages that correspond to requirements between the levels of the different dimensions. The new model builds on multidimensional item response theory models and change‐point analysis to add cut‐score and discontinuity parameters that embody these substantive requirements. This modeling strategy allows us to place the examinees in the appropriate LP level and simultaneously to model the hypothesized requirement relations. Results from a simulation study indicate that the proposed change‐point SCM recovers the generating parameters well. When the hypothesized requirement relations are ignored, the model fit tends to become worse, and the model parameters appear to be more biased. Moreover, the proposed model can be used to find validity evidence to support or disprove initial theoretical hypothesized links in the LP through empirical data. We illustrate the technique with data from an assessment system designed to measure student progress in a middle‐school statistics and modeling curriculum.  相似文献   

6.
The effect of item parameters (discrimination, difficulty, and level of guessing) on the item-fit statistic was investigated using simulated dichotomous data. Nine tests were simulated using 1,000 persons, 50 items, three levels of item discrimination, three levels of item difficulty, and three levels of guessing. The item fit was estimated using two fit statistics: the likelihood ratio statistic (X2B), and the standardized residuals (SRs). All the item parameters were simulated to be normally distributed. Results showed that the levels of item discrimination and guessing affected the item-fit values. As the level of item discrimination or guessing increased, item-fit values increased and more items misfit the model. The level of item difficulty did not affect the item-fit statistic.  相似文献   

7.
General and specific Content Didactic Profiles of engineering courseware have been developed in the framework of a proposed model for evaluating a given curriculum. To carry out a diagnosis of written material, a mapping sentence, including three facets (one for each of the two didactic profiles and a Complexity facet), has been prepared to establish a definitional system for the evaluating model. The common range is the student's score in the achievement tests. The model has been run on 119 students taking the self-instructional course ‘Digital Systems’ in the framework of Israel's Everyman's University. An Euclidean structure (Cylindrex) has been discovered repetitiously in four tests relating to learning units on separate subjects. This indicates that structural lawfulness exists between the general didactic profile, complexity of test items, and students’ achievements. The location of a test item within the cylindrex shows its degree of complexity and didactic profile level, thus providing an estimate for its cognitive difficulty level, required to arrive at a correct solution. A factor of Curriculum Awareness (FCA) has been established as an indicator providing a hierarchy of content subjects, in the written material, that need didactic improvement.  相似文献   

8.
Using item response theory, this study explores whether student survey and classroom observation items can be calibrated onto a common metric of teaching quality. The data comprises 269 lessons of 141 teachers that were scored on the International Comparative Analysis of Learning and Teaching (ICALT) observation instrument and the My Teacher student survey. Using Rasch model concurrent calibration, items from both instruments were calibrated onto a common one‐dimensional metric of teaching quality. Most items were found to fit the model. Challenges pertain mainly to items measuring teaching students learning strategies and differentiation. Explanations for these difficulties are discussed.  相似文献   

9.
In this article we focus on ‘cooperative engineering’, in which teachers and researchers co-design didactic sequences. In the first part of the article, we present cooperative engineering by describing some of the main principles on which it is grounded. The second part is dedicated to a case study, which enables us to illustrate some elements of the collective work in a specific cooperative design in kindergarten. The designed learning game, the ‘Treasure Game’, aimed to assist kindergarten students to build a system of graphical representations, which was implemented in a series of phases in which students were asked to memorise a series of items with increasing levels of difficulty. The game demonstrated the students’ growing competence in recalling items using strategies such as making lists and working collaboratively to collectively recall items through a ‘treasure box’. In the third part of the article, we show how this case study embodies some of the main principles put forward in the first part.  相似文献   

10.
Even though guessing biases difficulty estimates as a function of item difficulty in the dichotomous Rasch model, assessment programs with tests which include multiple‐choice items often construct scales using this model. Research has shown that when all items are multiple‐choice, this bias can largely be eliminated. However, many assessments have a combination of multiple‐choice and constructed response items. Using vertically scaled numeracy assessments from a large‐scale assessment program, this article shows that eliminating the bias on estimates of the multiple‐choice items also impacts on the difficulty estimates of the constructed response items. This implies that the original estimates of the constructed response items were biased by the guessing on the multiple‐choice items. This bias has implications for both defining difficulties in item banks for use in adaptive testing composed of both multiple‐choice and constructed response items, and for the construction of proficiency scales.  相似文献   

11.
The current concern about low levels of English proficiency among international students who graduate from degree courses – that students’ English language skills are not being developed during their higher education experience – reflects negatively on the quality of Australian higher education and its graduates. More careful selection of students and increased use of English language testing are among the solutions put forward. These debates over English language proficiency tend to construct English language as a skill that can be applied in any context and ‘native‐speaker’‐levels of language ability as essential for employment. Within such a formulation international students can only ever be defined as in deficit. Drawing on socio‐cultural theories of language learning and academic literacy, alternative understandings of language proficiency in internationalized higher education are explored. Improved communication skills among graduates are likely to be achieved through a better understanding of issues beyond classroom instruction, such as barriers to social integration with native‐speakers, which reveal many international students unable to access adequate levels of language experience. Without wider perspectives on the debate over English language proficiency in higher education, the many benefits of having international students in higher education institutions are obscured by negative attitudes and unrealistic expectations.  相似文献   

12.
Professionalisation in teaching has been the topic of extensive research in recent years, following in general two different approaches: the ‘competence-based approach’ and the ‘critical reflection approach’. With large-scale comparative studies such as PISA, TIMMS and PIRLS at the beginning of the 21st century, the former approach came to dominate the field and—especially in Germany—a specific model of professionalisation advanced to the status of a paradigm. However, this model does not seem unproblematic when one considers its roots. In this article I begin by inquiring into the concept and methodology behind the status quo in order to reveal its limitations. As a means of developing this concept further, I then introduce a model of educational expertise that takes as its theoretical foundation the didactic triangle, which has a rigid systematic structure for critical and reflective thinking about teaching and is backed up by empirical findings. At the core of this model of educational expertise are mind frames, a concept established by John Hattie. They may be seen as a theoretically founded and systematically structured interaction between competencies and attitudes backed up by empirical findings. Thus, they stand for an integrative model for professionalisation in teaching. Finally, I use this model to provide an outlook on university teacher education.  相似文献   

13.
ABSTRACT

In the past decade, there has been interest in the assessment of cognitive and affective processes and products for the purposes of meaningful learning. Meaningful measurement (MM) has been proposed which is in accordance with a humanistic constructivist information‐processing perspective. Students’ responses to the assessment tasks are now evaluated according to an item response measurement model, together with a hypothesized model detailing the progressive forms of knowing/competence under examination. There is a possibility of incorporating student errors and alternative frameworks into these evaluation procedures. Meaningful measurement leads us to examine the composite concepts of “ability” and “difficulty”. Under the rubric of meaningful measurement, validity assessment (i.e. internal and external components of construct validity) is essentially the same as an inquiry into the meanings afforded by the measurements. Concepts of reliability, expressed as a group statistics which is applied in the same way to all the examinees in the sample, have to be obviated when the precision of the trait estimates stemming from the item response measurement models can be determined at each trait level. Reliability, measured in terms of standard errors of estimates needs to be within acceptable limits when internal validity is to be secured. Further evidence of validity may be provided by in‐depth analyses of how “epistemic subjects” of different levels of competence and proficiency engage in different types of assessment tasks, where affective and metacognitive behaviours may be examined as well. These ways of undertaking MM can be codified by proposing a three‐level conceptualization of MM. It is within the rubric of this conceptualization and the MM enquiry paradigm that validity and reliability of test measures are discussed in this paper.  相似文献   

14.
This paper examines subject-specific structures and the levels achieved by students of Mechanical and Construction Engineering in the subject of Engineering Mechanics (EM). EM presents a major obstacle for students in the two courses of study mentioned. Until now, researchers have not examined which characteristics of the requirements in EM cause this obstacle. Initial efforts to address this research gap were made in the research project KoM@ING. Competence structure modelling confirms three dimensions of EM: statics, elastostatics and dynamics. This paper presents results on proficiency scaling for statics and dynamics. We found that the subject-specific mathematical requirements primarily explain the item difficulties. Moreover, the following features are relevant for item difficulty: the number of solution steps (for statics), the number of EM-specific terms and the content-specific features of complexity (for dynamics). The mathematical requirements and the content-specific features of complexity are revealed as important didactic elements in this higher education context.  相似文献   

15.
16.
This study investigated the usefulness of the many‐facet Rasch model (MFRM) in evaluating the quality of performance related to PowerPoint presentations in higher education. The Rasch Model utilizes item response theory stating that the probability of a correct response to a test item/task depends largely on a single parameter, the ability of the person. MFRM extends this one‐parameter model to other facets of task difficulty, for example, rater severity, rating scale format, task difficulty levels. This paper specifically investigated presentation ability in terms of items/task difficulty and rater severity/leniency. First‐year science education students prepared and used the PowerPoint presentation software program during the autumn semester of the 2005–2006 school year in the ‘Introduction to the Teaching Profession’ course. The students were divided into six sub‐groups and each sub‐group was given an instructional topic, based on the content and objectives of the course, to prepare a PowerPoint presentation. Seven judges, including the course instructor, evaluated each group’s PowerPoint presentation performance using ‘A+ PowerPoint Rubric’. The results of this study show that the MFRM technique is a powerful tool for handling polytomous data in performance and peer assessment in higher education.  相似文献   

17.
Students' evaluation of teaching skills has been an important yet controversial tool in the improvement of teaching quality during the last few decades. When searching for an apt student questionnaire to measure instructional skills, it appeared that most existing questionnaires the authors were able to collect are based on a single‐item type of evaluation. Additionally, most of these instruments lack a theoretical foundation and hardly any instrument was tested with modern tests for reliability and validity. The authors managed to create a 31‐item instrument which comprises 10 Likert scales and is based on both the educational theory and empirical data. In this article, they present the different steps in constructing the instrument and discuss its reliability and validity. The results of this study underline the value of the use of a scaling technique in students' evaluation of teacher performance.  相似文献   

18.
The purpose of the present study was to investigate the relationship between personal variables of physical education students and their attitudes towards participation of children with disabilities and self‐efficacy (SE) in teaching students with disabilities in regular classes. A total 153 PE majors (95 females and 58 males) participated in the study. A 15‐item attitude instrument and a 15‐item SE instrument concerning dilemmas during educational tasks were administered as a part of the didactic assignments. Factor analysis revealed one challenge and two threat factors in the attitude instrument. The statistical analysis revealed significant effects on attitudes to gender (females higher than males) and years in college (advanced students higher than novices). Significant effects on SE were found in the coursework, previous experience and years in college variables. SE was inversely related to both threat factors of the attitude instrument (r?=???0.42 and ??0.43 respectively).  相似文献   

19.
This article reports about our efforts to determine engineering students' competence in mathematics. Our research is embedded in a larger project, KoM@ING–Modeling and developing competence: Integrated IRT based and qualitative studies with a focus on mathematics and its usage in engineering studies, within the program Modeling and Measuring Competencies in Higher Education (KoKoHS). KoKoHS provides the umbrella organization of several research projects addressing the modeling and measuring of competences at the college level. KoM@ING aims to model the role of engineering students' mathematical competences for their studies from both a quantitative and a qualitative perspective.

Here, we report the development of a large-scale instrument assessing engineering freshmen's competence in mathematics by applying Rasch analysis to determine measures for item difficulties and student abilities. Several analyses were performed to provide insights into the measures' reliability and validity. In particular, to examine cognitive validity, we scrutinized students' think-aloud protocols when solving the items to investigate their problem solving abilities as a proxy for item difficulty. Overall, we found first evidence that our instrument is suitable to assess engineering freshmen's competence in mathematics. This instrument may be helpful to conduct further research and to inform those concerned with college organization and policy.  相似文献   

20.
Recent national reports have stressed the importance of teacher knowledge in teaching reading. However, in the past, teachers' knowledge of language and literacy constructs has typically been assessed with instruments that are not fully tested for validity. In the present study, an instrument was developed; and its reliability, item difficulty, and item discrimination were computed and examined to identify model fit by applying exploratory factor analysis. Such analyses showed that the instrument demonstrated adequate estimates of reliability in assessing teachers' knowledge of language constructs. The implications for professional development of in-service teachers as well as preservice teacher education are also discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号