首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
This paper describes a four-step approach to constructing diagnostic test profiles that provide precise but practical information on students' instructional needs. The approach is based on the specification and analysis of a domain and uses generalizability theory to determine which skills within the domain need to be assessed to diagnose gaps in students' skills and to estimate score profiles. A 64-item test of pronoun use was constructed to represent 32 categories of usage defined by different combinations of five factors in the domain. Generalizability analyses were conducted to determine the optimal number of categories to be included in students' profiles and the number of items needed for each category, and to produce univariate and multivariate estimates of students' universe scores. Multivariate profiles of universe scores were the most accurate and differed substantially from observed score and univariate universe score profiles.  相似文献   

2.
Domain scores have been proposed as a user-friendly way of providing instructional feedback about examinees' skills. Domain performance typically cannot be measured directly; instead, scores must be estimated using available information. Simulation studies suggest that IRT-based methods yield accurate group domain score estimates. Because simulations can represent best-case scenarios for methodology, it is important to verify results with a real data application. This study administered a domain of elementary algebra (EA) items created from operational test forms. An IRT-based group-level domain score was estimated from responses to a subset of taken items (comprised of EA items from a single operational form) and compared to the actual observed domain score. Domain item parameters were calibrated both using item responses from the special study and from national operational administrations of the items. The accuracy of the domain score estimates were evaluated within schools and across school sizes for each set of parameters. The IRT-based domain score estimates typically were closer to the actual domain score than observed performance on the EA items from the single form. Previously simulated findings for the IRT-based domain score estimation procedure were supported by the results of the real data application.  相似文献   

3.
Two qualitatively different information-processing algorithms for solution of Raven's Progressive Matrices items have been identified. Whereas the Gestalt algorithm involves spatial operations upon the test stimuli, the Analytic algorithm employs logical operations upon features abstracted from the displays. In this study, training groups were established varying both in the Strength (Weak or Strong) and Type (Gestalt or Analytic) of training at three grade levels. Two sets of post-test measures were given. Ambiguous items were constructed such that more than one correct answer was possible, some being the result of the Gestalt algorithm and others of the Analytic algorithm. Subjects' performances on the Ambiguous items indicated that strong Analytic training had been particularly effective and was specific to Analytic answer options. The second post-test measure was Set I of the Advanced Progressive Matrices. Performance on these Test items indicated that the effects of strategy training had been maintained, and were due to the facilitation of Analytic item performance by Analytic training. The effects of Strength and Type of training were consistent across Grades. These results support Hunt's analysis of Raven's Progressive Matrices items, and demonstrate that strategy training based upon a precise information processing task analysis can be effective in improving Progressive Matrices performance. The implications of these results for intellectual assessment are discussed.  相似文献   

4.
Lawrence’s Self‐Esteem Questionnaire (LAWSEQ) was administered to 120 Year 1 pupils in six schools in Belfast, Northern Ireland. A principal components analysis indicated that the scale items were unidimensional and that the reliability of the scores, as estimated by Cronbach’s alpha, was satisfactory (α = .73). There were no differences between boys and girls on either total scores or the individual items comprising the LAWSEQ. A follow‐up study, involving 71 of the children in Year 3, confirmed these findings but the stability of the scores between the two occasions (as indicated by Pearson’s r) was extremely low.  相似文献   

5.
A simulation study was performed to determine whether a group's average percent correct in a content domain could be accurately estimated for groups taking a single test form and not the entire domain of items. Six Item Response Theory based domain score estimation methods were evaluated, under conditions of few items per content area perform taken, small domains, and small group sizes. The methods used item responses to a single form taken to estimate examinee or group ability; domain scores were then computed using the ability estimates and domain item characteristics. The IRT-based domain score estimates typically showed greater accuracy and greater consistency across forms taken than observed performance on the form taken. For the smallest group size and least number of items taken, the accuracy of most IRT-based estimates was questionable; however, a procedure that operates on an estimated distribution of group ability showed promise under most conditions.  相似文献   

6.
Differences in attitudes toward the profession were determined among samples of teachers varying in CA and between teachers and students enrolled in teacher education programs. The students rated items reflecting “altruistic” and “pragmatic” needs met by teaching in terms of their importance in selecting teaching as a career and in terms of the way they perceived experienced teachers would rate them. The teachers rated the items in terms of their importance in their present teaching career and as they thought current students would rate them. The results indicated the absence of age differences in attitudes among the teacher samples. However both objective (self) and perceived intergroup differences in attitudes were found between the student and teacher samples. These results provide strong support for Neugarten's hypothesis that differences in the behavior and attitudes of adults reflect changes in social states rather than developmental differences and that the hypothesis can be generalized both to self and perceived intergroup differences in attitudes.  相似文献   

7.
Multiple-choice items are a mainstay of achievement testing. The need to adequately cover the content domain to certify achievement proficiency by producing meaningful precise scores requires many high-quality items. More 3-option items can be administered than 4- or 5-option items per testing time while improving content coverage, without detrimental effects on psychometric quality of test scores. Researchers have endorsed 3-option items for over 80 years with empirical evidence—the results of which have been synthesized in an effort to unify this endorsement and encourage its adoption.  相似文献   

8.
A three week experiment was conducted comparing the academic achievement of pupils in five classrooms (N = 108) taught in small cooperative groups against that of pupils from five classes (N = 109) taught in the traditional whole-class approach. Special achievement tests were prepared for each grade level, two through six. These tests were constructed with items requiring responses at low and high levels of cognitive functioning. Pupils in grades two, four, and six from small-group classrooms excelled on high level items as predicted. Pupils in the fifth grade produced superior answers on questions requiring original contributions. Achievement scores of both groups did not differ on items measuring low level cognitive functioning.  相似文献   

9.
ABSTRACT

This study sought to provide a framework for evaluating machine score-ability of items using a new score-ability rating scale, and to determine the extent to which ratings were predictive of observed automated scoring performance. The study listed and described a set of factors that are thought to influence machine score-ability; these factors informed the score-ability rating applied by expert raters. Five Reading items, six Science items, and 10 Math items were examined. Experts in automated scoring served as reviewers, providing independent ratings of score-ability before engine calibration. Following the rating, engines were calibrated and their performances were evaluated using common industry criteria. Three derived criteria from the engine evaluations were computed: the score-ability value in the rating scale based on the empirical results, the number of industry evaluation criteria met by the engine, the approval status of the engine based on the number of criteria met. The results indicated that the score-ability ratings were moderately correlated with Science score-ability, the ratings were weakly correlated with Math score-ability, and were not correlated with Reading score-ability.  相似文献   

10.
11.
The study examined two approaches for equating subscores. They are (1) equating subscores using internal common items as the anchor to conduct the equating, and (2) equating subscores using equated and scaled total scores as the anchor to conduct the equating. Since equated total scores are comparable across the new and old forms, they can be used as an anchor to equate the subscores. Both chained linear and chained equipercentile methods were used. Data from two tests were used to conduct the study and results showed that when more internal common items were available (i.e., 10–12 items), then using common items to equate the subscores is preferable. However, when the number of common items is very small (i.e., five to six items), then using total scaled scores to equate the subscores is preferable. For both tests, not equating (i.e., using raw subscores) is not reasonable as it resulted in a considerable amount of bias.  相似文献   

12.
ABSTRACT

The Student Background survey administered along with achievement tests in studies of the International Association for the Evaluation of Educational Achievement includes scales of student motivation, competence, and attitudes toward mathematics and science. The scales consist of positively- and negatively keyed items. The current research examined the factorial structure of the 18-item motivational scales in fourth-grade mathematics in the 2011 Trends in International Mathematics and Science Study (TIMSS). Survey data from six European countries were analyzed. In comparisons of alternative models, the fit was adequate when three correlated factors were specified and negative keying was taken into account as a latent factor, or with correlated uniquenesses among negatively keyed items. Participants reading achievement scores correlated systematically to negative keying with coefficients ranging from .254 to .395 in the six samples. Unlike their higher-scoring peers, fourth-graders with lower reading achievement responded differentially to similar items depending on the direction of item keying, in such a way that their motivation scores were biased downward. Implications about the use of reverse keying in surveys for young students are discussed.  相似文献   

13.
This study evaluated the reliability and validity of a performance assessment designed to measure students' thinking and reasoning skills in mathematics. The QUASAR Cognitive Assessment Instrument (QCA1) was administered to over 1.700 sixth and seventh grade students of various ethnic backgrounds in six schools that are participating in the QUASAR project. The consistency of students' responses across tasks and the validity for inferences drawn from the scores on the assessment to the more broadly-defined construct domain were examined. The intertask consistency and the dimensionality of the assessment was assessed through the use of polychoric correlations and confirmatory factor analysis, and the generalizability of the derived scores was examined through the use of generalizability theory. The results from the confirmatory factor analysis indicate that a one-factor model fits the data for each of the four QCAI forms. The major findings from the generalizability studies (person x task and person x rater x task) indicate that, for each of the four forms, the person x task variance component accounts for the largest percentage of the total variability and the percentage of variance accounted for by the variance components that include the rater effect is negligible. The variance components that-include the rater effect were negligible. The generalizability and dependability coefficients for the person x task decision studies (nt, = 9) range from .71-.84. These results indicate that the use of nine tasks may not be adequate for generalizing to the larger domain of mathematics for individual student level scores. The QUASAR project, however, is interested in assessing mathematics achievement at the program level not the student level; therefore, these coefficients are not alarmingly low.  相似文献   

14.
This study was designed to identify and analyze possible factors that mediate the effect of gender on ninth‐grade Turkish students' misconceptions concerning electric circuits. A Simple Electric Circuit Concept Test (SECCT), including items with both practical and theoretical contexts, and an Interest‐Experience Questionnaire about Electricity (IEQ) were administered to 1,678 ninth‐grade students (764 male, 914 female) after the completion of a unit on electricity to assess students' misconceptions and interests‐experiences about electricity. Results of the concept test indicated that general performances of the students were relatively low and that many students had misconceptions in interpreting electric circuits. When the data were analyzed using MANOVA and follow‐up ANOVAs, a gender difference for males was observed on the dependent variable of total scores on the 10 practical items; however, there was no significant gender difference on the dependent variable of total scores on the six theoretical items. Moreover, when the same data were analyzed using MANCOVA and follow‐up ANCOVAs, controlling students' age and interest‐experience related to electricity, the observed gender difference was mediated on the total scores on the practical items. © 2004 Wiley Periodicals, Inc. J Res Sci Teach 41: 603–616, 2004  相似文献   

15.
Twenty first graders and twenty second graders were examined on skills in segmenting, reading, and spelling 50 words with regular and exceptional spelling patterns. By using the same words for each task, it was possible to assess the interrelationships among these skills on a word by word, child by child basis. A multivariate analysis of variance was conducted on difference scores among segmentation, reading, and spelling. Generally, differences favored segmentation and were maximized when final sounds were deleted and minimized when medial sounds were deleted. In addition, graphical analyses showed a greater probability of correct reading and spelling given correct segmentation than incorrect segmentation. Results were interpreted to support a computational notion of phonology as a prerequisite to reading and spelling, with a more reflective notion explaining the reciprocal relation between reading and segmentation of consonant blends and medial sounds.  相似文献   

16.
In separate studies on academic self-concept, previous research has shown: (1) the distinctiveness of a cognitive and an affective component, (2) the domain specificity of self-concepts, (3) the reciprocal effects of self-concept and achievement, (4) the internal/external frame of reference in self-concept development, (5) the reciprocal effects of the internal/external frame of reference, (6) the big-fish-little-pond effect, and (7) the interrelatedness of self-concepts in similar domains. The present study demonstrates that all of these seven findings are replicable and may be synthesized in a single study with a sample of students in Singapore. Secondary 1 students (7th graders; N = 275) were surveyed with 24 items about their academic self-concepts in physics, English, and math in two components (cognitive and affective), and their respective achievement scores were recorded over two time points. Confirmatory factor analysis found that the cognitive and affective components of academic self-concept were separable. The students’ self-concepts in different curriculum domains were distinct, supporting the domain specificity of self-concepts. The frame of reference and reciprocal effects were both supported, but only for the cognitive component of self-concept. Positive and statistically significant correlations between physics and math suggest that these curriculum domains were interrelated. Results of self-concept studies in schools can encourage and guide the design of interventions that could enhance students’ self-concept for positive sustainable effects on desirable educational outcomes. Attempts to improve learning outcomes should emphasize an enhancement of specific components of academic self-concept in domain-specific and related curriculum domains for optimal effects.  相似文献   

17.
The 2006 Programme for International Student Assessment focussed on students’ scientific competencies, measured their knowledge and provided questionnaires focussed on different aspects of life. One aspect was students’ experience with information and communication technology (ICT). A secondary analysis of variance of the Czech Republic data (N?=?5,932 students) was conducted using the science knowledge test score and ICT familiarity items. The science knowledge items explored different thematic areas, such as evolution, mousepox, genetics and acid rain. The main result was that students who were connected in some way with ICT achieved better scores on the science knowledge test in comparison with students who were not. Furthermore, students whose ICT activity was connected with the educational process achieved a higher score in comparison with students whose ICT activity was not connected with the educational process.  相似文献   

18.
Most currently accepted approaches for identifying differentially functioning test items compare performance across groups after first matching examinees on the ability of interest. The typical basis for this matching is the total test score. Previous research indicates that when the test is not approximately unidimensional, matching using the total test score may result in an inflated Type I error rate. This study compares the results of differential item functioning (DIF) analysis with matching based on the total test score, matching based on subtest scores, or multivariate matching using multiple subtest scores. Analysis of both actual and simulated data indicate that for the dimensionally complex test examined in this study, using the total test score as the matching criterion is inappropriate. The results suggest that matching on multiple subtest scores simultaneously may be superior to using either the total test score or individual relevant subtest scores.  相似文献   

19.
This study investigates the relation between students' tendency to self-regulate their level of motivation and other aspects of their self-regulated learning and achievement. Ninth- and tenth-grade students (N = 88) responded to survey items designed to assess five motivational regulation strategies identified in previous research. An exploratory factor analyses of these items revealed distinct, internally consistent scales reflecting the strategies of Self-Consequating, Environmental Control, Performance Self-Talk, Mastery Self-Talk, and Interest Enhancement. Self-report measures of effort, use of six cognitive and metacognitive learning strategies, and teacher-reported grades were also collected. Findings revealed mean level differences in students' reported use of the motivational strategies. In addition, results from a series of multivariate regressions indicated that students' use of motivational regulation strategies could be used to predict their use of learning strategies, effort, and classroom performance. As a whole, findings support the belief that motivational self-regulation should be integrated more completely into current models of volition and self-regulated learning.  相似文献   

20.
We investigated if instruction on a Table of Specifications (TOS) would influence the quality of classroom test construction. Results should prove informative for educational researchers, teacher educators, and practising teachers interested in evidenced-based strategies that may improve assessment-related practices. Fifty-three college undergraduates were randomly assigned to an experimental (exposed to the TOS strategy) and a comparison condition (no specific strategy support) and given materials for an instructional unit to use to construct a classroom test. Results of a multivariate analysis of covariance suggested that students exposed to the TOS strategy constructed a test with higher test content evidence but not response process evidence scores. Furthermore, we found that treatment participants were able to accurately complete the TOS tool and choose items that reflected the subject matter specified in the TOS tool. However, they experienced difficulty selecting items at the cognitive level specified in the TOS tool.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号