首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this article, performance assessments are cast within a sampling framework. More specifically, a performance assessment is viewed as a sample of student performance drawn from a complex universe defined by a combination of all possible tasks, occasions, raters, and measurement methods. Using generalizability theory, we present evidence bearing on the generalizability and convergent validity of performance assessments sampled from a range of measurement facets and measurement methods. Results at both the individual and school level indicate that task-sampling variability is the major source ofmeasurment error. Large numbers of tasks are needed to get a reliable measure of mathematics and science achievement at the elementary level. With respect to convergent validity, results suggest that methods do not converge. Students' performance scores, then, are dependent on both the task and method sampled.  相似文献   

2.
《教育实用测度》2013,26(4):323-342
This study provides empirical evidence about the sampling variability and generalizability (reliability) of a statewide science performance assessment. Results at both individual and school levels indicate that task-sampling variability was the major source of measurement error in the performance assessment; rater-sampling variability was negligible. Adding more tasks improves the generalizability of the measurement. For the school-level assessment, the variation of performance among students within a school was larger than the variation among schools. Increasing the number of students taking a test within a school thus increases the generalizability of the assessment. Finally, the allocation of students in a matrix-sampling design is compared to a studentscrossed-with-tasks design. The former would require fewer tasks per student than the latter to build a generalizable measure of school performance.  相似文献   

3.
This study examined the stability of scores on two types of performance assessments, an observed hands-on investigation and a notebook surrogate. Twenty-nine sixth-grade students in a hands-on inquiry-based science curriculum completed three investigations on two occasions separated by 5 months. Results indicated that: (a) the generalizability across occasions for relative decisions was, on average, moderate for the observed investigations (.52) and the notebooks (.50); (b) the generalizability for absolute decisions was only slightly lower; (c) the major source of measurement error was the person by occasion (residual) interaction; and (d) the procedures students used to carry out the investigations tended to change from one occasion to the other.  相似文献   

4.
This study evaluated the reliability and validity of a performance assessment designed to measure students' thinking and reasoning skills in mathematics. The QUASAR Cognitive Assessment Instrument (QCA1) was administered to over 1.700 sixth and seventh grade students of various ethnic backgrounds in six schools that are participating in the QUASAR project. The consistency of students' responses across tasks and the validity for inferences drawn from the scores on the assessment to the more broadly-defined construct domain were examined. The intertask consistency and the dimensionality of the assessment was assessed through the use of polychoric correlations and confirmatory factor analysis, and the generalizability of the derived scores was examined through the use of generalizability theory. The results from the confirmatory factor analysis indicate that a one-factor model fits the data for each of the four QCAI forms. The major findings from the generalizability studies (person x task and person x rater x task) indicate that, for each of the four forms, the person x task variance component accounts for the largest percentage of the total variability and the percentage of variance accounted for by the variance components that include the rater effect is negligible. The variance components that-include the rater effect were negligible. The generalizability and dependability coefficients for the person x task decision studies (nt, = 9) range from .71-.84. These results indicate that the use of nine tasks may not be adequate for generalizing to the larger domain of mathematics for individual student level scores. The QUASAR project, however, is interested in assessing mathematics achievement at the program level not the student level; therefore, these coefficients are not alarmingly low.  相似文献   

5.
Eight rhesus monkeys were trained on a counterbalanced series of concurrent, two-choice, discrimination tasks that provided different numbers of correct or incorrect objects as lists of discriminanda. Small, large, or infinite lengths of correct or incorrect object lists were combined in different tasks, and acquisition performances were compared. When tasks had an infinite number of objects in their correct list and a small number (4) in their incorrect list, acquisition entailed significantly less error than was seen when a small number of correct objects was paired with an infinite incorrect list. This pattern of outcomes seemed attributable to novelty preference. However, comparison of error distributions from tasks with infinite. list lengths to distributions from analogous tasks with fixed list lengths provided some basis for interpreting the way monkeys integrated information that emerged from the temporally discriminative properties of the tasks. One prospective concern was whether or not these performances represented behaviors like those seen in human cognitive discriminations of frequency of occurrence.  相似文献   

6.
The latent state–trait (LST) theory is an extension of the classical test theory that allows one to decompose a test score into a true trait, a true state residual, and an error component. For practical applications, the variances of these latent variables may be estimated with standard methods of structural equation modeling (SEM). These estimates allow one to decompose the coefficient of reliability into a coefficient of consistency (indicating true effects of the person) plus a coefficient of occasion specificity (indicating true effects of the situation and the person–situation interaction). One disadvantage of this approach is that the standard SEM analysis requires large sample sizes. This article aims to overcome this disadvantage by presenting a simple method that allows one to estimate the LST parameters algebraically from the observed covariance matrix. A Monte Carlo simulation suggests that the proposed method may be superior to the standard SEM analysis in small samples.  相似文献   

7.
Why do students give incorrect answers in PISA? What are the reasons for giving incorrect answers? Do all incorrect answers reflect only the lack of competence or might even a competent child make a mistake? The aim of this article is to contribute to a better understanding of these issues. In the current investigation, we selected six students who responded incorrectly to one PISA question in mathematics or science when they solved it individually. Then, we analyzed their understanding of the PISA task and their reasoning about it through a dialogical problem solving in triads to identify why they made an incorrect answer. Moreover, we tried to determine how the shared peer interaction might change the understanding and reasoning of the child and enable her/him to solve the task. The results of this study illustrate the differences between incorrect answers reflecting lack of competence and those incorrect answers, which appear for some other reasons. Based on the dialogical problem solving approach, we analyzed these two types of incorrect answers and the reasoning trajectories behind them.  相似文献   

8.
Children aged 4 to 10 years old were asked to draw a person standing absolutely still and a person walking very fast so that someone not present would know from the pictures alone what had been depicted. Even at four some children were able to convey the difference to a viewer and there was increasing success with age. The number of differentiating cues increased with age and there was an age‐related trend in the order in which specific cues appeared in the drawings. The ability of the children to respond flexibly to the task gives no support to notions of rigid mental representations determining what young children can draw. It is argued that contrast tasks are a useful tool for investigating problem solving skills in the domain of drawing and could be used to extend children's skill by providing an occasion for explicit dialogue about how representational information is conveyed to a viewer.  相似文献   

9.
The primary purpose of this study was to estimate the amount of variability in the proportions of students in a school district, scoring within each of three achievement levels that could be attributed to factors other than random sampling error. The approach taken is based on a general conceptual framework that collectively incorporates five sources of variability: instructional intervention, random sampling error, measurement error, equating error, and systematic error. Statewide school-level assessment data for reading and mathematics in grades four and eight from four consecutive years were used to examine annual grade-group change. The intent was to assess the impact of random sampling error in grade-group change estimates when either single-year proportions or 2-year average proportions are used to report school improvement with achievement levels. Observed variability in change was compared with theoretically-derived estimates of change due to random sampling error to determine the relative influence of sampling error and the aggregate of the other four sources of variability. Results indicate that the error variance of estimates of change at the school level is large enough to interfere with interpretations of annual change estimates. Recommendations are offered for establishing annual improvement goals and for reporting results with achievement levels-all in the context of adequate yearly progress (AYP)-while taking error estimates into account.  相似文献   

10.
With a focus on the interaction between computer technology and assessment, we first review the typical functions served by technology in the support of various assessment purposes. These include efficiencies in person and item sampling and in administration, analysis, and reporting. Our major interest is the extent to which technology can provide unique opportunities to understand performance. Two examples are described: a tool-based knowledge representation approach to assess content understanding and a team problem-solving task involving negotiation. The first example, using HyperCard as well as paper-and-pencil variations, has been tested in science and history fields. Its continuing challenge is to determine a strategy for creating and validating scoring criteria. The second example, involving a workforce readiness task for secondary school, has used expert-novice comparisons to infer performance standards. These examples serve as the context for the exploration of validity, equity, and utility.  相似文献   

11.
In this article two studies on the use of diagrams in computer-supported collaborative learning are compared. Focus is on the way argumentative diagrams can be used during collaborative learning tasks, more specifically how diagrams support argumentative interaction between students when they discuss ill-defined topics. The main goal is to discover how diagram construction before discussion, and diagram construction during discussion, influence the way students explore the space of debate during discussion. Twenty pairs of 16/17-year-old students were randomly selected from 126 pairs. Ten pairs worked with a diagram before discussion and ten during discussion. The research showed that students use diagrams in very different ways, ranging from a means for talking to just a notebook. Our expectation that using a diagram during discussion leads to more depth in discussion than using one before discussion, was not confirmed. Possible explanations for this finding are structure of the task, and the way students interpreted the goal of the task.  相似文献   

12.
《教育实用测度》2013,26(3):201-214
The New Standards Project conducted a pilot test of a series of performance-based assessment tasks in mathematics and English language arts at Grades 4 and 8 in the spring of 1993. This article reports the results of a series of generalizability analyses conducted for a subset of the 1993 pilot study data in mathematics. Generalizability analyses for completely crossed designs of Raters x Tasks x Pupils were conducted for a total of nine collections of mathematics tasks. The results of those analyses were used to estimate standard errors of measurement for absolute decision studies using various combinations of number of raters andnumber of tasks. Consistent with results of previous analyses of performance-based assessment tasks, sampling variability due to tasks was found to be substantially larger than that due to raters. Implications for assessment designs are discussed.  相似文献   

13.
INTRODUCTION With the prevalence of distributed computing and parallel programming languages (Barry and Allen, 1998), performance evaluation of the parallel execu-tion systems becomes important. In this work we derive bounds and an approximation of the mean response time of a particular type parallel program: program with Fork-Join tasks and executed in multi-processor with first come first served (FCFS) policy. This kind of program is general in large-scale simu-lation and numerical …  相似文献   

14.
Methodological problems, have limited the usefulness of findings from experiments into learning by discovery. By using programmed instruction materials, a within-class design, and other controls, an attempt was made to remove confounding. Two tasks were used: concept learning and principle learning. For each task, a separate 2x2x2 factorial design containing sixteen Ss in each cell was used. Independent variables were instructional method (egrule and ruleg), school grade (9 and 5), and intelligence (high and average). A set of eight different measures, involving retention, transfer, and ease of relearning, was used for each task. It was found that the egrule and ruleg methods did not differ significantly, and that interaction between instructional method and the other variables was low.  相似文献   

15.
讨论了圆度仪测量圆度误差的最小二乘圆评定方法 ,通过对采样数据进行正交化处理 ,形成计算机辅助的实用数据处理方法 ,并给出了相应的C语言源程序 .  相似文献   

16.
Following Cronbach (1970) and others, it is useful to decompose test score variation into common factor, time‐specific, item‐specific, and residual components. In the traditional approach to factor analysis, only two sources of variance can be estimated: common factor variance and a uniqueness term that confounds specific sources of variation and residual error. When the same items are measured on different occasions, however, it is possible to separate specific variance and residual error. Two approaches, the first‐order approach described by Raffalovich and Bohrnstedt (1987) and a second‐order approach based on Jöreskog and Sörbom (1989; Jöreskog, 1974) are considered initially. The two approaches, although based on different rationales, both suffer a similar weakness in that two of the four sources of variance are confounded. In the Raffalovich and Bohrnstedt approach, time‐specific variance is confounded with common factor variance that generalizes across items and time. In the second‐order approach based on Jöreskog and Sörbom, time‐specific variance is confounded with residual error. Here we demonstrate that by combining features from both approaches we can eliminate these weaknesses and estimate all four of Cronbach's sources of variance, and that this combined approach is easily generalized to a wide variety of applications.  相似文献   

17.
Long-term memory retrieval efficiency was investigated as a potential underlying source of individual and developmental differences in cognitive functioning. Fourth-grade, eighth-grade, and college-aged subjects participated in a task using the Posner letter-matching paradigm. Letter pairs were presented simultaneously under physical-match and name-match instruction conditions. Reaction times were used to estimate parameters of long-term memory retrieval efficiency, basic encoding, decision, and response time, and name and physical output interference. Psychometric tests of verbal and spatial ability were included to assess convergent and discriminant validity of hypothesized relationships between aptitude test performance and basic cognitive processes. Developmental differences were observed in most but not all of the processing variables. Individual difference analyses indicated that less confounded estimates of processing parameters were not systematically related to verbal ability at any age level. Basic encoding and response speed was the most consistent correlate of spatial ability. The results suggest difficulties in previous interpretations of NIPI-verbal ability relationships. The study of cognitive processes in interaction and embedded in meaningful tasks is discussed.  相似文献   

18.
A series of item analyses of the CAK-C was conducted for a sample of 155 educable mentally retarded children. The probability of a correct response was found to differ from task to task, and there was evidence that the order of difficulty of the tasks for this sample resembled that for nonretarded children. The probabilities of the two incorrect responses were generally not equal, and the choice of one or the other incorrect response showed some relation to CA, MA, and IQ, particularly the last two variables.  相似文献   

19.
Patrick  J.  Gregov  A.  Halliday  P. 《Instructional Science》2000,28(1):51-79
Hierarchical Task Analysis (HTA) is used particularly in the context of instructional development. This paper involves two exploratory studies concerning the difficulties of those learning to perform HTA (Study 1) and how these might be overcome (Study 2). In Study 1 seventeen students were provided with declarative training in the major features of HTA and were then asked to analyse the task of making a cup of tea (task 1) or of painting a door (task 2). HTAs were analysed in terms of five HTA criteria (hierarchical representation, logical decomposition rule, logical equivalence, specification of plans and the P × C rule) and four other error categories (task boundaries incorrect, cognitive goals omitted, operations described as activities rather than goals, and lack of versatility of the analysis in terms of encompassing task variation). Errors occurred with respect to all HTA criteria and other error categories suggesting that carrying out HTA is itself a complex cognitive task. This together with an analysis of questionnaire responses concerning self-reported difficulties and strategies suggested that the tendency to use an action-oriented representation of the task being analysed might be one cause of poor performance. Study 2 investigated the effectiveness of three instructional conditions at improving analysts' performance at HTA: procedure training which specified eight main goals in carrying out HTA, criteria training which involved understanding and practice at using or recognising the five HTA criteria and types of error, and combined criteria/procedure training. Performance at HTA improved in both conditions that involved criteria training.  相似文献   

20.
Saito  H.  Masuda  H.  Kawakami  M. 《Reading and writing》1998,10(3-5):323-357
Four experiments are reported here to address the question of whether figurative and phonological processing based on sub-word components (radicals) interact in the recognition of Japanese kanji characters. A delayed matching task was used in which two briefly exposed source characters (e.g., ), each made up by two radicals, were followed by a probe character (e.g., ) which in critical conditions was different from the source characters. The task of the subject was to decide whether the probe was one of the two source characters. When a probe was figuratively similar to the source display, the homophonic relatedness between source and probe characters elicited more false responses to the probe. However, no homophony effect was found when the probe was dissimilar to the source display. Further, the false alarm rates in the homophone condition with figurative similarity was shown to be sensitive to proportion of homophonous trials in negative sets. The results suggest that phonological information of both whole character and of components was automatically activated despite experimental tasks in which subjects were given little incentive to execute phonetic processing. It is concluded that the interaction of figurative and phonological processing is due to mutual activation of the whole character and its radical(s) in the process of word identification in kanji. The results are considered within an interactive-activation framework with fore- and background activation device in multilevels.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号