首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Measurement bias can be detected using structural equation modeling (SEM), by testing measurement invariance with multigroup factor analysis (Jöreskog, 1971;Meredith, 1993;Sörbom, 1974) MIMIC modeling (Muthén, 1989) or restricted factor analysis (Oort, 1992,1998). In educational research, data often have a nested, multilevel structure, for example when data are collected from children in classrooms. Multilevel structures might complicate measurement bias research. In 2-level data, the potentially “biasing trait” or “violator” can be a Level 1 variable (e.g., pupil sex), or a Level 2 variable (e.g., teacher sex). One can also test measurement invariance with respect to the clustering variable (e.g., classroom). This article provides a stepwise approach for the detection of measurement bias with respect to these 3 types of violators. This approach works from Level 1 upward, so the final model accounts for all bias and substantive findings at both levels. The 5 proposed steps are illustrated with data of teacher–child relationships.  相似文献   

2.
The evidence gathered in the present study supports the use of the simultaneous development of test items for different languages. The simultaneous approach used in the present study involved writing an item in one language (e.g., French) and, before moving to the development of a second item, translating the item into the second language (e.g., English) and checking to see that both language versions of the item mean the same. The evidence collected through the item development stage suggested that the simultaneous test development method allowed the influence and integration of information from item writers representing different language and cultural groups to affect test development directly. Certified English/French translators and interpreters and the French Immersion students confirmed that the test items in French and English had comparable meanings. The pairs of test forms had equal standard errors of measurement. The source of differential item functioning was not attributable to the adaptation process used to produce the two language forms, but to the lack of French language proficiency as well as other unknown sources. Lastly, the simultaneous approach used in the present study was somewhat more efficient than the forward translation procedure currently in use.  相似文献   

3.
People are often better at comparing fractions when the larger fraction has the larger rather than the smaller natural number components. However, there is conflicting evidence about whether this “natural number bias” occurs for complex fraction comparisons (e.g., 23/52 vs. 11/19). It is also unclear whether using benchmarks such as 1/2 or 1/4 enhances performance and reduces the bias (e.g., 11/19 > 1/2 and 23/52 < 1/2, hence 11/19 > 23/52). We asked 107 adults to solve complex fraction comparisons that did or did not afford using benchmarks, and we assessed response time and accuracy. We found a reverse bias (i.e., smaller components—larger fraction) that was greater among participants with lower mathematics experience. Fractions' proximity to 0 or 1 facilitated performance and decreased bias; effects of other benchmarks were nonsignificant. These results challenge the generality of the natural number bias in fraction comparison and highlight its variability.  相似文献   

4.
Two experiments investigated response tendencies of preschoolers toward yes–no questions about actions. Two hundred 2‐ to 5‐year‐old children were asked questions concerning actions commonly associated with particular objects (e.g., drinking from a cup) and actions not commonly associated with particular objects (e.g., kicking a toothbrush). The impact of delay and comprehension of questions were also investigated. Results revealed a consistent developmental transition: Younger children tended to display a yes bias whereas older children did not display a bias unless they faced incomprehensible questions, in which case they displayed a nay‐saying bias. Delay shifted children's responses in such a way that “no” answers were given more often. These findings hold important implications regarding the use of yes–no questions with children.  相似文献   

5.
Several studies have indicated that bifactor models fit a broad range of psychometric data better than alternative multidimensional models such as second-order models (e.g., Carnivez, 2016; Gignac, 2016; Rodriguez, Reise, & Haviland, 2016). Murray and Johnson (2013) and Gignac (2016) argued that this phenomenon is partially due to unmodeled complexities (e.g., unmodeled cross-factor loadings) that induce a bias in standard statistical measures that favors bifactor models over second-order models. We extend the Murray and Johnson simulation studies to show how the ability to distinguish second-order and bifactor models diminishes as the amount of unmodeled complexity increases. By using theorems about rank constraints on the covariance matrix to find submodels of measurement models that have less unmodeled complexity, we are able to reduce the statistical bias in favor of bifactor models; this allows researchers to reliably distinguish between bifactor and second-order models.  相似文献   

6.
Recent research suggests that preschool children approach the task of word learning equipped with implicit biases that lead them to prefer some possible meanings over others. The noun-category bias proposes that children favor category relations when interpreting the meaning of novel nouns. In the series of experiments reported here, we develop a stringent test of the noun-category bias and reveal that it is present in children as young as 2 years of age. In each experiment, children participated in a 5-item match-to-sample task. Children were presented with a target item (e.g., a cow) and 4 choices, 2 of which belonged to the same superordinate category as the target (e.g., a fox and a zebra) and 2 of which were thematically related to the target (e.g., milk and a barn). In Experiment 1 we demonstrate that novel nouns prompt preschool children to attend to superordinate-level category relations, even in the presence of multiple thematic alternatives. In Experiment 2, we ascertain that the bias is specific to nouns; novel adjectives do not highlight superordinate category relations. In Experiment 3, we demonstrate the noun-category bias in 2-year-olds. The nature and utility of the noun-category bias are discussed.  相似文献   

7.
In automated test assembly (ATA), the methodology of mixed‐integer programming is used to select test items from an item bank to meet the specifications for a desired test form and optimize its measurement accuracy. The same methodology can be used to automate the formatting of the set of selected items into the actual test form. Three different cases are discussed: (i) computerized test forms in which the items are presented on a screen one at a time and only their optimal order has to be determined; (ii) paper forms in which the items need to be ordered and paginated and the typical goal is to minimize paper use; and (iii) published test forms with the same requirements but a more sophisticated layout (e.g., double‐column print). For each case, a menu of possible test‐form specifications is identified, and it is shown how they can be modeled as linear constraints using 0–1 decision variables. The methodology is demonstrated using two empirical examples.  相似文献   

8.
This study examined the effects of conversational language (e.g., asking questions, inviting replies, acknowledgments, referencing others by name, closing signatures, ‘I agree, but’, greetings, etc.) on the frequency and types of responses posted in reply to given types of messages (e.g., argument, evidence, critique, explanation), and how the resulting response patterns support and inhibit collaborative argumentation in asynchronous online discussions. Using event sequence analysis to analyze message-response exchanges in eight online group debates, this study found that (a) arguments elicited 41% more challenges when presented with more conversational language (effect size .32), (b) challenges with more conversational language elicited three to eight times more explanations (effect size .12 to .31), and (c) the number of supporting evidence elicited by challenges was not significantly different from challenges that used more versus less conversational language. Overall, these and other findings from exploratory post-hoc tests show that conversational language can help to produce patterns of interaction that foster high levels of critical discourse, and that some forms of conversational language are more effective in eliciting responses than others.  相似文献   

9.
A case is presented against a data collection system that is intended to provide increased accountability of teachers, professors and the profession. The utilization of some current data collection systems may in fact jeopardize the integrity of the profession's mission and goals. The cause of concern is the use of the easiest form of data collection (e.g., fitness, skills, math, and science scores) rather than evidence viewing the student as a complex organism that would require a more appropriate and complex assessment system (i.e., “life skills” activity participation and social skills rather than a fitness test). I also note that a focus on increased accountability and simplified data collection provides the impetus that research in higher education needs to consider a paradigm shift to be more collaborative and holistic. In presenting these issues, I note that the vision of Delphine Hanna was similar, specifically more collaborative, more holistic, and more humanistic in making scholarly and professional decisions.  相似文献   

10.
Careless responding is a bias in survey responses that disregards the actual item content, constituting a threat to the factor structure, reliability, and validity of psychological measurements. Different approaches have been proposed to detect aberrant responses such as probing questions that directly assess test-taking behavior (e.g., bogus items), auxiliary or paradata (e.g., response times), or data-driven statistical techniques (e.g., Mahalanobis distance). In the present study, gradient boosted trees, a state-of-the-art machine learning technique, are introduced to identify careless respondents. The performance of the approach was compared with established techniques previously described in the literature (e.g., statistical outlier methods, consistency analyses, and response pattern functions) using simulated data and empirical data from a web-based study, in which diligent versus careless response behavior was experimentally induced. In the simulation study, gradient boosting machines outperformed traditional detection mechanisms in flagging aberrant responses. However, this advantage did not transfer to the empirical study. In terms of precision, the results of both traditional and the novel detection mechanisms were unsatisfactory, although the latter incorporated response times as additional information. The comparison between the results of the simulation and the online study showed that responses in real-world settings seem to be much more erratic than can be expected from the simulation studies. We critically discuss the generalizability of currently available detection methods and provide an outlook on future research on the detection of aberrant response patterns in survey research.  相似文献   

11.
Argumentation is fundamental to science education, both as a prominent feature of scientific reasoning and as an effective mode of learning—a perspective reflected in contemporary frameworks and standards. The successful implementation of argumentation in school science, however, requires a paradigm shift in science assessment from the measurement of knowledge and understanding to the measurement of performance and knowledge in use. Performance tasks requiring argumentation must capture the many ways students can construct and evaluate arguments in science, yet such tasks are both expensive and resource-intensive to score. In this study we explore how machine learning text classification techniques can be applied to develop efficient, valid, and accurate constructed-response measures of students' competency with written scientific argumentation that are aligned with a validated argumentation learning progression. Data come from 933 middle school students in the San Francisco Bay Area and are based on three sets of argumentation items in three different science contexts. The findings demonstrate that we have been able to develop computer scoring models that can achieve substantial to almost perfect agreement between human-assigned and computer-predicted scores. Model performance was slightly weaker for harder items targeting higher levels of the learning progression, largely due to the linguistic complexity of these responses and the sparsity of higher-level responses in the training data set. Comparing the efficacy of different scoring approaches revealed that breaking down students' arguments into multiple components (e.g., the presence of an accurate claim or providing sufficient evidence), developing computer models for each component, and combining scores from these analytic components into a holistic score produced better results than holistic scoring approaches. However, this analytical approach was found to be differentially biased when scoring responses from English learners (EL) students as compared to responses from non-EL students on some items. Differences in the severity between human and computer scores for EL between these approaches are explored, and potential sources of bias in automated scoring are discussed.  相似文献   

12.
Standard errors of measurement of scale scores by score level (conditional standard errors of measurement) can be valuable to users of test results. In addition, the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 1985) recommends that conditional standard errors be reported by test developers. Although a variety of procedures are available for estimating conditional standard errors of measurement for raw scores, few procedures exist for estimating conditional standard errors of measurement for scale scores from a single test administration. In this article, a procedure is described for estimating the reliability and conditional standard errors of measurement of scale scores. This method is illustrated using a strong true score model. Practical applications of this methodology are given. These applications include a procedure for constructing score scales that equalize standard errors of measurement along the score scale. Also included are examples of the effects of various nonlinear raw-to-scale score transformations on scale score reliability and conditional standard errors of measurement. These illustrations examine the effects on scale score reliability and conditional standard errors of measurement of (a) the different types of raw-to-scale score transformations (e.g., normalizing scores), (b) the number of scale score points used, and (c) the transformation used to equate alternate forms of a test. All the illustrations use data from the ACT Assessment testing program.  相似文献   

13.
The teaching brain is a new concept that mirrors the complex, dynamic, and context‐dependent nature of the learning brain. In this article, I use the structure of the human nervous system and its sensing, processing, and responding components as a framework for a re‐conceptualized teaching system. This teaching system is capable of responses on an instinctual level (e.g., spinal cord teaching) as well as higher order student‐centered teaching and even more complex teaching brain teaching. At the most complex level the teacher and student engage in a synchronistic teaching flow that achieves the optimal teaching and learning experience.  相似文献   

14.
Most large-scale secondary data sets used in higher education research (e.g., NPSAS or BPS) are constructed using complex survey sample designs where the population of interest is stratified on a number of dimensions and oversampled within certain of these strata. Moreover, these complex sample designs often cluster lower level units (e.g., students) within higher level units (e.g., colleges) to achieve efficiencies in the sampling process. Ignoring oversampling (unequal probability of selection) in complex survey designs presents problems when trying to make inferences—data from these designs are, in their raw form, admittedly nonrepresentative of the population to which they are designed to generalize. Ignoring the clustering of observations in these sampling designs presents a second set of problems when making inferences about variability in the population and testing hypotheses and usually leads to an increased likelihood of committing Type I errors (declaring something as an effect when in fact it is not). This article presents an extended example using complex sample survey data to demonstrate how researchers can address problems associated with oversampling and clustering of observations in these designs.  相似文献   

15.
Two studies examined whether young children use their knowledge of the spelling of base words to spell inflected and derived forms. In Study 1, 5- to 9-year-olds wrote the correct letter (s or z) more often to represent the medial /z/ sound of words derived from base forms (e.g., noisy, from noise) than to represent the medial /z/ sound of one-morpheme control words (e.g., busy). In Study 2, 7- to 9-year-olds preserved the spelling of /z/ in pseudoword base forms when writing ostensibly related inflected and derived forms (e.g., kaise-kaisy). In both studies, the children’s tendency to preserve the spelling of /z/ between base and inflected/derived words was related to their performance on analogy tasks of morphological awareness. These findings add to the growing body of evidence that children recognise and represent links of meaning between words from relatively early in their writing experience, and that morphological awareness facilitates the spelling of morphologically complex words.  相似文献   

16.
In large-scale assessment programs such as NAEP, TIMSS and PISA, students' achievement data sets provided for secondary analysts contain so-called plausible values. Plausible values are multiple imputations of the unobservable latent achievement for each student. In this article it has been shown how plausible values are used to: (1) address concerns with bias in the estimation of certain population parameters when point estimates of latent achievement are used to estimate those population parameters; (2) allow secondary data analysts to employ standard techniques and tools (e.g., SPSS, SAS procedures) to analyse achievement data that contains substantial measurement error components; and (3) facilitate the computation of standard errors of estimates when the sample design is complex. The advantages of plausible values have been illustrated by comparing the use of maximum likelihood estimates and plausible values (PV) for estimating a range of population statistics.  相似文献   

17.
This special issue of Learning and Instruction examines the role of emotions in academic learning, with a special focus on emotions in computer-supported academic learning (or e-learning). Three central research challenges concerning emotion in e-learning are: identification (e.g., what are the key emotions in e-learning?), measurement (e.g., how can we tell how strongly a learner is experiencing each key emotion during e-learning?), and explanation (e.g., what are the causes and consequences of the learner's emotional state during learning?). A useful goal of research on emotions in e-learning is to test an affective-cognitive model of e-learning with links among an e-learning episode, the learner's emotional reaction during learning, the learner's cognitive processing during learning, and the learning outcome.  相似文献   

18.
We discuss generalizability (G) theory and the fair and valid assessment of linguistic minorities, especially emergent bilinguals. G theory allows examination of the relationship between score variation and language variation (e.g., variation of proficiency across languages, language modes, and social contexts). Studies examining score variation across items administered in emergent bilinguals' first and second languages show that the interaction of student and the facets (sources of measurement error) item and language is an important source of score variation. Each item poses a unique set of linguistic challenges in each language, and each emergent bilingual individual has a unique set of strengths and weaknesses in each language. Based on these findings, G theory can inform the process of test construction in large-scale testing programmes and the development of testing models that ensure more valid and fair interpretations of test scores for linguistic minorities.  相似文献   

19.
A two-part questionnaire was administered to 143 Head Start personnel in order to determine how personal characteristics of the Head Start workers and the characteristics of the families they serve, affect the identification and reporting of child maltreatment. Of additional interest was whether some forms of maltreatment, once identified, would be more likely to be reported than other forms of maltreatment. The results support the efficacy of educational programs in child maltreatment for increasing the identification and reporting of maltreatment by workers. They also indicate that there are complex interactions between certain characteristics of the reporter (e.g., educational level) and prior training in maltreatment identification. Finally, neglect, although more frequently identified by the workers, appears to be least likely of all forms of maltreatment to be reported to official sources. Results are discussed in light of their implications for future research and practical application.  相似文献   

20.
Physical activity is associated with numerous health benefits in youth; however, these benefits could extend further than health, into education. Our aim was to systematically review and combine in meta-analyses evidence concerning the association between physical activity and the dimensions of school engagement, including behavior (e.g., time-on-task), emotions (e.g., lesson enjoyment), and cognition (e.g., self-regulated learning). We conducted meta-analyses using structural equation modeling on results from 38 studies. Overall, physical activity had a small, positive association with school engagement (d = .28, I2 = .86), 95% confidence interval [.12, .46]. This association was moderated by study design, with significant associations shown in randomized controlled trials but not in studies employing other designs. Risk of bias was also a significant effect moderator, as studies with a low risk of bias showed significant associations but not high risk of bias studies. Altogether, these results suggest that physical activity could improve school engagement.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号