首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This article demonstrates the use of a new class of model‐free cumulative sum (CUSUM) statistics to detect person fit given the responses to a linear test. The fundamental statistic being accumulated is the likelihood ratio of two probabilities. The detection performance of this CUSUM scheme is compared to other model‐free person‐fit statistics found in the literature as well as an adaptation of another CUSUM approach. The study used both simulated responses and real response data from a large‐scale standardized admission test.  相似文献   

2.
Response accuracy and response time data can be analyzed with a joint model to measure ability and speed of working, while accounting for relationships between item and person characteristics. In this study, person‐fit statistics are proposed for joint models to detect aberrant response accuracy and/or response time patterns. The person‐fit tests take the correlation between ability and speed into account, as well as the correlation between item characteristics. They are posited as Bayesian significance tests, which have the advantage that the extremeness of a test statistic value is quantified by a posterior probability. The person‐fit tests can be computed as by‐products of a Markov chain Monte Carlo algorithm. Simulation studies were conducted in order to evaluate their performance. For all person‐fit tests, the simulation studies showed good detection rates in identifying aberrant patterns. A real data example is given to illustrate the person‐fit statistics for the evaluation of the joint model.  相似文献   

3.
Researchers have documented the impact of rater effects, or raters’ tendencies to give different ratings than would be expected given examinee achievement levels, in performance assessments. However, the degree to which rater effects influence person fit, or the reasonableness of test-takers’ achievement estimates given their response patterns, has not been investigated. In rater-mediated assessments, person fit reflects the reasonableness of rater judgments of individual test-takers’ achievement over components of the assessment. This study illustrates an approach to visualizing and evaluating person fit in assessments that involve rater judgment using rater-mediated person response functions (rm-PRFs). The rm-PRF approach allows analysts to consider the impact of rater effects on person fit in order to identify individual test-takers for whom the assessment results may not have a straightforward interpretation. A simulation study is used to evaluate the impact of rater effects on person fit. Results indicate that rater effects can compromise the interpretation and use of performance assessment results for individual test-takers. Recommendations are presented that call researchers and practitioners to supplement routine psychometric analyses for performance assessments (e.g., rater reliability checks) with rm-PRFs to identify students whose ratings may have compromised interpretations as a result of rater effects, person misfit, or both.  相似文献   

4.
Individual person fit analyses provide important information regarding the validity of test score inferences for an individual test taker. In this study, we use data from an undergraduate statistics test (N = 1135) to illustrate a two-step method that researchers and practitioners can use to examine individual person fit. First, person fit is examined numerically with several indices based on the Rasch model (i.e., Infit, Outfit, and Between-Subset statistics). Second, person misfit is presented graphically with person response functions, and these person response functions are interpreted using a heuristic. Individual person fit analysis holds promise for improving score interpretation in that it may detect potential threats to validity of score inferences for some test takers. Individual person fit analysis may also highlight particular subsets of items (on which a test taker performs unexpectedly) that can be used to further contextualize her or his test performance.  相似文献   

5.
The development of statistical methods for detecting test collusion is a new research direction in the area of test security. Test collusion may be described as large‐scale sharing of test materials, including answers to test items. Current methods of detecting test collusion are based on statistics also used in answer‐copying detection. Therefore, in computerized adaptive testing (CAT) these methods lose power because the actual test varies across examinees. This article addresses that problem by introducing a new approach that works in two stages: in Stage 1, test centers with an unusual distribution of a person‐fit statistic are identified via Kullback–Leibler divergence; in Stage 2, examinees from identified test centers are analyzed further using the person‐fit statistic, where the critical value is computed without data from the identified test centers. The approach is extremely flexible. One can employ any existing person‐fit statistic. The approach can be applied to all major testing programs: paper‐and‐pencil testing (P&P), computer‐based testing (CBT), multiple‐stage testing (MST), and CAT. Also, the definition of test center is not limited by the geographic location (room, class, college) and can be extended to support various relations between examinees (from the same undergraduate college, from the same test‐prep center, from the same group at a social network). The suggested approach was found to be effective in CAT for detecting groups of examinees with item pre‐knowledge, meaning those with access (possibly unknown to us) to one or more subsets of items prior to the exam.  相似文献   

6.
We evaluate the performance of the most common estimators of latent Markov (LM) models with covariates in the presence of direct effects of the covariates on the indicators of the LM model. In LM modeling it is common practice not to model such direct effects, ignoring the consequences that might have on the overall model fit and the parameters of interest. However, in the general literature about latent variable modeling it is well known that unmodeled direct effects can severely bias the parameter estimates of the model at hand. We evaluate how the presence of direct effects in?uences the bias and efficiency of the 3 most common estimators of LM models, the 1-step, 2-step, and 3-step approaches. Furthermore, we propose amendments (that were thus far not used in the context of LM modeling) to the 2- and 3-step approaches that make it possible to account for direct effects and eliminate bias as a consequence. This is done by modeling the (possible) direct effects in the first step of the stepwise estimation procedures. We evaluate the proposed estimators through an extensive simulation study, and illustrate them via a real data application. Our results show, first, that the augmented 2-step and 3-step approaches are unbiased and efficient estimators of LM models with direct effects. Second, ignoring the direct effects leads to biased estimates with all existing estimators, the 1-step approach being the most sensitive.  相似文献   

7.
Writing is part and parcel of children's active meaning‐making on and with screens, but it has been relatively neglected in the literature focused on children's digital literacies. This study synthesises existing empirical evidence focused on young children's (aged between 2 and 8 years) writing on screen and identifies the relationships between dominant themes in published literature and contemporary theories of children's technology use. A systematic literature review that included studies from diverse disciplines yielded 21 papers. Constant comparative analysis generated five themes that indicate four key directions for future research. We call attention to researchers' theoretical framing to supplement mono‐disciplinary approaches and single levels of analysis. We suggest that future research should provide greater specification of the purpose of children's writing on screen and the different types of tools and applications supporting the activity. We also highlight the need for interdisciplinary approaches that would capture the composing stages involved in the writing process with and around screens. Finally, we point out possible age‐related differences in documenting and reporting the composing process in classrooms. Overall, limitations in the current evidence base highlight the need for research conducted from a critical perspective and focused more directly on multimodality.  相似文献   

8.
Decades of student persistence and retention literature has brought to light factors of social, academic, and religious fit that influence a student's decision to remain at or depart from an institution. At Christian institutions, increasing student pluralism raises the likelihood that students will not fit religiously. This qualitative study of 21 first-time, full-time students contributes to the existing literature by exploring how students who already feel they do not fit for religious reasons work at constructing a sense of fit at a Christian research university. Many participants coped with religious discontinuity by redefining specifically Christian practices and teachings in terms that were personally palatable: as either general moral lessons that would help them to be a better person or as cultural insights that would benefit them social and professionally in the future. In many cases, university staff were instrumental. Finally, participants worked to construct an acceptable level of fit, or fit threshold, through various combinations of social fit, academic fit, and religious fit, often compensating for one with others. As Christian institutions increasingly invite students from diverse religious backgrounds into their campus community, understanding ways that these students attempt to adapt to religious incongruence will be paramount.  相似文献   

9.
Comparing the fit of alternative models has become a standard procedure for analyzing covariance structure analysis. Comparison of alternative models is typically accomplished by examining the fit of each model to sample data. It is argued that rather than using this indirect approach, one should do direct comparisons of the similarities and differences among competing models. It is shown that among the existing good‐ness‐of‐fit indexes, the root mean square residual (RMSR) is the only one that can be used for this purpose. However, the RMSR fails to satisfy some important statistical desiderata. Rao's Distance (RD), an alternate measure, is shown to overcome this limitation of RMSR. The preference for RD over RMSR for model comparisons is illustrated through a detailed analysis of a particular sample of multitrait‐multimethod data. A simulation study conducted to empirically investigate the sampling behavior of RD reveals that the true orderings of intermodel proximities are recovered (on average) with a fair degree of accuracy.  相似文献   

10.
As with any psychometric models, the validity of inferences from cognitive diagnosis models (CDMs) determines the extent to which these models can be useful. For inferences from CDMs to be valid, it is crucial that the fit of the model to the data is ascertained. Based on a simulation study, this study investigated the sensitivity of various fit statistics for absolute or relative fit under different CDM settings. The investigation covered various types of model–data misfit that can occur with the misspecifications of the Q‐matrix, the CDM, or both. Six fit statistics were considered: –2 log likelihood (–2LL), Akaike's information criterion (AIC), Bayesian information criterion (BIC), and residuals based on the proportion correct of individual items (p), the correlations (r), and the log‐odds ratio of item pairs (l). An empirical example involving real data was used to illustrate how the different fit statistics can be employed in conjunction with each other to identify different types of misspecifications. With these statistics and the saturated model serving as the basis, relative and absolute fit evaluation can be integrated to detect misspecification efficiently.  相似文献   

11.
De la Torre and Deng suggested a resampling‐based approach for person‐fit assessment (PFA). The approach involves the use of the statistic, a corrected expected a posteriori estimate of the examinee ability, and the Monte Carlo (MC) resampling method. The Type I error rate of the approach was closer to the nominal level than that of the traditional approach of using along with the assumption of a standard normal null distribution. This article suggests a generalized resampling‐based approach for PFA that allows one to employ or another person‐fit statistic (PFS) based on item response theory, the corrected expected a posteriori estimate or another ability estimate, and the MC method or another resampling method. The suggested approach includes the approach of de la Torre and Deng as a special case. Several approaches belonging to the generalized approach perform very similarly to the approach of de la Torre and Deng's in two simulation studies and in applications to three real data sets, irrespective of the PFS used. The generalized approach promises to be useful to those interested in resampling‐based PFA.  相似文献   

12.
《教育实用测度》2013,26(1):77-89
In person-fit analysis, it is investigated whether an item score pattern is improbable given the item score patterns of the other persons in the group or given an expected score pattern on the basis of a test model. In this study, several existing group-based statistics are discussed to detect such improbable item score patterns, along with the cut scores that were proposed in the literature to classify an item score pattern as aberrant. By means of a simulation study and an empirical study, the detection rate of these statistics is compared, and the practical use of various cut scores is investigated. It is furthermore demonstrated that person-fit statistics can be used to detect persons with a deficiency of knowledge on an achievement test.  相似文献   

13.
This paper furnishes recommendations for improving the presentation of distance education study guides. Two approaches were used to distil these recommendations. First, a review of the literature was undertaken. Then, the opinions of thirty‐five practitioners with first‐hand experience of printed study guide production for distance education were surveyed. The questionnaire (which was constructed on the basis of the literature findings) covered many aspects of textual design and layout including general, as well as macro and micro textual issues. Both sources generally agreed that simplicity, consistency, adequate use of white space, utilisation of a hierarchical heading structure, and use of access devices are essential for optimal textual design. But the applicability of a universal layout style, methods of separating paragraphs, whether to use section numbering, replacing textual cues with icons, the readability of fully justified text, and techniques for differentiating levels of headings were more contentious issues.  相似文献   

14.
The present article reviews reminiscence research with regard to people with intellectual disabilities. Although the term “reminiscence” is not often used in intellectual disability research, the concept offers a useful framework for charting the different approaches in literature, thanks to its multidisciplinary character and eclectic theoretical background. Three main perspectives are identified: a critical approach, in which reminiscence is stimulated to let people with intellectual disabilities become critically aware of their past; a person‐centred approach, in which reminiscence serves informational and social purposes; and a clinical approach, in which reminiscence is presented as an alternative diagnostic instrument and/or a “low‐threshold” narrative counselling method for people with intellectual disabilities. The three approaches differ in language use, aims, and backgrounds, but there is congruency amongst the approaches in that reminiscence work can strengthen the identity of people with intellectual disabilities, raise self‐esteem, and enhance social contacts. The review concludes that a more balanced view of reminiscence, better methodological procedures, and more evaluation studies on the effect and process of reminiscence work are needed in future research.  相似文献   

15.
The goal of this study was to investigate the usefulness of person‐fit analysis in validating student score inferences in a cognitive diagnostic assessment. In this study, a two‐stage procedure was used to evaluate person fit for a diagnostic test in the domain of statistical hypothesis testing. In the first stage, the person‐fit statistic, the hierarchy consistency index (HCI; Cui, 2007 ; Cui & Leighton, 2009 ), was used to identify the misfitting student item‐score vectors. In the second stage, students’ verbal reports were collected to provide additional information about students’ response processes so as to reveal the actual causes of misfits. This two‐stage procedure helped to identify the misfits of item‐score vectors to the cognitive model used in the design and analysis of the diagnostic test, and to discover the reasons of misfits so that students’ problem‐solving strategies were better understood and their performances were interpreted in a more meaningful way.  相似文献   

16.
In this article, professional development in the context of the current reforms in science education is discussed from the perspective of developing teachers' practical knowledge. It is argued that reform efforts in the past have often been unsuccessful because they failed to take teachers' existing knowledge, beliefs, and attitudes into account. Teachers' practical knowledge is conceptualized as action‐oriented and person‐bound. As it is constructed by teachers in the context of their work, practical knowledge integrates experiential knowledge, formal knowledge, and personal beliefs. To capture this complex type of knowledge, multimethod designs are necessary. On the basis of a literature review, it is concluded that long‐term professional development programs are needed to achieve lasting changes in teachers' practical knowledge. In particular, the following strategies are potentially powerful: (a) learning in networks, (b) peer coaching, (c) collaborative action research, and (d) the use of cases. In any case, it is recommended that teachers' practical knowledge be investigated at the start of a reform project, and that changes in this knowledge be monitored throughout the project. In that way, the reform project may benefit from teachers' expertise. Moreover, this makes it possible to adjust the reform so as to enhance the chances of a successful implementation. © 2001 John Wiley & Sons, Inc. J Res Sci Teach 38: 137–158, 2001  相似文献   

17.
This study examined the utility of response time‐based analyses in understanding the behavior of unmotivated test takers. For the data from an adaptive achievement test, patterns of observed rapid‐guessing behavior and item response accuracy were compared to the behavior expected under several types of models that have been proposed to represent unmotivated test taking behavior. Test taker behavior was found to be inconsistent with these models, with the exception of the effort‐moderated model. Effort‐moderated scoring was found to both yield scores that were more accurate than those found under traditional scoring, and exhibit improved person fit statistics. In addition, an effort‐guided adaptive test was proposed and shown by a simulation study to alleviate item difficulty mistargeting caused by unmotivated test taking.  相似文献   

18.
The purpose of this article is to examine the use of sample weights in the latent variable modeling context. A sample weight is the inverse of the probability that the unit in question was sampled and is used to obtain unbiased estimates of population parameters when units have unequal probabilities of inclusion in a sample. Although sample weights are discussed at length in survey research literature, virtually no discussion of sample weights can be found in the latent variable modeling literature. This article examines sample weights in latent variable models applied to the case where a simple random sample is drawn from a population containing a mixture of strata. A bootstrap simulation study is used to compare raw and normalized sample weights to conditions where weights are ignored. The results show that ignoring weights can lead to serious bias in latent variable model parameters and that this bias is mitigated by the incorporation of sample weights. Standard errors appear to be underestimated when sample weights are applied. Results on goodness‐of‐fit statistics demonstrate the advantages of utilizing sample weights.  相似文献   

19.
This article sets out to offer the reader an opportunity to engage with our emerging ideas about a reflective, person centred model for students and facilitators using problem‐based learning (PBL). The model developed initially through several strands of qualitative inquiry including research with students, immersion in the existing literature and a reflexive approach to our own work. We feel it enables the user to explore key challenges of PBL curricula through the interactive elements of readiness, congruence, group dynamics, communication and environment with the person being pivotal to the process. It is suggested that the model offers an opportunity to consider crucial questions in the successful implementation and development of PBL practice and requires users to challenge their own thinking. It is a tool for individuals, groups and organizations involved in PBL.  相似文献   

20.
In this article, we propose using the Bayes factors (BF) to evaluate person fit in item response theory models under the framework of Bayesian evaluation of an informative diagnostic hypothesis. We first discuss the theoretical foundation for this application and how to analyze person fit using BF. To demonstrate the feasibility of this approach, we further use it to evaluate person fit in simulated and empirical data, and compare the results with those of HT and the infit and outfit statistics. We found that overall BF performed as well as HT statistics and better than the infit and outfit statistics when detecting aberrant responses. Given the BF flexibility in handling data set with a small number of examinees, we suggest that BF can be used as person fit statistics, especially in computerized adaptive tests.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号