首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Content‐based automated scoring has been applied in a variety of science domains. However, many prior applications involved simplified scoring rubrics without considering rubrics representing multiple levels of understanding. This study tested a concept‐based scoring tool for content‐based scoring, c‐rater?, for four science items with rubrics aiming to differentiate among multiple levels of understanding. The items showed moderate to good agreement with human scores. The findings suggest that automated scoring has the potential to score constructed‐response items with complex scoring rubrics, but in its current design cannot replace human raters. This article discusses sources of disagreement and factors that could potentially improve the accuracy of concept‐based automated scoring.  相似文献   

2.
The psychometric characteristics and practicality of concept mapping as a technique for classroom assessment were evaluated. Subjects received 90 min of training in concept mapping techniques and were given a list of terms and asked to produce a concept map. The list of terms was from a course in which they were enrolled. The maps were scored by pairs of graduate students, each pair using one of six different scoring methods. The score reliability of the six scoring methods ranged from r = .23 to r = .76. The highest score reliability was found for the method based on the evaluation of separate propositions represented. Correlations of map scores with a measure of the concept maps' similarity to a master map provided evidence supporting the validity of five of the six scoring methods. The times required to provide training in concept mapping, produce concepts, and score concept maps were compatible with the adoption of concept mapping as classroom assessment technique. © 1999 John Wiley & Sons, Inc. J Res Sci Teach 36: 475–492, 1999  相似文献   

3.
This study examined the predictive power of age in the academic performance of Behavioural Science students at the Darling Downs Institute of Advanced Education. Other predictor variables included were study methods, Tertiary Entrance score, personal problems, satisfaction with college, self‐concept, locus of control and flexibility of thinking. 79 students, 93% of the total population, responded to the questionnaire containing scales measuring the above variables. Results from multiple regression analyses showed that the contribution of age outweighed by far that of any other variable. The next best predictors were study methods and environment factors. The contribution of personality traits and T.E. score was minimal. A high correlation between age and study methods was also noted. A greater acceptance of older age students into Behavioural Science courses was suggested.  相似文献   

4.
通过概念图评估和测验,探讨八年级学生建构概念图的特征,能够很好的反映出学生的学业水平。相比被试概念图制作的概念得分和层级得分,命题得分对学生的科学学业测验成绩有更高的预测效度。通过对学生制作的概念图错误问题的分析,能够反映学生在学习过程中存在的问题,并能有效地指导教师的教学。  相似文献   

5.
1985年《教育与心理测验标准》(第5版)出版之前,效度研究的核心概念是"效标(criterion)",效度研究被视为一种用"效标"对测验的效度进行证明(verify)、对测验分数做出有效(valid)解释的过程。1985年以后,效度研究的核心概念是"证据(evidence)",效度研究被视为一种通过积累证据对测验的效度进行支持(support)、对测验分数做出合理(reasonable)解释的过程。关于效度的这种理解,突出体现在1999年出版的《教育与心理测验标准》(第6版)中。美国教育协会和美国国家教育测量学会共同组织编写的《教育测量》在业内被称为"教育测量领域的《圣经》"。2006年《教育测量》(第4版)出版以后,效度研究的核心概念演变为"理由(warrant)",效度研究被视为一种通过构造"理由系统"和"理由网络"对效度进行"论证(argument)"、对测验分数做出可接受的(plausible)解释的过程。本文结合笔者的考试实践,介绍了效度概念的新发展。  相似文献   

6.
Malignant peritoneal mesothelioma (MPM) is a rare tumor that develops in the peritoneum. In this paper, we describe an extremely rare case of MPM metastasizing to the appendix in a 48-year-old female who initially presented with a persistent high fever. The woman reported a slight lower abdominal discomfort which had been relieved by urination for four months. She had lost 5 kg of weight. There was no nausea, vomiting, diarrhea, abdominal pain, or abdominal distension. Many broad spectrum antibiotics were given without relief of fever. Computed tomography (CT) scans revealed a thickened omentum majus and diffused multiple omental nodules. An omentectomy, appendectomy, and adnexectomy were carried out. A gross pathologic specimen of omentum tissue revealed a firm gray-white mass. Microscopic and immunohistochemical examinations confirmed the diagnosis of appendiceal and bilateral adnexal metastases of an MPM. These results suggest that MPM should be considered in the differential diagnosis of unexplained persistent high fever. Awareness of such atypical presentations of mesothelioma may help to make a correct diagnosis.  相似文献   

7.
This study was designed to examine students’ use of multiple modal representations within their written arguments as a consequence of completing a series of investigations of an organic chemistry laboratory course. One hundred and eleven students from a major Midwestern university were involved in using the Science Writing Heuristic (SWH) approach where they are required to use the argument structure of question, claim, evidence and reflection in completing the written report for their instructor on their laboratory investigations. Results indicate that students who achieved a high score for embedded multiple modal representations in the evidence section also constructed high quality arguments. That is, students who were able to embed multiple modal representations in evidence made strong reasoned connections to support their claim(s) and construct a cohesive argument. Further, there were strong correlations between the laboratory examination score and holistic quality of argument. This study suggests there is a need to build support structures pedagogically for the individual in order to help students understanding the role and function of multiple modal representations in science.  相似文献   

8.
As the primary interface between test developers and multiple educational stakeholders, score reports are a critical component to the success (or failure) of any assessment program. The purpose of this review is to document recent research on individual‐level score reporting to advance the research and practice of score reporting. We conducted a search for research studies published or presented between 2005 and 2015, examining 60 scholarly works for (1) the research focus, (2) stated or implied theoretical frameworks of communication, and (3) the characteristics of data sets employed in the studies. Results show that research on score properties, especially subscores, and score report design/layout are well‐represented in the literature base. The predominant approach to score reporting has been through a cybernetics tradition of communication. Data sets were often small or localized to a single context. We present example research questions from novel communication frameworks, and encourage our colleagues to adopt new roles in their relationships to stakeholders to advance score reporting research and practice.  相似文献   

9.
重积分的概念与多元函数可积判定、可积性质与可积计算密切相关,准确把握重积分概念的内涵,有助于重积分类问题的完满解决。  相似文献   

10.
概念图是一种以图表的形式反映概念和概念之间关系的空间网络知识结构图,它能全面地评价学生的知识结构。文章用平均分、茎叶图、命题构建和构图结构四个变量对中美中学生的学科知识结构进行了定量和定性的比较实验研究。实验表明,中美中学生的学科知识结构有显著差异。相对于美国中学生,中国中学生的应试能力强,平均分高,但分数分布不均匀;虽然他们的基础知识扎实,学科知识的掌握及运用较好,但在常识性知识和知识创新上有待提高。  相似文献   

11.
A concept map is a schematic device for representing a set of concept meanings embedded in a framework of propositions. It can be used to evaluate students’ knowledge structure. This article introduces the comparative study of Chinese and American secondary school students’ knowledge structure. They are compared quantitatively and qualitatively in terms of mean score, individual proposition scores, proposition choice and map structure. The results indicate that students’ knowledge structures in the two countries are remarkably different. Compared with American students, Chinese students’ ability to take an exam is stronger and their mean score is higher. However, Chinese students need to improve their general knowledge and creativity although their basic knowledge is solid and they are better in mastering discipline knowledge and knowledge application.  相似文献   

12.
The use of tablets for large‐scale testing programs has transitioned from concept to reality for many state testing programs. This study extended previous research on score comparability between tablets and computers with high school students to compare score distributions across devices for reading, math, and science and to evaluate device effects for gender and ethnicity subgroups. Results indicated no significant differences between tablets and computers for math and science. For reading, a small device effect favoring tablets was found for the middle to lower part of the score distribution. This effect seemed to be driven by increases in performance for male students when testing on tablets. No interactions of device with ethnicity were observed. Consistent with previous research, this study provides additional evidence for a relatively high degree of comparability between tablets and computers.  相似文献   

13.
When tests are designed to measure dimensionally complex material, DIF analysis with matching based on the total test score may be inappropriate. Previous research has demonstrated that matching can be improved by using multiple internal or both internal and external measures to more completely account for the latent ability space. The present article extends this line of research by examining the potential to improve matching by conditioning simultaneously on test score and a categorical variable representing the educational background of the examinees. The responses of male and female examinees from a test of medical competence were analyzed using a logistic regression procedure. Results show a substantial reduction in the number of items identified as displaying significant DIF when conditioning is based on total test score and a variable representing educational background as opposed to total test score only.  相似文献   

14.
对托福、托业、雅思、大学英语考试、新汉语水平考试等大规模第二语言测试的分数解释体系进行比较研究,提出单标准参照与多标准参照、精度标准参照与跨度标准参照等概念。大规模的第二语言测试应当同时提供标准参照和常模参照,使考试用户获得更为丰富的分数解释信息;对于标准参照而言,"单标准参照"的"完成能力标准的百分比"这一分数解释体系更为可取。  相似文献   

15.
Most currently accepted approaches for identifying differentially functioning test items compare performance across groups after first matching examinees on the ability of interest. The typical basis for this matching is the total test score. Previous research indicates that when the test is not approximately unidimensional, matching using the total test score may result in an inflated Type I error rate. This study compares the results of differential item functioning (DIF) analysis with matching based on the total test score, matching based on subtest scores, or multivariate matching using multiple subtest scores. Analysis of both actual and simulated data indicate that for the dimensionally complex test examined in this study, using the total test score as the matching criterion is inappropriate. The results suggest that matching on multiple subtest scores simultaneously may be superior to using either the total test score or individual relevant subtest scores.  相似文献   

16.
Educators from a variety of disciplines include the concept of transgender and multiple gender identities in course curricula. The ‘T’ (denoting the specific inclusion of transgender) in the popular acronym LGBT (lesbian, gay, bisexual, transgender) often is given less pedagogical attention than is sexual orientation, and the transgender concept often is taught in the context of sexual orientation. An experimental study was conducted to determine if a brief film intervention would yield differences in knowledge of the transgender concept. Individuals were randomly assigned to one of three conditions. Conditions varied to include no exposure to the concept of being transgender, a 14-minute news documentary about the transgender concept, or a 28-minute news documentary about the transgender concept. Results supported hypotheses that a brief film intervention would produce higher accuracy of transgender knowledge. In addition, and supporting the contact hypothesis, participants with a transgender friend reported less transphobia and more empathy for transgender individuals.  相似文献   

17.
Thirty mentally retarded persons took part in a study intended to verify the predictive value, for work adjustment, of learning potential. The multiple regression equation was derived for the data produced by a non verbal intelligence test (PM-47) and a test of learning potential (adaptation of the Block design test, Ionescu et al., 1974) The results showed that only the total score on a block design test has predictive value; this score is the sum of two scores, a “without help” score and a «transfer» score (measure of learning potential).  相似文献   

18.
A challenge facing nearly all studies in the psychological sciences is how to best combine multiple items into a valid and reliable score to be used in subsequent modeling. The most ubiquitous method is to compute a mean of items, but more contemporary approaches use various forms of latent score estimation. Regardless of approach, outside of large-scale testing applications, scoring models rarely include background characteristics to improve score quality. This article used a Monte Carlo simulation design to study score quality for different psychometric models that did and did not include covariates across levels of sample size, number of items, and degree of measurement invariance. The inclusion of covariates improved score quality for nearly all design factors, and in no case did the covariates degrade score quality relative to not considering the influences at all. Results suggest that the inclusion of observed covariates can improve factor score estimation.  相似文献   

19.
芳香性是化学中的一个重要概念,已由有机化合物延伸到全金属簇和合金中.在全金属簇中,不仅有p芳香性,还有s芳香性和d芳香性.解释了多重芳香性、多重反芳香性和对抗芳香性概念,并介绍了三种芳香性的判据.  相似文献   

20.
Diagnosis and treatment of pheochromocytoma in urinary bladder   总被引:1,自引:0,他引:1  
Objective: To study the diagnosis and treatment ofpheochromocytoma in urinary bladder. Methods: Six cases of bladder pheochromocytoma were studied. Four cases showed hypertension, 3 of which were paroxysmal hypertension during urination. Catecholamine (CA) was increased in a case, and vanillymandelic acid (VMA) was increased in 2 cases. Bladder submucosal mass was detected by B-ultrasound in 5 cases (5/5), computerized tomography (CT) in 3 cases (3/3), cystoscopy in 5 cases (5/6). Four cases took a-receptor blocker for 2 weeks, 1 case took β-receptor blocker to decrease heart rate. All patients were treated with surgical operation including 4 partial cystectomies, 2 excavations. Results: Three cases had manifestations including headache, excessive perspiration and hypertension during cystoscopy. Four cases were confirmed before operation. Two cases showed hypertension during operation. All patients were pathologically diagnosed as pheochromocytoma post- operatively. In five cases followed up, blood pressure returned to normal. No patient had relapse and malignancy. Conclusions: Typical hypertension during urination comprised the main symptoms. We should highly suspect bladder pheochromocytoma if a submucosal mass was discovered with B-ultrasound, CT, ^131I-M1BG (methyliodobenzylguanidine) and cystoscopy. The determination of CA in urine is valuable for qualitative diagnosis. The preoperative management of controlling blood pressure and expansion of the blood volume are very important. Surgical operation is a good method for effective treatment. Postoperative long-time followed up is necessary.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号