首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 437 毫秒
1.
Every year, thousands of college and university applicants with learning disabilities (LD) present scores from standardized examinations as part of the admissions process for postsecondary education. Many of these scores are from tests administered with nonstandard procedures due to the examinees' learning disabilities. Using a sample of college students with LD and a control sample, this study investigated the criterion validity and comparability of scores on the Miller Analogies Test when accommodations for the examinees with LD were in place. Scores for examinees with LD from test administrations with accommodations were similar to those of examinees without LD on standard administrations, but less well associated with grade point averages. The results of this study provide evidence that although scores for examinees with LD from nonstandard test administrations are comparable to scores for examinees without LD, they have less criterion validity and are less meaningful for their intended purpose.  相似文献   

2.
The Formulating-Hypotheses (F-H) item presents a situation and asks examinees to generate as many explanations for it as possible. This study examined the generalizability, validity, and examinee perceptions of a computer-delivered version of the task. Eight F-H questions were administered to 192 graduate students. Half of the items restricted examinees to 7 words per explanation, and half allowed up to 15 words. Generalizability results showed high interrater agreement, with tests of between 2 and 4 items scored by one judge achieving coefficients in the .80s. Construct validity analyses found that F-H was only marginally related to the GRE General Test, and more strongly related than the General Test to a measure of ideational fluency. Different response limits tapped somewhat different abilities, with the 15-word constraint appearing more useful for graduate assessment. These items added significantly to conventional measures in explaining school performance and creative expression.  相似文献   

3.
Admission decisions frequently rely on multiple assessments. As a consequence, it is important to explore rational approaches to combine the information from different educational tests. For example, U.S. graduate schools usually receive both TOEFL iBT® scores and GRE® General scores of foreign applicants for admission; however, little guidance has been given to combine information from these two assessments, even though the relationships between such sections as GRE Verbal and TOEFL iBT Reading are obvious. In this study, principles are provided to explore the extent to which different assessments complement one another and are distinguishable. Augmentation approaches developed for individual tests are applied to provide an accurate evaluation of combined assessments. Because augmentation methods require estimates of measurement error and internal reliability data are unavailable, required estimates of measurement error are obtained from repeaters, examinees who took the same test more than once. Because repeaters are not representative of all examinees in typical assessments, minimum discriminant information adjustment techniques are applied to the available sample of repeaters to treat the effect of selection bias. To illustrate methodology, combining information from TOEFL iBT scores and GRE General scores is examined. Analysis suggests that information from the GRE General and TOEFL iBT assessments is complementary but not redundant, indicating that the two tests measure related but somewhat different constructs. The proposed methodology can be readily applied to other situations where multiple assessments are needed.  相似文献   

4.
This study examined the relationship between 403 counseling graduate students' scores on the Counselor Preparation Comprehensive Examination (CPCE; Center for Credentialing and Education, n.d.) and 3 admissions requirements used as predictor variables: undergraduate grade point average (UGPA), Graduate Record Examinations (GRE) General Test Verbal Reasoning (GRE‐V) score, and GRE General Test Quantitative Reasoning (GRE‐Q) score. Multiple regression analyses revealed that all predictor variables accounted for somewhat limited, yet significant variations in the CPCE‐Total scores (R2 = .21). Results indicated that UGPAs, GRE‐V scores, and GRE‐Q scores are valid criteria for determining counseling graduate student success on the CPCE.  相似文献   

5.
This study gathered the judgments of Graduate Record Examination test takers-actual and prospective-about a sample of essay prompts being considered for possible use in a graduate admissions writing test. Our thesis was that test-takers' views, which have not been frequently considered in any systematic fashion, may provide valuable information to developers of writing assessments. The specific objective was to determine the kinds of prompts and topics on which examinees feel they can write strong essays, as well as those that they perceive as more difficult. The study identified several features that underlie examinee perceptions of essay prompts. Prominent among these features was the extent to which prompts allow writers to draw on their personal experiences. Some study participants also wrote essays on a small subset of the prompts. With these data, the relation of examinee opinions to performance on the prompts was examined. Though apparent, this relation was less dramatic than writers' strong opinions would suggest.  相似文献   

6.
This study examines the perceptions of a representative sample of GRE test takers who were asked to indicate their views of the importance of eight widely considered factors in graduate admissions. Overall, candidates perceived undergraduate grades as the most important factor in graduate admissions. Recommendations and one's undergraduate field were rated as somewhat less important than undergraduate grades, and GRE Aptitude Test scores were rated even less important. GRE Advanced (Subject) Test scores were perceived as considerably less important than any other factor. Analyses by subgroup revealed that candidates' perceptions differed markedly according to the graduate field they intended to enter. Perceptions also differed by ethnic group (blacks versus whites) but not by sex or age.  相似文献   

7.
The purpose of this study was to examine the effect of pretest items on response time in an operational, fixed-length, time-limited computerized adaptive test (CAT). These pretest items are embedded within the CAT, but unlike the operational items, are not tailored to the examinee's ability level. If examinees with higher ability levels need less time to complete these items than do their counterparts with lower ability levels, they will have more time to devote to the operational test questions. Data were from a graduate admissions test that was administered worldwide. Data from both quantitative and verbal sections of the test were considered. For the verbal section, examinees in the lower ability groups spent systematically more time on their pretest items than did those in the higher ability groups, though for the quantitative section the differences were less clear.  相似文献   

8.
本研究随机抽取了21387名参加某年医师资格考试医学综合笔试临床执业类别考试考生,对其在外科学上的得分进行了聚类分析,并将按照边界组法以及对照组法计算出的边界分数与Angoff专家判断的合格分数进行了对比。结果表明:两者对考生分类的一致性Kappa系数高达0.934,充分证明了Angoff合格分数判断法的有效性。  相似文献   

9.
《教育实用测度》2013,26(3):303-322
We investigated the functioning of a new computer-delivered response type for potential use in graduate admissions assessment. This response type, which is open-ended and automatically scorable, presents problems calling for the examinee to draw a graph modeling a given situation. Problem situations can be like the single-best-answer items currently found on the Graduate Record Examinations (GRE) General Test (ETS, 1998) or they can be more loosely defined, allowing for multiple-correct responses. Two graphical modeling (GM) tests differing from one another in the manipulation of specific item features were randomly spiraled among study participants. Results showed that GM scores were very reliable and moderately related to the General Test's quantitative section, suggesting that GM might help broaden the GRE quantitative construct. In exploratory difficulty analyses, 1 of 3 manipulated item features, problem structure, had a dependable effect. No significant gender differences independent of those associated with the GRE quantitative section were detected. Finally, more participants preferred regular multiple-choice graphical reasoning questions to GM items but thought GM was the fairer indicator of their ability to undertake graduate study.  相似文献   

10.
Scores on essay‐based assessments that are part of standardized admissions tests are typically given relatively little weight in admissions decisions compared to the weight given to scores from multiple‐choice assessments. Evidence is presented to suggest that more weight should be given to these assessments. The reliability of the writing scores from two of the large volume admissions tests, the GRE General Test (GRE) and the Test of English as a Foreign Language Internet‐based test (TOEFL iBT), based on retesting with a parallel form, is comparable to the reliability of the multiple‐choice Verbal or Reading scores from those tests. Furthermore, and even more important, the writing scores from both tests are as effective as the multiple‐choice scores in predicting academic success and could contribute to fairer admissions decisions.  相似文献   

11.
Sixty WISC-III protocols, administered by graduate students in training, were examined to obtain preliminary data on the frequency and types of administration and scoring errors that examinees commit. Results were compared with previous studies that have evaluated examiner errors on the Wechsler scales. In general, the present results were consistent with those of previous studies that have illustrated that a large number of scoring errors are committed by graduate students as well as by other professional groups. The majority of errors committed by participants in this study were general errors. That is, errors were not specific to a particular subtest. The five most frequent errors included failure to query, failure to record responses verbatim, reporting Full Scale IQ incorrectly, reporting Verbal IQ incorrectly, and adding individual subtest scores incorrectly. However, the traditional difficult to score Verbal subtests were not as troublesome for examiners in this study as they were for examiners in previous studies. In addition, significant decreases in the mean number of errors per protocol and in the number of most frequently occurring errors per protocol were noted. © 1998 John Wiley & Sons, Inc.  相似文献   

12.
The objective of this study is to identify the competencies required to achieve success in the transition from higher education to the labour market based on the perceptions of employers. This paper analyses the assessments made by a group of engineering company employers. An item-battery of 20 competencies was grouped into 3 dimensions by using factor analysis. Subsequently, respondents’ scores were also clustered into three groups and characterised through contingency tables. The competencies demanded by employers were grouped into business and finance, problem-solving and strategic planning. Significant differences were found between responses from employers working in medium and small companies, who placed more importance on competencies related to problem-solving and strategic planning, and employers in big companies, who were more concerned about the difficulties of finding well-trained graduates. The findings from this paper have important implications for research in the areas of higher education and organisations that usually employ graduate engineers.  相似文献   

13.
In order to determine the role of time limits on both test performance and test validity, we asked approximately 300 volunteers–prospective graduate students–to each write two essays–one in a 40-minute time period and the other in 60 minutes. Analyses revealed that, on average, test performance was significantly better when examinees were given 60 minutes instead of 40. However, there was no interaction between test-taking style (fast vs. slow) and time limits. 'That is', examinees who described themselves as slow writers/test takers did not benefit any more (or any less) from generous time limits than did their quicker counterparts. In addition, there was no detectable effect of different time limits on the meaning of essay scores, as suggested by their relationship to several nontest indicators of writing ability.  相似文献   

14.
In this study we examined alternative item types and section configurations for improving the discriminant and convergent validity of the GRE General Test. A computer-based test of reasoning items and a generating-explanations measure was administered to a sample of 388 examinees who previously had taken the General Test. Confirmatory factor analyses indicated that three dimensions of reasoning—verbal, analytical, and quantitative—and a fourth dimension of verbal fluency based on the generating-explanations task could be distinguished. Notably, generating explanations was as distinct from new variations of reasoning items as it was from verbal and quantitative reasoning. In the full sample, this differentiation was evident in relation to such external criteria as undergraduate grade point average (UGPA), self-reported accomplishments, and a measure of ideational fluency, with generating explanations relating uniquely to aesthetic and linguistic accomplishments and to ideational fluency. For the subset of participants with undergraduate majors in the humanities and social sciences, generating explanations added to the relationship with UGPA over that contributed by the General Test.  相似文献   

15.
The first generation of computer-based tests depends largely on multiple-choice items and constructed-response questions that can be scored through literal matches with a key. This study evaluated scoring accuracy and item functioning for an open-ended response type where correct answers, posed as mathematical expressions, can take many different surface forms. Items were administered to 1,864 participants in field trials of a new admissions test for quantitatively oriented graduate programs. Results showed automatic scoring to approximate the accuracy of multiple-choice scanning, with all processing errors stemming from examinees improperly entering responses. In addition, the items functioned similarly in difficulty, item-total relations, and male-female performance differences to other response types being considered for the measure.  相似文献   

16.
Using a conceptual model, this study examines the variables associated with the U.S. News and World Report peer assessment ratings of graduate and professional schools in business, education, engineering, law, and medicine. What are the correlates of prestige among the nation’s leading graduate and professional schools, and are they consistent with prior studies of prestige? Not since the studies of the 1995 National Research Council (NRC) data have scholars examined the correlates of prestige for individual graduate programs, and no study has ever extensively examined the U.S. News graduate ratings. Using available data from U.S. News, as well as institutional websites and ISI Web of Science information, this analysis finds robust relationships between the U.S. News graduate school reputation ratings and the model-relevant indicators, especially enrollment size, admissions test scores, and faculty publications per capita.  相似文献   

17.
高考是教育界乃至全社会极为关注的问题。我国高考制度一直处于不断改革和完善过程中,如保送生制度、特长生制度以及自主招生制度。近期,三大"高校联盟"的建立和扩大,使得自主招生问题再次成为人们关注的焦点。近几年平均每年至少有10多万人报考的自主招生考试已经并将继续影响越来越多的中国考生及家庭。从某种角度上讲,这场"抱团大战"隐喻着中国高考改革未来的方向。  相似文献   

18.
Although a few studies report sizable score gains for examinees who repeat performance‐based assessments, research has not yet addressed the reliability and validity of inferences based on ratings of repeat examinees on such tests. This study analyzed scores for 8,457 single‐take examinees and 4,030 repeat examinees who completed a 6‐hour clinical skills assessment required for physician licensure. Each examinee was rated in four skill domains: data gathering, communication‐interpersonal skills, spoken English proficiency, and documentation proficiency. Conditional standard errors of measurement computed for single‐take and multiple‐take examinees indicated that ratings were of comparable precision for the two groups within each of the four skill domains; however, conditional errors were larger for low‐scoring examinees regardless of retest status. In addition, on their first attempt multiple‐take examinees exhibited less score consistency across the skill domains but on their second attempt their scores became more consistent. Further, the median correlation between scores on the four clinical skill domains and three external measures was .15 for multiple‐take examinees on their first attempt but increased to .27 for their second attempt, a value, which was comparable to the median correlation of .26 for single‐take examinees. The findings support the validity of inferences based on scores from the second attempt.  相似文献   

19.
The purpose of this study was to re-evaluate the validity of traditional admissions criteria—UGPA and GRE scores—in predicting academic success for students admitted to a counselor education program in the United States. In contrast to prior research, we also included the newer GRE-Analytical Writing scores in our analyses. In general, we found that both UGPA and GRE scores were useful for predicting both graduate grade point averages (GGPAs) and students’ scores on the Counselor Preparation Comprehensive Exam (CPCE). We also found that a discriminant model that included all four admission variables was useful for predicting program completion outcomes: successfully graduated, dropped out, or dismissed from the program. Implications for the admissions and screening process are presented.  相似文献   

20.
Test scores matter these days. Test‐takers want to understand how they performed, and test score reports, particularly those for individual examinees, are the vehicles by which most people get the bulk of this information. Historically, score reports have not always met the examinees’ information or usability needs, but this is clearly changing for the better due to recent, much‐needed additions to the psychometric literature as well as improved efforts in reporting practices. This paper provides an overview of score reports from a development perspective, focusing on current practices and emerging efforts in content of reports as well as the process by which reports are designed, evaluated, and ultimately used to communicate with the public.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号