首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Application of the Levenshtein Distance Metric for the Construction of Longitudinal Data Files
Authors:Harold C Doran  Paul B Van Wamelen
Institution:American Institutes for Research
Abstract:The analysis of longitudinal data in education is becoming more prevalent given the nature of testing systems constructed for No Child Left Behind Act (NCLB). However, constructing the longitudinal data files remains a significant challenge. Students move into new schools, but in many cases the unique identifiers (ID) that should remain constant for each student change. As a result, different students frequently share the same ID, and merging records for an ID that is erroneously assigned to different students clearly becomes problematic. In small data sets, quality assurance of the merge can proceed through human reviews of the data to ensure all merged records are properly joined. However, in data sets with hundreds of thousands of cases, quality assurance via human review is impossible. While the record linkage literature has many applications in other disciplines, the educational measurement literature lacks details of formal protocols that can be used for quality assurance procedures for longitudinal data files. This article presents an empirical quality assurance procedure that may be used to verify the integrity of the merges performed for longitudinal analysis. We also discuss possible extensions that would permit merges to occur even when unique identifiers are not available.
Keywords:Levenshtein algorithm  longitudinal analysis  quality assurance  R program  record linkage
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号