首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Topic discovery based on text mining techniques
Authors:Aurora Pons-Porrata  Rafael Berlanga-Llavori  José Ruiz-Shulcloper
Institution:1. Center of Pattern Recognition and Data Mining, Universidad de Oriente, Patricio Lumumba s/n, Santiago de Cuba 90500, Cuba;2. Computer Science, Universitat Jaume I, Avda. Vicent Sos Banyat, Campus del Riu Sec s/n, E-12071 Castellón, Spain;3. Advanced Technologies Application Center, 7ma, No. 21812, Siboney, C. Habana, Cuba
Abstract:In this paper, we present a topic discovery system aimed to reveal the implicit knowledge present in news streams. This knowledge is expressed as a hierarchy of topic/subtopics, where each topic contains the set of documents that are related to it and a summary extracted from these documents. Summaries so built are useful to browse and select topics of interest from the generated hierarchies. Our proposal consists of a new incremental hierarchical clustering algorithm, which combines both partitional and agglomerative approaches, taking the main benefits from them. Finally, a new summarization method based on Testor Theory has been proposed to build the topic summaries. Experimental results in the TDT2 collection demonstrate its usefulness and effectiveness not only as a topic detection system, but also as a classification and summarization tool.
Keywords:Hierarchical clustering  Text summarization  Topic detection
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号