首页 | 本学科首页   官方微博 | 高级检索  
     检索      


A comparative study of automated legal text classification using random forests and deep learning
Institution:1. School of Information Management, Wuhan University, Wuhan 430072, PR China;2. Information Retrieval and Knowledge Mining Laboratory, Wuhan University, Wuhan 430072, PR China;1. School of Information Management, Wuhan University, Wuhan, Hubei, China;2. Information Retrieval and Knowledge Mining Laboratory, Wuhan University, Wuhan, Hubei, China;3. Department of Information Management, Peking University, Beijing, China
Abstract:Automated legal text classification is a prominent research topic in the legal field. It lays the foundation for building an intelligent legal system. Current literature focuses on international legal texts, such as Chinese cases, European cases, and Australian cases. Little attention is paid to text classification for U.S. legal texts. Deep learning has been applied to improving text classification performance. Its effectiveness needs further exploration in domains such as the legal field. This paper investigates legal text classification with a large collection of labeled U.S. case documents through comparing the effectiveness of different text classification techniques. We propose a machine learning algorithm using domain concepts as features and random forests as the classifier. Our experiment results on 30,000 full U.S. case documents in 50 categories demonstrated that our approach significantly outperforms a deep learning system built on multiple pre-trained word embeddings and deep neural networks. In addition, applying only the top 400 domain concepts as features for building the random forests could achieve the best performance. This study provides a reference to select machine learning techniques for building high-performance text classification systems in the legal domain or other fields.
Keywords:Legal text classification  Machine learning  Deep learning  Domain concept  Word embedding  Random forests
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号