首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于深度学习表示的医学主题语义相似度计算及知识发现研究
引用本文:沈思,孙豪,王东波.基于深度学习表示的医学主题语义相似度计算及知识发现研究[J].情报理论与实践,2020,43(5):183-190.
作者姓名:沈思  孙豪  王东波
作者单位:南京理工大学,江苏 南京210094;南京农业大学,江苏 南京210095
基金项目:江苏省自然科学基金青年项目;国家自然科学基金;国家社会科学基金
摘    要:目的/意义]针对目前医学文本中疾病-基因等实体关联关系在知识发现中结合主题的研究较少,不足以揭示医学领域知识在主题层面的深层语义关联关系,提出了一套结合全文文本和领域知识主题的语义相似度计算方法。方法/过程]以肿瘤期刊全文本为研究对象,用TWE模型进行词向量和主题向量的词嵌入表示,基于Siamese Network框架结合文本和领域知识主题进行相似度计算。结果/结论]实验表明,该研究所提出的相似度计算方法在验证集中的预测F值达94%,最后通过对测试集数据进行聚类分析,从高、中、低频以及未进行临床注册实验的角度对疾病和关联基因进行分析,发现当前的热门研究以及未来可能成为研究热点的靶点基因。

关 键 词:深度学习  语义相似度  孪生神经网络  知识发现

Research on Topics Semantic Similarity Calculation and Knowledge Discovery of Medical Based on Deep Learning Representation
Abstract:Purpose/significance]The research of studies on the combination of subjects with disease-gene and other entity associations in knowledge discovery in medical texts is less,not enough to reveal the deep semantic relationship of medical knowledge in the topic level.Aiming at that,we proposed a set of semantic similarity calculation methods,considering the text of full text and domain knowledge topics.Method/process]Taking the full text of the oncology journal as the research object.The TWE model is applied to represent the word vector and the topic vector.Based on the Siamese Network framework,we conducted the similarity calculation,considering the text and domain knowledge topics.Result/conclusion]Experiments showed that the predicted F value of verification set is 94%by means of proposed similarity calculation method.Finally,through the cluster analysis of the test set data,the disease and related genes were analyzed from the perspectives of high,medium,low frequency and no clinical registration experiments,we found the current hot research and potential target genes that may become research hotspots in the future.
Keywords:deep learning  semantic similarity  siamese network  knowledge discovery
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号