首页 | 本学科首页   官方微博 | 高级检索  
     检索      

支持向量分类器及其在原核生物基因计算识别中的应用
引用本文:黄国华.支持向量分类器及其在原核生物基因计算识别中的应用[J].湖南第一师范学报,2011,11(2):133-136.
作者姓名:黄国华
作者单位:邵阳学院,理学与信息科学系,湖南,邵阳,422000
摘    要:以支持向量机为分类器,序列的k-letter词为特征,建立了原核生物的基因识别模型。分别选取已知功能的基因为正样本,和与等长正样本的随机突变序列为负样本组成训练集。5倍交叉实验的结果表示,对于具有不同核函数的支持向量机以及不同长度的词特征,其预测准确率不同,最高的可达94%以上,最差的低于60%;长度为3的词的特征的分类结果最好,其次是长度为4。这说明3联核苷酸为基因序列比较好的统计特征。

关 键 词:支持向量机  基因识别  核函数  K—letter词

Support Vector Machine and Its Application in Computational Recognition of Prokaryotes Gene
HUANG Guo-hua.Support Vector Machine and Its Application in Computational Recognition of Prokaryotes Gene[J].Journal of First Teachers College of Hunan,2011,11(2):133-136.
Authors:HUANG Guo-hua
Institution:HUANG Guo-hua ( Department of Science and Information-science, Shaoyang University, Shaoyang, Hunan 422000)
Abstract:A model of gene recognition of Prokaryotes is built, with Support Vector Machine as a method of classification and k-letters word of a sequence as a characteristic. The train set consists of positive samples which are chosen out from known-function genes and equal negative ones generated randomly from the corresponding positive sample. The resuk in 5-cross experiments indicates that accuracy of prediction for SVMs varies with kernal functions and length of word, better above 94% and worse below 60%; the best classification result is of 3-letter word and next 4-letter word. This demonstrates 3 amino acids is a better statistical characteristic ofgene sequences.
Keywords:Support Vector Machine  gene recognition  kemal function  K-letter word
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号