Cargando…

Learning from positive examples when the negative class is undetermined- microRNA gene identification

BACKGROUND: The application of machine learning to classification problems that depend only on positive examples is gaining attention in the computational biology community. We and others have described the use of two-class machine learning to identify novel miRNAs. These methods require the generat...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yousef, Malik, Jung, Segun, Showe, Louise C, Showe, Michael K
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2248178/ https://www.ncbi.nlm.nih.gov/pubmed/18226233 http://dx.doi.org/10.1186/1748-7188-3-2

_version_	1782150973479714816
author	Yousef, Malik Jung, Segun Showe, Louise C Showe, Michael K
author_facet	Yousef, Malik Jung, Segun Showe, Louise C Showe, Michael K
author_sort	Yousef, Malik
collection	PubMed
description	BACKGROUND: The application of machine learning to classification problems that depend only on positive examples is gaining attention in the computational biology community. We and others have described the use of two-class machine learning to identify novel miRNAs. These methods require the generation of an artificial negative class. However, designation of the negative class can be problematic and if it is not properly done can affect the performance of the classifier dramatically and/or yield a biased estimate of performance. We present a study using one-class machine learning for microRNA (miRNA) discovery and compare one-class to two-class approaches using naïve Bayes and Support Vector Machines. These results are compared to published two-class miRNA prediction approaches. We also examine the ability of the one-class and two-class techniques to identify miRNAs in newly sequenced species. RESULTS: Of all methods tested, we found that 2-class naive Bayes and Support Vector Machines gave the best accuracy using our selected features and optimally chosen negative examples. One class methods showed average accuracies of 70–80% versus 90% for the two 2-class methods on the same feature sets. However, some one-class methods outperform some recently published two-class approaches with different selected features. Using the EBV genome as and external validation of the method we found one-class machine learning to work as well as or better than a two-class approach in identifying true miRNAs as well as predicting new miRNAs. CONCLUSION: One and two class methods can both give useful classification accuracies when the negative class is well characterized. The advantage of one class methods is that it eliminates guessing at the optimal features for the negative class when they are not well defined. In these cases one-class methods can be superior to two-class methods when the features which are chosen as representative of that positive class are well defined. AVAILABILITY: The OneClassmiRNA program is available at: [1]
format	Text
id	pubmed-2248178
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-22481782008-02-20 Learning from positive examples when the negative class is undetermined- microRNA gene identification Yousef, Malik Jung, Segun Showe, Louise C Showe, Michael K Algorithms Mol Biol Research BACKGROUND: The application of machine learning to classification problems that depend only on positive examples is gaining attention in the computational biology community. We and others have described the use of two-class machine learning to identify novel miRNAs. These methods require the generation of an artificial negative class. However, designation of the negative class can be problematic and if it is not properly done can affect the performance of the classifier dramatically and/or yield a biased estimate of performance. We present a study using one-class machine learning for microRNA (miRNA) discovery and compare one-class to two-class approaches using naïve Bayes and Support Vector Machines. These results are compared to published two-class miRNA prediction approaches. We also examine the ability of the one-class and two-class techniques to identify miRNAs in newly sequenced species. RESULTS: Of all methods tested, we found that 2-class naive Bayes and Support Vector Machines gave the best accuracy using our selected features and optimally chosen negative examples. One class methods showed average accuracies of 70–80% versus 90% for the two 2-class methods on the same feature sets. However, some one-class methods outperform some recently published two-class approaches with different selected features. Using the EBV genome as and external validation of the method we found one-class machine learning to work as well as or better than a two-class approach in identifying true miRNAs as well as predicting new miRNAs. CONCLUSION: One and two class methods can both give useful classification accuracies when the negative class is well characterized. The advantage of one class methods is that it eliminates guessing at the optimal features for the negative class when they are not well defined. In these cases one-class methods can be superior to two-class methods when the features which are chosen as representative of that positive class are well defined. AVAILABILITY: The OneClassmiRNA program is available at: [1] BioMed Central 2008-01-28 /pmc/articles/PMC2248178/ /pubmed/18226233 http://dx.doi.org/10.1186/1748-7188-3-2 Text en Copyright © 2008 Yousef et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Yousef, Malik Jung, Segun Showe, Louise C Showe, Michael K Learning from positive examples when the negative class is undetermined- microRNA gene identification
title	Learning from positive examples when the negative class is undetermined- microRNA gene identification
title_full	Learning from positive examples when the negative class is undetermined- microRNA gene identification
title_fullStr	Learning from positive examples when the negative class is undetermined- microRNA gene identification
title_full_unstemmed	Learning from positive examples when the negative class is undetermined- microRNA gene identification
title_short	Learning from positive examples when the negative class is undetermined- microRNA gene identification
title_sort	learning from positive examples when the negative class is undetermined- microrna gene identification
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2248178/ https://www.ncbi.nlm.nih.gov/pubmed/18226233 http://dx.doi.org/10.1186/1748-7188-3-2
work_keys_str_mv	AT yousefmalik learningfrompositiveexampleswhenthenegativeclassisundeterminedmicrornageneidentification AT jungsegun learningfrompositiveexampleswhenthenegativeclassisundeterminedmicrornageneidentification AT showelouisec learningfrompositiveexampleswhenthenegativeclassisundeterminedmicrornageneidentification AT showemichaelk learningfrompositiveexampleswhenthenegativeclassisundeterminedmicrornageneidentification

Learning from positive examples when the negative class is undetermined- microRNA gene identification

Ejemplares similares