Cargando…

NMFP: a non-negative matrix factorization based preselection method to increase accuracy of identifying mRNA isoforms from RNA-seq data

BACKGROUND: The advent of next-generation RNA sequencing (RNA-seq) has greatly advanced transcriptomic studies, including system-wide identification and quantification of mRNA isoforms under various biological conditions. A number of computational methods have been developed to systematically identi...

Descripción completa

Detalles Bibliográficos
Autores principales: Ye, Yuting, Li, Jingyi Jessica
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4895266/
https://www.ncbi.nlm.nih.gov/pubmed/26818007
http://dx.doi.org/10.1186/s12864-015-2304-8
_version_ 1782435814345539584
author Ye, Yuting
Li, Jingyi Jessica
author_facet Ye, Yuting
Li, Jingyi Jessica
author_sort Ye, Yuting
collection PubMed
description BACKGROUND: The advent of next-generation RNA sequencing (RNA-seq) has greatly advanced transcriptomic studies, including system-wide identification and quantification of mRNA isoforms under various biological conditions. A number of computational methods have been developed to systematically identify mRNA isoforms in a high-throughput manner from RNA-seq data. However, a common drawback of these methods is that their identified mRNA isoforms contain a high percentage of false positives, especially for genes with complex splicing structures, e.g., many exons and exon junctions. RESULTS: We have developed a preselection method called “Non-negative Matrix Factorization Preselection” (NMFP) which is designed to improve the accuracy of computational methods in identifying mRNA isoforms from RNA-seq data. We demonstrated through simulation and real data studies that NMFP can effectively shrink the search space of isoform candidates and increase the accuracy of two mainstream computational methods, Cufflinks and SLIDE, in their identification of mRNA isoforms. CONCLUSION: NMFP is a useful tool to preselect mRNA isoform candidates for downstream isoform discovery methods. It can greatly reduce the number of isoform candidates while maintaining a good coverage of unknown true isoforms. Adding NMFP as an upstream step, computational methods are expected to achieve better accuracy in identifying mRNA isoforms from RNA-seq data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-2304-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4895266
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-48952662016-06-10 NMFP: a non-negative matrix factorization based preselection method to increase accuracy of identifying mRNA isoforms from RNA-seq data Ye, Yuting Li, Jingyi Jessica BMC Genomics Proceedings BACKGROUND: The advent of next-generation RNA sequencing (RNA-seq) has greatly advanced transcriptomic studies, including system-wide identification and quantification of mRNA isoforms under various biological conditions. A number of computational methods have been developed to systematically identify mRNA isoforms in a high-throughput manner from RNA-seq data. However, a common drawback of these methods is that their identified mRNA isoforms contain a high percentage of false positives, especially for genes with complex splicing structures, e.g., many exons and exon junctions. RESULTS: We have developed a preselection method called “Non-negative Matrix Factorization Preselection” (NMFP) which is designed to improve the accuracy of computational methods in identifying mRNA isoforms from RNA-seq data. We demonstrated through simulation and real data studies that NMFP can effectively shrink the search space of isoform candidates and increase the accuracy of two mainstream computational methods, Cufflinks and SLIDE, in their identification of mRNA isoforms. CONCLUSION: NMFP is a useful tool to preselect mRNA isoform candidates for downstream isoform discovery methods. It can greatly reduce the number of isoform candidates while maintaining a good coverage of unknown true isoforms. Adding NMFP as an upstream step, computational methods are expected to achieve better accuracy in identifying mRNA isoforms from RNA-seq data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-2304-8) contains supplementary material, which is available to authorized users. BioMed Central 2016-01-11 /pmc/articles/PMC4895266/ /pubmed/26818007 http://dx.doi.org/10.1186/s12864-015-2304-8 Text en © Ye and Li. 2015 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Proceedings
Ye, Yuting
Li, Jingyi Jessica
NMFP: a non-negative matrix factorization based preselection method to increase accuracy of identifying mRNA isoforms from RNA-seq data
title NMFP: a non-negative matrix factorization based preselection method to increase accuracy of identifying mRNA isoforms from RNA-seq data
title_full NMFP: a non-negative matrix factorization based preselection method to increase accuracy of identifying mRNA isoforms from RNA-seq data
title_fullStr NMFP: a non-negative matrix factorization based preselection method to increase accuracy of identifying mRNA isoforms from RNA-seq data
title_full_unstemmed NMFP: a non-negative matrix factorization based preselection method to increase accuracy of identifying mRNA isoforms from RNA-seq data
title_short NMFP: a non-negative matrix factorization based preselection method to increase accuracy of identifying mRNA isoforms from RNA-seq data
title_sort nmfp: a non-negative matrix factorization based preselection method to increase accuracy of identifying mrna isoforms from rna-seq data
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4895266/
https://www.ncbi.nlm.nih.gov/pubmed/26818007
http://dx.doi.org/10.1186/s12864-015-2304-8
work_keys_str_mv AT yeyuting nmfpanonnegativematrixfactorizationbasedpreselectionmethodtoincreaseaccuracyofidentifyingmrnaisoformsfromrnaseqdata
AT lijingyijessica nmfpanonnegativematrixfactorizationbasedpreselectionmethodtoincreaseaccuracyofidentifyingmrnaisoformsfromrnaseqdata