Cargando…

Probabilistic error correction for RNA sequencing

Sequencing of RNAs (RNA-Seq) has revolutionized the field of transcriptomics, but the reads obtained often contain errors. Read error correction can have a large impact on our ability to accurately assemble transcripts. This is especially true for de novo transcriptome analysis, where a reference ge...

Descripción completa

Detalles Bibliográficos
Autores principales:	Le, Hai-Son, Schulz, Marcel H., McCauley, Brenna M., Hinman, Veronica F., Bar-Joseph, Ziv
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2013
Materias:	Methods Online
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3664804/ https://www.ncbi.nlm.nih.gov/pubmed/23558750 http://dx.doi.org/10.1093/nar/gkt215

_version_	1782271165911269376
author	Le, Hai-Son Schulz, Marcel H. McCauley, Brenna M. Hinman, Veronica F. Bar-Joseph, Ziv
author_facet	Le, Hai-Son Schulz, Marcel H. McCauley, Brenna M. Hinman, Veronica F. Bar-Joseph, Ziv
author_sort	Le, Hai-Son
collection	PubMed
description	Sequencing of RNAs (RNA-Seq) has revolutionized the field of transcriptomics, but the reads obtained often contain errors. Read error correction can have a large impact on our ability to accurately assemble transcripts. This is especially true for de novo transcriptome analysis, where a reference genome is not available. Current read error correction methods, developed for DNA sequence data, cannot handle the overlapping effects of non-uniform abundance, polymorphisms and alternative splicing. Here we present SEquencing Error CorrEction in Rna-seq data (SEECER), a hidden Markov Model (HMM)–based method, which is the first to successfully address these problems. SEECER efficiently learns hundreds of thousands of HMMs and uses these to correct sequencing errors. Using human RNA-Seq data, we show that SEECER greatly improves on previous methods in terms of quality of read alignment to the genome and assembly accuracy. To illustrate the usefulness of SEECER for de novo transcriptome studies, we generated new RNA-Seq data to study the development of the sea cucumber Parastichopus parvimensis. Our corrected assembled transcripts shed new light on two important stages in sea cucumber development. Comparison of the assembled transcripts to known transcripts in other species has also revealed novel transcripts that are unique to sea cucumber, some of which we have experimentally validated. Supporting website: http://sb.cs.cmu.edu/seecer/.
format	Online Article Text
id	pubmed-3664804
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-36648042013-05-28 Probabilistic error correction for RNA sequencing Le, Hai-Son Schulz, Marcel H. McCauley, Brenna M. Hinman, Veronica F. Bar-Joseph, Ziv Nucleic Acids Res Methods Online Sequencing of RNAs (RNA-Seq) has revolutionized the field of transcriptomics, but the reads obtained often contain errors. Read error correction can have a large impact on our ability to accurately assemble transcripts. This is especially true for de novo transcriptome analysis, where a reference genome is not available. Current read error correction methods, developed for DNA sequence data, cannot handle the overlapping effects of non-uniform abundance, polymorphisms and alternative splicing. Here we present SEquencing Error CorrEction in Rna-seq data (SEECER), a hidden Markov Model (HMM)–based method, which is the first to successfully address these problems. SEECER efficiently learns hundreds of thousands of HMMs and uses these to correct sequencing errors. Using human RNA-Seq data, we show that SEECER greatly improves on previous methods in terms of quality of read alignment to the genome and assembly accuracy. To illustrate the usefulness of SEECER for de novo transcriptome studies, we generated new RNA-Seq data to study the development of the sea cucumber Parastichopus parvimensis. Our corrected assembled transcripts shed new light on two important stages in sea cucumber development. Comparison of the assembled transcripts to known transcripts in other species has also revealed novel transcripts that are unique to sea cucumber, some of which we have experimentally validated. Supporting website: http://sb.cs.cmu.edu/seecer/. Oxford University Press 2013-05 2013-04-03 /pmc/articles/PMC3664804/ /pubmed/23558750 http://dx.doi.org/10.1093/nar/gkt215 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methods Online Le, Hai-Son Schulz, Marcel H. McCauley, Brenna M. Hinman, Veronica F. Bar-Joseph, Ziv Probabilistic error correction for RNA sequencing
title	Probabilistic error correction for RNA sequencing
title_full	Probabilistic error correction for RNA sequencing
title_fullStr	Probabilistic error correction for RNA sequencing
title_full_unstemmed	Probabilistic error correction for RNA sequencing
title_short	Probabilistic error correction for RNA sequencing
title_sort	probabilistic error correction for rna sequencing
topic	Methods Online
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3664804/ https://www.ncbi.nlm.nih.gov/pubmed/23558750 http://dx.doi.org/10.1093/nar/gkt215
work_keys_str_mv	AT lehaison probabilisticerrorcorrectionforrnasequencing AT schulzmarcelh probabilisticerrorcorrectionforrnasequencing AT mccauleybrennam probabilisticerrorcorrectionforrnasequencing AT hinmanveronicaf probabilisticerrorcorrectionforrnasequencing AT barjosephziv probabilisticerrorcorrectionforrnasequencing

Probabilistic error correction for RNA sequencing

Ejemplares similares