Cargando…

Improved transcriptome assembly using a hybrid of long and short reads with StringTie

Short-read RNA sequencing and long-read RNA sequencing each have their strengths and weaknesses for transcriptome assembly. While short reads are highly accurate, they are rarely able to span multiple exons. Long-read technology can capture full-length transcripts, but its relatively high error rate...

Descripción completa

Detalles Bibliográficos
Autores principales: Shumate, Alaina, Wong, Brandon, Pertea, Geo, Pertea, Mihaela
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9191730/
https://www.ncbi.nlm.nih.gov/pubmed/35648784
http://dx.doi.org/10.1371/journal.pcbi.1009730
_version_ 1784726080666468352
author Shumate, Alaina
Wong, Brandon
Pertea, Geo
Pertea, Mihaela
author_facet Shumate, Alaina
Wong, Brandon
Pertea, Geo
Pertea, Mihaela
author_sort Shumate, Alaina
collection PubMed
description Short-read RNA sequencing and long-read RNA sequencing each have their strengths and weaknesses for transcriptome assembly. While short reads are highly accurate, they are rarely able to span multiple exons. Long-read technology can capture full-length transcripts, but its relatively high error rate often leads to mis-identified splice sites. Here we present a new release of StringTie that performs hybrid-read assembly. By taking advantage of the strengths of both long and short reads, hybrid-read assembly with StringTie is more accurate than long-read only or short-read only assembly, and on some datasets it can more than double the number of correctly assembled transcripts, while obtaining substantially higher precision than the long-read data assembly alone. Here we demonstrate the improved accuracy on simulated data and real data from Arabidopsis thaliana, Mus musculus, and human. We also show that hybrid-read assembly is more accurate than correcting long reads prior to assembly while also being substantially faster. StringTie is freely available as open source software at https://github.com/gpertea/stringtie.
format Online
Article
Text
id pubmed-9191730
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-91917302022-06-14 Improved transcriptome assembly using a hybrid of long and short reads with StringTie Shumate, Alaina Wong, Brandon Pertea, Geo Pertea, Mihaela PLoS Comput Biol Research Article Short-read RNA sequencing and long-read RNA sequencing each have their strengths and weaknesses for transcriptome assembly. While short reads are highly accurate, they are rarely able to span multiple exons. Long-read technology can capture full-length transcripts, but its relatively high error rate often leads to mis-identified splice sites. Here we present a new release of StringTie that performs hybrid-read assembly. By taking advantage of the strengths of both long and short reads, hybrid-read assembly with StringTie is more accurate than long-read only or short-read only assembly, and on some datasets it can more than double the number of correctly assembled transcripts, while obtaining substantially higher precision than the long-read data assembly alone. Here we demonstrate the improved accuracy on simulated data and real data from Arabidopsis thaliana, Mus musculus, and human. We also show that hybrid-read assembly is more accurate than correcting long reads prior to assembly while also being substantially faster. StringTie is freely available as open source software at https://github.com/gpertea/stringtie. Public Library of Science 2022-06-01 /pmc/articles/PMC9191730/ /pubmed/35648784 http://dx.doi.org/10.1371/journal.pcbi.1009730 Text en © 2022 Shumate et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Shumate, Alaina
Wong, Brandon
Pertea, Geo
Pertea, Mihaela
Improved transcriptome assembly using a hybrid of long and short reads with StringTie
title Improved transcriptome assembly using a hybrid of long and short reads with StringTie
title_full Improved transcriptome assembly using a hybrid of long and short reads with StringTie
title_fullStr Improved transcriptome assembly using a hybrid of long and short reads with StringTie
title_full_unstemmed Improved transcriptome assembly using a hybrid of long and short reads with StringTie
title_short Improved transcriptome assembly using a hybrid of long and short reads with StringTie
title_sort improved transcriptome assembly using a hybrid of long and short reads with stringtie
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9191730/
https://www.ncbi.nlm.nih.gov/pubmed/35648784
http://dx.doi.org/10.1371/journal.pcbi.1009730
work_keys_str_mv AT shumatealaina improvedtranscriptomeassemblyusingahybridoflongandshortreadswithstringtie
AT wongbrandon improvedtranscriptomeassemblyusingahybridoflongandshortreadswithstringtie
AT perteageo improvedtranscriptomeassemblyusingahybridoflongandshortreadswithstringtie
AT perteamihaela improvedtranscriptomeassemblyusingahybridoflongandshortreadswithstringtie