Cargando…
Improved transcriptome assembly using a hybrid of long and short reads with StringTie
Short-read RNA sequencing and long-read RNA sequencing each have their strengths and weaknesses for transcriptome assembly. While short reads are highly accurate, they are rarely able to span multiple exons. Long-read technology can capture full-length transcripts, but its relatively high error rate...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9191730/ https://www.ncbi.nlm.nih.gov/pubmed/35648784 http://dx.doi.org/10.1371/journal.pcbi.1009730 |
_version_ | 1784726080666468352 |
---|---|
author | Shumate, Alaina Wong, Brandon Pertea, Geo Pertea, Mihaela |
author_facet | Shumate, Alaina Wong, Brandon Pertea, Geo Pertea, Mihaela |
author_sort | Shumate, Alaina |
collection | PubMed |
description | Short-read RNA sequencing and long-read RNA sequencing each have their strengths and weaknesses for transcriptome assembly. While short reads are highly accurate, they are rarely able to span multiple exons. Long-read technology can capture full-length transcripts, but its relatively high error rate often leads to mis-identified splice sites. Here we present a new release of StringTie that performs hybrid-read assembly. By taking advantage of the strengths of both long and short reads, hybrid-read assembly with StringTie is more accurate than long-read only or short-read only assembly, and on some datasets it can more than double the number of correctly assembled transcripts, while obtaining substantially higher precision than the long-read data assembly alone. Here we demonstrate the improved accuracy on simulated data and real data from Arabidopsis thaliana, Mus musculus, and human. We also show that hybrid-read assembly is more accurate than correcting long reads prior to assembly while also being substantially faster. StringTie is freely available as open source software at https://github.com/gpertea/stringtie. |
format | Online Article Text |
id | pubmed-9191730 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-91917302022-06-14 Improved transcriptome assembly using a hybrid of long and short reads with StringTie Shumate, Alaina Wong, Brandon Pertea, Geo Pertea, Mihaela PLoS Comput Biol Research Article Short-read RNA sequencing and long-read RNA sequencing each have their strengths and weaknesses for transcriptome assembly. While short reads are highly accurate, they are rarely able to span multiple exons. Long-read technology can capture full-length transcripts, but its relatively high error rate often leads to mis-identified splice sites. Here we present a new release of StringTie that performs hybrid-read assembly. By taking advantage of the strengths of both long and short reads, hybrid-read assembly with StringTie is more accurate than long-read only or short-read only assembly, and on some datasets it can more than double the number of correctly assembled transcripts, while obtaining substantially higher precision than the long-read data assembly alone. Here we demonstrate the improved accuracy on simulated data and real data from Arabidopsis thaliana, Mus musculus, and human. We also show that hybrid-read assembly is more accurate than correcting long reads prior to assembly while also being substantially faster. StringTie is freely available as open source software at https://github.com/gpertea/stringtie. Public Library of Science 2022-06-01 /pmc/articles/PMC9191730/ /pubmed/35648784 http://dx.doi.org/10.1371/journal.pcbi.1009730 Text en © 2022 Shumate et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Shumate, Alaina Wong, Brandon Pertea, Geo Pertea, Mihaela Improved transcriptome assembly using a hybrid of long and short reads with StringTie |
title | Improved transcriptome assembly using a hybrid of long and short reads with StringTie |
title_full | Improved transcriptome assembly using a hybrid of long and short reads with StringTie |
title_fullStr | Improved transcriptome assembly using a hybrid of long and short reads with StringTie |
title_full_unstemmed | Improved transcriptome assembly using a hybrid of long and short reads with StringTie |
title_short | Improved transcriptome assembly using a hybrid of long and short reads with StringTie |
title_sort | improved transcriptome assembly using a hybrid of long and short reads with stringtie |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9191730/ https://www.ncbi.nlm.nih.gov/pubmed/35648784 http://dx.doi.org/10.1371/journal.pcbi.1009730 |
work_keys_str_mv | AT shumatealaina improvedtranscriptomeassemblyusingahybridoflongandshortreadswithstringtie AT wongbrandon improvedtranscriptomeassemblyusingahybridoflongandshortreadswithstringtie AT perteageo improvedtranscriptomeassemblyusingahybridoflongandshortreadswithstringtie AT perteamihaela improvedtranscriptomeassemblyusingahybridoflongandshortreadswithstringtie |