Cargando…

LoReTTA, a user-friendly tool for assembling viral genomes from PacBio sequence data

Long-read, single-molecule DNA sequencing technologies have triggered a revolution in genomics by enabling the determination of large, reference-quality genomes in ways that overcome some of the limitations of short-read sequencing. However, the greater length and higher error rate of the reads gene...

Descripción completa

Detalles Bibliográficos
Autores principales: Al Qaffas, Ahmed, Nichols, Jenna, Davison, Andrew J, Ourahmane, Amine, Hertel, Laura, McVoy, Michael A, Camiolo, Salvatore
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8111061/
https://www.ncbi.nlm.nih.gov/pubmed/33996146
http://dx.doi.org/10.1093/ve/veab042
_version_ 1783690423483498496
author Al Qaffas, Ahmed
Nichols, Jenna
Davison, Andrew J
Ourahmane, Amine
Hertel, Laura
McVoy, Michael A
Camiolo, Salvatore
author_facet Al Qaffas, Ahmed
Nichols, Jenna
Davison, Andrew J
Ourahmane, Amine
Hertel, Laura
McVoy, Michael A
Camiolo, Salvatore
author_sort Al Qaffas, Ahmed
collection PubMed
description Long-read, single-molecule DNA sequencing technologies have triggered a revolution in genomics by enabling the determination of large, reference-quality genomes in ways that overcome some of the limitations of short-read sequencing. However, the greater length and higher error rate of the reads generated on long-read platforms make the tools used for assembling short reads unsuitable for use in data assembly and motivate the development of new approaches. We present LoReTTA (Long Read Template-Targeted Assembler), a tool designed for performing de novo assembly of long reads generated from viral genomes on the PacBio platform. LoReTTA exploits a reference genome to guide the assembly process, an approach that has been successful with short reads. The tool was designed to deal with reads originating from viral genomes, which feature high genetic variability, possible multiple isoforms, and the dominant presence of additional organisms in clinical or environmental samples. LoReTTA was tested on a range of simulated and experimental datasets and outperformed established long-read assemblers in terms of assembly contiguity and accuracy. The software runs under the Linux operating system, is designed for easy adaptation to alternative systems, and features an automatic installation pipeline that takes care of the required dependencies. A command-line version and a user-friendly graphical interface version are available under a GPLv3 license at https://bioinformatics.cvr.ac.uk/software/ with the manual and a test dataset.
format Online
Article
Text
id pubmed-8111061
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-81110612021-05-13 LoReTTA, a user-friendly tool for assembling viral genomes from PacBio sequence data Al Qaffas, Ahmed Nichols, Jenna Davison, Andrew J Ourahmane, Amine Hertel, Laura McVoy, Michael A Camiolo, Salvatore Virus Evol Resources Long-read, single-molecule DNA sequencing technologies have triggered a revolution in genomics by enabling the determination of large, reference-quality genomes in ways that overcome some of the limitations of short-read sequencing. However, the greater length and higher error rate of the reads generated on long-read platforms make the tools used for assembling short reads unsuitable for use in data assembly and motivate the development of new approaches. We present LoReTTA (Long Read Template-Targeted Assembler), a tool designed for performing de novo assembly of long reads generated from viral genomes on the PacBio platform. LoReTTA exploits a reference genome to guide the assembly process, an approach that has been successful with short reads. The tool was designed to deal with reads originating from viral genomes, which feature high genetic variability, possible multiple isoforms, and the dominant presence of additional organisms in clinical or environmental samples. LoReTTA was tested on a range of simulated and experimental datasets and outperformed established long-read assemblers in terms of assembly contiguity and accuracy. The software runs under the Linux operating system, is designed for easy adaptation to alternative systems, and features an automatic installation pipeline that takes care of the required dependencies. A command-line version and a user-friendly graphical interface version are available under a GPLv3 license at https://bioinformatics.cvr.ac.uk/software/ with the manual and a test dataset. Oxford University Press 2021-04-23 /pmc/articles/PMC8111061/ /pubmed/33996146 http://dx.doi.org/10.1093/ve/veab042 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Resources
Al Qaffas, Ahmed
Nichols, Jenna
Davison, Andrew J
Ourahmane, Amine
Hertel, Laura
McVoy, Michael A
Camiolo, Salvatore
LoReTTA, a user-friendly tool for assembling viral genomes from PacBio sequence data
title LoReTTA, a user-friendly tool for assembling viral genomes from PacBio sequence data
title_full LoReTTA, a user-friendly tool for assembling viral genomes from PacBio sequence data
title_fullStr LoReTTA, a user-friendly tool for assembling viral genomes from PacBio sequence data
title_full_unstemmed LoReTTA, a user-friendly tool for assembling viral genomes from PacBio sequence data
title_short LoReTTA, a user-friendly tool for assembling viral genomes from PacBio sequence data
title_sort loretta, a user-friendly tool for assembling viral genomes from pacbio sequence data
topic Resources
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8111061/
https://www.ncbi.nlm.nih.gov/pubmed/33996146
http://dx.doi.org/10.1093/ve/veab042
work_keys_str_mv AT alqaffasahmed lorettaauserfriendlytoolforassemblingviralgenomesfrompacbiosequencedata
AT nicholsjenna lorettaauserfriendlytoolforassemblingviralgenomesfrompacbiosequencedata
AT davisonandrewj lorettaauserfriendlytoolforassemblingviralgenomesfrompacbiosequencedata
AT ourahmaneamine lorettaauserfriendlytoolforassemblingviralgenomesfrompacbiosequencedata
AT hertellaura lorettaauserfriendlytoolforassemblingviralgenomesfrompacbiosequencedata
AT mcvoymichaela lorettaauserfriendlytoolforassemblingviralgenomesfrompacbiosequencedata
AT camiolosalvatore lorettaauserfriendlytoolforassemblingviralgenomesfrompacbiosequencedata