Cargando…

Effective machine-learning assembly for next-generation amplicon sequencing with very low coverage

BACKGROUND: In short-read DNA sequencing experiments, the read coverage is a key parameter to successfully assemble the reads and reconstruct the sequence of the input DNA. When coverage is very low, the original sequence reconstruction from the reads can be difficult because of the occurrence of un...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ranjard, Louis, Wong, Thomas K. F., Rodrigo, Allen G.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6907241/ https://www.ncbi.nlm.nih.gov/pubmed/31829137 http://dx.doi.org/10.1186/s12859-019-3287-2

_version_	1783478511092105216
author	Ranjard, Louis Wong, Thomas K. F. Rodrigo, Allen G.
author_facet	Ranjard, Louis Wong, Thomas K. F. Rodrigo, Allen G.
author_sort	Ranjard, Louis
collection	PubMed
description	BACKGROUND: In short-read DNA sequencing experiments, the read coverage is a key parameter to successfully assemble the reads and reconstruct the sequence of the input DNA. When coverage is very low, the original sequence reconstruction from the reads can be difficult because of the occurrence of uncovered gaps. Reference guided assembly can then improve these assemblies. However, when the available reference is phylogenetically distant from the sequencing reads, the mapping rate of the reads can be extremely low. Some recent improvements in read mapping approaches aim at modifying the reference according to the reads dynamically. Such approaches can significantly improve the alignment rate of the reads onto distant references but the processing of insertions and deletions remains challenging. RESULTS: Here, we introduce a new algorithm to update the reference sequence according to previously aligned reads. Substitutions, insertions and deletions are performed in the reference sequence dynamically. We evaluate this approach to assemble a western-grey kangaroo mitochondrial amplicon. Our results show that more reads can be aligned and that this method produces assemblies of length comparable to the truth while limiting error rate when classic approaches fail to recover the correct length. Finally, we discuss how the core algorithm of this method could be improved and combined with other approaches to analyse larger genomic sequences. CONCLUSIONS: We introduced an algorithm to perform dynamic alignment of reads on a distant reference. We showed that such approach can improve the reconstruction of an amplicon compared to classically used bioinformatic pipelines. Although not portable to genomic scale in the current form, we suggested several improvements to be investigated to make this method more flexible and allow dynamic alignment to be used for large genome assemblies.
format	Online Article Text
id	pubmed-6907241
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-69072412019-12-20 Effective machine-learning assembly for next-generation amplicon sequencing with very low coverage Ranjard, Louis Wong, Thomas K. F. Rodrigo, Allen G. BMC Bioinformatics Research Article BACKGROUND: In short-read DNA sequencing experiments, the read coverage is a key parameter to successfully assemble the reads and reconstruct the sequence of the input DNA. When coverage is very low, the original sequence reconstruction from the reads can be difficult because of the occurrence of uncovered gaps. Reference guided assembly can then improve these assemblies. However, when the available reference is phylogenetically distant from the sequencing reads, the mapping rate of the reads can be extremely low. Some recent improvements in read mapping approaches aim at modifying the reference according to the reads dynamically. Such approaches can significantly improve the alignment rate of the reads onto distant references but the processing of insertions and deletions remains challenging. RESULTS: Here, we introduce a new algorithm to update the reference sequence according to previously aligned reads. Substitutions, insertions and deletions are performed in the reference sequence dynamically. We evaluate this approach to assemble a western-grey kangaroo mitochondrial amplicon. Our results show that more reads can be aligned and that this method produces assemblies of length comparable to the truth while limiting error rate when classic approaches fail to recover the correct length. Finally, we discuss how the core algorithm of this method could be improved and combined with other approaches to analyse larger genomic sequences. CONCLUSIONS: We introduced an algorithm to perform dynamic alignment of reads on a distant reference. We showed that such approach can improve the reconstruction of an amplicon compared to classically used bioinformatic pipelines. Although not portable to genomic scale in the current form, we suggested several improvements to be investigated to make this method more flexible and allow dynamic alignment to be used for large genome assemblies. BioMed Central 2019-12-11 /pmc/articles/PMC6907241/ /pubmed/31829137 http://dx.doi.org/10.1186/s12859-019-3287-2 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Ranjard, Louis Wong, Thomas K. F. Rodrigo, Allen G. Effective machine-learning assembly for next-generation amplicon sequencing with very low coverage
title	Effective machine-learning assembly for next-generation amplicon sequencing with very low coverage
title_full	Effective machine-learning assembly for next-generation amplicon sequencing with very low coverage
title_fullStr	Effective machine-learning assembly for next-generation amplicon sequencing with very low coverage
title_full_unstemmed	Effective machine-learning assembly for next-generation amplicon sequencing with very low coverage
title_short	Effective machine-learning assembly for next-generation amplicon sequencing with very low coverage
title_sort	effective machine-learning assembly for next-generation amplicon sequencing with very low coverage
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6907241/ https://www.ncbi.nlm.nih.gov/pubmed/31829137 http://dx.doi.org/10.1186/s12859-019-3287-2
work_keys_str_mv	AT ranjardlouis effectivemachinelearningassemblyfornextgenerationampliconsequencingwithverylowcoverage AT wongthomaskf effectivemachinelearningassemblyfornextgenerationampliconsequencingwithverylowcoverage AT rodrigoalleng effectivemachinelearningassemblyfornextgenerationampliconsequencingwithverylowcoverage

Effective machine-learning assembly for next-generation amplicon sequencing with very low coverage

Ejemplares similares