Cargando…

Ultra-fast genome comparison for large-scale genomic experiments

In the last decade, a technological shift in the bioinformatics field has occurred: larger genomes can now be sequenced quickly and cost effectively, resulting in the computational need to efficiently compare large and abundant sequences. Furthermore, detecting conserved similarities across large co...

Descripción completa

Detalles Bibliográficos
Autores principales:	Pérez-Wohlfeil, Esteban, Diaz-del-Pino, Sergio, Trelles, Oswaldo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2019
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6635410/ https://www.ncbi.nlm.nih.gov/pubmed/31312019 http://dx.doi.org/10.1038/s41598-019-46773-w

_version_	1783435879475314688
author	Pérez-Wohlfeil, Esteban Diaz-del-Pino, Sergio Trelles, Oswaldo
author_facet	Pérez-Wohlfeil, Esteban Diaz-del-Pino, Sergio Trelles, Oswaldo
author_sort	Pérez-Wohlfeil, Esteban
collection	PubMed
description	In the last decade, a technological shift in the bioinformatics field has occurred: larger genomes can now be sequenced quickly and cost effectively, resulting in the computational need to efficiently compare large and abundant sequences. Furthermore, detecting conserved similarities across large collections of genomes remains a problem. The size of chromosomes, along with the substantial amount of noise and number of repeats found in DNA sequences (particularly in mammals and plants), leads to a scenario where executing and waiting for complete outputs is both time and resource consuming. Filtering steps, manual examination and annotation, very long execution times and a high demand for computational resources represent a few of the many difficulties faced in large genome comparisons. In this work, we provide a method designed for comparisons of considerable amounts of very long sequences that employs a heuristic algorithm capable of separating noise and repeats from conserved fragments in pairwise genomic comparisons. We provide software implementation that computes in linear time using one core as a minimum and a small, constant memory footprint. The method produces both a previsualization of the comparison and a collection of indices to drastically reduce computational complexity when performing exhaustive comparisons. Last, the method scores the comparison to automate classification of sequences and produces a list of detected synteny blocks to enable new evolutionary studies.
format	Online Article Text
id	pubmed-6635410
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-66354102019-07-24 Ultra-fast genome comparison for large-scale genomic experiments Pérez-Wohlfeil, Esteban Diaz-del-Pino, Sergio Trelles, Oswaldo Sci Rep Article In the last decade, a technological shift in the bioinformatics field has occurred: larger genomes can now be sequenced quickly and cost effectively, resulting in the computational need to efficiently compare large and abundant sequences. Furthermore, detecting conserved similarities across large collections of genomes remains a problem. The size of chromosomes, along with the substantial amount of noise and number of repeats found in DNA sequences (particularly in mammals and plants), leads to a scenario where executing and waiting for complete outputs is both time and resource consuming. Filtering steps, manual examination and annotation, very long execution times and a high demand for computational resources represent a few of the many difficulties faced in large genome comparisons. In this work, we provide a method designed for comparisons of considerable amounts of very long sequences that employs a heuristic algorithm capable of separating noise and repeats from conserved fragments in pairwise genomic comparisons. We provide software implementation that computes in linear time using one core as a minimum and a small, constant memory footprint. The method produces both a previsualization of the comparison and a collection of indices to drastically reduce computational complexity when performing exhaustive comparisons. Last, the method scores the comparison to automate classification of sequences and produces a list of detected synteny blocks to enable new evolutionary studies. Nature Publishing Group UK 2019-07-16 /pmc/articles/PMC6635410/ /pubmed/31312019 http://dx.doi.org/10.1038/s41598-019-46773-w Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle	Article Pérez-Wohlfeil, Esteban Diaz-del-Pino, Sergio Trelles, Oswaldo Ultra-fast genome comparison for large-scale genomic experiments
title	Ultra-fast genome comparison for large-scale genomic experiments
title_full	Ultra-fast genome comparison for large-scale genomic experiments
title_fullStr	Ultra-fast genome comparison for large-scale genomic experiments
title_full_unstemmed	Ultra-fast genome comparison for large-scale genomic experiments
title_short	Ultra-fast genome comparison for large-scale genomic experiments
title_sort	ultra-fast genome comparison for large-scale genomic experiments
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6635410/ https://www.ncbi.nlm.nih.gov/pubmed/31312019 http://dx.doi.org/10.1038/s41598-019-46773-w
work_keys_str_mv	AT perezwohlfeilesteban ultrafastgenomecomparisonforlargescalegenomicexperiments AT diazdelpinosergio ultrafastgenomecomparisonforlargescalegenomicexperiments AT trellesoswaldo ultrafastgenomecomparisonforlargescalegenomicexperiments

Ultra-fast genome comparison for large-scale genomic experiments

Ejemplares similares