Cargando…

seq-seq-pan: building a computational pan-genome data structure on whole genome alignment

BACKGROUND: The increasing application of next generation sequencing technologies has led to the availability of thousands of reference genomes, often providing multiple genomes for the same or closely related species. The current approach to represent a species or a population with a single referen...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jandrasits, Christine, Dabrowski, Piotr W., Fuchs, Stephan, Renard, Bernhard Y.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2018
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5769345/ https://www.ncbi.nlm.nih.gov/pubmed/29334898 http://dx.doi.org/10.1186/s12864-017-4401-3

_version_	1783292881970135040
author	Jandrasits, Christine Dabrowski, Piotr W. Fuchs, Stephan Renard, Bernhard Y.
author_facet	Jandrasits, Christine Dabrowski, Piotr W. Fuchs, Stephan Renard, Bernhard Y.
author_sort	Jandrasits, Christine
collection	PubMed
description	BACKGROUND: The increasing application of next generation sequencing technologies has led to the availability of thousands of reference genomes, often providing multiple genomes for the same or closely related species. The current approach to represent a species or a population with a single reference sequence and a set of variations cannot represent their full diversity and introduces bias towards the chosen reference. There is a need for the representation of multiple sequences in a composite way that is compatible with existing data sources for annotation and suitable for established sequence analysis methods. At the same time, this representation needs to be easily accessible and extendable to account for the constant change of available genomes. RESULTS: We introduce seq-seq-pan, a framework that provides methods for adding or removing new genomes from a set of aligned genomes and uses these to construct a whole genome alignment. Throughout the sequential workflow the alignment is optimized for generating a representative linear presentation of the aligned set of genomes, that enables its usage for annotation and in downstream analyses. CONCLUSIONS: By providing dynamic updates and optimized processing, our approach enables the usage of whole genome alignment in the field of pan-genomics. In addition, the sequential workflow can be used as a fast alternative to existing whole genome aligners for aligning closely related genomes. seq-seq-pan is freely available at https://gitlab.com/rki_bioinformatics ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-4401-3) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5769345
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-57693452018-01-25 seq-seq-pan: building a computational pan-genome data structure on whole genome alignment Jandrasits, Christine Dabrowski, Piotr W. Fuchs, Stephan Renard, Bernhard Y. BMC Genomics Methodology Article BACKGROUND: The increasing application of next generation sequencing technologies has led to the availability of thousands of reference genomes, often providing multiple genomes for the same or closely related species. The current approach to represent a species or a population with a single reference sequence and a set of variations cannot represent their full diversity and introduces bias towards the chosen reference. There is a need for the representation of multiple sequences in a composite way that is compatible with existing data sources for annotation and suitable for established sequence analysis methods. At the same time, this representation needs to be easily accessible and extendable to account for the constant change of available genomes. RESULTS: We introduce seq-seq-pan, a framework that provides methods for adding or removing new genomes from a set of aligned genomes and uses these to construct a whole genome alignment. Throughout the sequential workflow the alignment is optimized for generating a representative linear presentation of the aligned set of genomes, that enables its usage for annotation and in downstream analyses. CONCLUSIONS: By providing dynamic updates and optimized processing, our approach enables the usage of whole genome alignment in the field of pan-genomics. In addition, the sequential workflow can be used as a fast alternative to existing whole genome aligners for aligning closely related genomes. seq-seq-pan is freely available at https://gitlab.com/rki_bioinformatics ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-4401-3) contains supplementary material, which is available to authorized users. BioMed Central 2018-01-15 /pmc/articles/PMC5769345/ /pubmed/29334898 http://dx.doi.org/10.1186/s12864-017-4401-3 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Article Jandrasits, Christine Dabrowski, Piotr W. Fuchs, Stephan Renard, Bernhard Y. seq-seq-pan: building a computational pan-genome data structure on whole genome alignment
title	seq-seq-pan: building a computational pan-genome data structure on whole genome alignment
title_full	seq-seq-pan: building a computational pan-genome data structure on whole genome alignment
title_fullStr	seq-seq-pan: building a computational pan-genome data structure on whole genome alignment
title_full_unstemmed	seq-seq-pan: building a computational pan-genome data structure on whole genome alignment
title_short	seq-seq-pan: building a computational pan-genome data structure on whole genome alignment
title_sort	seq-seq-pan: building a computational pan-genome data structure on whole genome alignment
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5769345/ https://www.ncbi.nlm.nih.gov/pubmed/29334898 http://dx.doi.org/10.1186/s12864-017-4401-3
work_keys_str_mv	AT jandrasitschristine seqseqpanbuildingacomputationalpangenomedatastructureonwholegenomealignment AT dabrowskipiotrw seqseqpanbuildingacomputationalpangenomedatastructureonwholegenomealignment AT fuchsstephan seqseqpanbuildingacomputationalpangenomedatastructureonwholegenomealignment AT renardbernhardy seqseqpanbuildingacomputationalpangenomedatastructureonwholegenomealignment

seq-seq-pan: building a computational pan-genome data structure on whole genome alignment

Ejemplares similares