Cargando…

scanPAV: a pipeline for extracting presence–absence variations in genome pairs

MOTIVATION: The recent technological advances in genome sequencing techniques have resulted in an exponential increase in the number of sequenced human and non-human genomes. The ever increasing number of assemblies generated by novel de novo pipelines and strategies demands the development of new s...

Descripción completa

Detalles Bibliográficos
Autores principales: Giordano, Francesca, Stammnitz, Maximilian R, Murchison, Elizabeth P, Ning, Zemin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6129304/
https://www.ncbi.nlm.nih.gov/pubmed/29608694
http://dx.doi.org/10.1093/bioinformatics/bty189
_version_ 1783353777983586304
author Giordano, Francesca
Stammnitz, Maximilian R
Murchison, Elizabeth P
Ning, Zemin
author_facet Giordano, Francesca
Stammnitz, Maximilian R
Murchison, Elizabeth P
Ning, Zemin
author_sort Giordano, Francesca
collection PubMed
description MOTIVATION: The recent technological advances in genome sequencing techniques have resulted in an exponential increase in the number of sequenced human and non-human genomes. The ever increasing number of assemblies generated by novel de novo pipelines and strategies demands the development of new software to evaluate assembly quality and completeness. One way to determine the completeness of an assembly is by detecting its Presence–Absence variations (PAV) with respect to a reference, where PAVs between two assemblies are defined as the sequences present in one assembly but entirely missing in the other one. Beyond assembly error or technology bias, PAVs can also reveal real genome polymorphism, consequence of species or individual evolution, or horizontal transfer from viruses and bacteria. RESULTS: We present scanPAV, a pipeline for pairwise assembly comparison to identify and extract sequences present in one assembly but not the other. In this note, we use the GRCh38 reference assembly to assess the completeness of six human genome assemblies from various assembly strategies and sequencing technologies including Illumina short reads, 10× genomics linked-reads, PacBio and Oxford Nanopore long reads, and Bionano optical maps. We also discuss the PAV polymorphism of seven Tasmanian devil whole genome assemblies of normal animal tissues and devil facial tumour 1 (DFT1) and 2 (DFT2) samples, and the identification of bacterial sequences as contamination in some of the tumorous assemblies. AVAILABILITY AND IMPLEMENTATION: The pipeline is available under the MIT License at https://github.com/wtsi-hpag/scanPAV. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6129304
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-61293042018-09-12 scanPAV: a pipeline for extracting presence–absence variations in genome pairs Giordano, Francesca Stammnitz, Maximilian R Murchison, Elizabeth P Ning, Zemin Bioinformatics Applications Notes MOTIVATION: The recent technological advances in genome sequencing techniques have resulted in an exponential increase in the number of sequenced human and non-human genomes. The ever increasing number of assemblies generated by novel de novo pipelines and strategies demands the development of new software to evaluate assembly quality and completeness. One way to determine the completeness of an assembly is by detecting its Presence–Absence variations (PAV) with respect to a reference, where PAVs between two assemblies are defined as the sequences present in one assembly but entirely missing in the other one. Beyond assembly error or technology bias, PAVs can also reveal real genome polymorphism, consequence of species or individual evolution, or horizontal transfer from viruses and bacteria. RESULTS: We present scanPAV, a pipeline for pairwise assembly comparison to identify and extract sequences present in one assembly but not the other. In this note, we use the GRCh38 reference assembly to assess the completeness of six human genome assemblies from various assembly strategies and sequencing technologies including Illumina short reads, 10× genomics linked-reads, PacBio and Oxford Nanopore long reads, and Bionano optical maps. We also discuss the PAV polymorphism of seven Tasmanian devil whole genome assemblies of normal animal tissues and devil facial tumour 1 (DFT1) and 2 (DFT2) samples, and the identification of bacterial sequences as contamination in some of the tumorous assemblies. AVAILABILITY AND IMPLEMENTATION: The pipeline is available under the MIT License at https://github.com/wtsi-hpag/scanPAV. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-09-01 2018-03-28 /pmc/articles/PMC6129304/ /pubmed/29608694 http://dx.doi.org/10.1093/bioinformatics/bty189 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Notes
Giordano, Francesca
Stammnitz, Maximilian R
Murchison, Elizabeth P
Ning, Zemin
scanPAV: a pipeline for extracting presence–absence variations in genome pairs
title scanPAV: a pipeline for extracting presence–absence variations in genome pairs
title_full scanPAV: a pipeline for extracting presence–absence variations in genome pairs
title_fullStr scanPAV: a pipeline for extracting presence–absence variations in genome pairs
title_full_unstemmed scanPAV: a pipeline for extracting presence–absence variations in genome pairs
title_short scanPAV: a pipeline for extracting presence–absence variations in genome pairs
title_sort scanpav: a pipeline for extracting presence–absence variations in genome pairs
topic Applications Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6129304/
https://www.ncbi.nlm.nih.gov/pubmed/29608694
http://dx.doi.org/10.1093/bioinformatics/bty189
work_keys_str_mv AT giordanofrancesca scanpavapipelineforextractingpresenceabsencevariationsingenomepairs
AT stammnitzmaximilianr scanpavapipelineforextractingpresenceabsencevariationsingenomepairs
AT murchisonelizabethp scanpavapipelineforextractingpresenceabsencevariationsingenomepairs
AT ningzemin scanpavapipelineforextractingpresenceabsencevariationsingenomepairs