Cargando…
scanPAV: a pipeline for extracting presence–absence variations in genome pairs
MOTIVATION: The recent technological advances in genome sequencing techniques have resulted in an exponential increase in the number of sequenced human and non-human genomes. The ever increasing number of assemblies generated by novel de novo pipelines and strategies demands the development of new s...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6129304/ https://www.ncbi.nlm.nih.gov/pubmed/29608694 http://dx.doi.org/10.1093/bioinformatics/bty189 |
_version_ | 1783353777983586304 |
---|---|
author | Giordano, Francesca Stammnitz, Maximilian R Murchison, Elizabeth P Ning, Zemin |
author_facet | Giordano, Francesca Stammnitz, Maximilian R Murchison, Elizabeth P Ning, Zemin |
author_sort | Giordano, Francesca |
collection | PubMed |
description | MOTIVATION: The recent technological advances in genome sequencing techniques have resulted in an exponential increase in the number of sequenced human and non-human genomes. The ever increasing number of assemblies generated by novel de novo pipelines and strategies demands the development of new software to evaluate assembly quality and completeness. One way to determine the completeness of an assembly is by detecting its Presence–Absence variations (PAV) with respect to a reference, where PAVs between two assemblies are defined as the sequences present in one assembly but entirely missing in the other one. Beyond assembly error or technology bias, PAVs can also reveal real genome polymorphism, consequence of species or individual evolution, or horizontal transfer from viruses and bacteria. RESULTS: We present scanPAV, a pipeline for pairwise assembly comparison to identify and extract sequences present in one assembly but not the other. In this note, we use the GRCh38 reference assembly to assess the completeness of six human genome assemblies from various assembly strategies and sequencing technologies including Illumina short reads, 10× genomics linked-reads, PacBio and Oxford Nanopore long reads, and Bionano optical maps. We also discuss the PAV polymorphism of seven Tasmanian devil whole genome assemblies of normal animal tissues and devil facial tumour 1 (DFT1) and 2 (DFT2) samples, and the identification of bacterial sequences as contamination in some of the tumorous assemblies. AVAILABILITY AND IMPLEMENTATION: The pipeline is available under the MIT License at https://github.com/wtsi-hpag/scanPAV. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-6129304 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-61293042018-09-12 scanPAV: a pipeline for extracting presence–absence variations in genome pairs Giordano, Francesca Stammnitz, Maximilian R Murchison, Elizabeth P Ning, Zemin Bioinformatics Applications Notes MOTIVATION: The recent technological advances in genome sequencing techniques have resulted in an exponential increase in the number of sequenced human and non-human genomes. The ever increasing number of assemblies generated by novel de novo pipelines and strategies demands the development of new software to evaluate assembly quality and completeness. One way to determine the completeness of an assembly is by detecting its Presence–Absence variations (PAV) with respect to a reference, where PAVs between two assemblies are defined as the sequences present in one assembly but entirely missing in the other one. Beyond assembly error or technology bias, PAVs can also reveal real genome polymorphism, consequence of species or individual evolution, or horizontal transfer from viruses and bacteria. RESULTS: We present scanPAV, a pipeline for pairwise assembly comparison to identify and extract sequences present in one assembly but not the other. In this note, we use the GRCh38 reference assembly to assess the completeness of six human genome assemblies from various assembly strategies and sequencing technologies including Illumina short reads, 10× genomics linked-reads, PacBio and Oxford Nanopore long reads, and Bionano optical maps. We also discuss the PAV polymorphism of seven Tasmanian devil whole genome assemblies of normal animal tissues and devil facial tumour 1 (DFT1) and 2 (DFT2) samples, and the identification of bacterial sequences as contamination in some of the tumorous assemblies. AVAILABILITY AND IMPLEMENTATION: The pipeline is available under the MIT License at https://github.com/wtsi-hpag/scanPAV. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-09-01 2018-03-28 /pmc/articles/PMC6129304/ /pubmed/29608694 http://dx.doi.org/10.1093/bioinformatics/bty189 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Applications Notes Giordano, Francesca Stammnitz, Maximilian R Murchison, Elizabeth P Ning, Zemin scanPAV: a pipeline for extracting presence–absence variations in genome pairs |
title | scanPAV: a pipeline for extracting presence–absence variations in genome pairs |
title_full | scanPAV: a pipeline for extracting presence–absence variations in genome pairs |
title_fullStr | scanPAV: a pipeline for extracting presence–absence variations in genome pairs |
title_full_unstemmed | scanPAV: a pipeline for extracting presence–absence variations in genome pairs |
title_short | scanPAV: a pipeline for extracting presence–absence variations in genome pairs |
title_sort | scanpav: a pipeline for extracting presence–absence variations in genome pairs |
topic | Applications Notes |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6129304/ https://www.ncbi.nlm.nih.gov/pubmed/29608694 http://dx.doi.org/10.1093/bioinformatics/bty189 |
work_keys_str_mv | AT giordanofrancesca scanpavapipelineforextractingpresenceabsencevariationsingenomepairs AT stammnitzmaximilianr scanpavapipelineforextractingpresenceabsencevariationsingenomepairs AT murchisonelizabethp scanpavapipelineforextractingpresenceabsencevariationsingenomepairs AT ningzemin scanpavapipelineforextractingpresenceabsencevariationsingenomepairs |