Cargando…

PBHoney: identifying genomic variants via long-read discordance and interrupted mapping

BACKGROUND: As resequencing projects become more prevalent across a larger number of species, accurate variant identification will further elucidate the nature of genetic diversity and become increasingly relevant in genomic studies. However, the identification of larger genomic variants via DNA seq...

Descripción completa

Detalles Bibliográficos
Autores principales: English, Adam C, Salerno, William J, Reid, Jeffrey G
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4082283/
https://www.ncbi.nlm.nih.gov/pubmed/24915764
http://dx.doi.org/10.1186/1471-2105-15-180
_version_ 1782324237915127808
author English, Adam C
Salerno, William J
Reid, Jeffrey G
author_facet English, Adam C
Salerno, William J
Reid, Jeffrey G
author_sort English, Adam C
collection PubMed
description BACKGROUND: As resequencing projects become more prevalent across a larger number of species, accurate variant identification will further elucidate the nature of genetic diversity and become increasingly relevant in genomic studies. However, the identification of larger genomic variants via DNA sequencing is limited by both the incomplete information provided by sequencing reads and the nature of the genome itself. Long-read sequencing technologies provide high-resolution access to structural variants often inaccessible to shorter reads. RESULTS: We present PBHoney, software that considers both intra-read discordance and soft-clipped tails of long reads (>10,000 bp) to identify structural variants. As a proof of concept, we identify four structural variants and two genomic features in a strain of Escherichia coli with PBHoney and validate them via de novo assembly. PBHoney is available for download at http://sourceforge.net/projects/pb-jelly/. CONCLUSIONS: Implementing two variant-identification approaches that exploit the high mappability of long reads, PBHoney is demonstrated as being effective at detecting larger structural variants using whole-genome Pacific Biosciences RS II Continuous Long Reads. Furthermore, PBHoney is able to discover two genomic features: the existence of Rac-Phage in isolate; evidence of E. coli’s circular genome.
format Online
Article
Text
id pubmed-4082283
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40822832014-07-18 PBHoney: identifying genomic variants via long-read discordance and interrupted mapping English, Adam C Salerno, William J Reid, Jeffrey G BMC Bioinformatics Software BACKGROUND: As resequencing projects become more prevalent across a larger number of species, accurate variant identification will further elucidate the nature of genetic diversity and become increasingly relevant in genomic studies. However, the identification of larger genomic variants via DNA sequencing is limited by both the incomplete information provided by sequencing reads and the nature of the genome itself. Long-read sequencing technologies provide high-resolution access to structural variants often inaccessible to shorter reads. RESULTS: We present PBHoney, software that considers both intra-read discordance and soft-clipped tails of long reads (>10,000 bp) to identify structural variants. As a proof of concept, we identify four structural variants and two genomic features in a strain of Escherichia coli with PBHoney and validate them via de novo assembly. PBHoney is available for download at http://sourceforge.net/projects/pb-jelly/. CONCLUSIONS: Implementing two variant-identification approaches that exploit the high mappability of long reads, PBHoney is demonstrated as being effective at detecting larger structural variants using whole-genome Pacific Biosciences RS II Continuous Long Reads. Furthermore, PBHoney is able to discover two genomic features: the existence of Rac-Phage in isolate; evidence of E. coli’s circular genome. BioMed Central 2014-06-10 /pmc/articles/PMC4082283/ /pubmed/24915764 http://dx.doi.org/10.1186/1471-2105-15-180 Text en Copyright © 2014 English et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
English, Adam C
Salerno, William J
Reid, Jeffrey G
PBHoney: identifying genomic variants via long-read discordance and interrupted mapping
title PBHoney: identifying genomic variants via long-read discordance and interrupted mapping
title_full PBHoney: identifying genomic variants via long-read discordance and interrupted mapping
title_fullStr PBHoney: identifying genomic variants via long-read discordance and interrupted mapping
title_full_unstemmed PBHoney: identifying genomic variants via long-read discordance and interrupted mapping
title_short PBHoney: identifying genomic variants via long-read discordance and interrupted mapping
title_sort pbhoney: identifying genomic variants via long-read discordance and interrupted mapping
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4082283/
https://www.ncbi.nlm.nih.gov/pubmed/24915764
http://dx.doi.org/10.1186/1471-2105-15-180
work_keys_str_mv AT englishadamc pbhoneyidentifyinggenomicvariantsvialongreaddiscordanceandinterruptedmapping
AT salernowilliamj pbhoneyidentifyinggenomicvariantsvialongreaddiscordanceandinterruptedmapping
AT reidjeffreyg pbhoneyidentifyinggenomicvariantsvialongreaddiscordanceandinterruptedmapping