Cargando…

A recurrence-based approach for validating structural variation using long-read sequencing technology

Although numerous algorithms have been developed to identify structural variations (SVs) in genomic sequences, there is a dearth of approaches that can be used to evaluate their results. This is significant as the accurate identification of structural variation is still an outstanding but important...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Xuefang, Weber, Alexandra M., Mills, Ryan E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737365/
https://www.ncbi.nlm.nih.gov/pubmed/28873962
http://dx.doi.org/10.1093/gigascience/gix061
_version_ 1783287504886038528
author Zhao, Xuefang
Weber, Alexandra M.
Mills, Ryan E.
author_facet Zhao, Xuefang
Weber, Alexandra M.
Mills, Ryan E.
author_sort Zhao, Xuefang
collection PubMed
description Although numerous algorithms have been developed to identify structural variations (SVs) in genomic sequences, there is a dearth of approaches that can be used to evaluate their results. This is significant as the accurate identification of structural variation is still an outstanding but important problem in genomics. The emergence of new sequencing technologies that generate longer sequence reads can, in theory, provide direct evidence for all types of SVs regardless of the length of the region through which it spans. However, current efforts to use these data in this manner require the use of large computational resources to assemble these sequences as well as visual inspection of each region. Here we present VaPoR, a highly efficient algorithm that autonomously validates large SV sets using long-read sequencing data. We assessed the performance of VaPoR on SVs in both simulated and real genomes and report a high-fidelity rate for overall accuracy across different levels of sequence depths. We show that VaPoR can interrogate a much larger range of SVs while still matching existing methods in terms of false positive validations and providing additional features considering breakpoint precision and predicted genotype. We further show that VaPoR can run quickly and efficiency without requiring a large processing or assembly pipeline. VaPoR provides a long read–based validation approach for genomic SVs that requires relatively low read depth and computing resources and thus will provide utility with targeted or low-pass sequencing coverage for accurate SV assessment. The VaPoR Software is available at: https://github.com/mills-lab/vapor.
format Online
Article
Text
id pubmed-5737365
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-57373652018-01-08 A recurrence-based approach for validating structural variation using long-read sequencing technology Zhao, Xuefang Weber, Alexandra M. Mills, Ryan E. Gigascience Technical Note Although numerous algorithms have been developed to identify structural variations (SVs) in genomic sequences, there is a dearth of approaches that can be used to evaluate their results. This is significant as the accurate identification of structural variation is still an outstanding but important problem in genomics. The emergence of new sequencing technologies that generate longer sequence reads can, in theory, provide direct evidence for all types of SVs regardless of the length of the region through which it spans. However, current efforts to use these data in this manner require the use of large computational resources to assemble these sequences as well as visual inspection of each region. Here we present VaPoR, a highly efficient algorithm that autonomously validates large SV sets using long-read sequencing data. We assessed the performance of VaPoR on SVs in both simulated and real genomes and report a high-fidelity rate for overall accuracy across different levels of sequence depths. We show that VaPoR can interrogate a much larger range of SVs while still matching existing methods in terms of false positive validations and providing additional features considering breakpoint precision and predicted genotype. We further show that VaPoR can run quickly and efficiency without requiring a large processing or assembly pipeline. VaPoR provides a long read–based validation approach for genomic SVs that requires relatively low read depth and computing resources and thus will provide utility with targeted or low-pass sequencing coverage for accurate SV assessment. The VaPoR Software is available at: https://github.com/mills-lab/vapor. Oxford University Press 2017-07-19 /pmc/articles/PMC5737365/ /pubmed/28873962 http://dx.doi.org/10.1093/gigascience/gix061 Text en © The Authors 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Zhao, Xuefang
Weber, Alexandra M.
Mills, Ryan E.
A recurrence-based approach for validating structural variation using long-read sequencing technology
title A recurrence-based approach for validating structural variation using long-read sequencing technology
title_full A recurrence-based approach for validating structural variation using long-read sequencing technology
title_fullStr A recurrence-based approach for validating structural variation using long-read sequencing technology
title_full_unstemmed A recurrence-based approach for validating structural variation using long-read sequencing technology
title_short A recurrence-based approach for validating structural variation using long-read sequencing technology
title_sort recurrence-based approach for validating structural variation using long-read sequencing technology
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737365/
https://www.ncbi.nlm.nih.gov/pubmed/28873962
http://dx.doi.org/10.1093/gigascience/gix061
work_keys_str_mv AT zhaoxuefang arecurrencebasedapproachforvalidatingstructuralvariationusinglongreadsequencingtechnology
AT weberalexandram arecurrencebasedapproachforvalidatingstructuralvariationusinglongreadsequencingtechnology
AT millsryane arecurrencebasedapproachforvalidatingstructuralvariationusinglongreadsequencingtechnology
AT zhaoxuefang recurrencebasedapproachforvalidatingstructuralvariationusinglongreadsequencingtechnology
AT weberalexandram recurrencebasedapproachforvalidatingstructuralvariationusinglongreadsequencingtechnology
AT millsryane recurrencebasedapproachforvalidatingstructuralvariationusinglongreadsequencingtechnology