Cargando…

SAP—A Sequence Mapping and Analyzing Program for Long Sequence Reads Alignment and Accurate Variants Discovery

The third-generation of sequencing technologies produces sequence reads of 1000 bp or more that may contain high polymorphism information. However, most currently available sequence analysis tools are developed specifically for analyzing short sequence reads. While the traditional Smith-Waterman (SW...

Descripción completa

Detalles Bibliográficos
Autores principales: Sun, Zheng, Tian, Weidong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3413671/
https://www.ncbi.nlm.nih.gov/pubmed/22880129
http://dx.doi.org/10.1371/journal.pone.0042887
_version_ 1782240096148258816
author Sun, Zheng
Tian, Weidong
author_facet Sun, Zheng
Tian, Weidong
author_sort Sun, Zheng
collection PubMed
description The third-generation of sequencing technologies produces sequence reads of 1000 bp or more that may contain high polymorphism information. However, most currently available sequence analysis tools are developed specifically for analyzing short sequence reads. While the traditional Smith-Waterman (SW) algorithm can be used to map long sequence reads, its naive implementation is computationally infeasible. We have developed a new Sequence mapping and Analyzing Program (SAP) that implements a modified version of SW to speed up the alignment process. In benchmarks with simulated and real exon sequencing data and a real E. coli genome sequence data generated by the third-generation sequencing technologies, SAP outperforms currently available tools for mapping short and long sequence reads in both speed and proportion of captured reads. In addition, it achieves high accuracy in detecting SNPs and InDels in the simulated data. SAP is available at https://github.com/davidsun/SAP.
format Online
Article
Text
id pubmed-3413671
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-34136712012-08-09 SAP—A Sequence Mapping and Analyzing Program for Long Sequence Reads Alignment and Accurate Variants Discovery Sun, Zheng Tian, Weidong PLoS One Research Article The third-generation of sequencing technologies produces sequence reads of 1000 bp or more that may contain high polymorphism information. However, most currently available sequence analysis tools are developed specifically for analyzing short sequence reads. While the traditional Smith-Waterman (SW) algorithm can be used to map long sequence reads, its naive implementation is computationally infeasible. We have developed a new Sequence mapping and Analyzing Program (SAP) that implements a modified version of SW to speed up the alignment process. In benchmarks with simulated and real exon sequencing data and a real E. coli genome sequence data generated by the third-generation sequencing technologies, SAP outperforms currently available tools for mapping short and long sequence reads in both speed and proportion of captured reads. In addition, it achieves high accuracy in detecting SNPs and InDels in the simulated data. SAP is available at https://github.com/davidsun/SAP. Public Library of Science 2012-08-07 /pmc/articles/PMC3413671/ /pubmed/22880129 http://dx.doi.org/10.1371/journal.pone.0042887 Text en © 2012 Sun, Tian http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Sun, Zheng
Tian, Weidong
SAP—A Sequence Mapping and Analyzing Program for Long Sequence Reads Alignment and Accurate Variants Discovery
title SAP—A Sequence Mapping and Analyzing Program for Long Sequence Reads Alignment and Accurate Variants Discovery
title_full SAP—A Sequence Mapping and Analyzing Program for Long Sequence Reads Alignment and Accurate Variants Discovery
title_fullStr SAP—A Sequence Mapping and Analyzing Program for Long Sequence Reads Alignment and Accurate Variants Discovery
title_full_unstemmed SAP—A Sequence Mapping and Analyzing Program for Long Sequence Reads Alignment and Accurate Variants Discovery
title_short SAP—A Sequence Mapping and Analyzing Program for Long Sequence Reads Alignment and Accurate Variants Discovery
title_sort sap—a sequence mapping and analyzing program for long sequence reads alignment and accurate variants discovery
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3413671/
https://www.ncbi.nlm.nih.gov/pubmed/22880129
http://dx.doi.org/10.1371/journal.pone.0042887
work_keys_str_mv AT sunzheng sapasequencemappingandanalyzingprogramforlongsequencereadsalignmentandaccuratevariantsdiscovery
AT tianweidong sapasequencemappingandanalyzingprogramforlongsequencereadsalignmentandaccuratevariantsdiscovery