Cargando…

How do alignment programs perform on sequencing data with varying qualities and from repetitive regions?

BACKGROUND: Next-generation sequencing technologies generate a significant number of short reads that are utilized to address a variety of biological questions. However, quite often, sequencing reads tend to have low quality at the 3’ end and are generated from the repetitive regions of a genome. It...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yu, Xiaoqing, Guda, Kishore, Willis, Joseph, Veigl, Martina, Wang, Zhenghe, Markowitz, Sanford, Adams, Mark D, Sun, Shuying
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3414812/ https://www.ncbi.nlm.nih.gov/pubmed/22709551 http://dx.doi.org/10.1186/1756-0381-5-6

_version_	1782240262324486144
author	Yu, Xiaoqing Guda, Kishore Willis, Joseph Veigl, Martina Wang, Zhenghe Markowitz, Sanford Adams, Mark D Sun, Shuying
author_facet	Yu, Xiaoqing Guda, Kishore Willis, Joseph Veigl, Martina Wang, Zhenghe Markowitz, Sanford Adams, Mark D Sun, Shuying
author_sort	Yu, Xiaoqing
collection	PubMed
description	BACKGROUND: Next-generation sequencing technologies generate a significant number of short reads that are utilized to address a variety of biological questions. However, quite often, sequencing reads tend to have low quality at the 3’ end and are generated from the repetitive regions of a genome. It is unclear how different alignment programs perform under these different cases. In order to investigate this question, we use both real data and simulated data with the above issues to evaluate the performance of four commonly used algorithms: SOAP2, Bowtie, BWA, and Novoalign. METHODS: The performance of different alignment algorithms are measured in terms of concordance between any pair of aligners (for real sequencing data without known truth) and the accuracy of simulated read alignment. RESULTS: Our results show that, for sequencing data with reads that have relatively good quality or that have had low quality bases trimmed off, all four alignment programs perform similarly. We have also demonstrated that trimming off low quality ends markedly increases the number of aligned reads and improves the consistency among different aligners as well, especially for low quality data. However, Novoalign is more sensitive to the improvement of data quality. Trimming off low quality ends significantly increases the concordance between Novoalign and other aligners. As for aligning reads from repetitive regions, our simulation data show that reads from repetitive regions tend to be aligned incorrectly, and suppressing reads with multiple hits can improve alignment accuracy. CONCLUSIONS: This study provides a systematic comparison of commonly used alignment algorithms in the context of sequencing data with varying qualities and from repetitive regions. Our approach can be applied to different sequencing data sets generated from different platforms. It can also be utilized to study the performance of other alignment programs.
format	Online Article Text
id	pubmed-3414812
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-34148122012-08-10 How do alignment programs perform on sequencing data with varying qualities and from repetitive regions? Yu, Xiaoqing Guda, Kishore Willis, Joseph Veigl, Martina Wang, Zhenghe Markowitz, Sanford Adams, Mark D Sun, Shuying BioData Min Research BACKGROUND: Next-generation sequencing technologies generate a significant number of short reads that are utilized to address a variety of biological questions. However, quite often, sequencing reads tend to have low quality at the 3’ end and are generated from the repetitive regions of a genome. It is unclear how different alignment programs perform under these different cases. In order to investigate this question, we use both real data and simulated data with the above issues to evaluate the performance of four commonly used algorithms: SOAP2, Bowtie, BWA, and Novoalign. METHODS: The performance of different alignment algorithms are measured in terms of concordance between any pair of aligners (for real sequencing data without known truth) and the accuracy of simulated read alignment. RESULTS: Our results show that, for sequencing data with reads that have relatively good quality or that have had low quality bases trimmed off, all four alignment programs perform similarly. We have also demonstrated that trimming off low quality ends markedly increases the number of aligned reads and improves the consistency among different aligners as well, especially for low quality data. However, Novoalign is more sensitive to the improvement of data quality. Trimming off low quality ends significantly increases the concordance between Novoalign and other aligners. As for aligning reads from repetitive regions, our simulation data show that reads from repetitive regions tend to be aligned incorrectly, and suppressing reads with multiple hits can improve alignment accuracy. CONCLUSIONS: This study provides a systematic comparison of commonly used alignment algorithms in the context of sequencing data with varying qualities and from repetitive regions. Our approach can be applied to different sequencing data sets generated from different platforms. It can also be utilized to study the performance of other alignment programs. BioMed Central 2012-06-18 /pmc/articles/PMC3414812/ /pubmed/22709551 http://dx.doi.org/10.1186/1756-0381-5-6 Text en Copyright ©2012 Yu et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Yu, Xiaoqing Guda, Kishore Willis, Joseph Veigl, Martina Wang, Zhenghe Markowitz, Sanford Adams, Mark D Sun, Shuying How do alignment programs perform on sequencing data with varying qualities and from repetitive regions?
title	How do alignment programs perform on sequencing data with varying qualities and from repetitive regions?
title_full	How do alignment programs perform on sequencing data with varying qualities and from repetitive regions?
title_fullStr	How do alignment programs perform on sequencing data with varying qualities and from repetitive regions?
title_full_unstemmed	How do alignment programs perform on sequencing data with varying qualities and from repetitive regions?
title_short	How do alignment programs perform on sequencing data with varying qualities and from repetitive regions?
title_sort	how do alignment programs perform on sequencing data with varying qualities and from repetitive regions?
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3414812/ https://www.ncbi.nlm.nih.gov/pubmed/22709551 http://dx.doi.org/10.1186/1756-0381-5-6
work_keys_str_mv	AT yuxiaoqing howdoalignmentprogramsperformonsequencingdatawithvaryingqualitiesandfromrepetitiveregions AT gudakishore howdoalignmentprogramsperformonsequencingdatawithvaryingqualitiesandfromrepetitiveregions AT willisjoseph howdoalignmentprogramsperformonsequencingdatawithvaryingqualitiesandfromrepetitiveregions AT veiglmartina howdoalignmentprogramsperformonsequencingdatawithvaryingqualitiesandfromrepetitiveregions AT wangzhenghe howdoalignmentprogramsperformonsequencingdatawithvaryingqualitiesandfromrepetitiveregions AT markowitzsanford howdoalignmentprogramsperformonsequencingdatawithvaryingqualitiesandfromrepetitiveregions AT adamsmarkd howdoalignmentprogramsperformonsequencingdatawithvaryingqualitiesandfromrepetitiveregions AT sunshuying howdoalignmentprogramsperformonsequencingdatawithvaryingqualitiesandfromrepetitiveregions

How do alignment programs perform on sequencing data with varying qualities and from repetitive regions?

Ejemplares similares