Cargando…

A novel multi-alignment pipeline for high-throughput sequencing data

Mapping reads to a reference sequence is a common step when analyzing allele effects in high-throughput sequencing data. The choice of reference is critical because its effect on quantitative sequence analysis is non-negligible. Recent studies suggest aligning to a single standard reference sequence...

Descripción completa

Detalles Bibliográficos
Autores principales:	Huang, Shunping, Holt, James, Kao, Chia-Yu, McMillan, Leonard, Wang, Wei
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2014
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4062837/ https://www.ncbi.nlm.nih.gov/pubmed/24948510 http://dx.doi.org/10.1093/database/bau057

_version_	1782321697776467968
author	Huang, Shunping Holt, James Kao, Chia-Yu McMillan, Leonard Wang, Wei
author_facet	Huang, Shunping Holt, James Kao, Chia-Yu McMillan, Leonard Wang, Wei
author_sort	Huang, Shunping
collection	PubMed
description	Mapping reads to a reference sequence is a common step when analyzing allele effects in high-throughput sequencing data. The choice of reference is critical because its effect on quantitative sequence analysis is non-negligible. Recent studies suggest aligning to a single standard reference sequence, as is common practice, can lead to an underlying bias depending on the genetic distances of the target sequences from the reference. To avoid this bias, researchers have resorted to using modified reference sequences. Even with this improvement, various limitations and problems remain unsolved, which include reduced mapping ratios, shifts in read mappings and the selection of which variants to include to remove biases. To address these issues, we propose a novel and generic multi-alignment pipeline. Our pipeline integrates the genomic variations from known or suspected founders into separate reference sequences and performs alignments to each one. By mapping reads to multiple reference sequences and merging them afterward, we are able to rescue more reads and diminish the bias caused by using a single common reference. Moreover, the genomic origin of each read is determined and annotated during the merging process, providing a better source of information to assess differential expression than simple allele queries at known variant positions. Using RNA-seq of a diallel cross, we compare our pipeline with the single-reference pipeline and demonstrate our advantages of more aligned reads and a higher percentage of reads with assigned origins. Database URL: http://csbio.unc.edu/CCstatus/index.py?run=Pseudo
format	Online Article Text
id	pubmed-4062837
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-40628372014-06-23 A novel multi-alignment pipeline for high-throughput sequencing data Huang, Shunping Holt, James Kao, Chia-Yu McMillan, Leonard Wang, Wei Database (Oxford) Original Article Mapping reads to a reference sequence is a common step when analyzing allele effects in high-throughput sequencing data. The choice of reference is critical because its effect on quantitative sequence analysis is non-negligible. Recent studies suggest aligning to a single standard reference sequence, as is common practice, can lead to an underlying bias depending on the genetic distances of the target sequences from the reference. To avoid this bias, researchers have resorted to using modified reference sequences. Even with this improvement, various limitations and problems remain unsolved, which include reduced mapping ratios, shifts in read mappings and the selection of which variants to include to remove biases. To address these issues, we propose a novel and generic multi-alignment pipeline. Our pipeline integrates the genomic variations from known or suspected founders into separate reference sequences and performs alignments to each one. By mapping reads to multiple reference sequences and merging them afterward, we are able to rescue more reads and diminish the bias caused by using a single common reference. Moreover, the genomic origin of each read is determined and annotated during the merging process, providing a better source of information to assess differential expression than simple allele queries at known variant positions. Using RNA-seq of a diallel cross, we compare our pipeline with the single-reference pipeline and demonstrate our advantages of more aligned reads and a higher percentage of reads with assigned origins. Database URL: http://csbio.unc.edu/CCstatus/index.py?run=Pseudo Oxford University Press 2014-06-19 /pmc/articles/PMC4062837/ /pubmed/24948510 http://dx.doi.org/10.1093/database/bau057 Text en © The Author(s) 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Article Huang, Shunping Holt, James Kao, Chia-Yu McMillan, Leonard Wang, Wei A novel multi-alignment pipeline for high-throughput sequencing data
title	A novel multi-alignment pipeline for high-throughput sequencing data
title_full	A novel multi-alignment pipeline for high-throughput sequencing data
title_fullStr	A novel multi-alignment pipeline for high-throughput sequencing data
title_full_unstemmed	A novel multi-alignment pipeline for high-throughput sequencing data
title_short	A novel multi-alignment pipeline for high-throughput sequencing data
title_sort	novel multi-alignment pipeline for high-throughput sequencing data
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4062837/ https://www.ncbi.nlm.nih.gov/pubmed/24948510 http://dx.doi.org/10.1093/database/bau057
work_keys_str_mv	AT huangshunping anovelmultialignmentpipelineforhighthroughputsequencingdata AT holtjames anovelmultialignmentpipelineforhighthroughputsequencingdata AT kaochiayu anovelmultialignmentpipelineforhighthroughputsequencingdata AT mcmillanleonard anovelmultialignmentpipelineforhighthroughputsequencingdata AT wangwei anovelmultialignmentpipelineforhighthroughputsequencingdata AT huangshunping novelmultialignmentpipelineforhighthroughputsequencingdata AT holtjames novelmultialignmentpipelineforhighthroughputsequencingdata AT kaochiayu novelmultialignmentpipelineforhighthroughputsequencingdata AT mcmillanleonard novelmultialignmentpipelineforhighthroughputsequencingdata AT wangwei novelmultialignmentpipelineforhighthroughputsequencingdata

A novel multi-alignment pipeline for high-throughput sequencing data

Ejemplares similares