Cargando…

A novel multi-alignment pipeline for high-throughput sequencing data

Mapping reads to a reference sequence is a common step when analyzing allele effects in high-throughput sequencing data. The choice of reference is critical because its effect on quantitative sequence analysis is non-negligible. Recent studies suggest aligning to a single standard reference sequence...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Shunping, Holt, James, Kao, Chia-Yu, McMillan, Leonard, Wang, Wei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4062837/
https://www.ncbi.nlm.nih.gov/pubmed/24948510
http://dx.doi.org/10.1093/database/bau057
_version_ 1782321697776467968
author Huang, Shunping
Holt, James
Kao, Chia-Yu
McMillan, Leonard
Wang, Wei
author_facet Huang, Shunping
Holt, James
Kao, Chia-Yu
McMillan, Leonard
Wang, Wei
author_sort Huang, Shunping
collection PubMed
description Mapping reads to a reference sequence is a common step when analyzing allele effects in high-throughput sequencing data. The choice of reference is critical because its effect on quantitative sequence analysis is non-negligible. Recent studies suggest aligning to a single standard reference sequence, as is common practice, can lead to an underlying bias depending on the genetic distances of the target sequences from the reference. To avoid this bias, researchers have resorted to using modified reference sequences. Even with this improvement, various limitations and problems remain unsolved, which include reduced mapping ratios, shifts in read mappings and the selection of which variants to include to remove biases. To address these issues, we propose a novel and generic multi-alignment pipeline. Our pipeline integrates the genomic variations from known or suspected founders into separate reference sequences and performs alignments to each one. By mapping reads to multiple reference sequences and merging them afterward, we are able to rescue more reads and diminish the bias caused by using a single common reference. Moreover, the genomic origin of each read is determined and annotated during the merging process, providing a better source of information to assess differential expression than simple allele queries at known variant positions. Using RNA-seq of a diallel cross, we compare our pipeline with the single-reference pipeline and demonstrate our advantages of more aligned reads and a higher percentage of reads with assigned origins. Database URL: http://csbio.unc.edu/CCstatus/index.py?run=Pseudo
format Online
Article
Text
id pubmed-4062837
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-40628372014-06-23 A novel multi-alignment pipeline for high-throughput sequencing data Huang, Shunping Holt, James Kao, Chia-Yu McMillan, Leonard Wang, Wei Database (Oxford) Original Article Mapping reads to a reference sequence is a common step when analyzing allele effects in high-throughput sequencing data. The choice of reference is critical because its effect on quantitative sequence analysis is non-negligible. Recent studies suggest aligning to a single standard reference sequence, as is common practice, can lead to an underlying bias depending on the genetic distances of the target sequences from the reference. To avoid this bias, researchers have resorted to using modified reference sequences. Even with this improvement, various limitations and problems remain unsolved, which include reduced mapping ratios, shifts in read mappings and the selection of which variants to include to remove biases. To address these issues, we propose a novel and generic multi-alignment pipeline. Our pipeline integrates the genomic variations from known or suspected founders into separate reference sequences and performs alignments to each one. By mapping reads to multiple reference sequences and merging them afterward, we are able to rescue more reads and diminish the bias caused by using a single common reference. Moreover, the genomic origin of each read is determined and annotated during the merging process, providing a better source of information to assess differential expression than simple allele queries at known variant positions. Using RNA-seq of a diallel cross, we compare our pipeline with the single-reference pipeline and demonstrate our advantages of more aligned reads and a higher percentage of reads with assigned origins. Database URL: http://csbio.unc.edu/CCstatus/index.py?run=Pseudo Oxford University Press 2014-06-19 /pmc/articles/PMC4062837/ /pubmed/24948510 http://dx.doi.org/10.1093/database/bau057 Text en © The Author(s) 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Huang, Shunping
Holt, James
Kao, Chia-Yu
McMillan, Leonard
Wang, Wei
A novel multi-alignment pipeline for high-throughput sequencing data
title A novel multi-alignment pipeline for high-throughput sequencing data
title_full A novel multi-alignment pipeline for high-throughput sequencing data
title_fullStr A novel multi-alignment pipeline for high-throughput sequencing data
title_full_unstemmed A novel multi-alignment pipeline for high-throughput sequencing data
title_short A novel multi-alignment pipeline for high-throughput sequencing data
title_sort novel multi-alignment pipeline for high-throughput sequencing data
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4062837/
https://www.ncbi.nlm.nih.gov/pubmed/24948510
http://dx.doi.org/10.1093/database/bau057
work_keys_str_mv AT huangshunping anovelmultialignmentpipelineforhighthroughputsequencingdata
AT holtjames anovelmultialignmentpipelineforhighthroughputsequencingdata
AT kaochiayu anovelmultialignmentpipelineforhighthroughputsequencingdata
AT mcmillanleonard anovelmultialignmentpipelineforhighthroughputsequencingdata
AT wangwei anovelmultialignmentpipelineforhighthroughputsequencingdata
AT huangshunping novelmultialignmentpipelineforhighthroughputsequencingdata
AT holtjames novelmultialignmentpipelineforhighthroughputsequencingdata
AT kaochiayu novelmultialignmentpipelineforhighthroughputsequencingdata
AT mcmillanleonard novelmultialignmentpipelineforhighthroughputsequencingdata
AT wangwei novelmultialignmentpipelineforhighthroughputsequencingdata