Cargando…
A novel multi-alignment pipeline for high-throughput sequencing data
Mapping reads to a reference sequence is a common step when analyzing allele effects in high-throughput sequencing data. The choice of reference is critical because its effect on quantitative sequence analysis is non-negligible. Recent studies suggest aligning to a single standard reference sequence...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4062837/ https://www.ncbi.nlm.nih.gov/pubmed/24948510 http://dx.doi.org/10.1093/database/bau057 |
_version_ | 1782321697776467968 |
---|---|
author | Huang, Shunping Holt, James Kao, Chia-Yu McMillan, Leonard Wang, Wei |
author_facet | Huang, Shunping Holt, James Kao, Chia-Yu McMillan, Leonard Wang, Wei |
author_sort | Huang, Shunping |
collection | PubMed |
description | Mapping reads to a reference sequence is a common step when analyzing allele effects in high-throughput sequencing data. The choice of reference is critical because its effect on quantitative sequence analysis is non-negligible. Recent studies suggest aligning to a single standard reference sequence, as is common practice, can lead to an underlying bias depending on the genetic distances of the target sequences from the reference. To avoid this bias, researchers have resorted to using modified reference sequences. Even with this improvement, various limitations and problems remain unsolved, which include reduced mapping ratios, shifts in read mappings and the selection of which variants to include to remove biases. To address these issues, we propose a novel and generic multi-alignment pipeline. Our pipeline integrates the genomic variations from known or suspected founders into separate reference sequences and performs alignments to each one. By mapping reads to multiple reference sequences and merging them afterward, we are able to rescue more reads and diminish the bias caused by using a single common reference. Moreover, the genomic origin of each read is determined and annotated during the merging process, providing a better source of information to assess differential expression than simple allele queries at known variant positions. Using RNA-seq of a diallel cross, we compare our pipeline with the single-reference pipeline and demonstrate our advantages of more aligned reads and a higher percentage of reads with assigned origins. Database URL: http://csbio.unc.edu/CCstatus/index.py?run=Pseudo |
format | Online Article Text |
id | pubmed-4062837 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-40628372014-06-23 A novel multi-alignment pipeline for high-throughput sequencing data Huang, Shunping Holt, James Kao, Chia-Yu McMillan, Leonard Wang, Wei Database (Oxford) Original Article Mapping reads to a reference sequence is a common step when analyzing allele effects in high-throughput sequencing data. The choice of reference is critical because its effect on quantitative sequence analysis is non-negligible. Recent studies suggest aligning to a single standard reference sequence, as is common practice, can lead to an underlying bias depending on the genetic distances of the target sequences from the reference. To avoid this bias, researchers have resorted to using modified reference sequences. Even with this improvement, various limitations and problems remain unsolved, which include reduced mapping ratios, shifts in read mappings and the selection of which variants to include to remove biases. To address these issues, we propose a novel and generic multi-alignment pipeline. Our pipeline integrates the genomic variations from known or suspected founders into separate reference sequences and performs alignments to each one. By mapping reads to multiple reference sequences and merging them afterward, we are able to rescue more reads and diminish the bias caused by using a single common reference. Moreover, the genomic origin of each read is determined and annotated during the merging process, providing a better source of information to assess differential expression than simple allele queries at known variant positions. Using RNA-seq of a diallel cross, we compare our pipeline with the single-reference pipeline and demonstrate our advantages of more aligned reads and a higher percentage of reads with assigned origins. Database URL: http://csbio.unc.edu/CCstatus/index.py?run=Pseudo Oxford University Press 2014-06-19 /pmc/articles/PMC4062837/ /pubmed/24948510 http://dx.doi.org/10.1093/database/bau057 Text en © The Author(s) 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Article Huang, Shunping Holt, James Kao, Chia-Yu McMillan, Leonard Wang, Wei A novel multi-alignment pipeline for high-throughput sequencing data |
title | A novel multi-alignment pipeline for high-throughput sequencing data |
title_full | A novel multi-alignment pipeline for high-throughput sequencing data |
title_fullStr | A novel multi-alignment pipeline for high-throughput sequencing data |
title_full_unstemmed | A novel multi-alignment pipeline for high-throughput sequencing data |
title_short | A novel multi-alignment pipeline for high-throughput sequencing data |
title_sort | novel multi-alignment pipeline for high-throughput sequencing data |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4062837/ https://www.ncbi.nlm.nih.gov/pubmed/24948510 http://dx.doi.org/10.1093/database/bau057 |
work_keys_str_mv | AT huangshunping anovelmultialignmentpipelineforhighthroughputsequencingdata AT holtjames anovelmultialignmentpipelineforhighthroughputsequencingdata AT kaochiayu anovelmultialignmentpipelineforhighthroughputsequencingdata AT mcmillanleonard anovelmultialignmentpipelineforhighthroughputsequencingdata AT wangwei anovelmultialignmentpipelineforhighthroughputsequencingdata AT huangshunping novelmultialignmentpipelineforhighthroughputsequencingdata AT holtjames novelmultialignmentpipelineforhighthroughputsequencingdata AT kaochiayu novelmultialignmentpipelineforhighthroughputsequencingdata AT mcmillanleonard novelmultialignmentpipelineforhighthroughputsequencingdata AT wangwei novelmultialignmentpipelineforhighthroughputsequencingdata |