Cargando…

Haplotype and minimum-chimerism consensus determination using short sequence data

BACKGROUND: Assembling haplotypes given sequence data derived from a single individual is a well studied problem, but only recently has haplotype assembly been considered for population-sampled data. We discuss a software tool called Hapler, which is designed specifically for low-diversity, low-cove...

Descripción completa

Detalles Bibliográficos
Autores principales: O'Neil, Shawn T, Emrich, Scott J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3394418/
https://www.ncbi.nlm.nih.gov/pubmed/22537299
http://dx.doi.org/10.1186/1471-2164-13-S2-S4
_version_ 1782237865755803648
author O'Neil, Shawn T
Emrich, Scott J
author_facet O'Neil, Shawn T
Emrich, Scott J
author_sort O'Neil, Shawn T
collection PubMed
description BACKGROUND: Assembling haplotypes given sequence data derived from a single individual is a well studied problem, but only recently has haplotype assembly been considered for population-sampled data. We discuss a software tool called Hapler, which is designed specifically for low-diversity, low-coverage data such as ecological samples derived from natural populations. Because such data may contain error as well as ambiguous haplotype information, we developed methods that increase confidence in these assemblies. Hapler also reconstructs full consensus sequences while minimizing and identifying possible chimeric points. RESULTS: Experiments on simulated data indicate that Hapler is effective at assembling haplotypes from gene-sized alignments of short reads. Further, in our tests Hapler-generated consensus sequences are less chimeric than the alternative consensus approaches of majority vote and viral quasispecies estimation regardless of error rate, read length, or population haplotype bias. CONCLUSIONS: The analysis of genetically diverse sequence data is increasingly common, particularly in the field of ecoinformatics where transcriptome sequencing of natural populations is a cost effective alternative to genome sequencing. For such studies, it is important to consider and identify haplotype diversity. Hapler provides robust haplotype information and identifies possible phasing errors in consensus sequences, providing valuable information for population studies and downstream usage of resulting assemblies.
format Online
Article
Text
id pubmed-3394418
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-33944182012-07-16 Haplotype and minimum-chimerism consensus determination using short sequence data O'Neil, Shawn T Emrich, Scott J BMC Genomics Research BACKGROUND: Assembling haplotypes given sequence data derived from a single individual is a well studied problem, but only recently has haplotype assembly been considered for population-sampled data. We discuss a software tool called Hapler, which is designed specifically for low-diversity, low-coverage data such as ecological samples derived from natural populations. Because such data may contain error as well as ambiguous haplotype information, we developed methods that increase confidence in these assemblies. Hapler also reconstructs full consensus sequences while minimizing and identifying possible chimeric points. RESULTS: Experiments on simulated data indicate that Hapler is effective at assembling haplotypes from gene-sized alignments of short reads. Further, in our tests Hapler-generated consensus sequences are less chimeric than the alternative consensus approaches of majority vote and viral quasispecies estimation regardless of error rate, read length, or population haplotype bias. CONCLUSIONS: The analysis of genetically diverse sequence data is increasingly common, particularly in the field of ecoinformatics where transcriptome sequencing of natural populations is a cost effective alternative to genome sequencing. For such studies, it is important to consider and identify haplotype diversity. Hapler provides robust haplotype information and identifies possible phasing errors in consensus sequences, providing valuable information for population studies and downstream usage of resulting assemblies. BioMed Central 2012-04-12 /pmc/articles/PMC3394418/ /pubmed/22537299 http://dx.doi.org/10.1186/1471-2164-13-S2-S4 Text en Copyright ©2012 O'Neil and Emrich; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
O'Neil, Shawn T
Emrich, Scott J
Haplotype and minimum-chimerism consensus determination using short sequence data
title Haplotype and minimum-chimerism consensus determination using short sequence data
title_full Haplotype and minimum-chimerism consensus determination using short sequence data
title_fullStr Haplotype and minimum-chimerism consensus determination using short sequence data
title_full_unstemmed Haplotype and minimum-chimerism consensus determination using short sequence data
title_short Haplotype and minimum-chimerism consensus determination using short sequence data
title_sort haplotype and minimum-chimerism consensus determination using short sequence data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3394418/
https://www.ncbi.nlm.nih.gov/pubmed/22537299
http://dx.doi.org/10.1186/1471-2164-13-S2-S4
work_keys_str_mv AT oneilshawnt haplotypeandminimumchimerismconsensusdeterminationusingshortsequencedata
AT emrichscottj haplotypeandminimumchimerismconsensusdeterminationusingshortsequencedata