Cargando…
A computational method for genotype calling in family-based sequencing data
BACKGROUND: As sequencing technologies can help researchers detect common and rare variants across the human genome in many individuals, it is known that jointly calling genotypes across multiple individuals based on linkage disequilibrium (LD) can facilitate the analysis of low to modest coverage s...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4715317/ https://www.ncbi.nlm.nih.gov/pubmed/26772743 http://dx.doi.org/10.1186/s12859-016-0880-5 |
_version_ | 1782410450991841280 |
---|---|
author | Chang, Lun-Ching Li, Bingshan Fang, Zhou Vrieze, Scott McGue, Matt Iacono, William G. Tseng, George C. Chen, Wei |
author_facet | Chang, Lun-Ching Li, Bingshan Fang, Zhou Vrieze, Scott McGue, Matt Iacono, William G. Tseng, George C. Chen, Wei |
author_sort | Chang, Lun-Ching |
collection | PubMed |
description | BACKGROUND: As sequencing technologies can help researchers detect common and rare variants across the human genome in many individuals, it is known that jointly calling genotypes across multiple individuals based on linkage disequilibrium (LD) can facilitate the analysis of low to modest coverage sequence data. However, genotype-calling methods for family-based sequence data, particularly for complex families beyond parent-offspring trios, are still lacking. RESULTS: In this study, first, we proposed an algorithm that considers both linkage disequilibrium (LD) patterns and familial transmission in nuclear and multi-generational families while retaining the computational efficiency. Second, we extended our method to incorporate external reference panels to analyze family-based sequence data with a small sample size. In simulation studies, we show that modeling multiple offspring can dramatically increase genotype calling accuracy and reduce phasing and Mendelian errors, especially at low to modest coverage. In addition, we show that using external panels can greatly facilitate genotype calling of sequencing data with a small number of individuals. We applied our method to a whole genome sequencing study of 1339 individuals at ~10X coverage from the Minnesota Center for Twin and Family Research. CONCLUSIONS: The aggregated results show that our methods significantly outperform existing ones that ignore family constraints or LD information. We anticipate that our method will be useful for many ongoing family-based sequencing projects. We have implemented our methods efficiently in a C++ program FamLDCaller, which is available from http://www.pitt.edu/~wec47/famldcaller.html. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-0880-5) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4715317 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-47153172016-01-17 A computational method for genotype calling in family-based sequencing data Chang, Lun-Ching Li, Bingshan Fang, Zhou Vrieze, Scott McGue, Matt Iacono, William G. Tseng, George C. Chen, Wei BMC Bioinformatics Methodology Article BACKGROUND: As sequencing technologies can help researchers detect common and rare variants across the human genome in many individuals, it is known that jointly calling genotypes across multiple individuals based on linkage disequilibrium (LD) can facilitate the analysis of low to modest coverage sequence data. However, genotype-calling methods for family-based sequence data, particularly for complex families beyond parent-offspring trios, are still lacking. RESULTS: In this study, first, we proposed an algorithm that considers both linkage disequilibrium (LD) patterns and familial transmission in nuclear and multi-generational families while retaining the computational efficiency. Second, we extended our method to incorporate external reference panels to analyze family-based sequence data with a small sample size. In simulation studies, we show that modeling multiple offspring can dramatically increase genotype calling accuracy and reduce phasing and Mendelian errors, especially at low to modest coverage. In addition, we show that using external panels can greatly facilitate genotype calling of sequencing data with a small number of individuals. We applied our method to a whole genome sequencing study of 1339 individuals at ~10X coverage from the Minnesota Center for Twin and Family Research. CONCLUSIONS: The aggregated results show that our methods significantly outperform existing ones that ignore family constraints or LD information. We anticipate that our method will be useful for many ongoing family-based sequencing projects. We have implemented our methods efficiently in a C++ program FamLDCaller, which is available from http://www.pitt.edu/~wec47/famldcaller.html. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-0880-5) contains supplementary material, which is available to authorized users. BioMed Central 2016-01-16 /pmc/articles/PMC4715317/ /pubmed/26772743 http://dx.doi.org/10.1186/s12859-016-0880-5 Text en © Chang et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Chang, Lun-Ching Li, Bingshan Fang, Zhou Vrieze, Scott McGue, Matt Iacono, William G. Tseng, George C. Chen, Wei A computational method for genotype calling in family-based sequencing data |
title | A computational method for genotype calling in family-based sequencing data |
title_full | A computational method for genotype calling in family-based sequencing data |
title_fullStr | A computational method for genotype calling in family-based sequencing data |
title_full_unstemmed | A computational method for genotype calling in family-based sequencing data |
title_short | A computational method for genotype calling in family-based sequencing data |
title_sort | computational method for genotype calling in family-based sequencing data |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4715317/ https://www.ncbi.nlm.nih.gov/pubmed/26772743 http://dx.doi.org/10.1186/s12859-016-0880-5 |
work_keys_str_mv | AT changlunching acomputationalmethodforgenotypecallinginfamilybasedsequencingdata AT libingshan acomputationalmethodforgenotypecallinginfamilybasedsequencingdata AT fangzhou acomputationalmethodforgenotypecallinginfamilybasedsequencingdata AT vriezescott acomputationalmethodforgenotypecallinginfamilybasedsequencingdata AT mcguematt acomputationalmethodforgenotypecallinginfamilybasedsequencingdata AT iaconowilliamg acomputationalmethodforgenotypecallinginfamilybasedsequencingdata AT tsenggeorgec acomputationalmethodforgenotypecallinginfamilybasedsequencingdata AT chenwei acomputationalmethodforgenotypecallinginfamilybasedsequencingdata AT changlunching computationalmethodforgenotypecallinginfamilybasedsequencingdata AT libingshan computationalmethodforgenotypecallinginfamilybasedsequencingdata AT fangzhou computationalmethodforgenotypecallinginfamilybasedsequencingdata AT vriezescott computationalmethodforgenotypecallinginfamilybasedsequencingdata AT mcguematt computationalmethodforgenotypecallinginfamilybasedsequencingdata AT iaconowilliamg computationalmethodforgenotypecallinginfamilybasedsequencingdata AT tsenggeorgec computationalmethodforgenotypecallinginfamilybasedsequencingdata AT chenwei computationalmethodforgenotypecallinginfamilybasedsequencingdata |