Cargando…

A computational method for genotype calling in family-based sequencing data

BACKGROUND: As sequencing technologies can help researchers detect common and rare variants across the human genome in many individuals, it is known that jointly calling genotypes across multiple individuals based on linkage disequilibrium (LD) can facilitate the analysis of low to modest coverage s...

Descripción completa

Detalles Bibliográficos
Autores principales: Chang, Lun-Ching, Li, Bingshan, Fang, Zhou, Vrieze, Scott, McGue, Matt, Iacono, William G., Tseng, George C., Chen, Wei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4715317/
https://www.ncbi.nlm.nih.gov/pubmed/26772743
http://dx.doi.org/10.1186/s12859-016-0880-5
_version_ 1782410450991841280
author Chang, Lun-Ching
Li, Bingshan
Fang, Zhou
Vrieze, Scott
McGue, Matt
Iacono, William G.
Tseng, George C.
Chen, Wei
author_facet Chang, Lun-Ching
Li, Bingshan
Fang, Zhou
Vrieze, Scott
McGue, Matt
Iacono, William G.
Tseng, George C.
Chen, Wei
author_sort Chang, Lun-Ching
collection PubMed
description BACKGROUND: As sequencing technologies can help researchers detect common and rare variants across the human genome in many individuals, it is known that jointly calling genotypes across multiple individuals based on linkage disequilibrium (LD) can facilitate the analysis of low to modest coverage sequence data. However, genotype-calling methods for family-based sequence data, particularly for complex families beyond parent-offspring trios, are still lacking. RESULTS: In this study, first, we proposed an algorithm that considers both linkage disequilibrium (LD) patterns and familial transmission in nuclear and multi-generational families while retaining the computational efficiency. Second, we extended our method to incorporate external reference panels to analyze family-based sequence data with a small sample size. In simulation studies, we show that modeling multiple offspring can dramatically increase genotype calling accuracy and reduce phasing and Mendelian errors, especially at low to modest coverage. In addition, we show that using external panels can greatly facilitate genotype calling of sequencing data with a small number of individuals. We applied our method to a whole genome sequencing study of 1339 individuals at ~10X coverage from the Minnesota Center for Twin and Family Research. CONCLUSIONS: The aggregated results show that our methods significantly outperform existing ones that ignore family constraints or LD information. We anticipate that our method will be useful for many ongoing family-based sequencing projects. We have implemented our methods efficiently in a C++ program FamLDCaller, which is available from http://www.pitt.edu/~wec47/famldcaller.html. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-0880-5) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4715317
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-47153172016-01-17 A computational method for genotype calling in family-based sequencing data Chang, Lun-Ching Li, Bingshan Fang, Zhou Vrieze, Scott McGue, Matt Iacono, William G. Tseng, George C. Chen, Wei BMC Bioinformatics Methodology Article BACKGROUND: As sequencing technologies can help researchers detect common and rare variants across the human genome in many individuals, it is known that jointly calling genotypes across multiple individuals based on linkage disequilibrium (LD) can facilitate the analysis of low to modest coverage sequence data. However, genotype-calling methods for family-based sequence data, particularly for complex families beyond parent-offspring trios, are still lacking. RESULTS: In this study, first, we proposed an algorithm that considers both linkage disequilibrium (LD) patterns and familial transmission in nuclear and multi-generational families while retaining the computational efficiency. Second, we extended our method to incorporate external reference panels to analyze family-based sequence data with a small sample size. In simulation studies, we show that modeling multiple offspring can dramatically increase genotype calling accuracy and reduce phasing and Mendelian errors, especially at low to modest coverage. In addition, we show that using external panels can greatly facilitate genotype calling of sequencing data with a small number of individuals. We applied our method to a whole genome sequencing study of 1339 individuals at ~10X coverage from the Minnesota Center for Twin and Family Research. CONCLUSIONS: The aggregated results show that our methods significantly outperform existing ones that ignore family constraints or LD information. We anticipate that our method will be useful for many ongoing family-based sequencing projects. We have implemented our methods efficiently in a C++ program FamLDCaller, which is available from http://www.pitt.edu/~wec47/famldcaller.html. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-0880-5) contains supplementary material, which is available to authorized users. BioMed Central 2016-01-16 /pmc/articles/PMC4715317/ /pubmed/26772743 http://dx.doi.org/10.1186/s12859-016-0880-5 Text en © Chang et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Chang, Lun-Ching
Li, Bingshan
Fang, Zhou
Vrieze, Scott
McGue, Matt
Iacono, William G.
Tseng, George C.
Chen, Wei
A computational method for genotype calling in family-based sequencing data
title A computational method for genotype calling in family-based sequencing data
title_full A computational method for genotype calling in family-based sequencing data
title_fullStr A computational method for genotype calling in family-based sequencing data
title_full_unstemmed A computational method for genotype calling in family-based sequencing data
title_short A computational method for genotype calling in family-based sequencing data
title_sort computational method for genotype calling in family-based sequencing data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4715317/
https://www.ncbi.nlm.nih.gov/pubmed/26772743
http://dx.doi.org/10.1186/s12859-016-0880-5
work_keys_str_mv AT changlunching acomputationalmethodforgenotypecallinginfamilybasedsequencingdata
AT libingshan acomputationalmethodforgenotypecallinginfamilybasedsequencingdata
AT fangzhou acomputationalmethodforgenotypecallinginfamilybasedsequencingdata
AT vriezescott acomputationalmethodforgenotypecallinginfamilybasedsequencingdata
AT mcguematt acomputationalmethodforgenotypecallinginfamilybasedsequencingdata
AT iaconowilliamg acomputationalmethodforgenotypecallinginfamilybasedsequencingdata
AT tsenggeorgec acomputationalmethodforgenotypecallinginfamilybasedsequencingdata
AT chenwei acomputationalmethodforgenotypecallinginfamilybasedsequencingdata
AT changlunching computationalmethodforgenotypecallinginfamilybasedsequencingdata
AT libingshan computationalmethodforgenotypecallinginfamilybasedsequencingdata
AT fangzhou computationalmethodforgenotypecallinginfamilybasedsequencingdata
AT vriezescott computationalmethodforgenotypecallinginfamilybasedsequencingdata
AT mcguematt computationalmethodforgenotypecallinginfamilybasedsequencingdata
AT iaconowilliamg computationalmethodforgenotypecallinginfamilybasedsequencingdata
AT tsenggeorgec computationalmethodforgenotypecallinginfamilybasedsequencingdata
AT chenwei computationalmethodforgenotypecallinginfamilybasedsequencingdata