Cargando…

trioPhaser: using Mendelian inheritance logic to improve genomic phasing of trios

BACKGROUND: When analyzing DNA sequence data of an individual, knowing which nucleotide was inherited from each parent can be beneficial when trying to identify certain types of DNA variants. Mendelian inheritance logic can be used to accurately phase (haplotype) the majority (67–83%) of an individu...

Descripción completa

Detalles Bibliográficos
Autores principales: Miller, Dustin B., Piccolo, Stephen R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8607709/
https://www.ncbi.nlm.nih.gov/pubmed/34809557
http://dx.doi.org/10.1186/s12859-021-04470-4
_version_ 1784602615518068736
author Miller, Dustin B.
Piccolo, Stephen R.
author_facet Miller, Dustin B.
Piccolo, Stephen R.
author_sort Miller, Dustin B.
collection PubMed
description BACKGROUND: When analyzing DNA sequence data of an individual, knowing which nucleotide was inherited from each parent can be beneficial when trying to identify certain types of DNA variants. Mendelian inheritance logic can be used to accurately phase (haplotype) the majority (67–83%) of an individual's heterozygous nucleotide positions when genotypes are available for both parents (trio). However, when all members of a trio are heterozygous at a position, Mendelian inheritance logic cannot be used to phase. For such positions, a computational phasing algorithm can be used. Existing phasing algorithms use a haplotype reference panel, sequencing reads, and/or parental genotypes to phase an individual; however, they are limited in that they can only phase certain types of variants, require a specific genotype build, require large amounts of storage capacity, and/or require long run times. We created trioPhaser to address these challenges. RESULTS: trioPhaser uses gVCF files from an individual and their parents as initial input, and then outputs a phased VCF file. Input trio data are first phased using Mendelian inheritance logic. Then, the positions that cannot be phased using inheritance information alone are phased by the SHAPEIT4 phasing algorithm. Using whole-genome sequencing data of 52 trios, we show that trioPhaser, on average, increases the total number of phased positions by 21.0% and 10.5%, respectively, when compared to the number of positions that SHAPEIT4 or Mendelian inheritance logic can phase when either is used alone. In addition, we show that the accuracy of the phased calls output by trioPhaser are similar to linked-read and read-backed phasing. CONCLUSION: trioPhaser is a containerized software tool that uses both Mendelian inheritance logic and SHAPEIT4 to phase trios when gVCF files are available. By implementing both phasing methods, more variant positions are phased compared to what either method is able to phase alone.
format Online
Article
Text
id pubmed-8607709
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-86077092021-11-22 trioPhaser: using Mendelian inheritance logic to improve genomic phasing of trios Miller, Dustin B. Piccolo, Stephen R. BMC Bioinformatics Software BACKGROUND: When analyzing DNA sequence data of an individual, knowing which nucleotide was inherited from each parent can be beneficial when trying to identify certain types of DNA variants. Mendelian inheritance logic can be used to accurately phase (haplotype) the majority (67–83%) of an individual's heterozygous nucleotide positions when genotypes are available for both parents (trio). However, when all members of a trio are heterozygous at a position, Mendelian inheritance logic cannot be used to phase. For such positions, a computational phasing algorithm can be used. Existing phasing algorithms use a haplotype reference panel, sequencing reads, and/or parental genotypes to phase an individual; however, they are limited in that they can only phase certain types of variants, require a specific genotype build, require large amounts of storage capacity, and/or require long run times. We created trioPhaser to address these challenges. RESULTS: trioPhaser uses gVCF files from an individual and their parents as initial input, and then outputs a phased VCF file. Input trio data are first phased using Mendelian inheritance logic. Then, the positions that cannot be phased using inheritance information alone are phased by the SHAPEIT4 phasing algorithm. Using whole-genome sequencing data of 52 trios, we show that trioPhaser, on average, increases the total number of phased positions by 21.0% and 10.5%, respectively, when compared to the number of positions that SHAPEIT4 or Mendelian inheritance logic can phase when either is used alone. In addition, we show that the accuracy of the phased calls output by trioPhaser are similar to linked-read and read-backed phasing. CONCLUSION: trioPhaser is a containerized software tool that uses both Mendelian inheritance logic and SHAPEIT4 to phase trios when gVCF files are available. By implementing both phasing methods, more variant positions are phased compared to what either method is able to phase alone. BioMed Central 2021-11-22 /pmc/articles/PMC8607709/ /pubmed/34809557 http://dx.doi.org/10.1186/s12859-021-04470-4 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Miller, Dustin B.
Piccolo, Stephen R.
trioPhaser: using Mendelian inheritance logic to improve genomic phasing of trios
title trioPhaser: using Mendelian inheritance logic to improve genomic phasing of trios
title_full trioPhaser: using Mendelian inheritance logic to improve genomic phasing of trios
title_fullStr trioPhaser: using Mendelian inheritance logic to improve genomic phasing of trios
title_full_unstemmed trioPhaser: using Mendelian inheritance logic to improve genomic phasing of trios
title_short trioPhaser: using Mendelian inheritance logic to improve genomic phasing of trios
title_sort triophaser: using mendelian inheritance logic to improve genomic phasing of trios
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8607709/
https://www.ncbi.nlm.nih.gov/pubmed/34809557
http://dx.doi.org/10.1186/s12859-021-04470-4
work_keys_str_mv AT millerdustinb triophaserusingmendelianinheritancelogictoimprovegenomicphasingoftrios
AT piccolostephenr triophaserusingmendelianinheritancelogictoimprovegenomicphasingoftrios