Cargando…

CompoundHetVIP: Compound Heterozygous Variant Identification Pipeline

Compound Heterozygous ( CH) variant identification requires distinguishing maternally from paternally derived nucleotides, a process that requires numerous computational tools. Using such tools often introduces unforeseen challenges such as installation procedures that are operating-system specific,...

Descripción completa

Detalles Bibliográficos
Autores principales: Miller, Dustin B., Piccolo, Stephen R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000 Research Limited 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7905494/
https://www.ncbi.nlm.nih.gov/pubmed/33680433
http://dx.doi.org/10.12688/f1000research.26848.2
_version_ 1783655122220351488
author Miller, Dustin B.
Piccolo, Stephen R.
author_facet Miller, Dustin B.
Piccolo, Stephen R.
author_sort Miller, Dustin B.
collection PubMed
description Compound Heterozygous ( CH) variant identification requires distinguishing maternally from paternally derived nucleotides, a process that requires numerous computational tools. Using such tools often introduces unforeseen challenges such as installation procedures that are operating-system specific, software dependencies that must be installed, and formatting requirements for input files. To overcome these challenges, we developed Compound Heterozygous Variant Identification Pipeline (CompoundHetVIP), which uses a single Docker image to encapsulate commonly used software tools for file aggregation ( BCFtools or GATK4), VCF liftover ( Picard Tools), joint-genotyping ( GATK4), file conversion ( Plink2), phasing ( SHAPEIT2, Beagle, and/or Eagle2), variant normalization ( vt tools), annotation ( SnpEff), relational database generation ( GEMINI), and identification of CH, homozygous alternate, and de novo variants in a series of 13 steps. To begin using our tool, researchers need only install the Docker engine and download the CompoundHetVIP Docker image. The tools provided in CompoundHetVIP, subject to the limitations of the underlying software, can be applied to whole-genome, whole-exome, or targeted exome sequencing data of individual samples or trios (a child and both parents), using VCF or gVCF files as initial input. Each step of the pipeline produces an analysis-ready output file that can be further evaluated. To illustrate its use, we applied CompoundHetVIP to data from a publicly available Ashkenazim trio and identified two genes with a candidate CH variant and two genes with a candidate homozygous alternate variant after filtering based on user-set thresholds for global minor allele frequency, Combined Annotation Dependent Depletion, and Gene Damage Index. While this example uses genomic data from a healthy child, we anticipate that most researchers will use CompoundHetVIP to uncover missing heritability in human diseases and other phenotypes. CompoundHetVIP is open-source software and can be found at https://github.com/dmiller903/CompoundHetVIP; this repository also provides detailed, step-by-step examples.
format Online
Article
Text
id pubmed-7905494
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher F1000 Research Limited
record_format MEDLINE/PubMed
spelling pubmed-79054942021-03-04 CompoundHetVIP: Compound Heterozygous Variant Identification Pipeline Miller, Dustin B. Piccolo, Stephen R. F1000Res Software Tool Article Compound Heterozygous ( CH) variant identification requires distinguishing maternally from paternally derived nucleotides, a process that requires numerous computational tools. Using such tools often introduces unforeseen challenges such as installation procedures that are operating-system specific, software dependencies that must be installed, and formatting requirements for input files. To overcome these challenges, we developed Compound Heterozygous Variant Identification Pipeline (CompoundHetVIP), which uses a single Docker image to encapsulate commonly used software tools for file aggregation ( BCFtools or GATK4), VCF liftover ( Picard Tools), joint-genotyping ( GATK4), file conversion ( Plink2), phasing ( SHAPEIT2, Beagle, and/or Eagle2), variant normalization ( vt tools), annotation ( SnpEff), relational database generation ( GEMINI), and identification of CH, homozygous alternate, and de novo variants in a series of 13 steps. To begin using our tool, researchers need only install the Docker engine and download the CompoundHetVIP Docker image. The tools provided in CompoundHetVIP, subject to the limitations of the underlying software, can be applied to whole-genome, whole-exome, or targeted exome sequencing data of individual samples or trios (a child and both parents), using VCF or gVCF files as initial input. Each step of the pipeline produces an analysis-ready output file that can be further evaluated. To illustrate its use, we applied CompoundHetVIP to data from a publicly available Ashkenazim trio and identified two genes with a candidate CH variant and two genes with a candidate homozygous alternate variant after filtering based on user-set thresholds for global minor allele frequency, Combined Annotation Dependent Depletion, and Gene Damage Index. While this example uses genomic data from a healthy child, we anticipate that most researchers will use CompoundHetVIP to uncover missing heritability in human diseases and other phenotypes. CompoundHetVIP is open-source software and can be found at https://github.com/dmiller903/CompoundHetVIP; this repository also provides detailed, step-by-step examples. F1000 Research Limited 2021-02-10 /pmc/articles/PMC7905494/ /pubmed/33680433 http://dx.doi.org/10.12688/f1000research.26848.2 Text en Copyright: © 2021 Miller DB and Piccolo SR http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software Tool Article
Miller, Dustin B.
Piccolo, Stephen R.
CompoundHetVIP: Compound Heterozygous Variant Identification Pipeline
title CompoundHetVIP: Compound Heterozygous Variant Identification Pipeline
title_full CompoundHetVIP: Compound Heterozygous Variant Identification Pipeline
title_fullStr CompoundHetVIP: Compound Heterozygous Variant Identification Pipeline
title_full_unstemmed CompoundHetVIP: Compound Heterozygous Variant Identification Pipeline
title_short CompoundHetVIP: Compound Heterozygous Variant Identification Pipeline
title_sort compoundhetvip: compound heterozygous variant identification pipeline
topic Software Tool Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7905494/
https://www.ncbi.nlm.nih.gov/pubmed/33680433
http://dx.doi.org/10.12688/f1000research.26848.2
work_keys_str_mv AT millerdustinb compoundhetvipcompoundheterozygousvariantidentificationpipeline
AT piccolostephenr compoundhetvipcompoundheterozygousvariantidentificationpipeline