Cargando…

A bioinformatics pipeline for estimating mitochondrial DNA copy number and heteroplasmy levels from whole genome sequencing data

Mitochondrial diseases are a heterogeneous group of disorders that can be caused by mutations in the nuclear or mitochondrial genome. Mitochondrial DNA (mtDNA) variants may exist in a state of heteroplasmy, where a percentage of DNA molecules harbor a variant, or homoplasmy, where all DNA molecules...

Descripción completa

Detalles Bibliográficos
Autores principales: Battle, Stephanie L, Puiu, Daniela, Verlouw, Joost, Broer, Linda, Boerwinkle, Eric, Taylor, Kent D, Rotter, Jerome I, Rich, Stephan S, Grove, Megan L, Pankratz, Nathan, Fetterman, Jessica L, Liu, Chunyu, Arking, Dan E
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9112767/
https://www.ncbi.nlm.nih.gov/pubmed/35591888
http://dx.doi.org/10.1093/nargab/lqac034
_version_ 1784709468392521728
author Battle, Stephanie L
Puiu, Daniela
Verlouw, Joost
Broer, Linda
Boerwinkle, Eric
Taylor, Kent D
Rotter, Jerome I
Rich, Stephan S
Grove, Megan L
Pankratz, Nathan
Fetterman, Jessica L
Liu, Chunyu
Arking, Dan E
author_facet Battle, Stephanie L
Puiu, Daniela
Verlouw, Joost
Broer, Linda
Boerwinkle, Eric
Taylor, Kent D
Rotter, Jerome I
Rich, Stephan S
Grove, Megan L
Pankratz, Nathan
Fetterman, Jessica L
Liu, Chunyu
Arking, Dan E
author_sort Battle, Stephanie L
collection PubMed
description Mitochondrial diseases are a heterogeneous group of disorders that can be caused by mutations in the nuclear or mitochondrial genome. Mitochondrial DNA (mtDNA) variants may exist in a state of heteroplasmy, where a percentage of DNA molecules harbor a variant, or homoplasmy, where all DNA molecules have the same variant. The relative quantity of mtDNA in a cell, or copy number (mtDNA-CN), is associated with mitochondrial function, human disease, and mortality. To facilitate accurate identification of heteroplasmy and quantify mtDNA-CN, we built a bioinformatics pipeline that takes whole genome sequencing data and outputs mitochondrial variants, and mtDNA-CN. We incorporate variant annotations to facilitate determination of variant significance. Our pipeline yields uniform coverage by remapping to a circularized chrM and by recovering reads falsely mapped to nuclear-encoded mitochondrial sequences. Notably, we construct a consensus chrM sequence for each sample and recall heteroplasmy against the sample's unique mitochondrial genome. We observe an approximately 3-fold increased association with age for heteroplasmic variants in non-homopolymer regions and, are better able to capture genetic variation in the D-loop of chrM compared to existing software. Our bioinformatics pipeline more accurately captures features of mitochondrial genetics than existing pipelines that are important in understanding how mitochondrial dysfunction contributes to disease.
format Online
Article
Text
id pubmed-9112767
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-91127672022-05-18 A bioinformatics pipeline for estimating mitochondrial DNA copy number and heteroplasmy levels from whole genome sequencing data Battle, Stephanie L Puiu, Daniela Verlouw, Joost Broer, Linda Boerwinkle, Eric Taylor, Kent D Rotter, Jerome I Rich, Stephan S Grove, Megan L Pankratz, Nathan Fetterman, Jessica L Liu, Chunyu Arking, Dan E NAR Genom Bioinform Methods Article Mitochondrial diseases are a heterogeneous group of disorders that can be caused by mutations in the nuclear or mitochondrial genome. Mitochondrial DNA (mtDNA) variants may exist in a state of heteroplasmy, where a percentage of DNA molecules harbor a variant, or homoplasmy, where all DNA molecules have the same variant. The relative quantity of mtDNA in a cell, or copy number (mtDNA-CN), is associated with mitochondrial function, human disease, and mortality. To facilitate accurate identification of heteroplasmy and quantify mtDNA-CN, we built a bioinformatics pipeline that takes whole genome sequencing data and outputs mitochondrial variants, and mtDNA-CN. We incorporate variant annotations to facilitate determination of variant significance. Our pipeline yields uniform coverage by remapping to a circularized chrM and by recovering reads falsely mapped to nuclear-encoded mitochondrial sequences. Notably, we construct a consensus chrM sequence for each sample and recall heteroplasmy against the sample's unique mitochondrial genome. We observe an approximately 3-fold increased association with age for heteroplasmic variants in non-homopolymer regions and, are better able to capture genetic variation in the D-loop of chrM compared to existing software. Our bioinformatics pipeline more accurately captures features of mitochondrial genetics than existing pipelines that are important in understanding how mitochondrial dysfunction contributes to disease. Oxford University Press 2022-05-17 /pmc/articles/PMC9112767/ /pubmed/35591888 http://dx.doi.org/10.1093/nargab/lqac034 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Article
Battle, Stephanie L
Puiu, Daniela
Verlouw, Joost
Broer, Linda
Boerwinkle, Eric
Taylor, Kent D
Rotter, Jerome I
Rich, Stephan S
Grove, Megan L
Pankratz, Nathan
Fetterman, Jessica L
Liu, Chunyu
Arking, Dan E
A bioinformatics pipeline for estimating mitochondrial DNA copy number and heteroplasmy levels from whole genome sequencing data
title A bioinformatics pipeline for estimating mitochondrial DNA copy number and heteroplasmy levels from whole genome sequencing data
title_full A bioinformatics pipeline for estimating mitochondrial DNA copy number and heteroplasmy levels from whole genome sequencing data
title_fullStr A bioinformatics pipeline for estimating mitochondrial DNA copy number and heteroplasmy levels from whole genome sequencing data
title_full_unstemmed A bioinformatics pipeline for estimating mitochondrial DNA copy number and heteroplasmy levels from whole genome sequencing data
title_short A bioinformatics pipeline for estimating mitochondrial DNA copy number and heteroplasmy levels from whole genome sequencing data
title_sort bioinformatics pipeline for estimating mitochondrial dna copy number and heteroplasmy levels from whole genome sequencing data
topic Methods Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9112767/
https://www.ncbi.nlm.nih.gov/pubmed/35591888
http://dx.doi.org/10.1093/nargab/lqac034
work_keys_str_mv AT battlestephaniel abioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata
AT puiudaniela abioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata
AT abioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata
AT verlouwjoost abioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata
AT broerlinda abioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata
AT boerwinkleeric abioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata
AT taylorkentd abioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata
AT rotterjeromei abioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata
AT richstephans abioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata
AT grovemeganl abioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata
AT pankratznathan abioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata
AT fettermanjessical abioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata
AT liuchunyu abioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata
AT arkingdane abioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata
AT battlestephaniel bioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata
AT puiudaniela bioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata
AT bioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata
AT verlouwjoost bioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata
AT broerlinda bioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata
AT boerwinkleeric bioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata
AT taylorkentd bioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata
AT rotterjeromei bioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata
AT richstephans bioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata
AT grovemeganl bioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata
AT pankratznathan bioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata
AT fettermanjessical bioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata
AT liuchunyu bioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata
AT arkingdane bioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata