Cargando…
A bioinformatics pipeline for estimating mitochondrial DNA copy number and heteroplasmy levels from whole genome sequencing data
Mitochondrial diseases are a heterogeneous group of disorders that can be caused by mutations in the nuclear or mitochondrial genome. Mitochondrial DNA (mtDNA) variants may exist in a state of heteroplasmy, where a percentage of DNA molecules harbor a variant, or homoplasmy, where all DNA molecules...
Autores principales: | , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9112767/ https://www.ncbi.nlm.nih.gov/pubmed/35591888 http://dx.doi.org/10.1093/nargab/lqac034 |
_version_ | 1784709468392521728 |
---|---|
author | Battle, Stephanie L Puiu, Daniela Verlouw, Joost Broer, Linda Boerwinkle, Eric Taylor, Kent D Rotter, Jerome I Rich, Stephan S Grove, Megan L Pankratz, Nathan Fetterman, Jessica L Liu, Chunyu Arking, Dan E |
author_facet | Battle, Stephanie L Puiu, Daniela Verlouw, Joost Broer, Linda Boerwinkle, Eric Taylor, Kent D Rotter, Jerome I Rich, Stephan S Grove, Megan L Pankratz, Nathan Fetterman, Jessica L Liu, Chunyu Arking, Dan E |
author_sort | Battle, Stephanie L |
collection | PubMed |
description | Mitochondrial diseases are a heterogeneous group of disorders that can be caused by mutations in the nuclear or mitochondrial genome. Mitochondrial DNA (mtDNA) variants may exist in a state of heteroplasmy, where a percentage of DNA molecules harbor a variant, or homoplasmy, where all DNA molecules have the same variant. The relative quantity of mtDNA in a cell, or copy number (mtDNA-CN), is associated with mitochondrial function, human disease, and mortality. To facilitate accurate identification of heteroplasmy and quantify mtDNA-CN, we built a bioinformatics pipeline that takes whole genome sequencing data and outputs mitochondrial variants, and mtDNA-CN. We incorporate variant annotations to facilitate determination of variant significance. Our pipeline yields uniform coverage by remapping to a circularized chrM and by recovering reads falsely mapped to nuclear-encoded mitochondrial sequences. Notably, we construct a consensus chrM sequence for each sample and recall heteroplasmy against the sample's unique mitochondrial genome. We observe an approximately 3-fold increased association with age for heteroplasmic variants in non-homopolymer regions and, are better able to capture genetic variation in the D-loop of chrM compared to existing software. Our bioinformatics pipeline more accurately captures features of mitochondrial genetics than existing pipelines that are important in understanding how mitochondrial dysfunction contributes to disease. |
format | Online Article Text |
id | pubmed-9112767 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-91127672022-05-18 A bioinformatics pipeline for estimating mitochondrial DNA copy number and heteroplasmy levels from whole genome sequencing data Battle, Stephanie L Puiu, Daniela Verlouw, Joost Broer, Linda Boerwinkle, Eric Taylor, Kent D Rotter, Jerome I Rich, Stephan S Grove, Megan L Pankratz, Nathan Fetterman, Jessica L Liu, Chunyu Arking, Dan E NAR Genom Bioinform Methods Article Mitochondrial diseases are a heterogeneous group of disorders that can be caused by mutations in the nuclear or mitochondrial genome. Mitochondrial DNA (mtDNA) variants may exist in a state of heteroplasmy, where a percentage of DNA molecules harbor a variant, or homoplasmy, where all DNA molecules have the same variant. The relative quantity of mtDNA in a cell, or copy number (mtDNA-CN), is associated with mitochondrial function, human disease, and mortality. To facilitate accurate identification of heteroplasmy and quantify mtDNA-CN, we built a bioinformatics pipeline that takes whole genome sequencing data and outputs mitochondrial variants, and mtDNA-CN. We incorporate variant annotations to facilitate determination of variant significance. Our pipeline yields uniform coverage by remapping to a circularized chrM and by recovering reads falsely mapped to nuclear-encoded mitochondrial sequences. Notably, we construct a consensus chrM sequence for each sample and recall heteroplasmy against the sample's unique mitochondrial genome. We observe an approximately 3-fold increased association with age for heteroplasmic variants in non-homopolymer regions and, are better able to capture genetic variation in the D-loop of chrM compared to existing software. Our bioinformatics pipeline more accurately captures features of mitochondrial genetics than existing pipelines that are important in understanding how mitochondrial dysfunction contributes to disease. Oxford University Press 2022-05-17 /pmc/articles/PMC9112767/ /pubmed/35591888 http://dx.doi.org/10.1093/nargab/lqac034 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Article Battle, Stephanie L Puiu, Daniela Verlouw, Joost Broer, Linda Boerwinkle, Eric Taylor, Kent D Rotter, Jerome I Rich, Stephan S Grove, Megan L Pankratz, Nathan Fetterman, Jessica L Liu, Chunyu Arking, Dan E A bioinformatics pipeline for estimating mitochondrial DNA copy number and heteroplasmy levels from whole genome sequencing data |
title | A bioinformatics pipeline for estimating mitochondrial DNA copy number and heteroplasmy levels from whole genome sequencing data |
title_full | A bioinformatics pipeline for estimating mitochondrial DNA copy number and heteroplasmy levels from whole genome sequencing data |
title_fullStr | A bioinformatics pipeline for estimating mitochondrial DNA copy number and heteroplasmy levels from whole genome sequencing data |
title_full_unstemmed | A bioinformatics pipeline for estimating mitochondrial DNA copy number and heteroplasmy levels from whole genome sequencing data |
title_short | A bioinformatics pipeline for estimating mitochondrial DNA copy number and heteroplasmy levels from whole genome sequencing data |
title_sort | bioinformatics pipeline for estimating mitochondrial dna copy number and heteroplasmy levels from whole genome sequencing data |
topic | Methods Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9112767/ https://www.ncbi.nlm.nih.gov/pubmed/35591888 http://dx.doi.org/10.1093/nargab/lqac034 |
work_keys_str_mv | AT battlestephaniel abioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata AT puiudaniela abioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata AT abioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata AT verlouwjoost abioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata AT broerlinda abioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata AT boerwinkleeric abioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata AT taylorkentd abioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata AT rotterjeromei abioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata AT richstephans abioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata AT grovemeganl abioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata AT pankratznathan abioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata AT fettermanjessical abioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata AT liuchunyu abioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata AT arkingdane abioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata AT battlestephaniel bioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata AT puiudaniela bioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata AT bioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata AT verlouwjoost bioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata AT broerlinda bioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata AT boerwinkleeric bioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata AT taylorkentd bioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata AT rotterjeromei bioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata AT richstephans bioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata AT grovemeganl bioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata AT pankratznathan bioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata AT fettermanjessical bioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata AT liuchunyu bioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata AT arkingdane bioinformaticspipelineforestimatingmitochondrialdnacopynumberandheteroplasmylevelsfromwholegenomesequencingdata |