Cargando…

Extraction and annotation of human mitochondrial genomes from 1000 Genomes Whole Exome Sequencing data

BACKGROUND: Whole Exome Sequencing (WES) is one of the most used and cost-effective next generation technologies that allows sequencing of all nuclear exons. Off-target regions may be captured if they present high sequence similarity with baits. Bioinformatics tools have been optimized to retrieve a...

Descripción completa

Detalles Bibliográficos
Autores principales: Diroma, Maria Angela, Calabrese, Claudia, Simone, Domenico, Santorsola, Mariangela, Calabrese, Francesco Maria, Gasparre, Giuseppe, Attimonelli, Marcella
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4083402/
https://www.ncbi.nlm.nih.gov/pubmed/25077682
http://dx.doi.org/10.1186/1471-2164-15-S3-S2
_version_ 1782324372905656320
author Diroma, Maria Angela
Calabrese, Claudia
Simone, Domenico
Santorsola, Mariangela
Calabrese, Francesco Maria
Gasparre, Giuseppe
Attimonelli, Marcella
author_facet Diroma, Maria Angela
Calabrese, Claudia
Simone, Domenico
Santorsola, Mariangela
Calabrese, Francesco Maria
Gasparre, Giuseppe
Attimonelli, Marcella
author_sort Diroma, Maria Angela
collection PubMed
description BACKGROUND: Whole Exome Sequencing (WES) is one of the most used and cost-effective next generation technologies that allows sequencing of all nuclear exons. Off-target regions may be captured if they present high sequence similarity with baits. Bioinformatics tools have been optimized to retrieve a large amount of WES off-target mitochondrial DNA (mtDNA), by exploiting the aspecificity of probes, partially overlapping to Nuclear mitochondrial Sequences (NumtS). The 1000 Genomes project represents one of the widest resources to extract mtDNA sequences from WES data, considering the large effort the scientific community is undertaking to reconstruct human population history using mtDNA as marker, and the involvement of mtDNA in pathology. RESULTS: A previously published pipeline aimed at assembling mitochondrial genomes from off-target WES reads and further improved to detect insertions and deletions (indels) and heteroplasmy in a dataset of 1242 samples from the 1000 Genomes project, enabled to obtain a nearly complete mitochondrial genome from 943 samples (76% analyzed exomes). The robustness of our computational strategy was highlighted by the reduction of reads amount recognized as mitochondrial in the original annotation produced by the Consortium, due to NumtS filtering. An accurate survey was carried out on 1242 individuals. 215 indels, mostly heteroplasmic, and 3407 single base variants were mapped. A homogeneous mismatches distribution was observed along the whole mitochondrial genome, while a lower frequency of indels was found within protein-coding regions, where frameshift mutations may be deleterious. The majority of indels and mismatches found were not previously annotated in mitochondrial databases since conventional sequencing methods were limited to homoplasmy or quasi-homoplasmy detection. Intriguingly, upon filtering out non haplogroup-defining variants, we detected a widespread population occurrence of rare events predicted to be damaging. Eventually, samples were stratified into blood- and lymphoblastoid-derived to detect possibly different trends of mutability in the two datasets, an analysis which did not yield significant discordances. CONCLUSIONS: To the best of our knowledge, this is likely the most extended population-scale mitochondrial genotyping in humans enriched with the estimation of heteroplasmies.
format Online
Article
Text
id pubmed-4083402
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40834022014-07-18 Extraction and annotation of human mitochondrial genomes from 1000 Genomes Whole Exome Sequencing data Diroma, Maria Angela Calabrese, Claudia Simone, Domenico Santorsola, Mariangela Calabrese, Francesco Maria Gasparre, Giuseppe Attimonelli, Marcella BMC Genomics Research BACKGROUND: Whole Exome Sequencing (WES) is one of the most used and cost-effective next generation technologies that allows sequencing of all nuclear exons. Off-target regions may be captured if they present high sequence similarity with baits. Bioinformatics tools have been optimized to retrieve a large amount of WES off-target mitochondrial DNA (mtDNA), by exploiting the aspecificity of probes, partially overlapping to Nuclear mitochondrial Sequences (NumtS). The 1000 Genomes project represents one of the widest resources to extract mtDNA sequences from WES data, considering the large effort the scientific community is undertaking to reconstruct human population history using mtDNA as marker, and the involvement of mtDNA in pathology. RESULTS: A previously published pipeline aimed at assembling mitochondrial genomes from off-target WES reads and further improved to detect insertions and deletions (indels) and heteroplasmy in a dataset of 1242 samples from the 1000 Genomes project, enabled to obtain a nearly complete mitochondrial genome from 943 samples (76% analyzed exomes). The robustness of our computational strategy was highlighted by the reduction of reads amount recognized as mitochondrial in the original annotation produced by the Consortium, due to NumtS filtering. An accurate survey was carried out on 1242 individuals. 215 indels, mostly heteroplasmic, and 3407 single base variants were mapped. A homogeneous mismatches distribution was observed along the whole mitochondrial genome, while a lower frequency of indels was found within protein-coding regions, where frameshift mutations may be deleterious. The majority of indels and mismatches found were not previously annotated in mitochondrial databases since conventional sequencing methods were limited to homoplasmy or quasi-homoplasmy detection. Intriguingly, upon filtering out non haplogroup-defining variants, we detected a widespread population occurrence of rare events predicted to be damaging. Eventually, samples were stratified into blood- and lymphoblastoid-derived to detect possibly different trends of mutability in the two datasets, an analysis which did not yield significant discordances. CONCLUSIONS: To the best of our knowledge, this is likely the most extended population-scale mitochondrial genotyping in humans enriched with the estimation of heteroplasmies. BioMed Central 2014-05-06 /pmc/articles/PMC4083402/ /pubmed/25077682 http://dx.doi.org/10.1186/1471-2164-15-S3-S2 Text en Copyright © 2014 Diroma et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Diroma, Maria Angela
Calabrese, Claudia
Simone, Domenico
Santorsola, Mariangela
Calabrese, Francesco Maria
Gasparre, Giuseppe
Attimonelli, Marcella
Extraction and annotation of human mitochondrial genomes from 1000 Genomes Whole Exome Sequencing data
title Extraction and annotation of human mitochondrial genomes from 1000 Genomes Whole Exome Sequencing data
title_full Extraction and annotation of human mitochondrial genomes from 1000 Genomes Whole Exome Sequencing data
title_fullStr Extraction and annotation of human mitochondrial genomes from 1000 Genomes Whole Exome Sequencing data
title_full_unstemmed Extraction and annotation of human mitochondrial genomes from 1000 Genomes Whole Exome Sequencing data
title_short Extraction and annotation of human mitochondrial genomes from 1000 Genomes Whole Exome Sequencing data
title_sort extraction and annotation of human mitochondrial genomes from 1000 genomes whole exome sequencing data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4083402/
https://www.ncbi.nlm.nih.gov/pubmed/25077682
http://dx.doi.org/10.1186/1471-2164-15-S3-S2
work_keys_str_mv AT diromamariaangela extractionandannotationofhumanmitochondrialgenomesfrom1000genomeswholeexomesequencingdata
AT calabreseclaudia extractionandannotationofhumanmitochondrialgenomesfrom1000genomeswholeexomesequencingdata
AT simonedomenico extractionandannotationofhumanmitochondrialgenomesfrom1000genomeswholeexomesequencingdata
AT santorsolamariangela extractionandannotationofhumanmitochondrialgenomesfrom1000genomeswholeexomesequencingdata
AT calabresefrancescomaria extractionandannotationofhumanmitochondrialgenomesfrom1000genomeswholeexomesequencingdata
AT gasparregiuseppe extractionandannotationofhumanmitochondrialgenomesfrom1000genomeswholeexomesequencingdata
AT attimonellimarcella extractionandannotationofhumanmitochondrialgenomesfrom1000genomeswholeexomesequencingdata