Cargando…
Fully automated annotation of mitochondrial genomes using a cluster-based approach with de Bruijn graphs
A wide range of scientific fields, such as forensics, anthropology, medicine, and molecular evolution, benefits from the analysis of mitogenomic data. With the development of new sequencing technologies, the amount of mitochondrial sequence data to be analyzed has increased exponentially over the la...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10448254/ https://www.ncbi.nlm.nih.gov/pubmed/37636259 http://dx.doi.org/10.3389/fgene.2023.1250907 |
_version_ | 1785094692675780608 |
---|---|
author | Fiedler, Lisa Middendorf, Martin Bernt, Matthias |
author_facet | Fiedler, Lisa Middendorf, Martin Bernt, Matthias |
author_sort | Fiedler, Lisa |
collection | PubMed |
description | A wide range of scientific fields, such as forensics, anthropology, medicine, and molecular evolution, benefits from the analysis of mitogenomic data. With the development of new sequencing technologies, the amount of mitochondrial sequence data to be analyzed has increased exponentially over the last few years. The accurate annotation of mitochondrial DNA is a prerequisite for any mitogenomic comparative analysis. To sustain with the growth of the available mitochondrial sequence data, highly efficient automatic computational methods are, hence, needed. Automatic annotation methods are typically based on databases that contain information about already annotated (and often pre-curated) mitogenomes of different species. However, the existing approaches have several shortcomings: 1) they do not scale well with the size of the database; 2) they do not allow for a fast (and easy) update of the database; and 3) they can only be applied to a relatively small taxonomic subset of all species. Here, we present a novel approach that does not have any of these aforementioned shortcomings, (1), (2), and (3). The reference database of mitogenomes is represented as a richly annotated de Bruijn graph. To generate gene predictions for a new user-supplied mitogenome, the method utilizes a clustering routine that uses the mapping information of the provided sequence to this graph. The method is implemented in a software package called DeGeCI (De Bruijn graph Gene Cluster Identification). For a large set of mitogenomes, for which expert-curated annotations are available, DeGeCI generates gene predictions of high conformity. In a comparative evaluation with MITOS2, a state-of-the-art annotation tool for mitochondrial genomes, DeGeCI shows better database scalability while still matching MITOS2 in terms of result quality and providing a fully automated means to update the underlying database. Moreover, unlike MITOS2, DeGeCI can be run in parallel on several processors to make use of modern multi-processor systems. |
format | Online Article Text |
id | pubmed-10448254 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-104482542023-08-25 Fully automated annotation of mitochondrial genomes using a cluster-based approach with de Bruijn graphs Fiedler, Lisa Middendorf, Martin Bernt, Matthias Front Genet Genetics A wide range of scientific fields, such as forensics, anthropology, medicine, and molecular evolution, benefits from the analysis of mitogenomic data. With the development of new sequencing technologies, the amount of mitochondrial sequence data to be analyzed has increased exponentially over the last few years. The accurate annotation of mitochondrial DNA is a prerequisite for any mitogenomic comparative analysis. To sustain with the growth of the available mitochondrial sequence data, highly efficient automatic computational methods are, hence, needed. Automatic annotation methods are typically based on databases that contain information about already annotated (and often pre-curated) mitogenomes of different species. However, the existing approaches have several shortcomings: 1) they do not scale well with the size of the database; 2) they do not allow for a fast (and easy) update of the database; and 3) they can only be applied to a relatively small taxonomic subset of all species. Here, we present a novel approach that does not have any of these aforementioned shortcomings, (1), (2), and (3). The reference database of mitogenomes is represented as a richly annotated de Bruijn graph. To generate gene predictions for a new user-supplied mitogenome, the method utilizes a clustering routine that uses the mapping information of the provided sequence to this graph. The method is implemented in a software package called DeGeCI (De Bruijn graph Gene Cluster Identification). For a large set of mitogenomes, for which expert-curated annotations are available, DeGeCI generates gene predictions of high conformity. In a comparative evaluation with MITOS2, a state-of-the-art annotation tool for mitochondrial genomes, DeGeCI shows better database scalability while still matching MITOS2 in terms of result quality and providing a fully automated means to update the underlying database. Moreover, unlike MITOS2, DeGeCI can be run in parallel on several processors to make use of modern multi-processor systems. Frontiers Media S.A. 2023-08-10 /pmc/articles/PMC10448254/ /pubmed/37636259 http://dx.doi.org/10.3389/fgene.2023.1250907 Text en Copyright © 2023 Fiedler, Middendorf and Bernt. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Fiedler, Lisa Middendorf, Martin Bernt, Matthias Fully automated annotation of mitochondrial genomes using a cluster-based approach with de Bruijn graphs |
title | Fully automated annotation of mitochondrial genomes using a cluster-based approach with de Bruijn graphs |
title_full | Fully automated annotation of mitochondrial genomes using a cluster-based approach with de Bruijn graphs |
title_fullStr | Fully automated annotation of mitochondrial genomes using a cluster-based approach with de Bruijn graphs |
title_full_unstemmed | Fully automated annotation of mitochondrial genomes using a cluster-based approach with de Bruijn graphs |
title_short | Fully automated annotation of mitochondrial genomes using a cluster-based approach with de Bruijn graphs |
title_sort | fully automated annotation of mitochondrial genomes using a cluster-based approach with de bruijn graphs |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10448254/ https://www.ncbi.nlm.nih.gov/pubmed/37636259 http://dx.doi.org/10.3389/fgene.2023.1250907 |
work_keys_str_mv | AT fiedlerlisa fullyautomatedannotationofmitochondrialgenomesusingaclusterbasedapproachwithdebruijngraphs AT middendorfmartin fullyautomatedannotationofmitochondrialgenomesusingaclusterbasedapproachwithdebruijngraphs AT berntmatthias fullyautomatedannotationofmitochondrialgenomesusingaclusterbasedapproachwithdebruijngraphs |