Cargando…

Buried treasure in a public repository: Mining mitochondrial genes of 32 annelid species from sequence reads deposited in the Sequence Read Archive (SRA)

BACKGROUND: The mitochondrial genomes (mitogenomes) of metazoans generally include the same set of protein-coding genes, which ensures the homology of mitochondrial genes between species. The mitochondrial genes are often used as reference data for species identification based on genetic data (DNA b...

Descripción completa

Detalles Bibliográficos
Autor principal: Kobayashi, Genki
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10693233/
https://www.ncbi.nlm.nih.gov/pubmed/38047014
http://dx.doi.org/10.7717/peerj.16446
_version_ 1785153117021536256
author Kobayashi, Genki
author_facet Kobayashi, Genki
author_sort Kobayashi, Genki
collection PubMed
description BACKGROUND: The mitochondrial genomes (mitogenomes) of metazoans generally include the same set of protein-coding genes, which ensures the homology of mitochondrial genes between species. The mitochondrial genes are often used as reference data for species identification based on genetic data (DNA barcoding). The need for such reference data has been increasing due to the application of environmental DNA (eDNA) analysis for environmental assessments. Recently, the number of publicly available sequence reads obtained with next-generation sequencing (NGS) has been increasing in the public database (the NCBI Sequence Read Archive, SRA). Such freely available NGS reads would be promising sources for assembling mitochondrial protein-coding genes (mPCGs) of organisms whose mitochondrial genes are not available in GenBank. The present study aimed to assemble annelid mPCGs from raw data deposited in the SRA. METHODS: The recent progress in the classification of Annelida was briefly introduced. In the present study, the mPCGs of 32 annelid species of 19 families in clitellates and allies in Sedentaria (echiurans and polychaetes) were newly assembled from the reads deposited in the SRA. Assembly was performed with a recently published pipeline mitoRNA, which includes cycles of Bowtie2 mapping and Trinity assembly. Assembled mPCGs were deposited in GenBank as Third Party Data (TPA) data. A phylogenetic tree was reconstructed with maximum likelihood (ML) analysis, together with other mPCGs deposited in GenBank. RESULTS AND DISCUSSION: mPCG assembly was largely successful except for Travisia forbesii; only four genes were detected from the assembled contigs of the species probably due to the reads targeting its parasite. Most genes were largely successfully obtained, whereas atp8, nad2, and nad4l were only successful in 22–24 species. The high nucleotide substitution rates of these genes might be relevant to the failure in the assembly although nad6, which showed a similarly high substitution rate, was successfully assembled. Although the phylogenetic positions of several lineages were not resolved in the present study, the phylogenetic relationships of some polychaetes and leeches that were not inferred by transcriptomes were well resolved probably due to a more dense taxon sampling than previous phylogenetic analyses based on transcriptomes. Although NGS data are generally better sources for resolving phylogenetic relationships of both higher and lower classifications, there are ensuring needs for specific loci of the mitochondrial genes for analyses that do not require high resolutions, such as DNA barcoding, eDNA, and phylogenetic analysis among lower taxa. Assembly from publicly available NGS reads would help design specific primers for the mitochondrial gene sequences of species, whose mitochondrial genes are hard to amplify by Sanger sequencing using universal primers.
format Online
Article
Text
id pubmed-10693233
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-106932332023-12-03 Buried treasure in a public repository: Mining mitochondrial genes of 32 annelid species from sequence reads deposited in the Sequence Read Archive (SRA) Kobayashi, Genki PeerJ Bioinformatics BACKGROUND: The mitochondrial genomes (mitogenomes) of metazoans generally include the same set of protein-coding genes, which ensures the homology of mitochondrial genes between species. The mitochondrial genes are often used as reference data for species identification based on genetic data (DNA barcoding). The need for such reference data has been increasing due to the application of environmental DNA (eDNA) analysis for environmental assessments. Recently, the number of publicly available sequence reads obtained with next-generation sequencing (NGS) has been increasing in the public database (the NCBI Sequence Read Archive, SRA). Such freely available NGS reads would be promising sources for assembling mitochondrial protein-coding genes (mPCGs) of organisms whose mitochondrial genes are not available in GenBank. The present study aimed to assemble annelid mPCGs from raw data deposited in the SRA. METHODS: The recent progress in the classification of Annelida was briefly introduced. In the present study, the mPCGs of 32 annelid species of 19 families in clitellates and allies in Sedentaria (echiurans and polychaetes) were newly assembled from the reads deposited in the SRA. Assembly was performed with a recently published pipeline mitoRNA, which includes cycles of Bowtie2 mapping and Trinity assembly. Assembled mPCGs were deposited in GenBank as Third Party Data (TPA) data. A phylogenetic tree was reconstructed with maximum likelihood (ML) analysis, together with other mPCGs deposited in GenBank. RESULTS AND DISCUSSION: mPCG assembly was largely successful except for Travisia forbesii; only four genes were detected from the assembled contigs of the species probably due to the reads targeting its parasite. Most genes were largely successfully obtained, whereas atp8, nad2, and nad4l were only successful in 22–24 species. The high nucleotide substitution rates of these genes might be relevant to the failure in the assembly although nad6, which showed a similarly high substitution rate, was successfully assembled. Although the phylogenetic positions of several lineages were not resolved in the present study, the phylogenetic relationships of some polychaetes and leeches that were not inferred by transcriptomes were well resolved probably due to a more dense taxon sampling than previous phylogenetic analyses based on transcriptomes. Although NGS data are generally better sources for resolving phylogenetic relationships of both higher and lower classifications, there are ensuring needs for specific loci of the mitochondrial genes for analyses that do not require high resolutions, such as DNA barcoding, eDNA, and phylogenetic analysis among lower taxa. Assembly from publicly available NGS reads would help design specific primers for the mitochondrial gene sequences of species, whose mitochondrial genes are hard to amplify by Sanger sequencing using universal primers. PeerJ Inc. 2023-11-29 /pmc/articles/PMC10693233/ /pubmed/38047014 http://dx.doi.org/10.7717/peerj.16446 Text en ©2023 Kobayashi https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Kobayashi, Genki
Buried treasure in a public repository: Mining mitochondrial genes of 32 annelid species from sequence reads deposited in the Sequence Read Archive (SRA)
title Buried treasure in a public repository: Mining mitochondrial genes of 32 annelid species from sequence reads deposited in the Sequence Read Archive (SRA)
title_full Buried treasure in a public repository: Mining mitochondrial genes of 32 annelid species from sequence reads deposited in the Sequence Read Archive (SRA)
title_fullStr Buried treasure in a public repository: Mining mitochondrial genes of 32 annelid species from sequence reads deposited in the Sequence Read Archive (SRA)
title_full_unstemmed Buried treasure in a public repository: Mining mitochondrial genes of 32 annelid species from sequence reads deposited in the Sequence Read Archive (SRA)
title_short Buried treasure in a public repository: Mining mitochondrial genes of 32 annelid species from sequence reads deposited in the Sequence Read Archive (SRA)
title_sort buried treasure in a public repository: mining mitochondrial genes of 32 annelid species from sequence reads deposited in the sequence read archive (sra)
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10693233/
https://www.ncbi.nlm.nih.gov/pubmed/38047014
http://dx.doi.org/10.7717/peerj.16446
work_keys_str_mv AT kobayashigenki buriedtreasureinapublicrepositoryminingmitochondrialgenesof32annelidspeciesfromsequencereadsdepositedinthesequencereadarchivesra