Cargando…
Buried treasure in a public repository: Mining mitochondrial genes of 32 annelid species from sequence reads deposited in the Sequence Read Archive (SRA)
BACKGROUND: The mitochondrial genomes (mitogenomes) of metazoans generally include the same set of protein-coding genes, which ensures the homology of mitochondrial genes between species. The mitochondrial genes are often used as reference data for species identification based on genetic data (DNA b...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10693233/ https://www.ncbi.nlm.nih.gov/pubmed/38047014 http://dx.doi.org/10.7717/peerj.16446 |
_version_ | 1785153117021536256 |
---|---|
author | Kobayashi, Genki |
author_facet | Kobayashi, Genki |
author_sort | Kobayashi, Genki |
collection | PubMed |
description | BACKGROUND: The mitochondrial genomes (mitogenomes) of metazoans generally include the same set of protein-coding genes, which ensures the homology of mitochondrial genes between species. The mitochondrial genes are often used as reference data for species identification based on genetic data (DNA barcoding). The need for such reference data has been increasing due to the application of environmental DNA (eDNA) analysis for environmental assessments. Recently, the number of publicly available sequence reads obtained with next-generation sequencing (NGS) has been increasing in the public database (the NCBI Sequence Read Archive, SRA). Such freely available NGS reads would be promising sources for assembling mitochondrial protein-coding genes (mPCGs) of organisms whose mitochondrial genes are not available in GenBank. The present study aimed to assemble annelid mPCGs from raw data deposited in the SRA. METHODS: The recent progress in the classification of Annelida was briefly introduced. In the present study, the mPCGs of 32 annelid species of 19 families in clitellates and allies in Sedentaria (echiurans and polychaetes) were newly assembled from the reads deposited in the SRA. Assembly was performed with a recently published pipeline mitoRNA, which includes cycles of Bowtie2 mapping and Trinity assembly. Assembled mPCGs were deposited in GenBank as Third Party Data (TPA) data. A phylogenetic tree was reconstructed with maximum likelihood (ML) analysis, together with other mPCGs deposited in GenBank. RESULTS AND DISCUSSION: mPCG assembly was largely successful except for Travisia forbesii; only four genes were detected from the assembled contigs of the species probably due to the reads targeting its parasite. Most genes were largely successfully obtained, whereas atp8, nad2, and nad4l were only successful in 22–24 species. The high nucleotide substitution rates of these genes might be relevant to the failure in the assembly although nad6, which showed a similarly high substitution rate, was successfully assembled. Although the phylogenetic positions of several lineages were not resolved in the present study, the phylogenetic relationships of some polychaetes and leeches that were not inferred by transcriptomes were well resolved probably due to a more dense taxon sampling than previous phylogenetic analyses based on transcriptomes. Although NGS data are generally better sources for resolving phylogenetic relationships of both higher and lower classifications, there are ensuring needs for specific loci of the mitochondrial genes for analyses that do not require high resolutions, such as DNA barcoding, eDNA, and phylogenetic analysis among lower taxa. Assembly from publicly available NGS reads would help design specific primers for the mitochondrial gene sequences of species, whose mitochondrial genes are hard to amplify by Sanger sequencing using universal primers. |
format | Online Article Text |
id | pubmed-10693233 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-106932332023-12-03 Buried treasure in a public repository: Mining mitochondrial genes of 32 annelid species from sequence reads deposited in the Sequence Read Archive (SRA) Kobayashi, Genki PeerJ Bioinformatics BACKGROUND: The mitochondrial genomes (mitogenomes) of metazoans generally include the same set of protein-coding genes, which ensures the homology of mitochondrial genes between species. The mitochondrial genes are often used as reference data for species identification based on genetic data (DNA barcoding). The need for such reference data has been increasing due to the application of environmental DNA (eDNA) analysis for environmental assessments. Recently, the number of publicly available sequence reads obtained with next-generation sequencing (NGS) has been increasing in the public database (the NCBI Sequence Read Archive, SRA). Such freely available NGS reads would be promising sources for assembling mitochondrial protein-coding genes (mPCGs) of organisms whose mitochondrial genes are not available in GenBank. The present study aimed to assemble annelid mPCGs from raw data deposited in the SRA. METHODS: The recent progress in the classification of Annelida was briefly introduced. In the present study, the mPCGs of 32 annelid species of 19 families in clitellates and allies in Sedentaria (echiurans and polychaetes) were newly assembled from the reads deposited in the SRA. Assembly was performed with a recently published pipeline mitoRNA, which includes cycles of Bowtie2 mapping and Trinity assembly. Assembled mPCGs were deposited in GenBank as Third Party Data (TPA) data. A phylogenetic tree was reconstructed with maximum likelihood (ML) analysis, together with other mPCGs deposited in GenBank. RESULTS AND DISCUSSION: mPCG assembly was largely successful except for Travisia forbesii; only four genes were detected from the assembled contigs of the species probably due to the reads targeting its parasite. Most genes were largely successfully obtained, whereas atp8, nad2, and nad4l were only successful in 22–24 species. The high nucleotide substitution rates of these genes might be relevant to the failure in the assembly although nad6, which showed a similarly high substitution rate, was successfully assembled. Although the phylogenetic positions of several lineages were not resolved in the present study, the phylogenetic relationships of some polychaetes and leeches that were not inferred by transcriptomes were well resolved probably due to a more dense taxon sampling than previous phylogenetic analyses based on transcriptomes. Although NGS data are generally better sources for resolving phylogenetic relationships of both higher and lower classifications, there are ensuring needs for specific loci of the mitochondrial genes for analyses that do not require high resolutions, such as DNA barcoding, eDNA, and phylogenetic analysis among lower taxa. Assembly from publicly available NGS reads would help design specific primers for the mitochondrial gene sequences of species, whose mitochondrial genes are hard to amplify by Sanger sequencing using universal primers. PeerJ Inc. 2023-11-29 /pmc/articles/PMC10693233/ /pubmed/38047014 http://dx.doi.org/10.7717/peerj.16446 Text en ©2023 Kobayashi https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
spellingShingle | Bioinformatics Kobayashi, Genki Buried treasure in a public repository: Mining mitochondrial genes of 32 annelid species from sequence reads deposited in the Sequence Read Archive (SRA) |
title | Buried treasure in a public repository: Mining mitochondrial genes of 32 annelid species from sequence reads deposited in the Sequence Read Archive (SRA) |
title_full | Buried treasure in a public repository: Mining mitochondrial genes of 32 annelid species from sequence reads deposited in the Sequence Read Archive (SRA) |
title_fullStr | Buried treasure in a public repository: Mining mitochondrial genes of 32 annelid species from sequence reads deposited in the Sequence Read Archive (SRA) |
title_full_unstemmed | Buried treasure in a public repository: Mining mitochondrial genes of 32 annelid species from sequence reads deposited in the Sequence Read Archive (SRA) |
title_short | Buried treasure in a public repository: Mining mitochondrial genes of 32 annelid species from sequence reads deposited in the Sequence Read Archive (SRA) |
title_sort | buried treasure in a public repository: mining mitochondrial genes of 32 annelid species from sequence reads deposited in the sequence read archive (sra) |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10693233/ https://www.ncbi.nlm.nih.gov/pubmed/38047014 http://dx.doi.org/10.7717/peerj.16446 |
work_keys_str_mv | AT kobayashigenki buriedtreasureinapublicrepositoryminingmitochondrialgenesof32annelidspeciesfromsequencereadsdepositedinthesequencereadarchivesra |