Cargando…

airpg: automatically accessing the inverted repeats of archived plastid genomes

BACKGROUND: In most flowering plants, the plastid genome exhibits a quadripartite genome structure, comprising a large and a small single copy as well as two inverted repeat regions. Thousands of plastid genomes have been sequenced and submitted to public sequence repositories in recent years. The q...

Descripción completa

Detalles Bibliográficos
Autores principales: Mehl, Tilman, Gruenstaeudl, Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8379869/
https://www.ncbi.nlm.nih.gov/pubmed/34418956
http://dx.doi.org/10.1186/s12859-021-04309-y
_version_ 1783741096216494080
author Mehl, Tilman
Gruenstaeudl, Michael
author_facet Mehl, Tilman
Gruenstaeudl, Michael
author_sort Mehl, Tilman
collection PubMed
description BACKGROUND: In most flowering plants, the plastid genome exhibits a quadripartite genome structure, comprising a large and a small single copy as well as two inverted repeat regions. Thousands of plastid genomes have been sequenced and submitted to public sequence repositories in recent years. The quality of sequence annotations in many of these submissions is known to be problematic, especially regarding annotations that specify the length and location of the inverted repeats: such annotations are either missing or portray the length or location of the repeats incorrectly. However, many biological investigations employ publicly available plastid genomes at face value and implicitly assume the correctness of their sequence annotations. RESULTS: We introduce airpg, a Python package that automatically assesses the frequency of incomplete or incorrect annotations of the inverted repeats among publicly available plastid genomes. Specifically, the tool automatically retrieves plastid genomes from NCBI Nucleotide under variable search parameters, surveys them for length and location specifications of inverted repeats, and confirms any inverted repeat annotations through self-comparisons of the genome sequences. The package also includes functionality for automatic identification and removal of duplicate genome records and accounts for taxa that genuinely lack inverted repeats. A survey of the presence of inverted repeat annotations among all plastid genomes of flowering plants submitted to NCBI Nucleotide until the end of 2020 using airpg, followed by a statistical analysis of potential associations with record metadata, highlights that release year and publication status of the genome records have a significant effect on the frequency of complete and equal-length inverted repeat annotations. CONCLUSION: The number of plastid genomes on NCBI Nucleotide has increased dramatically in recent years, and many more genomes will likely be submitted over the next decade. airpg enables researchers to automatically access and evaluate the inverted repeats of these plastid genomes as well as their sequence annotations and, thus, contributes to increasing the reliability of publicly available plastid genomes. The software is freely available via the Python package index at http://pypi.python.org/pypi/airpg. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04309-y.
format Online
Article
Text
id pubmed-8379869
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-83798692021-08-23 airpg: automatically accessing the inverted repeats of archived plastid genomes Mehl, Tilman Gruenstaeudl, Michael BMC Bioinformatics Software BACKGROUND: In most flowering plants, the plastid genome exhibits a quadripartite genome structure, comprising a large and a small single copy as well as two inverted repeat regions. Thousands of plastid genomes have been sequenced and submitted to public sequence repositories in recent years. The quality of sequence annotations in many of these submissions is known to be problematic, especially regarding annotations that specify the length and location of the inverted repeats: such annotations are either missing or portray the length or location of the repeats incorrectly. However, many biological investigations employ publicly available plastid genomes at face value and implicitly assume the correctness of their sequence annotations. RESULTS: We introduce airpg, a Python package that automatically assesses the frequency of incomplete or incorrect annotations of the inverted repeats among publicly available plastid genomes. Specifically, the tool automatically retrieves plastid genomes from NCBI Nucleotide under variable search parameters, surveys them for length and location specifications of inverted repeats, and confirms any inverted repeat annotations through self-comparisons of the genome sequences. The package also includes functionality for automatic identification and removal of duplicate genome records and accounts for taxa that genuinely lack inverted repeats. A survey of the presence of inverted repeat annotations among all plastid genomes of flowering plants submitted to NCBI Nucleotide until the end of 2020 using airpg, followed by a statistical analysis of potential associations with record metadata, highlights that release year and publication status of the genome records have a significant effect on the frequency of complete and equal-length inverted repeat annotations. CONCLUSION: The number of plastid genomes on NCBI Nucleotide has increased dramatically in recent years, and many more genomes will likely be submitted over the next decade. airpg enables researchers to automatically access and evaluate the inverted repeats of these plastid genomes as well as their sequence annotations and, thus, contributes to increasing the reliability of publicly available plastid genomes. The software is freely available via the Python package index at http://pypi.python.org/pypi/airpg. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04309-y. BioMed Central 2021-08-21 /pmc/articles/PMC8379869/ /pubmed/34418956 http://dx.doi.org/10.1186/s12859-021-04309-y Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Mehl, Tilman
Gruenstaeudl, Michael
airpg: automatically accessing the inverted repeats of archived plastid genomes
title airpg: automatically accessing the inverted repeats of archived plastid genomes
title_full airpg: automatically accessing the inverted repeats of archived plastid genomes
title_fullStr airpg: automatically accessing the inverted repeats of archived plastid genomes
title_full_unstemmed airpg: automatically accessing the inverted repeats of archived plastid genomes
title_short airpg: automatically accessing the inverted repeats of archived plastid genomes
title_sort airpg: automatically accessing the inverted repeats of archived plastid genomes
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8379869/
https://www.ncbi.nlm.nih.gov/pubmed/34418956
http://dx.doi.org/10.1186/s12859-021-04309-y
work_keys_str_mv AT mehltilman airpgautomaticallyaccessingtheinvertedrepeatsofarchivedplastidgenomes
AT gruenstaeudlmichael airpgautomaticallyaccessingtheinvertedrepeatsofarchivedplastidgenomes