Cargando…

ECuADOR—Easy Curation of Angiosperm Duplicated Organellar Regions, a tool for cleaning and curating plastomes assembled from next generation sequencing pipelines

BACKGROUND: With the rapid increase in availability of genomic resources offered by Next-Generation Sequencing (NGS) and the availability of free online genomic databases, efficient and standardized metadata curation approaches have become increasingly critical for the post-processing stages of biol...

Descripción completa

Detalles Bibliográficos
Autores principales: Armijos Carrion, Angelo D., Hinsinger, Damien D., Strijk, Joeri S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7147433/
https://www.ncbi.nlm.nih.gov/pubmed/32292644
http://dx.doi.org/10.7717/peerj.8699
_version_ 1783520420207525888
author Armijos Carrion, Angelo D.
Hinsinger, Damien D.
Strijk, Joeri S.
author_facet Armijos Carrion, Angelo D.
Hinsinger, Damien D.
Strijk, Joeri S.
author_sort Armijos Carrion, Angelo D.
collection PubMed
description BACKGROUND: With the rapid increase in availability of genomic resources offered by Next-Generation Sequencing (NGS) and the availability of free online genomic databases, efficient and standardized metadata curation approaches have become increasingly critical for the post-processing stages of biological data. Especially in organelle-based studies using circular chloroplast genome datasets, the assembly of the main structural regions in random order and orientation represents a major limitation in our ability to easily generate “ready-to-align” datasets for phylogenetic reconstruction, at both small and large taxonomic scales. In addition, current practices discard the most variable regions of the genomes to facilitate the alignment of the remaining coding regions. Nevertheless, no software is currently available to perform curation to such a degree, through simple detection, organization and positioning of the main plastome regions, making it a time-consuming and error-prone process. Here we introduce a fast and user friendly software ECuADOR, a Perl script specifically designed to automate the detection and reorganization of newly assembled plastomes obtained from any source available (NGS, sanger sequencing or assembler output). METHODS: ECuADOR uses a sliding-window approach to detect long repeated sequences in draft sequences, which then identifies the inverted repeat regions (IRs), even in case of artifactual breaks or sequencing errors and automates the rearrangement of the sequence to the widely used LSC–Irb–SSC–IRa order. This facilitates rapid post-editing steps such as creation of genome alignments, detection of variable regions, SNP detection and phylogenomic analyses. RESULTS: ECuADOR was successfully tested on plant families throughout the angiosperm phylogeny by curating 161 chloroplast datasets. ECuADOR first identified and reordered the central regions (LSC–Irb–SSC–IRa) for each dataset and then produced a new annotation for the chloroplast sequences. The process took less than 20 min with a maximum memory requirement of 150 MB and an accuracy of over 99%. CONCLUSIONS: ECuADOR is the sole de novo one-step recognition and re-ordination tool that provides facilitation in the post-processing analysis of the extra nuclear genomes from NGS data. The program is available at https://github.com/BiodivGenomic/ECuADOR/.
format Online
Article
Text
id pubmed-7147433
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-71474332020-04-14 ECuADOR—Easy Curation of Angiosperm Duplicated Organellar Regions, a tool for cleaning and curating plastomes assembled from next generation sequencing pipelines Armijos Carrion, Angelo D. Hinsinger, Damien D. Strijk, Joeri S. PeerJ Computational Biology BACKGROUND: With the rapid increase in availability of genomic resources offered by Next-Generation Sequencing (NGS) and the availability of free online genomic databases, efficient and standardized metadata curation approaches have become increasingly critical for the post-processing stages of biological data. Especially in organelle-based studies using circular chloroplast genome datasets, the assembly of the main structural regions in random order and orientation represents a major limitation in our ability to easily generate “ready-to-align” datasets for phylogenetic reconstruction, at both small and large taxonomic scales. In addition, current practices discard the most variable regions of the genomes to facilitate the alignment of the remaining coding regions. Nevertheless, no software is currently available to perform curation to such a degree, through simple detection, organization and positioning of the main plastome regions, making it a time-consuming and error-prone process. Here we introduce a fast and user friendly software ECuADOR, a Perl script specifically designed to automate the detection and reorganization of newly assembled plastomes obtained from any source available (NGS, sanger sequencing or assembler output). METHODS: ECuADOR uses a sliding-window approach to detect long repeated sequences in draft sequences, which then identifies the inverted repeat regions (IRs), even in case of artifactual breaks or sequencing errors and automates the rearrangement of the sequence to the widely used LSC–Irb–SSC–IRa order. This facilitates rapid post-editing steps such as creation of genome alignments, detection of variable regions, SNP detection and phylogenomic analyses. RESULTS: ECuADOR was successfully tested on plant families throughout the angiosperm phylogeny by curating 161 chloroplast datasets. ECuADOR first identified and reordered the central regions (LSC–Irb–SSC–IRa) for each dataset and then produced a new annotation for the chloroplast sequences. The process took less than 20 min with a maximum memory requirement of 150 MB and an accuracy of over 99%. CONCLUSIONS: ECuADOR is the sole de novo one-step recognition and re-ordination tool that provides facilitation in the post-processing analysis of the extra nuclear genomes from NGS data. The program is available at https://github.com/BiodivGenomic/ECuADOR/. PeerJ Inc. 2020-04-07 /pmc/articles/PMC7147433/ /pubmed/32292644 http://dx.doi.org/10.7717/peerj.8699 Text en © 2020 Armijos Carrion et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Computational Biology
Armijos Carrion, Angelo D.
Hinsinger, Damien D.
Strijk, Joeri S.
ECuADOR—Easy Curation of Angiosperm Duplicated Organellar Regions, a tool for cleaning and curating plastomes assembled from next generation sequencing pipelines
title ECuADOR—Easy Curation of Angiosperm Duplicated Organellar Regions, a tool for cleaning and curating plastomes assembled from next generation sequencing pipelines
title_full ECuADOR—Easy Curation of Angiosperm Duplicated Organellar Regions, a tool for cleaning and curating plastomes assembled from next generation sequencing pipelines
title_fullStr ECuADOR—Easy Curation of Angiosperm Duplicated Organellar Regions, a tool for cleaning and curating plastomes assembled from next generation sequencing pipelines
title_full_unstemmed ECuADOR—Easy Curation of Angiosperm Duplicated Organellar Regions, a tool for cleaning and curating plastomes assembled from next generation sequencing pipelines
title_short ECuADOR—Easy Curation of Angiosperm Duplicated Organellar Regions, a tool for cleaning and curating plastomes assembled from next generation sequencing pipelines
title_sort ecuador—easy curation of angiosperm duplicated organellar regions, a tool for cleaning and curating plastomes assembled from next generation sequencing pipelines
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7147433/
https://www.ncbi.nlm.nih.gov/pubmed/32292644
http://dx.doi.org/10.7717/peerj.8699
work_keys_str_mv AT armijoscarrionangelod ecuadoreasycurationofangiospermduplicatedorganellarregionsatoolforcleaningandcuratingplastomesassembledfromnextgenerationsequencingpipelines
AT hinsingerdamiend ecuadoreasycurationofangiospermduplicatedorganellarregionsatoolforcleaningandcuratingplastomesassembledfromnextgenerationsequencingpipelines
AT strijkjoeris ecuadoreasycurationofangiospermduplicatedorganellarregionsatoolforcleaningandcuratingplastomesassembledfromnextgenerationsequencingpipelines