Cargando…

CIAlign: A highly customisable command line tool to clean, interpret and visualise multiple sequence alignments

BACKGROUND: Throughout biology, multiple sequence alignments (MSAs) form the basis of much investigation into biological features and relationships. These alignments are at the heart of many bioinformatics analyses. However, sequences in MSAs are often incomplete or very divergent, which can lead to...

Descripción completa

Detalles Bibliográficos
Autores principales: Tumescheit, Charlotte, Firth, Andrew E., Brown, Katherine
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8932311/
https://www.ncbi.nlm.nih.gov/pubmed/35310163
http://dx.doi.org/10.7717/peerj.12983
_version_ 1784671427439362048
author Tumescheit, Charlotte
Firth, Andrew E.
Brown, Katherine
author_facet Tumescheit, Charlotte
Firth, Andrew E.
Brown, Katherine
author_sort Tumescheit, Charlotte
collection PubMed
description BACKGROUND: Throughout biology, multiple sequence alignments (MSAs) form the basis of much investigation into biological features and relationships. These alignments are at the heart of many bioinformatics analyses. However, sequences in MSAs are often incomplete or very divergent, which can lead to poor alignment and large gaps. This slows down computation and can impact conclusions without being biologically relevant. Cleaning the alignment by removing common issues such as gaps, divergent sequences, large insertions and deletions and poorly aligned sequence ends can substantially improve analyses. Manual editing of MSAs is very widespread but is time-consuming and difficult to reproduce. RESULTS: We present a comprehensive, user-friendly MSA trimming tool with multiple visualisation options. Our highly customisable command line tool aims to give intervention power to the user by offering various options, and outputs graphical representations of the alignment before and after processing to give the user a clear overview of what has been removed. The main functionalities of the tool include removing regions of low coverage due to insertions, removing gaps, cropping poorly aligned sequence ends and removing sequences that are too divergent or too short. The thresholds for each function can be specified by the user and parameters can be adjusted to each individual MSA. CIAlign is designed with an emphasis on solving specific and common alignment problems and on providing transparency to the user. CONCLUSION: CIAlign effectively removes problematic regions and sequences from MSAs and provides novel visualisation options. This tool can be used to fine-tune alignments for further analysis and processing. The tool is aimed at anyone who wishes to automatically clean up parts of an MSA and those requiring a new, accessible way of visualising large MSAs.
format Online
Article
Text
id pubmed-8932311
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-89323112022-03-19 CIAlign: A highly customisable command line tool to clean, interpret and visualise multiple sequence alignments Tumescheit, Charlotte Firth, Andrew E. Brown, Katherine PeerJ Bioinformatics BACKGROUND: Throughout biology, multiple sequence alignments (MSAs) form the basis of much investigation into biological features and relationships. These alignments are at the heart of many bioinformatics analyses. However, sequences in MSAs are often incomplete or very divergent, which can lead to poor alignment and large gaps. This slows down computation and can impact conclusions without being biologically relevant. Cleaning the alignment by removing common issues such as gaps, divergent sequences, large insertions and deletions and poorly aligned sequence ends can substantially improve analyses. Manual editing of MSAs is very widespread but is time-consuming and difficult to reproduce. RESULTS: We present a comprehensive, user-friendly MSA trimming tool with multiple visualisation options. Our highly customisable command line tool aims to give intervention power to the user by offering various options, and outputs graphical representations of the alignment before and after processing to give the user a clear overview of what has been removed. The main functionalities of the tool include removing regions of low coverage due to insertions, removing gaps, cropping poorly aligned sequence ends and removing sequences that are too divergent or too short. The thresholds for each function can be specified by the user and parameters can be adjusted to each individual MSA. CIAlign is designed with an emphasis on solving specific and common alignment problems and on providing transparency to the user. CONCLUSION: CIAlign effectively removes problematic regions and sequences from MSAs and provides novel visualisation options. This tool can be used to fine-tune alignments for further analysis and processing. The tool is aimed at anyone who wishes to automatically clean up parts of an MSA and those requiring a new, accessible way of visualising large MSAs. PeerJ Inc. 2022-03-15 /pmc/articles/PMC8932311/ /pubmed/35310163 http://dx.doi.org/10.7717/peerj.12983 Text en ©2022 Tumescheit et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Tumescheit, Charlotte
Firth, Andrew E.
Brown, Katherine
CIAlign: A highly customisable command line tool to clean, interpret and visualise multiple sequence alignments
title CIAlign: A highly customisable command line tool to clean, interpret and visualise multiple sequence alignments
title_full CIAlign: A highly customisable command line tool to clean, interpret and visualise multiple sequence alignments
title_fullStr CIAlign: A highly customisable command line tool to clean, interpret and visualise multiple sequence alignments
title_full_unstemmed CIAlign: A highly customisable command line tool to clean, interpret and visualise multiple sequence alignments
title_short CIAlign: A highly customisable command line tool to clean, interpret and visualise multiple sequence alignments
title_sort cialign: a highly customisable command line tool to clean, interpret and visualise multiple sequence alignments
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8932311/
https://www.ncbi.nlm.nih.gov/pubmed/35310163
http://dx.doi.org/10.7717/peerj.12983
work_keys_str_mv AT tumescheitcharlotte cialignahighlycustomisablecommandlinetooltocleaninterpretandvisualisemultiplesequencealignments
AT firthandrewe cialignahighlycustomisablecommandlinetooltocleaninterpretandvisualisemultiplesequencealignments
AT brownkatherine cialignahighlycustomisablecommandlinetooltocleaninterpretandvisualisemultiplesequencealignments