Cargando…

Multiple genome analytics framework: The case of all SARS-CoV-2 complete variants

Pattern detection and string matching are fundamental problems in computer science and the accelerated expansion of bioinformatics and computational biology have made them a core topic for both disciplines. The requirement for computational tools for genomic analyses, such as sequence alignment, is...

Descripción completa

Detalles Bibliográficos
Autor principal: Xylogiannopoulos, Konstantinos F.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier B.V. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9527188/
https://www.ncbi.nlm.nih.gov/pubmed/36195206
http://dx.doi.org/10.1016/j.jbiotec.2022.09.015
_version_ 1784801030302597120
author Xylogiannopoulos, Konstantinos F.
author_facet Xylogiannopoulos, Konstantinos F.
author_sort Xylogiannopoulos, Konstantinos F.
collection PubMed
description Pattern detection and string matching are fundamental problems in computer science and the accelerated expansion of bioinformatics and computational biology have made them a core topic for both disciplines. The requirement for computational tools for genomic analyses, such as sequence alignment, is very important, although, in most cases the resources and computational power required are enormous. The presented Multiple Genome Analytics Framework combines data structures and algorithms, specifically built for text mining and (repeated) pattern detection, that can help to efficiently address several computational biology and bioinformatics problems, concurrently, with minimal resources. A single execution of advanced algorithms, with space and time complexity [Formula: see text] , is enough to acquire knowledge on all repeated patterns that exist in multiple genome sequences and this information can be used as input by meta-algorithms for further meta-analyses. For the proof of concept and technology of the proposed Framework scalability, agility and efficiency, a publicly available dataset of more than 300,000 SARS-CoV-2 genome sequences from the National Center for Biotechnology Information has been used for the detection of all repeated patterns. These results have been used by newly introduced algorithms to provide answers to questions such as common patterns among all variants, sequence alignment, palindromes and tandem repeats detection, different organism genome comparisons, polymerase chain reaction primers detection, etc.
format Online
Article
Text
id pubmed-9527188
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier B.V.
record_format MEDLINE/PubMed
spelling pubmed-95271882022-10-03 Multiple genome analytics framework: The case of all SARS-CoV-2 complete variants Xylogiannopoulos, Konstantinos F. J Biotechnol Article Pattern detection and string matching are fundamental problems in computer science and the accelerated expansion of bioinformatics and computational biology have made them a core topic for both disciplines. The requirement for computational tools for genomic analyses, such as sequence alignment, is very important, although, in most cases the resources and computational power required are enormous. The presented Multiple Genome Analytics Framework combines data structures and algorithms, specifically built for text mining and (repeated) pattern detection, that can help to efficiently address several computational biology and bioinformatics problems, concurrently, with minimal resources. A single execution of advanced algorithms, with space and time complexity [Formula: see text] , is enough to acquire knowledge on all repeated patterns that exist in multiple genome sequences and this information can be used as input by meta-algorithms for further meta-analyses. For the proof of concept and technology of the proposed Framework scalability, agility and efficiency, a publicly available dataset of more than 300,000 SARS-CoV-2 genome sequences from the National Center for Biotechnology Information has been used for the detection of all repeated patterns. These results have been used by newly introduced algorithms to provide answers to questions such as common patterns among all variants, sequence alignment, palindromes and tandem repeats detection, different organism genome comparisons, polymerase chain reaction primers detection, etc. Elsevier B.V. 2022-11-20 2022-10-03 /pmc/articles/PMC9527188/ /pubmed/36195206 http://dx.doi.org/10.1016/j.jbiotec.2022.09.015 Text en © 2022 Elsevier B.V. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Article
Xylogiannopoulos, Konstantinos F.
Multiple genome analytics framework: The case of all SARS-CoV-2 complete variants
title Multiple genome analytics framework: The case of all SARS-CoV-2 complete variants
title_full Multiple genome analytics framework: The case of all SARS-CoV-2 complete variants
title_fullStr Multiple genome analytics framework: The case of all SARS-CoV-2 complete variants
title_full_unstemmed Multiple genome analytics framework: The case of all SARS-CoV-2 complete variants
title_short Multiple genome analytics framework: The case of all SARS-CoV-2 complete variants
title_sort multiple genome analytics framework: the case of all sars-cov-2 complete variants
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9527188/
https://www.ncbi.nlm.nih.gov/pubmed/36195206
http://dx.doi.org/10.1016/j.jbiotec.2022.09.015
work_keys_str_mv AT xylogiannopouloskonstantinosf multiplegenomeanalyticsframeworkthecaseofallsarscov2completevariants