Cargando…
Multiple genome analytics framework: The case of all SARS-CoV-2 complete variants
Pattern detection and string matching are fundamental problems in computer science and the accelerated expansion of bioinformatics and computational biology have made them a core topic for both disciplines. The requirement for computational tools for genomic analyses, such as sequence alignment, is...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier B.V.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9527188/ https://www.ncbi.nlm.nih.gov/pubmed/36195206 http://dx.doi.org/10.1016/j.jbiotec.2022.09.015 |
_version_ | 1784801030302597120 |
---|---|
author | Xylogiannopoulos, Konstantinos F. |
author_facet | Xylogiannopoulos, Konstantinos F. |
author_sort | Xylogiannopoulos, Konstantinos F. |
collection | PubMed |
description | Pattern detection and string matching are fundamental problems in computer science and the accelerated expansion of bioinformatics and computational biology have made them a core topic for both disciplines. The requirement for computational tools for genomic analyses, such as sequence alignment, is very important, although, in most cases the resources and computational power required are enormous. The presented Multiple Genome Analytics Framework combines data structures and algorithms, specifically built for text mining and (repeated) pattern detection, that can help to efficiently address several computational biology and bioinformatics problems, concurrently, with minimal resources. A single execution of advanced algorithms, with space and time complexity [Formula: see text] , is enough to acquire knowledge on all repeated patterns that exist in multiple genome sequences and this information can be used as input by meta-algorithms for further meta-analyses. For the proof of concept and technology of the proposed Framework scalability, agility and efficiency, a publicly available dataset of more than 300,000 SARS-CoV-2 genome sequences from the National Center for Biotechnology Information has been used for the detection of all repeated patterns. These results have been used by newly introduced algorithms to provide answers to questions such as common patterns among all variants, sequence alignment, palindromes and tandem repeats detection, different organism genome comparisons, polymerase chain reaction primers detection, etc. |
format | Online Article Text |
id | pubmed-9527188 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Elsevier B.V. |
record_format | MEDLINE/PubMed |
spelling | pubmed-95271882022-10-03 Multiple genome analytics framework: The case of all SARS-CoV-2 complete variants Xylogiannopoulos, Konstantinos F. J Biotechnol Article Pattern detection and string matching are fundamental problems in computer science and the accelerated expansion of bioinformatics and computational biology have made them a core topic for both disciplines. The requirement for computational tools for genomic analyses, such as sequence alignment, is very important, although, in most cases the resources and computational power required are enormous. The presented Multiple Genome Analytics Framework combines data structures and algorithms, specifically built for text mining and (repeated) pattern detection, that can help to efficiently address several computational biology and bioinformatics problems, concurrently, with minimal resources. A single execution of advanced algorithms, with space and time complexity [Formula: see text] , is enough to acquire knowledge on all repeated patterns that exist in multiple genome sequences and this information can be used as input by meta-algorithms for further meta-analyses. For the proof of concept and technology of the proposed Framework scalability, agility and efficiency, a publicly available dataset of more than 300,000 SARS-CoV-2 genome sequences from the National Center for Biotechnology Information has been used for the detection of all repeated patterns. These results have been used by newly introduced algorithms to provide answers to questions such as common patterns among all variants, sequence alignment, palindromes and tandem repeats detection, different organism genome comparisons, polymerase chain reaction primers detection, etc. Elsevier B.V. 2022-11-20 2022-10-03 /pmc/articles/PMC9527188/ /pubmed/36195206 http://dx.doi.org/10.1016/j.jbiotec.2022.09.015 Text en © 2022 Elsevier B.V. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active. |
spellingShingle | Article Xylogiannopoulos, Konstantinos F. Multiple genome analytics framework: The case of all SARS-CoV-2 complete variants |
title | Multiple genome analytics framework: The case of all SARS-CoV-2 complete variants |
title_full | Multiple genome analytics framework: The case of all SARS-CoV-2 complete variants |
title_fullStr | Multiple genome analytics framework: The case of all SARS-CoV-2 complete variants |
title_full_unstemmed | Multiple genome analytics framework: The case of all SARS-CoV-2 complete variants |
title_short | Multiple genome analytics framework: The case of all SARS-CoV-2 complete variants |
title_sort | multiple genome analytics framework: the case of all sars-cov-2 complete variants |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9527188/ https://www.ncbi.nlm.nih.gov/pubmed/36195206 http://dx.doi.org/10.1016/j.jbiotec.2022.09.015 |
work_keys_str_mv | AT xylogiannopouloskonstantinosf multiplegenomeanalyticsframeworkthecaseofallsarscov2completevariants |