Cargando…

Genome-wide identification and prediction of SARS-CoV-2 mutations show an abundance of variants: Integrated study of bioinformatics and deep neural learning

Genomic data analysis is a fundamental system for monitoring pathogen evolution and the outbreak of infectious diseases. Based on bioinformatics and deep learning, this study was designed to identify the genomic variability of SARS-CoV-2 worldwide and predict the impending mutation rate. Analysis of...

Descripción completa

Detalles Bibliográficos
Autores principales: Hossain, Md Shahadat, Pathan, A.Q.M. Sala Uddin, Islam, Md Nur, Tonmoy, Mahafujul Islam Quadery, Rakib, Mahmudul Islam, Munim, Md Adnan, Saha, Otun, Fariha, Atqiya, Reza, Hasan Al, Roy, Maitreyee, Bahadur, Newaz Mohammed, Rahaman, Md Mizanur
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Authors. Published by Elsevier Ltd. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8598266/
https://www.ncbi.nlm.nih.gov/pubmed/34812411
http://dx.doi.org/10.1016/j.imu.2021.100798
_version_ 1784600782953250816
author Hossain, Md Shahadat
Pathan, A.Q.M. Sala Uddin
Islam, Md Nur
Tonmoy, Mahafujul Islam Quadery
Rakib, Mahmudul Islam
Munim, Md Adnan
Saha, Otun
Fariha, Atqiya
Reza, Hasan Al
Roy, Maitreyee
Bahadur, Newaz Mohammed
Rahaman, Md Mizanur
author_facet Hossain, Md Shahadat
Pathan, A.Q.M. Sala Uddin
Islam, Md Nur
Tonmoy, Mahafujul Islam Quadery
Rakib, Mahmudul Islam
Munim, Md Adnan
Saha, Otun
Fariha, Atqiya
Reza, Hasan Al
Roy, Maitreyee
Bahadur, Newaz Mohammed
Rahaman, Md Mizanur
author_sort Hossain, Md Shahadat
collection PubMed
description Genomic data analysis is a fundamental system for monitoring pathogen evolution and the outbreak of infectious diseases. Based on bioinformatics and deep learning, this study was designed to identify the genomic variability of SARS-CoV-2 worldwide and predict the impending mutation rate. Analysis of 259044 SARS-CoV-2 isolates identified 3334545 mutations with an average of 14.01 mutations per isolate. Globally, single nucleotide polymorphism (SNP) is the most prevalent mutational event. The prevalence of C > T (52.67%) was noticed as a major alteration across the world followed by the G > T (14.59%) and A > G (11.13%). Strains from India showed the highest number of mutations (48) followed by Scotland, USA, Netherlands, Norway, and France having up to 36 mutations. D416G, F106F, P314L, UTR:C241T, L93L, A222V, A199A, V30L, and A220V mutations were found as the most frequent mutations. D1118H, S194L, R262H, M809L, P314L, A8D, S220G, A890D, G1433C, T1456I, R233C, F263S, L111K, A54T, A74V, L183A, A316T, V212F, L46C, V48G, Q57H, W131R, G172V, Q185H, and Y206S missense mutations were found to largely decrease the structural stability of the corresponding proteins. Conversely, D3L, L5F, and S97I were found to largely increase the structural stability of the corresponding proteins. Multi-nucleotide mutations GGG > AAC, CC > TT, TG > CA, and AT > TA have come up in our analysis which are in the top 20 mutational cohort. Future mutation rate analysis predicts a 17%, 7%, and 3% increment of C > T, A > G, and A > T, respectively in the future. Conversely, 7%, 7%, and 6% decrement is estimated for T > C, G > A, and G > T mutations, respectively. T > G\A, C > G\A, and A > T\C are not anticipated in the future. Since SARS-CoV-2 is mutating continuously, our findings will facilitate the tracking of mutations and help to map the progression of the COVID-19 intensity worldwide.
format Online
Article
Text
id pubmed-8598266
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher The Authors. Published by Elsevier Ltd.
record_format MEDLINE/PubMed
spelling pubmed-85982662021-11-18 Genome-wide identification and prediction of SARS-CoV-2 mutations show an abundance of variants: Integrated study of bioinformatics and deep neural learning Hossain, Md Shahadat Pathan, A.Q.M. Sala Uddin Islam, Md Nur Tonmoy, Mahafujul Islam Quadery Rakib, Mahmudul Islam Munim, Md Adnan Saha, Otun Fariha, Atqiya Reza, Hasan Al Roy, Maitreyee Bahadur, Newaz Mohammed Rahaman, Md Mizanur Inform Med Unlocked Article Genomic data analysis is a fundamental system for monitoring pathogen evolution and the outbreak of infectious diseases. Based on bioinformatics and deep learning, this study was designed to identify the genomic variability of SARS-CoV-2 worldwide and predict the impending mutation rate. Analysis of 259044 SARS-CoV-2 isolates identified 3334545 mutations with an average of 14.01 mutations per isolate. Globally, single nucleotide polymorphism (SNP) is the most prevalent mutational event. The prevalence of C > T (52.67%) was noticed as a major alteration across the world followed by the G > T (14.59%) and A > G (11.13%). Strains from India showed the highest number of mutations (48) followed by Scotland, USA, Netherlands, Norway, and France having up to 36 mutations. D416G, F106F, P314L, UTR:C241T, L93L, A222V, A199A, V30L, and A220V mutations were found as the most frequent mutations. D1118H, S194L, R262H, M809L, P314L, A8D, S220G, A890D, G1433C, T1456I, R233C, F263S, L111K, A54T, A74V, L183A, A316T, V212F, L46C, V48G, Q57H, W131R, G172V, Q185H, and Y206S missense mutations were found to largely decrease the structural stability of the corresponding proteins. Conversely, D3L, L5F, and S97I were found to largely increase the structural stability of the corresponding proteins. Multi-nucleotide mutations GGG > AAC, CC > TT, TG > CA, and AT > TA have come up in our analysis which are in the top 20 mutational cohort. Future mutation rate analysis predicts a 17%, 7%, and 3% increment of C > T, A > G, and A > T, respectively in the future. Conversely, 7%, 7%, and 6% decrement is estimated for T > C, G > A, and G > T mutations, respectively. T > G\A, C > G\A, and A > T\C are not anticipated in the future. Since SARS-CoV-2 is mutating continuously, our findings will facilitate the tracking of mutations and help to map the progression of the COVID-19 intensity worldwide. The Authors. Published by Elsevier Ltd. 2021 2021-11-18 /pmc/articles/PMC8598266/ /pubmed/34812411 http://dx.doi.org/10.1016/j.imu.2021.100798 Text en © 2022 The Authors Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Article
Hossain, Md Shahadat
Pathan, A.Q.M. Sala Uddin
Islam, Md Nur
Tonmoy, Mahafujul Islam Quadery
Rakib, Mahmudul Islam
Munim, Md Adnan
Saha, Otun
Fariha, Atqiya
Reza, Hasan Al
Roy, Maitreyee
Bahadur, Newaz Mohammed
Rahaman, Md Mizanur
Genome-wide identification and prediction of SARS-CoV-2 mutations show an abundance of variants: Integrated study of bioinformatics and deep neural learning
title Genome-wide identification and prediction of SARS-CoV-2 mutations show an abundance of variants: Integrated study of bioinformatics and deep neural learning
title_full Genome-wide identification and prediction of SARS-CoV-2 mutations show an abundance of variants: Integrated study of bioinformatics and deep neural learning
title_fullStr Genome-wide identification and prediction of SARS-CoV-2 mutations show an abundance of variants: Integrated study of bioinformatics and deep neural learning
title_full_unstemmed Genome-wide identification and prediction of SARS-CoV-2 mutations show an abundance of variants: Integrated study of bioinformatics and deep neural learning
title_short Genome-wide identification and prediction of SARS-CoV-2 mutations show an abundance of variants: Integrated study of bioinformatics and deep neural learning
title_sort genome-wide identification and prediction of sars-cov-2 mutations show an abundance of variants: integrated study of bioinformatics and deep neural learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8598266/
https://www.ncbi.nlm.nih.gov/pubmed/34812411
http://dx.doi.org/10.1016/j.imu.2021.100798
work_keys_str_mv AT hossainmdshahadat genomewideidentificationandpredictionofsarscov2mutationsshowanabundanceofvariantsintegratedstudyofbioinformaticsanddeepneurallearning
AT pathanaqmsalauddin genomewideidentificationandpredictionofsarscov2mutationsshowanabundanceofvariantsintegratedstudyofbioinformaticsanddeepneurallearning
AT islammdnur genomewideidentificationandpredictionofsarscov2mutationsshowanabundanceofvariantsintegratedstudyofbioinformaticsanddeepneurallearning
AT tonmoymahafujulislamquadery genomewideidentificationandpredictionofsarscov2mutationsshowanabundanceofvariantsintegratedstudyofbioinformaticsanddeepneurallearning
AT rakibmahmudulislam genomewideidentificationandpredictionofsarscov2mutationsshowanabundanceofvariantsintegratedstudyofbioinformaticsanddeepneurallearning
AT munimmdadnan genomewideidentificationandpredictionofsarscov2mutationsshowanabundanceofvariantsintegratedstudyofbioinformaticsanddeepneurallearning
AT sahaotun genomewideidentificationandpredictionofsarscov2mutationsshowanabundanceofvariantsintegratedstudyofbioinformaticsanddeepneurallearning
AT farihaatqiya genomewideidentificationandpredictionofsarscov2mutationsshowanabundanceofvariantsintegratedstudyofbioinformaticsanddeepneurallearning
AT rezahasanal genomewideidentificationandpredictionofsarscov2mutationsshowanabundanceofvariantsintegratedstudyofbioinformaticsanddeepneurallearning
AT roymaitreyee genomewideidentificationandpredictionofsarscov2mutationsshowanabundanceofvariantsintegratedstudyofbioinformaticsanddeepneurallearning
AT bahadurnewazmohammed genomewideidentificationandpredictionofsarscov2mutationsshowanabundanceofvariantsintegratedstudyofbioinformaticsanddeepneurallearning
AT rahamanmdmizanur genomewideidentificationandpredictionofsarscov2mutationsshowanabundanceofvariantsintegratedstudyofbioinformaticsanddeepneurallearning