Cargando…

High Performance Integration Pipeline for Viral and Epitope Sequences

With the spread of COVID-19, sequencing laboratories started to share hundreds of sequences daily. However, the lack of a commonly agreed standard across deposition databases hindered the exploration and study of all the viral sequences collected worldwide in a practical and homogeneous way. During...

Descripción completa

Detalles Bibliográficos
Autores principales: Alfonsi, Tommaso, Pinoli, Pietro, Canakoglu, Arif
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9245902/
https://www.ncbi.nlm.nih.gov/pubmed/35822815
http://dx.doi.org/10.3390/biotech11010007
_version_ 1784738851615408128
author Alfonsi, Tommaso
Pinoli, Pietro
Canakoglu, Arif
author_facet Alfonsi, Tommaso
Pinoli, Pietro
Canakoglu, Arif
author_sort Alfonsi, Tommaso
collection PubMed
description With the spread of COVID-19, sequencing laboratories started to share hundreds of sequences daily. However, the lack of a commonly agreed standard across deposition databases hindered the exploration and study of all the viral sequences collected worldwide in a practical and homogeneous way. During the first months of the pandemic, we developed an automatic procedure to collect, transform, and integrate viral sequences of SARS-CoV-2, MERS, SARS-CoV, Ebola, and Dengue from four major database institutions (NCBI, COG-UK, GISAID, and NMDC). This data pipeline allowed the creation of the data exploration interfaces VirusViz and EpiSurf, as well as ViruSurf, one of the largest databases of integrated viral sequences. Almost two years after the first release of the repository, the original pipeline underwent a thorough refinement process and became more efficient, scalable, and general (currently, it also includes epitopes from the IEDB). Thanks to these improvements, we constantly update and expand our integrated repository, encompassing about 9.1 million SARS-CoV-2 sequences at present (March 2022). This pipeline made it possible to design and develop fundamental resources for any researcher interested in understanding the biological mechanisms behind the viral infection. In addition, it plays a crucial role in many analytic and visualization tools, such as ViruSurf, EpiSurf, VirusViz, and VirusLab.
format Online
Article
Text
id pubmed-9245902
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-92459022022-07-06 High Performance Integration Pipeline for Viral and Epitope Sequences Alfonsi, Tommaso Pinoli, Pietro Canakoglu, Arif BioTech (Basel) Article With the spread of COVID-19, sequencing laboratories started to share hundreds of sequences daily. However, the lack of a commonly agreed standard across deposition databases hindered the exploration and study of all the viral sequences collected worldwide in a practical and homogeneous way. During the first months of the pandemic, we developed an automatic procedure to collect, transform, and integrate viral sequences of SARS-CoV-2, MERS, SARS-CoV, Ebola, and Dengue from four major database institutions (NCBI, COG-UK, GISAID, and NMDC). This data pipeline allowed the creation of the data exploration interfaces VirusViz and EpiSurf, as well as ViruSurf, one of the largest databases of integrated viral sequences. Almost two years after the first release of the repository, the original pipeline underwent a thorough refinement process and became more efficient, scalable, and general (currently, it also includes epitopes from the IEDB). Thanks to these improvements, we constantly update and expand our integrated repository, encompassing about 9.1 million SARS-CoV-2 sequences at present (March 2022). This pipeline made it possible to design and develop fundamental resources for any researcher interested in understanding the biological mechanisms behind the viral infection. In addition, it plays a crucial role in many analytic and visualization tools, such as ViruSurf, EpiSurf, VirusViz, and VirusLab. MDPI 2022-03-21 /pmc/articles/PMC9245902/ /pubmed/35822815 http://dx.doi.org/10.3390/biotech11010007 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Alfonsi, Tommaso
Pinoli, Pietro
Canakoglu, Arif
High Performance Integration Pipeline for Viral and Epitope Sequences
title High Performance Integration Pipeline for Viral and Epitope Sequences
title_full High Performance Integration Pipeline for Viral and Epitope Sequences
title_fullStr High Performance Integration Pipeline for Viral and Epitope Sequences
title_full_unstemmed High Performance Integration Pipeline for Viral and Epitope Sequences
title_short High Performance Integration Pipeline for Viral and Epitope Sequences
title_sort high performance integration pipeline for viral and epitope sequences
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9245902/
https://www.ncbi.nlm.nih.gov/pubmed/35822815
http://dx.doi.org/10.3390/biotech11010007
work_keys_str_mv AT alfonsitommaso highperformanceintegrationpipelineforviralandepitopesequences
AT pinolipietro highperformanceintegrationpipelineforviralandepitopesequences
AT canakogluarif highperformanceintegrationpipelineforviralandepitopesequences