Cargando…

WhatsGNU: a tool for identifying proteomic novelty

To understand diversity in enormous collections of genome sequences, we need computationally scalable tools that can quickly contextualize individual genomes based on their similarities and identify features of each genome that make them unique. We present WhatsGNU, a tool based on exact match prote...

Descripción completa

Detalles Bibliográficos
Autores principales: Moustafa, Ahmed M., Planet, Paul J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7059281/
https://www.ncbi.nlm.nih.gov/pubmed/32138767
http://dx.doi.org/10.1186/s13059-020-01965-w
Descripción
Sumario:To understand diversity in enormous collections of genome sequences, we need computationally scalable tools that can quickly contextualize individual genomes based on their similarities and identify features of each genome that make them unique. We present WhatsGNU, a tool based on exact match proteomic compression that, in seconds, classifies any new genome and provides a detailed report of protein alleles that may have novel functional differences. We use this technique to characterize the total allelic diversity (panallelome) of Salmonella enterica, Mycobacterium tuberculosis, Pseudomonas aeruginosa, and Staphylococcus aureus. It could be extended to others. WhatsGNU is available from https://github.com/ahmedmagds/WhatsGNU. ELECTRONIC SUPPLEMENTARY MATERIAL: Supplementary information accompanies this paper at 10.1186/s13059-020-01965-w.