Cargando…

CoreCruncher: Fast and Robust Construction of Core Genomes in Large Prokaryotic Data Sets

The core genome represents the set of genes shared by all, or nearly all, strains of a given population or species of prokaryotes. Inferring the core genome is integral to many genomic analyses, however, most methods rely on the comparison of all the pairs of genomes; a step that is becoming increas...

Descripción completa

Detalles Bibliográficos
Autores principales: Harris, Connor D, Torrance, Ellis L, Raymann, Kasie, Bobay, Louis-Marie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7826169/
https://www.ncbi.nlm.nih.gov/pubmed/32886787
http://dx.doi.org/10.1093/molbev/msaa224
_version_ 1783640477068689408
author Harris, Connor D
Torrance, Ellis L
Raymann, Kasie
Bobay, Louis-Marie
author_facet Harris, Connor D
Torrance, Ellis L
Raymann, Kasie
Bobay, Louis-Marie
author_sort Harris, Connor D
collection PubMed
description The core genome represents the set of genes shared by all, or nearly all, strains of a given population or species of prokaryotes. Inferring the core genome is integral to many genomic analyses, however, most methods rely on the comparison of all the pairs of genomes; a step that is becoming increasingly difficult given the massive accumulation of genomic data. Here, we present CoreCruncher; a program that robustly and rapidly constructs core genomes across hundreds or thousands of genomes. CoreCruncher does not compute all pairwise genome comparisons and uses a heuristic based on the distributions of identity scores to classify sequences as orthologs or paralogs/xenologs. Although it is much faster than current methods, our results indicate that our approach is more conservative than other tools and less sensitive to the presence of paralogs and xenologs. CoreCruncher is freely available from: https://github.com/lbobay/CoreCruncher. CoreCruncher is written in Python 3.7 and can also run on Python 2.7 without modification. It requires the python library Numpy and either Usearch or Blast. Certain options require the programs muscle or mafft.
format Online
Article
Text
id pubmed-7826169
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-78261692021-01-27 CoreCruncher: Fast and Robust Construction of Core Genomes in Large Prokaryotic Data Sets Harris, Connor D Torrance, Ellis L Raymann, Kasie Bobay, Louis-Marie Mol Biol Evol Resources The core genome represents the set of genes shared by all, or nearly all, strains of a given population or species of prokaryotes. Inferring the core genome is integral to many genomic analyses, however, most methods rely on the comparison of all the pairs of genomes; a step that is becoming increasingly difficult given the massive accumulation of genomic data. Here, we present CoreCruncher; a program that robustly and rapidly constructs core genomes across hundreds or thousands of genomes. CoreCruncher does not compute all pairwise genome comparisons and uses a heuristic based on the distributions of identity scores to classify sequences as orthologs or paralogs/xenologs. Although it is much faster than current methods, our results indicate that our approach is more conservative than other tools and less sensitive to the presence of paralogs and xenologs. CoreCruncher is freely available from: https://github.com/lbobay/CoreCruncher. CoreCruncher is written in Python 3.7 and can also run on Python 2.7 without modification. It requires the python library Numpy and either Usearch or Blast. Certain options require the programs muscle or mafft. Oxford University Press 2020-09-04 /pmc/articles/PMC7826169/ /pubmed/32886787 http://dx.doi.org/10.1093/molbev/msaa224 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Resources
Harris, Connor D
Torrance, Ellis L
Raymann, Kasie
Bobay, Louis-Marie
CoreCruncher: Fast and Robust Construction of Core Genomes in Large Prokaryotic Data Sets
title CoreCruncher: Fast and Robust Construction of Core Genomes in Large Prokaryotic Data Sets
title_full CoreCruncher: Fast and Robust Construction of Core Genomes in Large Prokaryotic Data Sets
title_fullStr CoreCruncher: Fast and Robust Construction of Core Genomes in Large Prokaryotic Data Sets
title_full_unstemmed CoreCruncher: Fast and Robust Construction of Core Genomes in Large Prokaryotic Data Sets
title_short CoreCruncher: Fast and Robust Construction of Core Genomes in Large Prokaryotic Data Sets
title_sort corecruncher: fast and robust construction of core genomes in large prokaryotic data sets
topic Resources
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7826169/
https://www.ncbi.nlm.nih.gov/pubmed/32886787
http://dx.doi.org/10.1093/molbev/msaa224
work_keys_str_mv AT harrisconnord corecruncherfastandrobustconstructionofcoregenomesinlargeprokaryoticdatasets
AT torranceellisl corecruncherfastandrobustconstructionofcoregenomesinlargeprokaryoticdatasets
AT raymannkasie corecruncherfastandrobustconstructionofcoregenomesinlargeprokaryoticdatasets
AT bobaylouismarie corecruncherfastandrobustconstructionofcoregenomesinlargeprokaryoticdatasets