Cargando…
cath-resolve-hits: a new tool that resolves domain matches suspiciously quickly
MOTIVATION: Many bioinformatics areas require us to assign domain matches onto stretches of a query protein. Starting with a set of candidate matches, we want to identify the optimal subset that has limited/no overlap between matches. This may be further complicated by discontinuous domains in the i...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6513158/ https://www.ncbi.nlm.nih.gov/pubmed/30295745 http://dx.doi.org/10.1093/bioinformatics/bty863 |
_version_ | 1783417731175940096 |
---|---|
author | Lewis, T E Sillitoe, I Lees, J G |
author_facet | Lewis, T E Sillitoe, I Lees, J G |
author_sort | Lewis, T E |
collection | PubMed |
description | MOTIVATION: Many bioinformatics areas require us to assign domain matches onto stretches of a query protein. Starting with a set of candidate matches, we want to identify the optimal subset that has limited/no overlap between matches. This may be further complicated by discontinuous domains in the input data. Existing tools are increasingly facing very large data-sets for which they require prohibitive amounts of CPU-time and memory. RESULTS: We present cath-resolve-hits (CRH), a new tool that uses a dynamic-programming algorithm implemented in open-source C++ to handle large datasets quickly (up to ∼1 million hits/second) and in reasonable amounts of memory. It accepts multiple input formats and provides its output in plain text, JSON or graphical HTML. We describe a benchmark against an existing algorithm, which shows CRH delivers very similar or slightly improved results and very much improved CPU/memory performance on large datasets. AVAILABILITY AND IMPLEMENTATION: CRH is available at https://github.com/UCLOrengoGroup/cath-tools; documentation is available at http://cath-tools.readthedocs.io. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-6513158 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-65131582019-05-20 cath-resolve-hits: a new tool that resolves domain matches suspiciously quickly Lewis, T E Sillitoe, I Lees, J G Bioinformatics Applications Notes MOTIVATION: Many bioinformatics areas require us to assign domain matches onto stretches of a query protein. Starting with a set of candidate matches, we want to identify the optimal subset that has limited/no overlap between matches. This may be further complicated by discontinuous domains in the input data. Existing tools are increasingly facing very large data-sets for which they require prohibitive amounts of CPU-time and memory. RESULTS: We present cath-resolve-hits (CRH), a new tool that uses a dynamic-programming algorithm implemented in open-source C++ to handle large datasets quickly (up to ∼1 million hits/second) and in reasonable amounts of memory. It accepts multiple input formats and provides its output in plain text, JSON or graphical HTML. We describe a benchmark against an existing algorithm, which shows CRH delivers very similar or slightly improved results and very much improved CPU/memory performance on large datasets. AVAILABILITY AND IMPLEMENTATION: CRH is available at https://github.com/UCLOrengoGroup/cath-tools; documentation is available at http://cath-tools.readthedocs.io. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-05-15 2018-10-08 /pmc/articles/PMC6513158/ /pubmed/30295745 http://dx.doi.org/10.1093/bioinformatics/bty863 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Applications Notes Lewis, T E Sillitoe, I Lees, J G cath-resolve-hits: a new tool that resolves domain matches suspiciously quickly |
title | cath-resolve-hits: a new tool that resolves domain matches suspiciously quickly |
title_full | cath-resolve-hits: a new tool that resolves domain matches suspiciously quickly |
title_fullStr | cath-resolve-hits: a new tool that resolves domain matches suspiciously quickly |
title_full_unstemmed | cath-resolve-hits: a new tool that resolves domain matches suspiciously quickly |
title_short | cath-resolve-hits: a new tool that resolves domain matches suspiciously quickly |
title_sort | cath-resolve-hits: a new tool that resolves domain matches suspiciously quickly |
topic | Applications Notes |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6513158/ https://www.ncbi.nlm.nih.gov/pubmed/30295745 http://dx.doi.org/10.1093/bioinformatics/bty863 |
work_keys_str_mv | AT lewiste cathresolvehitsanewtoolthatresolvesdomainmatchessuspiciouslyquickly AT sillitoei cathresolvehitsanewtoolthatresolvesdomainmatchessuspiciouslyquickly AT leesjg cathresolvehitsanewtoolthatresolvesdomainmatchessuspiciouslyquickly |