Cargando…
ProdMX: Rapid query and analysis of protein functional domain based on compressed sparse matrices
Large-scale protein analysis has been used to characterize large numbers of proteins across numerous species. One of the applications is to use as a high-throughput screening method for pathogenicity of genomes. Unlike sequence homology methods, protein comparison at a functional level provides us w...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Research Network of Computational and Structural Biotechnology
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7719867/ https://www.ncbi.nlm.nih.gov/pubmed/33335686 http://dx.doi.org/10.1016/j.csbj.2020.10.023 |
_version_ | 1783619765784281088 |
---|---|
author | Wanchai, Visanu Nookaew, Intawat Ussery, David W. |
author_facet | Wanchai, Visanu Nookaew, Intawat Ussery, David W. |
author_sort | Wanchai, Visanu |
collection | PubMed |
description | Large-scale protein analysis has been used to characterize large numbers of proteins across numerous species. One of the applications is to use as a high-throughput screening method for pathogenicity of genomes. Unlike sequence homology methods, protein comparison at a functional level provides us with a unique opportunity to classify proteins, based on their functional structures without dealing with sequence complexity of distantly related species. Protein functions can be abstractly described by a set of protein functional domains, such as PfamA domains; a set of genomes can then be mapped to a matrix, with each row representing a genome, and the columns representing the presence or absence of a given functional domain. However, a powerful tool is needed to analyze the large sparse matrices generated by millions of genomes that will become available in the near future. The ProdMX is a tool with user-friendly utilities developed to facilitate high-throughput analysis of proteins with an ability to be included as an effective module in the high-throughput pipeline. The ProdMX employs a compressed sparse matrix algorithm to reduce computational resources and time used to perform the matrix manipulation during functional domain analysis. The ProdMX is a free and publicly available Python package which can be installed with popular package mangers such as PyPI and Conda, or with a standard installer from source code available on the ProdMX GitHub repository at https://github.com/visanuwan/prodmx. |
format | Online Article Text |
id | pubmed-7719867 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Research Network of Computational and Structural Biotechnology |
record_format | MEDLINE/PubMed |
spelling | pubmed-77198672020-12-16 ProdMX: Rapid query and analysis of protein functional domain based on compressed sparse matrices Wanchai, Visanu Nookaew, Intawat Ussery, David W. Comput Struct Biotechnol J Research Article Large-scale protein analysis has been used to characterize large numbers of proteins across numerous species. One of the applications is to use as a high-throughput screening method for pathogenicity of genomes. Unlike sequence homology methods, protein comparison at a functional level provides us with a unique opportunity to classify proteins, based on their functional structures without dealing with sequence complexity of distantly related species. Protein functions can be abstractly described by a set of protein functional domains, such as PfamA domains; a set of genomes can then be mapped to a matrix, with each row representing a genome, and the columns representing the presence or absence of a given functional domain. However, a powerful tool is needed to analyze the large sparse matrices generated by millions of genomes that will become available in the near future. The ProdMX is a tool with user-friendly utilities developed to facilitate high-throughput analysis of proteins with an ability to be included as an effective module in the high-throughput pipeline. The ProdMX employs a compressed sparse matrix algorithm to reduce computational resources and time used to perform the matrix manipulation during functional domain analysis. The ProdMX is a free and publicly available Python package which can be installed with popular package mangers such as PyPI and Conda, or with a standard installer from source code available on the ProdMX GitHub repository at https://github.com/visanuwan/prodmx. Research Network of Computational and Structural Biotechnology 2020-11-24 /pmc/articles/PMC7719867/ /pubmed/33335686 http://dx.doi.org/10.1016/j.csbj.2020.10.023 Text en http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Research Article Wanchai, Visanu Nookaew, Intawat Ussery, David W. ProdMX: Rapid query and analysis of protein functional domain based on compressed sparse matrices |
title | ProdMX: Rapid query and analysis of protein functional domain based on compressed sparse matrices |
title_full | ProdMX: Rapid query and analysis of protein functional domain based on compressed sparse matrices |
title_fullStr | ProdMX: Rapid query and analysis of protein functional domain based on compressed sparse matrices |
title_full_unstemmed | ProdMX: Rapid query and analysis of protein functional domain based on compressed sparse matrices |
title_short | ProdMX: Rapid query and analysis of protein functional domain based on compressed sparse matrices |
title_sort | prodmx: rapid query and analysis of protein functional domain based on compressed sparse matrices |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7719867/ https://www.ncbi.nlm.nih.gov/pubmed/33335686 http://dx.doi.org/10.1016/j.csbj.2020.10.023 |
work_keys_str_mv | AT wanchaivisanu prodmxrapidqueryandanalysisofproteinfunctionaldomainbasedoncompressedsparsematrices AT nookaewintawat prodmxrapidqueryandanalysisofproteinfunctionaldomainbasedoncompressedsparsematrices AT usserydavidw prodmxrapidqueryandanalysisofproteinfunctionaldomainbasedoncompressedsparsematrices |