Cargando…
An entropy-reducing data representation approach for bioinformatic data
Non-semantic approaches to bioinformatic data analysis have potential relevance where semantic resources such as annotated finished reference genomes are lacking, such as in the analysis and utilisation of growing amounts of sequence data from non-model organisms, often associated with sequence-base...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5887302/ https://www.ncbi.nlm.nih.gov/pubmed/29688382 http://dx.doi.org/10.1093/database/bay029 |
_version_ | 1783312270994964480 |
---|---|
author | McCulloch, Alan F Jauregui, Ruy Maclean, Paul H Ashby, Rachael L Moraga, Roger A Laugraud, Aurelie Brauning, Rudiger Dodds, Ken G McEwan, John C |
author_facet | McCulloch, Alan F Jauregui, Ruy Maclean, Paul H Ashby, Rachael L Moraga, Roger A Laugraud, Aurelie Brauning, Rudiger Dodds, Ken G McEwan, John C |
author_sort | McCulloch, Alan F |
collection | PubMed |
description | Non-semantic approaches to bioinformatic data analysis have potential relevance where semantic resources such as annotated finished reference genomes are lacking, such as in the analysis and utilisation of growing amounts of sequence data from non-model organisms, often associated with sequence-based agricultural, aqua-cultural and environmental sampling studies and commercial services. Even where rich semantic resources are available, semantic approaches to problems such as contrasting and comparing reference assemblies, and utilising multiple references in parallel to avoid reference bias, are costly and difficult to fully automate. We introduce and discuss a non-semantic data representation approach intended mainly for bioinformatic data called non-semantic labelling. Non-semantic labelling involves tensorially combining multiple kinds of model-based entropy-reducing data representation, with multiple representation models, so as to map both data and models into dual metric representation spaces, with goals of both reducing the statistical complexity of the data, and highlighting latent structure via machine learning and statistical analyses conducted within the dual representation spaces. As part of the framework, we introduce a novel algebraic abstraction of data representation mappings, and present four proof-of-concept examples of its application, to problems such as comparing and contrasting sequence assemblies, utilisation of multiple references for annotation and development of quality control diagnostics in a variety of high-throughput sequencing contexts. Database URL: https://github.com/AgResearch/data_prism |
format | Online Article Text |
id | pubmed-5887302 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-58873022018-04-11 An entropy-reducing data representation approach for bioinformatic data McCulloch, Alan F Jauregui, Ruy Maclean, Paul H Ashby, Rachael L Moraga, Roger A Laugraud, Aurelie Brauning, Rudiger Dodds, Ken G McEwan, John C Database (Oxford) Original Article Non-semantic approaches to bioinformatic data analysis have potential relevance where semantic resources such as annotated finished reference genomes are lacking, such as in the analysis and utilisation of growing amounts of sequence data from non-model organisms, often associated with sequence-based agricultural, aqua-cultural and environmental sampling studies and commercial services. Even where rich semantic resources are available, semantic approaches to problems such as contrasting and comparing reference assemblies, and utilising multiple references in parallel to avoid reference bias, are costly and difficult to fully automate. We introduce and discuss a non-semantic data representation approach intended mainly for bioinformatic data called non-semantic labelling. Non-semantic labelling involves tensorially combining multiple kinds of model-based entropy-reducing data representation, with multiple representation models, so as to map both data and models into dual metric representation spaces, with goals of both reducing the statistical complexity of the data, and highlighting latent structure via machine learning and statistical analyses conducted within the dual representation spaces. As part of the framework, we introduce a novel algebraic abstraction of data representation mappings, and present four proof-of-concept examples of its application, to problems such as comparing and contrasting sequence assemblies, utilisation of multiple references for annotation and development of quality control diagnostics in a variety of high-throughput sequencing contexts. Database URL: https://github.com/AgResearch/data_prism Oxford University Press 2018-04-05 /pmc/articles/PMC5887302/ /pubmed/29688382 http://dx.doi.org/10.1093/database/bay029 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Article McCulloch, Alan F Jauregui, Ruy Maclean, Paul H Ashby, Rachael L Moraga, Roger A Laugraud, Aurelie Brauning, Rudiger Dodds, Ken G McEwan, John C An entropy-reducing data representation approach for bioinformatic data |
title | An entropy-reducing data representation approach for bioinformatic data |
title_full | An entropy-reducing data representation approach for bioinformatic data |
title_fullStr | An entropy-reducing data representation approach for bioinformatic data |
title_full_unstemmed | An entropy-reducing data representation approach for bioinformatic data |
title_short | An entropy-reducing data representation approach for bioinformatic data |
title_sort | entropy-reducing data representation approach for bioinformatic data |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5887302/ https://www.ncbi.nlm.nih.gov/pubmed/29688382 http://dx.doi.org/10.1093/database/bay029 |
work_keys_str_mv | AT mccullochalanf anentropyreducingdatarepresentationapproachforbioinformaticdata AT jaureguiruy anentropyreducingdatarepresentationapproachforbioinformaticdata AT macleanpaulh anentropyreducingdatarepresentationapproachforbioinformaticdata AT ashbyrachaell anentropyreducingdatarepresentationapproachforbioinformaticdata AT moragarogera anentropyreducingdatarepresentationapproachforbioinformaticdata AT laugraudaurelie anentropyreducingdatarepresentationapproachforbioinformaticdata AT brauningrudiger anentropyreducingdatarepresentationapproachforbioinformaticdata AT doddskeng anentropyreducingdatarepresentationapproachforbioinformaticdata AT mcewanjohnc anentropyreducingdatarepresentationapproachforbioinformaticdata AT mccullochalanf entropyreducingdatarepresentationapproachforbioinformaticdata AT jaureguiruy entropyreducingdatarepresentationapproachforbioinformaticdata AT macleanpaulh entropyreducingdatarepresentationapproachforbioinformaticdata AT ashbyrachaell entropyreducingdatarepresentationapproachforbioinformaticdata AT moragarogera entropyreducingdatarepresentationapproachforbioinformaticdata AT laugraudaurelie entropyreducingdatarepresentationapproachforbioinformaticdata AT brauningrudiger entropyreducingdatarepresentationapproachforbioinformaticdata AT doddskeng entropyreducingdatarepresentationapproachforbioinformaticdata AT mcewanjohnc entropyreducingdatarepresentationapproachforbioinformaticdata |