Cargando…

An entropy-reducing data representation approach for bioinformatic data

Non-semantic approaches to bioinformatic data analysis have potential relevance where semantic resources such as annotated finished reference genomes are lacking, such as in the analysis and utilisation of growing amounts of sequence data from non-model organisms, often associated with sequence-base...

Descripción completa

Detalles Bibliográficos
Autores principales:	McCulloch, Alan F, Jauregui, Ruy, Maclean, Paul H, Ashby, Rachael L, Moraga, Roger A, Laugraud, Aurelie, Brauning, Rudiger, Dodds, Ken G, McEwan, John C
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2018
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5887302/ https://www.ncbi.nlm.nih.gov/pubmed/29688382 http://dx.doi.org/10.1093/database/bay029

_version_	1783312270994964480
author	McCulloch, Alan F Jauregui, Ruy Maclean, Paul H Ashby, Rachael L Moraga, Roger A Laugraud, Aurelie Brauning, Rudiger Dodds, Ken G McEwan, John C
author_facet	McCulloch, Alan F Jauregui, Ruy Maclean, Paul H Ashby, Rachael L Moraga, Roger A Laugraud, Aurelie Brauning, Rudiger Dodds, Ken G McEwan, John C
author_sort	McCulloch, Alan F
collection	PubMed
description	Non-semantic approaches to bioinformatic data analysis have potential relevance where semantic resources such as annotated finished reference genomes are lacking, such as in the analysis and utilisation of growing amounts of sequence data from non-model organisms, often associated with sequence-based agricultural, aqua-cultural and environmental sampling studies and commercial services. Even where rich semantic resources are available, semantic approaches to problems such as contrasting and comparing reference assemblies, and utilising multiple references in parallel to avoid reference bias, are costly and difficult to fully automate. We introduce and discuss a non-semantic data representation approach intended mainly for bioinformatic data called non-semantic labelling. Non-semantic labelling involves tensorially combining multiple kinds of model-based entropy-reducing data representation, with multiple representation models, so as to map both data and models into dual metric representation spaces, with goals of both reducing the statistical complexity of the data, and highlighting latent structure via machine learning and statistical analyses conducted within the dual representation spaces. As part of the framework, we introduce a novel algebraic abstraction of data representation mappings, and present four proof-of-concept examples of its application, to problems such as comparing and contrasting sequence assemblies, utilisation of multiple references for annotation and development of quality control diagnostics in a variety of high-throughput sequencing contexts. Database URL: https://github.com/AgResearch/data_prism
format	Online Article Text
id	pubmed-5887302
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-58873022018-04-11 An entropy-reducing data representation approach for bioinformatic data McCulloch, Alan F Jauregui, Ruy Maclean, Paul H Ashby, Rachael L Moraga, Roger A Laugraud, Aurelie Brauning, Rudiger Dodds, Ken G McEwan, John C Database (Oxford) Original Article Non-semantic approaches to bioinformatic data analysis have potential relevance where semantic resources such as annotated finished reference genomes are lacking, such as in the analysis and utilisation of growing amounts of sequence data from non-model organisms, often associated with sequence-based agricultural, aqua-cultural and environmental sampling studies and commercial services. Even where rich semantic resources are available, semantic approaches to problems such as contrasting and comparing reference assemblies, and utilising multiple references in parallel to avoid reference bias, are costly and difficult to fully automate. We introduce and discuss a non-semantic data representation approach intended mainly for bioinformatic data called non-semantic labelling. Non-semantic labelling involves tensorially combining multiple kinds of model-based entropy-reducing data representation, with multiple representation models, so as to map both data and models into dual metric representation spaces, with goals of both reducing the statistical complexity of the data, and highlighting latent structure via machine learning and statistical analyses conducted within the dual representation spaces. As part of the framework, we introduce a novel algebraic abstraction of data representation mappings, and present four proof-of-concept examples of its application, to problems such as comparing and contrasting sequence assemblies, utilisation of multiple references for annotation and development of quality control diagnostics in a variety of high-throughput sequencing contexts. Database URL: https://github.com/AgResearch/data_prism Oxford University Press 2018-04-05 /pmc/articles/PMC5887302/ /pubmed/29688382 http://dx.doi.org/10.1093/database/bay029 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Article McCulloch, Alan F Jauregui, Ruy Maclean, Paul H Ashby, Rachael L Moraga, Roger A Laugraud, Aurelie Brauning, Rudiger Dodds, Ken G McEwan, John C An entropy-reducing data representation approach for bioinformatic data
title	An entropy-reducing data representation approach for bioinformatic data
title_full	An entropy-reducing data representation approach for bioinformatic data
title_fullStr	An entropy-reducing data representation approach for bioinformatic data
title_full_unstemmed	An entropy-reducing data representation approach for bioinformatic data
title_short	An entropy-reducing data representation approach for bioinformatic data
title_sort	entropy-reducing data representation approach for bioinformatic data
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5887302/ https://www.ncbi.nlm.nih.gov/pubmed/29688382 http://dx.doi.org/10.1093/database/bay029
work_keys_str_mv	AT mccullochalanf anentropyreducingdatarepresentationapproachforbioinformaticdata AT jaureguiruy anentropyreducingdatarepresentationapproachforbioinformaticdata AT macleanpaulh anentropyreducingdatarepresentationapproachforbioinformaticdata AT ashbyrachaell anentropyreducingdatarepresentationapproachforbioinformaticdata AT moragarogera anentropyreducingdatarepresentationapproachforbioinformaticdata AT laugraudaurelie anentropyreducingdatarepresentationapproachforbioinformaticdata AT brauningrudiger anentropyreducingdatarepresentationapproachforbioinformaticdata AT doddskeng anentropyreducingdatarepresentationapproachforbioinformaticdata AT mcewanjohnc anentropyreducingdatarepresentationapproachforbioinformaticdata AT mccullochalanf entropyreducingdatarepresentationapproachforbioinformaticdata AT jaureguiruy entropyreducingdatarepresentationapproachforbioinformaticdata AT macleanpaulh entropyreducingdatarepresentationapproachforbioinformaticdata AT ashbyrachaell entropyreducingdatarepresentationapproachforbioinformaticdata AT moragarogera entropyreducingdatarepresentationapproachforbioinformaticdata AT laugraudaurelie entropyreducingdatarepresentationapproachforbioinformaticdata AT brauningrudiger entropyreducingdatarepresentationapproachforbioinformaticdata AT doddskeng entropyreducingdatarepresentationapproachforbioinformaticdata AT mcewanjohnc entropyreducingdatarepresentationapproachforbioinformaticdata

An entropy-reducing data representation approach for bioinformatic data

Ejemplares similares