Cargando…
Statistical Approaches to Use a Model Organism for Regulatory Sequences Annotation of Newly Sequenced Species
A major goal of bioinformatics is the characterization of transcription factors and the transcriptional programs they regulate. Given the speed of genome sequencing, we would like to quickly annotate regulatory sequences in newly-sequenced genomes. In such cases, it would be helpful to predict seque...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3439465/ https://www.ncbi.nlm.nih.gov/pubmed/22984403 http://dx.doi.org/10.1371/journal.pone.0042489 |
_version_ | 1782243014832291840 |
---|---|
author | Liò, Pietro Angelini, Claudia De Feis, Italia Nguyen, Viet-Anh |
author_facet | Liò, Pietro Angelini, Claudia De Feis, Italia Nguyen, Viet-Anh |
author_sort | Liò, Pietro |
collection | PubMed |
description | A major goal of bioinformatics is the characterization of transcription factors and the transcriptional programs they regulate. Given the speed of genome sequencing, we would like to quickly annotate regulatory sequences in newly-sequenced genomes. In such cases, it would be helpful to predict sequence motifs by using experimental data from closely related model organism. Here we present a general algorithm that allow to identify transcription factor binding sites in one newly sequenced species by performing Bayesian regression on the annotated species. First we set the rationale of our method by applying it within the same species, then we extend it to use data available in closely related species. Finally, we generalise the method to handle the case when a certain number of experiments, from several species close to the species on which to make inference, are available. In order to show the performance of the method, we analyse three functionally related networks in the Ascomycota. Two gene network case studies are related to the G2/M phase of the Ascomycota cell cycle; the third is related to morphogenesis. We also compared the method with MatrixReduce and discuss other types of validation and tests. The first network is well known and provides a biological validation test of the method. The two cell cycle case studies, where the gene network size is conserved, demonstrate an effective utility in annotating new species sequences using all the available replicas from model species. The third case, where the gene network size varies among species, shows that the combination of information is less powerful but is still informative. Our methodology is quite general and could be extended to integrate other high-throughput data from model organisms. |
format | Online Article Text |
id | pubmed-3439465 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-34394652012-09-14 Statistical Approaches to Use a Model Organism for Regulatory Sequences Annotation of Newly Sequenced Species Liò, Pietro Angelini, Claudia De Feis, Italia Nguyen, Viet-Anh PLoS One Research Article A major goal of bioinformatics is the characterization of transcription factors and the transcriptional programs they regulate. Given the speed of genome sequencing, we would like to quickly annotate regulatory sequences in newly-sequenced genomes. In such cases, it would be helpful to predict sequence motifs by using experimental data from closely related model organism. Here we present a general algorithm that allow to identify transcription factor binding sites in one newly sequenced species by performing Bayesian regression on the annotated species. First we set the rationale of our method by applying it within the same species, then we extend it to use data available in closely related species. Finally, we generalise the method to handle the case when a certain number of experiments, from several species close to the species on which to make inference, are available. In order to show the performance of the method, we analyse three functionally related networks in the Ascomycota. Two gene network case studies are related to the G2/M phase of the Ascomycota cell cycle; the third is related to morphogenesis. We also compared the method with MatrixReduce and discuss other types of validation and tests. The first network is well known and provides a biological validation test of the method. The two cell cycle case studies, where the gene network size is conserved, demonstrate an effective utility in annotating new species sequences using all the available replicas from model species. The third case, where the gene network size varies among species, shows that the combination of information is less powerful but is still informative. Our methodology is quite general and could be extended to integrate other high-throughput data from model organisms. Public Library of Science 2012-09-11 /pmc/articles/PMC3439465/ /pubmed/22984403 http://dx.doi.org/10.1371/journal.pone.0042489 Text en © 2012 Liò et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Liò, Pietro Angelini, Claudia De Feis, Italia Nguyen, Viet-Anh Statistical Approaches to Use a Model Organism for Regulatory Sequences Annotation of Newly Sequenced Species |
title | Statistical Approaches to Use a Model Organism for Regulatory Sequences Annotation of Newly Sequenced Species |
title_full | Statistical Approaches to Use a Model Organism for Regulatory Sequences Annotation of Newly Sequenced Species |
title_fullStr | Statistical Approaches to Use a Model Organism for Regulatory Sequences Annotation of Newly Sequenced Species |
title_full_unstemmed | Statistical Approaches to Use a Model Organism for Regulatory Sequences Annotation of Newly Sequenced Species |
title_short | Statistical Approaches to Use a Model Organism for Regulatory Sequences Annotation of Newly Sequenced Species |
title_sort | statistical approaches to use a model organism for regulatory sequences annotation of newly sequenced species |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3439465/ https://www.ncbi.nlm.nih.gov/pubmed/22984403 http://dx.doi.org/10.1371/journal.pone.0042489 |
work_keys_str_mv | AT liopietro statisticalapproachestouseamodelorganismforregulatorysequencesannotationofnewlysequencedspecies AT angeliniclaudia statisticalapproachestouseamodelorganismforregulatorysequencesannotationofnewlysequencedspecies AT defeisitalia statisticalapproachestouseamodelorganismforregulatorysequencesannotationofnewlysequencedspecies AT nguyenvietanh statisticalapproachestouseamodelorganismforregulatorysequencesannotationofnewlysequencedspecies |