Cargando…
Use of ChIP-Seq data for the design of a multiple promoter-alignment method
We address the challenge of regulatory sequence alignment with a new method, Pro-Coffee, a multiple aligner specifically designed for homologous promoter regions. Pro-Coffee uses a dinucleotide substitution matrix estimated on alignments of functional binding sites from TRANSFAC. We designed a valid...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3326335/ https://www.ncbi.nlm.nih.gov/pubmed/22230796 http://dx.doi.org/10.1093/nar/gkr1292 |
_version_ | 1782229519321530368 |
---|---|
author | Erb, Ionas González-Vallinas, Juan R. Bussotti, Giovanni Blanco, Enrique Eyras, Eduardo Notredame, Cédric |
author_facet | Erb, Ionas González-Vallinas, Juan R. Bussotti, Giovanni Blanco, Enrique Eyras, Eduardo Notredame, Cédric |
author_sort | Erb, Ionas |
collection | PubMed |
description | We address the challenge of regulatory sequence alignment with a new method, Pro-Coffee, a multiple aligner specifically designed for homologous promoter regions. Pro-Coffee uses a dinucleotide substitution matrix estimated on alignments of functional binding sites from TRANSFAC. We designed a validation framework using several thousand families of orthologous promoters. This dataset was used to evaluate the accuracy for predicting true human orthologs among their paralogs. We found that whereas other methods achieve on average 73.5% accuracy, and 77.6% when trained on that same dataset, the figure goes up to 80.4% for Pro-Coffee. We then applied a novel validation procedure based on multi-species ChIP-seq data. Trained and untrained methods were tested for their capacity to correctly align experimentally detected binding sites. Whereas the average number of correctly aligned sites for two transcription factors is 284 for default methods and 316 for trained methods, Pro-Coffee achieves 331, 16.5% above the default average. We find a high correlation between a method's performance when classifying orthologs and its ability to correctly align proven binding sites. Not only has this interesting biological consequences, it also allows us to conclude that any method that is trained on the ortholog data set will result in functionally more informative alignments. |
format | Online Article Text |
id | pubmed-3326335 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-33263352012-04-16 Use of ChIP-Seq data for the design of a multiple promoter-alignment method Erb, Ionas González-Vallinas, Juan R. Bussotti, Giovanni Blanco, Enrique Eyras, Eduardo Notredame, Cédric Nucleic Acids Res Methods Online We address the challenge of regulatory sequence alignment with a new method, Pro-Coffee, a multiple aligner specifically designed for homologous promoter regions. Pro-Coffee uses a dinucleotide substitution matrix estimated on alignments of functional binding sites from TRANSFAC. We designed a validation framework using several thousand families of orthologous promoters. This dataset was used to evaluate the accuracy for predicting true human orthologs among their paralogs. We found that whereas other methods achieve on average 73.5% accuracy, and 77.6% when trained on that same dataset, the figure goes up to 80.4% for Pro-Coffee. We then applied a novel validation procedure based on multi-species ChIP-seq data. Trained and untrained methods were tested for their capacity to correctly align experimentally detected binding sites. Whereas the average number of correctly aligned sites for two transcription factors is 284 for default methods and 316 for trained methods, Pro-Coffee achieves 331, 16.5% above the default average. We find a high correlation between a method's performance when classifying orthologs and its ability to correctly align proven binding sites. Not only has this interesting biological consequences, it also allows us to conclude that any method that is trained on the ortholog data set will result in functionally more informative alignments. Oxford University Press 2012-04 2012-01-09 /pmc/articles/PMC3326335/ /pubmed/22230796 http://dx.doi.org/10.1093/nar/gkr1292 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Online Erb, Ionas González-Vallinas, Juan R. Bussotti, Giovanni Blanco, Enrique Eyras, Eduardo Notredame, Cédric Use of ChIP-Seq data for the design of a multiple promoter-alignment method |
title | Use of ChIP-Seq data for the design of a multiple promoter-alignment method |
title_full | Use of ChIP-Seq data for the design of a multiple promoter-alignment method |
title_fullStr | Use of ChIP-Seq data for the design of a multiple promoter-alignment method |
title_full_unstemmed | Use of ChIP-Seq data for the design of a multiple promoter-alignment method |
title_short | Use of ChIP-Seq data for the design of a multiple promoter-alignment method |
title_sort | use of chip-seq data for the design of a multiple promoter-alignment method |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3326335/ https://www.ncbi.nlm.nih.gov/pubmed/22230796 http://dx.doi.org/10.1093/nar/gkr1292 |
work_keys_str_mv | AT erbionas useofchipseqdataforthedesignofamultiplepromoteralignmentmethod AT gonzalezvallinasjuanr useofchipseqdataforthedesignofamultiplepromoteralignmentmethod AT bussottigiovanni useofchipseqdataforthedesignofamultiplepromoteralignmentmethod AT blancoenrique useofchipseqdataforthedesignofamultiplepromoteralignmentmethod AT eyraseduardo useofchipseqdataforthedesignofamultiplepromoteralignmentmethod AT notredamecedric useofchipseqdataforthedesignofamultiplepromoteralignmentmethod |