Cargando…

Use of ChIP-Seq data for the design of a multiple promoter-alignment method

We address the challenge of regulatory sequence alignment with a new method, Pro-Coffee, a multiple aligner specifically designed for homologous promoter regions. Pro-Coffee uses a dinucleotide substitution matrix estimated on alignments of functional binding sites from TRANSFAC. We designed a valid...

Descripción completa

Detalles Bibliográficos
Autores principales: Erb, Ionas, González-Vallinas, Juan R., Bussotti, Giovanni, Blanco, Enrique, Eyras, Eduardo, Notredame, Cédric
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3326335/
https://www.ncbi.nlm.nih.gov/pubmed/22230796
http://dx.doi.org/10.1093/nar/gkr1292
_version_ 1782229519321530368
author Erb, Ionas
González-Vallinas, Juan R.
Bussotti, Giovanni
Blanco, Enrique
Eyras, Eduardo
Notredame, Cédric
author_facet Erb, Ionas
González-Vallinas, Juan R.
Bussotti, Giovanni
Blanco, Enrique
Eyras, Eduardo
Notredame, Cédric
author_sort Erb, Ionas
collection PubMed
description We address the challenge of regulatory sequence alignment with a new method, Pro-Coffee, a multiple aligner specifically designed for homologous promoter regions. Pro-Coffee uses a dinucleotide substitution matrix estimated on alignments of functional binding sites from TRANSFAC. We designed a validation framework using several thousand families of orthologous promoters. This dataset was used to evaluate the accuracy for predicting true human orthologs among their paralogs. We found that whereas other methods achieve on average 73.5% accuracy, and 77.6% when trained on that same dataset, the figure goes up to 80.4% for Pro-Coffee. We then applied a novel validation procedure based on multi-species ChIP-seq data. Trained and untrained methods were tested for their capacity to correctly align experimentally detected binding sites. Whereas the average number of correctly aligned sites for two transcription factors is 284 for default methods and 316 for trained methods, Pro-Coffee achieves 331, 16.5% above the default average. We find a high correlation between a method's performance when classifying orthologs and its ability to correctly align proven binding sites. Not only has this interesting biological consequences, it also allows us to conclude that any method that is trained on the ortholog data set will result in functionally more informative alignments.
format Online
Article
Text
id pubmed-3326335
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-33263352012-04-16 Use of ChIP-Seq data for the design of a multiple promoter-alignment method Erb, Ionas González-Vallinas, Juan R. Bussotti, Giovanni Blanco, Enrique Eyras, Eduardo Notredame, Cédric Nucleic Acids Res Methods Online We address the challenge of regulatory sequence alignment with a new method, Pro-Coffee, a multiple aligner specifically designed for homologous promoter regions. Pro-Coffee uses a dinucleotide substitution matrix estimated on alignments of functional binding sites from TRANSFAC. We designed a validation framework using several thousand families of orthologous promoters. This dataset was used to evaluate the accuracy for predicting true human orthologs among their paralogs. We found that whereas other methods achieve on average 73.5% accuracy, and 77.6% when trained on that same dataset, the figure goes up to 80.4% for Pro-Coffee. We then applied a novel validation procedure based on multi-species ChIP-seq data. Trained and untrained methods were tested for their capacity to correctly align experimentally detected binding sites. Whereas the average number of correctly aligned sites for two transcription factors is 284 for default methods and 316 for trained methods, Pro-Coffee achieves 331, 16.5% above the default average. We find a high correlation between a method's performance when classifying orthologs and its ability to correctly align proven binding sites. Not only has this interesting biological consequences, it also allows us to conclude that any method that is trained on the ortholog data set will result in functionally more informative alignments. Oxford University Press 2012-04 2012-01-09 /pmc/articles/PMC3326335/ /pubmed/22230796 http://dx.doi.org/10.1093/nar/gkr1292 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Erb, Ionas
González-Vallinas, Juan R.
Bussotti, Giovanni
Blanco, Enrique
Eyras, Eduardo
Notredame, Cédric
Use of ChIP-Seq data for the design of a multiple promoter-alignment method
title Use of ChIP-Seq data for the design of a multiple promoter-alignment method
title_full Use of ChIP-Seq data for the design of a multiple promoter-alignment method
title_fullStr Use of ChIP-Seq data for the design of a multiple promoter-alignment method
title_full_unstemmed Use of ChIP-Seq data for the design of a multiple promoter-alignment method
title_short Use of ChIP-Seq data for the design of a multiple promoter-alignment method
title_sort use of chip-seq data for the design of a multiple promoter-alignment method
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3326335/
https://www.ncbi.nlm.nih.gov/pubmed/22230796
http://dx.doi.org/10.1093/nar/gkr1292
work_keys_str_mv AT erbionas useofchipseqdataforthedesignofamultiplepromoteralignmentmethod
AT gonzalezvallinasjuanr useofchipseqdataforthedesignofamultiplepromoteralignmentmethod
AT bussottigiovanni useofchipseqdataforthedesignofamultiplepromoteralignmentmethod
AT blancoenrique useofchipseqdataforthedesignofamultiplepromoteralignmentmethod
AT eyraseduardo useofchipseqdataforthedesignofamultiplepromoteralignmentmethod
AT notredamecedric useofchipseqdataforthedesignofamultiplepromoteralignmentmethod