Cargando…

Classification of Promoter Sequences from Human Genome

We have developed a new method for promoter sequence classification based on a genetic algorithm and the MAHDS sequence alignment method. We have created four classes of human promoters, combining 17,310 sequences out of the 29,598 present in the EPD database. We searched the human genome for potent...

Descripción completa

Detalles Bibliográficos
Autores principales: Zaytsev, Konstantin, Fedorov, Alexey, Korotkov, Eugene
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10454140/
https://www.ncbi.nlm.nih.gov/pubmed/37628742
http://dx.doi.org/10.3390/ijms241612561
_version_ 1785096116032765952
author Zaytsev, Konstantin
Fedorov, Alexey
Korotkov, Eugene
author_facet Zaytsev, Konstantin
Fedorov, Alexey
Korotkov, Eugene
author_sort Zaytsev, Konstantin
collection PubMed
description We have developed a new method for promoter sequence classification based on a genetic algorithm and the MAHDS sequence alignment method. We have created four classes of human promoters, combining 17,310 sequences out of the 29,598 present in the EPD database. We searched the human genome for potential promoter sequences (PPSs) using dynamic programming and position weight matrices representing each of the promoter sequence classes. A total of 3,065,317 potential promoter sequences were found. Only 1,241,206 of them were located in unannotated parts of the human genome. Every other PPS found intersected with either true promoters, transposable elements, or interspersed repeats. We found a strong intersection between PPSs and Alu elements as well as transcript start sites. The number of false positive PPSs is estimated to be 3 × 10(−8) per nucleotide, which is several orders of magnitude lower than for any other promoter prediction method. The developed method can be used to search for PPSs in various eukaryotic genomes.
format Online
Article
Text
id pubmed-10454140
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-104541402023-08-26 Classification of Promoter Sequences from Human Genome Zaytsev, Konstantin Fedorov, Alexey Korotkov, Eugene Int J Mol Sci Article We have developed a new method for promoter sequence classification based on a genetic algorithm and the MAHDS sequence alignment method. We have created four classes of human promoters, combining 17,310 sequences out of the 29,598 present in the EPD database. We searched the human genome for potential promoter sequences (PPSs) using dynamic programming and position weight matrices representing each of the promoter sequence classes. A total of 3,065,317 potential promoter sequences were found. Only 1,241,206 of them were located in unannotated parts of the human genome. Every other PPS found intersected with either true promoters, transposable elements, or interspersed repeats. We found a strong intersection between PPSs and Alu elements as well as transcript start sites. The number of false positive PPSs is estimated to be 3 × 10(−8) per nucleotide, which is several orders of magnitude lower than for any other promoter prediction method. The developed method can be used to search for PPSs in various eukaryotic genomes. MDPI 2023-08-08 /pmc/articles/PMC10454140/ /pubmed/37628742 http://dx.doi.org/10.3390/ijms241612561 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Zaytsev, Konstantin
Fedorov, Alexey
Korotkov, Eugene
Classification of Promoter Sequences from Human Genome
title Classification of Promoter Sequences from Human Genome
title_full Classification of Promoter Sequences from Human Genome
title_fullStr Classification of Promoter Sequences from Human Genome
title_full_unstemmed Classification of Promoter Sequences from Human Genome
title_short Classification of Promoter Sequences from Human Genome
title_sort classification of promoter sequences from human genome
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10454140/
https://www.ncbi.nlm.nih.gov/pubmed/37628742
http://dx.doi.org/10.3390/ijms241612561
work_keys_str_mv AT zaytsevkonstantin classificationofpromotersequencesfromhumangenome
AT fedorovalexey classificationofpromotersequencesfromhumangenome
AT korotkoveugene classificationofpromotersequencesfromhumangenome