Cargando…

Population-based change-point detection for the identification of homozygosity islands

MOTIVATION: This work is motivated by the problem of identifying homozygosity islands on the genome of individuals in a population. Our method directly tackles the issue of identification of the homozygosity islands at the population level, without the need of analysing single individuals and then c...

Descripción completa

Detalles Bibliográficos
Autores principales: Prates, Lucas, Lemes, Renan B, Hünemeier, Tábita, Leonardi, Florencia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10112956/
https://www.ncbi.nlm.nih.gov/pubmed/37039826
http://dx.doi.org/10.1093/bioinformatics/btad170
_version_ 1785027724707889152
author Prates, Lucas
Lemes, Renan B
Hünemeier, Tábita
Leonardi, Florencia
author_facet Prates, Lucas
Lemes, Renan B
Hünemeier, Tábita
Leonardi, Florencia
author_sort Prates, Lucas
collection PubMed
description MOTIVATION: This work is motivated by the problem of identifying homozygosity islands on the genome of individuals in a population. Our method directly tackles the issue of identification of the homozygosity islands at the population level, without the need of analysing single individuals and then combine the results, as is made nowadays in state-of-the-art approaches. RESULTS: We propose regularized offline change-point methods to detect changes in the parameters of a multidimensional distribution when we have several aligned, independent samples of fixed resolution. We present a penalized maximum likelihood approach that can be efficiently computed by a dynamic programming algorithm or approximated by a fast binary segmentation algorithm. Both estimators are shown to converge almost surely to the set of change-points without the need of specifying a priori the number of change-points. In simulation, we observed similar performances from the exact and greedy estimators. Moreover, we provide a new methodology for the selection of the regularization constant which has the advantage of being automatic, consistent, and less prone to subjective analysis. AVAILABILITY AND IMPLEMENTATION: The data used in the application are from the Human Genome Diversity Project (HGDP) and is publicly available. Algorithms were implemented using the R software R Core Team (R: A Language and Environment for Statistical Computing. Vienna (Austria): R Foundation for Statistical Computing, 2020.) in the R package blockcpd, found at https://github.com/Lucas-Prates/blockcpd.
format Online
Article
Text
id pubmed-10112956
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-101129562023-04-19 Population-based change-point detection for the identification of homozygosity islands Prates, Lucas Lemes, Renan B Hünemeier, Tábita Leonardi, Florencia Bioinformatics Original Paper MOTIVATION: This work is motivated by the problem of identifying homozygosity islands on the genome of individuals in a population. Our method directly tackles the issue of identification of the homozygosity islands at the population level, without the need of analysing single individuals and then combine the results, as is made nowadays in state-of-the-art approaches. RESULTS: We propose regularized offline change-point methods to detect changes in the parameters of a multidimensional distribution when we have several aligned, independent samples of fixed resolution. We present a penalized maximum likelihood approach that can be efficiently computed by a dynamic programming algorithm or approximated by a fast binary segmentation algorithm. Both estimators are shown to converge almost surely to the set of change-points without the need of specifying a priori the number of change-points. In simulation, we observed similar performances from the exact and greedy estimators. Moreover, we provide a new methodology for the selection of the regularization constant which has the advantage of being automatic, consistent, and less prone to subjective analysis. AVAILABILITY AND IMPLEMENTATION: The data used in the application are from the Human Genome Diversity Project (HGDP) and is publicly available. Algorithms were implemented using the R software R Core Team (R: A Language and Environment for Statistical Computing. Vienna (Austria): R Foundation for Statistical Computing, 2020.) in the R package blockcpd, found at https://github.com/Lucas-Prates/blockcpd. Oxford University Press 2023-04-11 /pmc/articles/PMC10112956/ /pubmed/37039826 http://dx.doi.org/10.1093/bioinformatics/btad170 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Prates, Lucas
Lemes, Renan B
Hünemeier, Tábita
Leonardi, Florencia
Population-based change-point detection for the identification of homozygosity islands
title Population-based change-point detection for the identification of homozygosity islands
title_full Population-based change-point detection for the identification of homozygosity islands
title_fullStr Population-based change-point detection for the identification of homozygosity islands
title_full_unstemmed Population-based change-point detection for the identification of homozygosity islands
title_short Population-based change-point detection for the identification of homozygosity islands
title_sort population-based change-point detection for the identification of homozygosity islands
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10112956/
https://www.ncbi.nlm.nih.gov/pubmed/37039826
http://dx.doi.org/10.1093/bioinformatics/btad170
work_keys_str_mv AT prateslucas populationbasedchangepointdetectionfortheidentificationofhomozygosityislands
AT lemesrenanb populationbasedchangepointdetectionfortheidentificationofhomozygosityislands
AT hunemeiertabita populationbasedchangepointdetectionfortheidentificationofhomozygosityislands
AT leonardiflorencia populationbasedchangepointdetectionfortheidentificationofhomozygosityislands