Cargando…
Population-based change-point detection for the identification of homozygosity islands
MOTIVATION: This work is motivated by the problem of identifying homozygosity islands on the genome of individuals in a population. Our method directly tackles the issue of identification of the homozygosity islands at the population level, without the need of analysing single individuals and then c...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10112956/ https://www.ncbi.nlm.nih.gov/pubmed/37039826 http://dx.doi.org/10.1093/bioinformatics/btad170 |
_version_ | 1785027724707889152 |
---|---|
author | Prates, Lucas Lemes, Renan B Hünemeier, Tábita Leonardi, Florencia |
author_facet | Prates, Lucas Lemes, Renan B Hünemeier, Tábita Leonardi, Florencia |
author_sort | Prates, Lucas |
collection | PubMed |
description | MOTIVATION: This work is motivated by the problem of identifying homozygosity islands on the genome of individuals in a population. Our method directly tackles the issue of identification of the homozygosity islands at the population level, without the need of analysing single individuals and then combine the results, as is made nowadays in state-of-the-art approaches. RESULTS: We propose regularized offline change-point methods to detect changes in the parameters of a multidimensional distribution when we have several aligned, independent samples of fixed resolution. We present a penalized maximum likelihood approach that can be efficiently computed by a dynamic programming algorithm or approximated by a fast binary segmentation algorithm. Both estimators are shown to converge almost surely to the set of change-points without the need of specifying a priori the number of change-points. In simulation, we observed similar performances from the exact and greedy estimators. Moreover, we provide a new methodology for the selection of the regularization constant which has the advantage of being automatic, consistent, and less prone to subjective analysis. AVAILABILITY AND IMPLEMENTATION: The data used in the application are from the Human Genome Diversity Project (HGDP) and is publicly available. Algorithms were implemented using the R software R Core Team (R: A Language and Environment for Statistical Computing. Vienna (Austria): R Foundation for Statistical Computing, 2020.) in the R package blockcpd, found at https://github.com/Lucas-Prates/blockcpd. |
format | Online Article Text |
id | pubmed-10112956 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-101129562023-04-19 Population-based change-point detection for the identification of homozygosity islands Prates, Lucas Lemes, Renan B Hünemeier, Tábita Leonardi, Florencia Bioinformatics Original Paper MOTIVATION: This work is motivated by the problem of identifying homozygosity islands on the genome of individuals in a population. Our method directly tackles the issue of identification of the homozygosity islands at the population level, without the need of analysing single individuals and then combine the results, as is made nowadays in state-of-the-art approaches. RESULTS: We propose regularized offline change-point methods to detect changes in the parameters of a multidimensional distribution when we have several aligned, independent samples of fixed resolution. We present a penalized maximum likelihood approach that can be efficiently computed by a dynamic programming algorithm or approximated by a fast binary segmentation algorithm. Both estimators are shown to converge almost surely to the set of change-points without the need of specifying a priori the number of change-points. In simulation, we observed similar performances from the exact and greedy estimators. Moreover, we provide a new methodology for the selection of the regularization constant which has the advantage of being automatic, consistent, and less prone to subjective analysis. AVAILABILITY AND IMPLEMENTATION: The data used in the application are from the Human Genome Diversity Project (HGDP) and is publicly available. Algorithms were implemented using the R software R Core Team (R: A Language and Environment for Statistical Computing. Vienna (Austria): R Foundation for Statistical Computing, 2020.) in the R package blockcpd, found at https://github.com/Lucas-Prates/blockcpd. Oxford University Press 2023-04-11 /pmc/articles/PMC10112956/ /pubmed/37039826 http://dx.doi.org/10.1093/bioinformatics/btad170 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Paper Prates, Lucas Lemes, Renan B Hünemeier, Tábita Leonardi, Florencia Population-based change-point detection for the identification of homozygosity islands |
title | Population-based change-point detection for the identification of homozygosity islands |
title_full | Population-based change-point detection for the identification of homozygosity islands |
title_fullStr | Population-based change-point detection for the identification of homozygosity islands |
title_full_unstemmed | Population-based change-point detection for the identification of homozygosity islands |
title_short | Population-based change-point detection for the identification of homozygosity islands |
title_sort | population-based change-point detection for the identification of homozygosity islands |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10112956/ https://www.ncbi.nlm.nih.gov/pubmed/37039826 http://dx.doi.org/10.1093/bioinformatics/btad170 |
work_keys_str_mv | AT prateslucas populationbasedchangepointdetectionfortheidentificationofhomozygosityislands AT lemesrenanb populationbasedchangepointdetectionfortheidentificationofhomozygosityislands AT hunemeiertabita populationbasedchangepointdetectionfortheidentificationofhomozygosityislands AT leonardiflorencia populationbasedchangepointdetectionfortheidentificationofhomozygosityislands |