Cargando…
NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses
Genetic and omics analyses frequently require independent observations, which is not guaranteed in real datasets. When relatedness cannot be accounted for, solutions involve removing related individuals (or observations) and, consequently, a reduction of available data. We developed a network-based...
Autores principales: | , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Research Network of Computational and Structural Biotechnology
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9046962/ https://www.ncbi.nlm.nih.gov/pubmed/35521552 http://dx.doi.org/10.1016/j.csbj.2022.04.009 |
_version_ | 1784695630834171904 |
---|---|
author | Leal, Thiago Peixoto Furlan, Vinicius C Gouveia, Mateus Henrique Saraiva Duarte, Julia Maria Fonseca, Pablo AS Tou, Rafael Scliar, Marilia de Oliveira Araujo, Gilderlanio Santana de Costa, Lucas F. Zolini, Camila Peixoto, Maria Gabriela Campolina Diniz Carvalho, Maria Raquel Santos Lima-Costa, Maria Fernanda Gilman, Robert H Tarazona-Santos, Eduardo Rodrigues, Maíra Ribeiro |
author_facet | Leal, Thiago Peixoto Furlan, Vinicius C Gouveia, Mateus Henrique Saraiva Duarte, Julia Maria Fonseca, Pablo AS Tou, Rafael Scliar, Marilia de Oliveira Araujo, Gilderlanio Santana de Costa, Lucas F. Zolini, Camila Peixoto, Maria Gabriela Campolina Diniz Carvalho, Maria Raquel Santos Lima-Costa, Maria Fernanda Gilman, Robert H Tarazona-Santos, Eduardo Rodrigues, Maíra Ribeiro |
author_sort | Leal, Thiago Peixoto |
collection | PubMed |
description | Genetic and omics analyses frequently require independent observations, which is not guaranteed in real datasets. When relatedness cannot be accounted for, solutions involve removing related individuals (or observations) and, consequently, a reduction of available data. We developed a network-based relatedness-pruning method that minimizes dataset reduction while removing unwanted relationships in a dataset. It uses node degree centrality metric to identify highly connected nodes (or individuals) and implements heuristics that approximate the minimal reduction of a dataset to allow its application to complex datasets. When compared with two other popular population genetics methodologies (PLINK and KING), NAToRA shows the best combination of removing all relatives while keeping the largest possible number of individuals in all datasets tested and also, with similar effects on the allele frequency spectrum and Principal Component Analysis than PLINK and KING. NAToRA is freely available, both as a standalone tool that can be easily incorporated as part of a pipeline, and as a graphical web tool that allows visualization of the relatedness networks. NAToRA also accepts a variety of relationship metrics as input, which facilitates its use. We also release a genealogies simulator software used for different tests performed in this study. |
format | Online Article Text |
id | pubmed-9046962 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Research Network of Computational and Structural Biotechnology |
record_format | MEDLINE/PubMed |
spelling | pubmed-90469622022-05-04 NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses Leal, Thiago Peixoto Furlan, Vinicius C Gouveia, Mateus Henrique Saraiva Duarte, Julia Maria Fonseca, Pablo AS Tou, Rafael Scliar, Marilia de Oliveira Araujo, Gilderlanio Santana de Costa, Lucas F. Zolini, Camila Peixoto, Maria Gabriela Campolina Diniz Carvalho, Maria Raquel Santos Lima-Costa, Maria Fernanda Gilman, Robert H Tarazona-Santos, Eduardo Rodrigues, Maíra Ribeiro Comput Struct Biotechnol J Method Article Genetic and omics analyses frequently require independent observations, which is not guaranteed in real datasets. When relatedness cannot be accounted for, solutions involve removing related individuals (or observations) and, consequently, a reduction of available data. We developed a network-based relatedness-pruning method that minimizes dataset reduction while removing unwanted relationships in a dataset. It uses node degree centrality metric to identify highly connected nodes (or individuals) and implements heuristics that approximate the minimal reduction of a dataset to allow its application to complex datasets. When compared with two other popular population genetics methodologies (PLINK and KING), NAToRA shows the best combination of removing all relatives while keeping the largest possible number of individuals in all datasets tested and also, with similar effects on the allele frequency spectrum and Principal Component Analysis than PLINK and KING. NAToRA is freely available, both as a standalone tool that can be easily incorporated as part of a pipeline, and as a graphical web tool that allows visualization of the relatedness networks. NAToRA also accepts a variety of relationship metrics as input, which facilitates its use. We also release a genealogies simulator software used for different tests performed in this study. Research Network of Computational and Structural Biotechnology 2022-04-09 /pmc/articles/PMC9046962/ /pubmed/35521552 http://dx.doi.org/10.1016/j.csbj.2022.04.009 Text en © 2022 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Method Article Leal, Thiago Peixoto Furlan, Vinicius C Gouveia, Mateus Henrique Saraiva Duarte, Julia Maria Fonseca, Pablo AS Tou, Rafael Scliar, Marilia de Oliveira Araujo, Gilderlanio Santana de Costa, Lucas F. Zolini, Camila Peixoto, Maria Gabriela Campolina Diniz Carvalho, Maria Raquel Santos Lima-Costa, Maria Fernanda Gilman, Robert H Tarazona-Santos, Eduardo Rodrigues, Maíra Ribeiro NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses |
title | NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses |
title_full | NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses |
title_fullStr | NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses |
title_full_unstemmed | NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses |
title_short | NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses |
title_sort | natora, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses |
topic | Method Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9046962/ https://www.ncbi.nlm.nih.gov/pubmed/35521552 http://dx.doi.org/10.1016/j.csbj.2022.04.009 |
work_keys_str_mv | AT lealthiagopeixoto natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses AT furlanviniciusc natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses AT gouveiamateushenrique natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses AT saraivaduartejuliamaria natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses AT fonsecapabloas natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses AT tourafael natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses AT scliarmariliadeoliveira natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses AT araujogilderlaniosantanade natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses AT costalucasf natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses AT zolinicamila natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses AT peixotomariagabrielacampolinadiniz natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses AT carvalhomariaraquelsantos natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses AT limacostamariafernanda natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses AT gilmanroberth natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses AT tarazonasantoseduardo natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses AT rodriguesmairaribeiro natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses |