Cargando…

NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses

Genetic and omics analyses frequently require independent observations, which is not guaranteed in real datasets. When relatedness cannot be accounted for, solutions involve removing related individuals (or observations) and, consequently, a reduction of available data. We developed a network-based...

Descripción completa

Detalles Bibliográficos
Autores principales: Leal, Thiago Peixoto, Furlan, Vinicius C, Gouveia, Mateus Henrique, Saraiva Duarte, Julia Maria, Fonseca, Pablo AS, Tou, Rafael, Scliar, Marilia de Oliveira, Araujo, Gilderlanio Santana de, Costa, Lucas F., Zolini, Camila, Peixoto, Maria Gabriela Campolina Diniz, Carvalho, Maria Raquel Santos, Lima-Costa, Maria Fernanda, Gilman, Robert H, Tarazona-Santos, Eduardo, Rodrigues, Maíra Ribeiro
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9046962/
https://www.ncbi.nlm.nih.gov/pubmed/35521552
http://dx.doi.org/10.1016/j.csbj.2022.04.009
_version_ 1784695630834171904
author Leal, Thiago Peixoto
Furlan, Vinicius C
Gouveia, Mateus Henrique
Saraiva Duarte, Julia Maria
Fonseca, Pablo AS
Tou, Rafael
Scliar, Marilia de Oliveira
Araujo, Gilderlanio Santana de
Costa, Lucas F.
Zolini, Camila
Peixoto, Maria Gabriela Campolina Diniz
Carvalho, Maria Raquel Santos
Lima-Costa, Maria Fernanda
Gilman, Robert H
Tarazona-Santos, Eduardo
Rodrigues, Maíra Ribeiro
author_facet Leal, Thiago Peixoto
Furlan, Vinicius C
Gouveia, Mateus Henrique
Saraiva Duarte, Julia Maria
Fonseca, Pablo AS
Tou, Rafael
Scliar, Marilia de Oliveira
Araujo, Gilderlanio Santana de
Costa, Lucas F.
Zolini, Camila
Peixoto, Maria Gabriela Campolina Diniz
Carvalho, Maria Raquel Santos
Lima-Costa, Maria Fernanda
Gilman, Robert H
Tarazona-Santos, Eduardo
Rodrigues, Maíra Ribeiro
author_sort Leal, Thiago Peixoto
collection PubMed
description Genetic and omics analyses frequently require independent observations, which is not guaranteed in real datasets. When relatedness cannot be accounted for, solutions involve removing related individuals (or observations) and, consequently, a reduction of available data. We developed a network-based relatedness-pruning method that minimizes dataset reduction while removing unwanted relationships in a dataset. It uses node degree centrality metric to identify highly connected nodes (or individuals) and implements heuristics that approximate the minimal reduction of a dataset to allow its application to complex datasets. When compared with two other popular population genetics methodologies (PLINK and KING), NAToRA shows the best combination of removing all relatives while keeping the largest possible number of individuals in all datasets tested and also, with similar effects on the allele frequency spectrum and Principal Component Analysis than PLINK and KING. NAToRA is freely available, both as a standalone tool that can be easily incorporated as part of a pipeline, and as a graphical web tool that allows visualization of the relatedness networks. NAToRA also accepts a variety of relationship metrics as input, which facilitates its use. We also release a genealogies simulator software used for different tests performed in this study.
format Online
Article
Text
id pubmed-9046962
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-90469622022-05-04 NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses Leal, Thiago Peixoto Furlan, Vinicius C Gouveia, Mateus Henrique Saraiva Duarte, Julia Maria Fonseca, Pablo AS Tou, Rafael Scliar, Marilia de Oliveira Araujo, Gilderlanio Santana de Costa, Lucas F. Zolini, Camila Peixoto, Maria Gabriela Campolina Diniz Carvalho, Maria Raquel Santos Lima-Costa, Maria Fernanda Gilman, Robert H Tarazona-Santos, Eduardo Rodrigues, Maíra Ribeiro Comput Struct Biotechnol J Method Article Genetic and omics analyses frequently require independent observations, which is not guaranteed in real datasets. When relatedness cannot be accounted for, solutions involve removing related individuals (or observations) and, consequently, a reduction of available data. We developed a network-based relatedness-pruning method that minimizes dataset reduction while removing unwanted relationships in a dataset. It uses node degree centrality metric to identify highly connected nodes (or individuals) and implements heuristics that approximate the minimal reduction of a dataset to allow its application to complex datasets. When compared with two other popular population genetics methodologies (PLINK and KING), NAToRA shows the best combination of removing all relatives while keeping the largest possible number of individuals in all datasets tested and also, with similar effects on the allele frequency spectrum and Principal Component Analysis than PLINK and KING. NAToRA is freely available, both as a standalone tool that can be easily incorporated as part of a pipeline, and as a graphical web tool that allows visualization of the relatedness networks. NAToRA also accepts a variety of relationship metrics as input, which facilitates its use. We also release a genealogies simulator software used for different tests performed in this study. Research Network of Computational and Structural Biotechnology 2022-04-09 /pmc/articles/PMC9046962/ /pubmed/35521552 http://dx.doi.org/10.1016/j.csbj.2022.04.009 Text en © 2022 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Method Article
Leal, Thiago Peixoto
Furlan, Vinicius C
Gouveia, Mateus Henrique
Saraiva Duarte, Julia Maria
Fonseca, Pablo AS
Tou, Rafael
Scliar, Marilia de Oliveira
Araujo, Gilderlanio Santana de
Costa, Lucas F.
Zolini, Camila
Peixoto, Maria Gabriela Campolina Diniz
Carvalho, Maria Raquel Santos
Lima-Costa, Maria Fernanda
Gilman, Robert H
Tarazona-Santos, Eduardo
Rodrigues, Maíra Ribeiro
NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses
title NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses
title_full NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses
title_fullStr NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses
title_full_unstemmed NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses
title_short NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses
title_sort natora, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses
topic Method Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9046962/
https://www.ncbi.nlm.nih.gov/pubmed/35521552
http://dx.doi.org/10.1016/j.csbj.2022.04.009
work_keys_str_mv AT lealthiagopeixoto natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses
AT furlanviniciusc natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses
AT gouveiamateushenrique natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses
AT saraivaduartejuliamaria natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses
AT fonsecapabloas natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses
AT tourafael natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses
AT scliarmariliadeoliveira natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses
AT araujogilderlaniosantanade natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses
AT costalucasf natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses
AT zolinicamila natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses
AT peixotomariagabrielacampolinadiniz natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses
AT carvalhomariaraquelsantos natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses
AT limacostamariafernanda natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses
AT gilmanroberth natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses
AT tarazonasantoseduardo natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses
AT rodriguesmairaribeiro natoraarelatednesspruningmethodtominimizethelossofdatasetsizeingeneticandomicsanalyses