Cargando…

Triku: a feature selection method based on nearest neighbors for single-cell data

BACKGROUND: Feature selection is a relevant step in the analysis of single-cell RNA sequencing datasets. Most of the current feature selection methods are based on general univariate descriptors of the data such as the dispersion or the percentage of zeros. Despite the use of correction methods, the...

Descripción completa

Detalles Bibliográficos
Autores principales: M Ascensión, Alex, Ibáñez-Solé, Olga, Inza, Iñaki, Izeta, Ander, Araúzo-Bravo, Marcos J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8917514/
https://www.ncbi.nlm.nih.gov/pubmed/35277963
http://dx.doi.org/10.1093/gigascience/giac017
_version_ 1784668562939445248
author M Ascensión, Alex
Ibáñez-Solé, Olga
Inza, Iñaki
Izeta, Ander
Araúzo-Bravo, Marcos J
author_facet M Ascensión, Alex
Ibáñez-Solé, Olga
Inza, Iñaki
Izeta, Ander
Araúzo-Bravo, Marcos J
author_sort M Ascensión, Alex
collection PubMed
description BACKGROUND: Feature selection is a relevant step in the analysis of single-cell RNA sequencing datasets. Most of the current feature selection methods are based on general univariate descriptors of the data such as the dispersion or the percentage of zeros. Despite the use of correction methods, the generality of these feature selection methods biases the genes selected towards highly expressed genes, instead of the genes defining the cell populations of the dataset. RESULTS: Triku is a feature selection method that favors genes defining the main cell populations. It does so by selecting genes expressed by groups of cells that are close in the k-nearest neighbor graph. The expression of these genes is higher than the expected expression if the k-cells were chosen at random. Triku efficiently recovers cell populations present in artificial and biological benchmarking datasets, based on adjusted Rand index, normalized mutual information, supervised classification, and silhouette coefficient measurements. Additionally, gene sets selected by triku are more likely to be related to relevant Gene Ontology terms and contain fewer ribosomal and mitochondrial genes. CONCLUSION: Triku is developed in Python 3 and is available at https://github.com/alexmascension/triku.
format Online
Article
Text
id pubmed-8917514
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-89175142022-03-14 Triku: a feature selection method based on nearest neighbors for single-cell data M Ascensión, Alex Ibáñez-Solé, Olga Inza, Iñaki Izeta, Ander Araúzo-Bravo, Marcos J Gigascience Technical Note BACKGROUND: Feature selection is a relevant step in the analysis of single-cell RNA sequencing datasets. Most of the current feature selection methods are based on general univariate descriptors of the data such as the dispersion or the percentage of zeros. Despite the use of correction methods, the generality of these feature selection methods biases the genes selected towards highly expressed genes, instead of the genes defining the cell populations of the dataset. RESULTS: Triku is a feature selection method that favors genes defining the main cell populations. It does so by selecting genes expressed by groups of cells that are close in the k-nearest neighbor graph. The expression of these genes is higher than the expected expression if the k-cells were chosen at random. Triku efficiently recovers cell populations present in artificial and biological benchmarking datasets, based on adjusted Rand index, normalized mutual information, supervised classification, and silhouette coefficient measurements. Additionally, gene sets selected by triku are more likely to be related to relevant Gene Ontology terms and contain fewer ribosomal and mitochondrial genes. CONCLUSION: Triku is developed in Python 3 and is available at https://github.com/alexmascension/triku. Oxford University Press 2022-03-12 /pmc/articles/PMC8917514/ /pubmed/35277963 http://dx.doi.org/10.1093/gigascience/giac017 Text en © The Author(s) 2022. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
M Ascensión, Alex
Ibáñez-Solé, Olga
Inza, Iñaki
Izeta, Ander
Araúzo-Bravo, Marcos J
Triku: a feature selection method based on nearest neighbors for single-cell data
title Triku: a feature selection method based on nearest neighbors for single-cell data
title_full Triku: a feature selection method based on nearest neighbors for single-cell data
title_fullStr Triku: a feature selection method based on nearest neighbors for single-cell data
title_full_unstemmed Triku: a feature selection method based on nearest neighbors for single-cell data
title_short Triku: a feature selection method based on nearest neighbors for single-cell data
title_sort triku: a feature selection method based on nearest neighbors for single-cell data
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8917514/
https://www.ncbi.nlm.nih.gov/pubmed/35277963
http://dx.doi.org/10.1093/gigascience/giac017
work_keys_str_mv AT mascensionalex trikuafeatureselectionmethodbasedonnearestneighborsforsinglecelldata
AT ibanezsoleolga trikuafeatureselectionmethodbasedonnearestneighborsforsinglecelldata
AT inzainaki trikuafeatureselectionmethodbasedonnearestneighborsforsinglecelldata
AT izetaander trikuafeatureselectionmethodbasedonnearestneighborsforsinglecelldata
AT arauzobravomarcosj trikuafeatureselectionmethodbasedonnearestneighborsforsinglecelldata