Cargando…

Resampling Nucleotide Sequences with Closest-Neighbor Trimming and Its Comparison to Other Methods

A large number of nucleotide sequences of various pathogens are available in public databases. The growth of the datasets has resulted in an enormous increase in computational costs. Moreover, due to differences in surveillance activities, the number of sequences found in databases varies from one c...

Descripción completa

Detalles Bibliográficos
Autores principales: Yonezawa, Kouki, Igarashi, Manabu, Ueno, Keisuke, Takada, Ayato, Ito, Kimihito
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3583903/
https://www.ncbi.nlm.nih.gov/pubmed/23460894
http://dx.doi.org/10.1371/journal.pone.0057684
_version_ 1782475502838087680
author Yonezawa, Kouki
Igarashi, Manabu
Ueno, Keisuke
Takada, Ayato
Ito, Kimihito
author_facet Yonezawa, Kouki
Igarashi, Manabu
Ueno, Keisuke
Takada, Ayato
Ito, Kimihito
author_sort Yonezawa, Kouki
collection PubMed
description A large number of nucleotide sequences of various pathogens are available in public databases. The growth of the datasets has resulted in an enormous increase in computational costs. Moreover, due to differences in surveillance activities, the number of sequences found in databases varies from one country to another and from year to year. Therefore, it is important to study resampling methods to reduce the sampling bias. A novel algorithm–called the closest-neighbor trimming method–that resamples a given number of sequences from a large nucleotide sequence dataset was proposed. The performance of the proposed algorithm was compared with other algorithms by using the nucleotide sequences of human H3N2 influenza viruses. We compared the closest-neighbor trimming method with the naive hierarchical clustering algorithm and [Image: see text]-medoids clustering algorithm. Genetic information accumulated in public databases contains sampling bias. The closest-neighbor trimming method can thin out densely sampled sequences from a given dataset. Since nucleotide sequences are among the most widely used materials for life sciences, we anticipate that our algorithm to various datasets will result in reducing sampling bias.
format Online
Article
Text
id pubmed-3583903
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-35839032013-03-04 Resampling Nucleotide Sequences with Closest-Neighbor Trimming and Its Comparison to Other Methods Yonezawa, Kouki Igarashi, Manabu Ueno, Keisuke Takada, Ayato Ito, Kimihito PLoS One Research Article A large number of nucleotide sequences of various pathogens are available in public databases. The growth of the datasets has resulted in an enormous increase in computational costs. Moreover, due to differences in surveillance activities, the number of sequences found in databases varies from one country to another and from year to year. Therefore, it is important to study resampling methods to reduce the sampling bias. A novel algorithm–called the closest-neighbor trimming method–that resamples a given number of sequences from a large nucleotide sequence dataset was proposed. The performance of the proposed algorithm was compared with other algorithms by using the nucleotide sequences of human H3N2 influenza viruses. We compared the closest-neighbor trimming method with the naive hierarchical clustering algorithm and [Image: see text]-medoids clustering algorithm. Genetic information accumulated in public databases contains sampling bias. The closest-neighbor trimming method can thin out densely sampled sequences from a given dataset. Since nucleotide sequences are among the most widely used materials for life sciences, we anticipate that our algorithm to various datasets will result in reducing sampling bias. Public Library of Science 2013-02-27 /pmc/articles/PMC3583903/ /pubmed/23460894 http://dx.doi.org/10.1371/journal.pone.0057684 Text en © 2013 Yonezawa et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Yonezawa, Kouki
Igarashi, Manabu
Ueno, Keisuke
Takada, Ayato
Ito, Kimihito
Resampling Nucleotide Sequences with Closest-Neighbor Trimming and Its Comparison to Other Methods
title Resampling Nucleotide Sequences with Closest-Neighbor Trimming and Its Comparison to Other Methods
title_full Resampling Nucleotide Sequences with Closest-Neighbor Trimming and Its Comparison to Other Methods
title_fullStr Resampling Nucleotide Sequences with Closest-Neighbor Trimming and Its Comparison to Other Methods
title_full_unstemmed Resampling Nucleotide Sequences with Closest-Neighbor Trimming and Its Comparison to Other Methods
title_short Resampling Nucleotide Sequences with Closest-Neighbor Trimming and Its Comparison to Other Methods
title_sort resampling nucleotide sequences with closest-neighbor trimming and its comparison to other methods
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3583903/
https://www.ncbi.nlm.nih.gov/pubmed/23460894
http://dx.doi.org/10.1371/journal.pone.0057684
work_keys_str_mv AT yonezawakouki resamplingnucleotidesequenceswithclosestneighbortrimminganditscomparisontoothermethods
AT igarashimanabu resamplingnucleotidesequenceswithclosestneighbortrimminganditscomparisontoothermethods
AT uenokeisuke resamplingnucleotidesequenceswithclosestneighbortrimminganditscomparisontoothermethods
AT takadaayato resamplingnucleotidesequenceswithclosestneighbortrimminganditscomparisontoothermethods
AT itokimihito resamplingnucleotidesequenceswithclosestneighbortrimminganditscomparisontoothermethods